AWS service of the week – Introduction
Hello there and welcome, I’m Rob Harding, and I have the awesome role of being the AWS Solution Architect here at Wirehive.
I thought it would be great to write a series of blogs on a range of AWS services that I’ve found interesting when delivering projects and talk about the business value they can provide. This week I’m going to kick off with the excellent S3 or Simple Storage Service to give it its full title, and talk through what it is, and the brilliant features it can provide you!
The definition quote from AWS states that
“S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance”
That’s fantastic you may say, or you may think what even is Object Storage? So, let’s start with the fundamentals
S3 has the concept of a bucket, and this bucket is your storage area for as many objects as you want (5GB limit per object), with an unlimited amount of data it stores.
Objects are pieces of data which could really be anything such as media files, logging data, or even html files to host a static website! I’ve seen S3 used for many different use cases, because of this versatility in the types of data that can be stored.
One important point to note is that there are scenarios where Object storage is not the right solution, such as if you wanted to store Relational Database Data, but as you might expect AWS have that covered in another suite of Data Services I’ll talk about in another blog!
Great, so now I have a bucket and I’ve put some objects (files) into it, how is it accessed?
Bucket names for S3 need to be unique so that its endpoint does not conflict with any other bucket name within AWS. We can communicate with this endpoint using several methods such as REST API Requests, AWS CLI commands, or the web console. This means that my technical barrier to entry is reduced because of the different access methods
Now where is my data stored and how is it kept backed up?
You decide which AWS region you would like your bucket to be stored in, and S3 will make sure copies of each object will be stored on at least 3 AZs (Availability Zones). Due to this S3 has an industry-leading 99.999999999% (11 9’s) SLA on its data durability.
To put that number into perspective, if you stored 10,000,000 objects on S3, it is expected that you will incur a loss of 1 object every 10,000 years!
I can get my data, I know its backed up, but I’m really concerned about security and who can access my buckets/objects?
By default, S3 buckets are locked down, and the onus is you to grant explicit access to Users, IP addresses, HTTP Originating Headers, or other AWS Accounts/Services. This can all be achieved with creating a bucket policy setting out the actions (i.e. read files, upload files), and resources you want to get access (all objects, or some of them).
A word to the wise, don’t do what I did once and create a bucket policy so restrictive you lock everyone out from it, if that happens only the root user has the access to back out the policy changes. I can also choose that my objects are encrypted at rest with server-side encryption activated, which can use the AES-256 cipher to scramble the objects original content before being stored on the S3 infrastructure
What about any other features of S3 that I should know about?
This is a feature rich Object storage solution and there are many, but a few to point out are: Lifecycle management of objects in buckets, where after a set period of time, I want to perform an action on objects. For example, all my CloudTrail log objects older than 90 days since uploaded, I’m unlikely to need, so I want to delete them, which will save money on my data storage charges.
Another great feature is versioning of objects which if enabled will not write over any existing object with the same name, but will add a versioning number to it, so you will have multiple versions of that object from different points in time. This is great for critical files in production environments, such as a Terraform state file.
The final feature to mention today will be on storage classes.
S3 has several different storage classes to choose from, each with their benefits, cost association, and SLA.
If I have some replaceable secondary backed up data objects, I’m not that concerned about, and will not access much, I may want to use the S3 One Zone-IA class, which will not replicate the data to other AZs, but will cost less, and if I lose some files, would it really matter?