Amazon S3

Amazon Simple Storage Service (Amazon S3) is a storage for the Internet. You can use Amazon S3 to store and recover any amount of data at any time, anywhere on the web.

Its scale is almost infinite, a considerable number of websites use Amazon S3 as well as many AWS services. Therefore, it is an essential component of AWS.

Buckets et Objects
- Bucket
- Object
Versionning
Stockage Classes
- Transition and Life Cycle
Versioning
Replication
Encryption
- Methods
- Forcing encryption
Security
Logging and Audit
- Logging Bucket
- Audit with Athena
S3 Website
- S3 CORS (Cross-Origin Resource Sharing )

Buckets et Objects

In Amazon S3, objects (files) are stored in buckets (directories)

Bucket

A Bucket must have a unique name at the global level of the AWS network even if it is defined at the Region level
It follows a naming convention:
- No uppercase letter
- No underscore
- No IP
- And starts with a lowercase letter or a number

Object

As a file content
It has a maximum size of 5 TB (as a 5 GB multi-parts)
You can attach meta-data, tags and a version ID to it
Objects are accessible by their Key
A Key is composed of a prefix and the object name:
- Prefix: company/department/
- Object Name: users.json
- Key: company/department/users.json
For a Bucket named referential, the object will be accessed via the URL:
- s3://referential/company/department/users.json

Even if there is no notion of directory in S3, we see that the naming of Prefix with /`s allows to simulate a tree structure.

Versionning

To implement the version management of objects, you must first enable the versioning at the Bucket level.

Version number is generated by Amazon S3
The removal of an object is then soft-delete and the object will be marked with a delete marker. It will no longer be displayed in the list of objects but it will still exist with its different versions.

Stockage Classes

There are several categories of S3 storage (S3 Classes) that should be used depending on your use case:

Amazon S3 Standard:
- General Use
- Highly durable data (10,000 years)
- 99.99% availability over 1 year (eleven 9)
- Resilient to AZ disaster (supports 2 concurrent failures)
Amazon S3 Standard-Infrequent Access (SIA):
- For less frequently used data
  - Backup
  - Disaster Recovery
- Highly durable data (10,000 years)
- 99.9% availability over 1 year (un 9)
- Resilient to AZ disaster (supports 2 concurrent failures)
- Cheaper than S3 Standard
Amazon S3 One Zone-Infrequent Access:
- For less frequently used and lost data:
  - Secondary backup
  - Data that can be recreated
- Highly durable data (10,000 years) BUT on a single AZ (risk of data loss)
- 99.5% availability over 1 year
- Cheaper than S3 SIA
Amazon S3 One Intelligent Tiering:
- Same low latency and high throughput as S3 Standard
- Moves Objects between 2 thirds (e.g. between standard S3 and S3 IA)
- Highly durable data (10,000 years)
- 99.9% availability over 1 year
- Resilient to AZ disaster (supports 2 concurrent failures)
- Additional cost due to monitoring required
Amazon Glacier:
- For long-term data retention (minimum 90 days) of up to 10 years, which do not require access:
  - Archives or backups
- Very low cost storage BUT with cost recovery
- Recovery:
  - Expedited: 1-5 min
  - Standard: 3 to 5 hours
  - Bulk: 5-12 hours
- Highly durable data (10,000 years)
- Archive with a size of 40 TB
- Storage is done in Vaults
Amazon Glacier Deep Archive:
- Like Amazon Glacier
- For long-term data retention (minimum 180 days) up to 10 years, which do not require quick access
- Recovery:
  - Standard: 12 hours
  - Bulk: 48 hours

Transition and Life Cycle

It is possible to create rules so that the data is automatically migrated to a more suitable storage or even deleted:

Transition: moves objects to less expensive storage after some time
Expiration: deletes an object after some time

The supported transition cycle is constrained and it is not possible to switch from Amazon Glacier to S3 Standard directly
The ultimate goal is to reduce storage costs

Versioning

To implement the version management of objects, you must first enable versioning at the bucket level.

Version number is generated by Amazon S3
Deleting an object is then soft-delete and the object will be marked with a Delete Marker. It will no longer be displayed in the list of objects but it will still exist with its different versions.

Replication

It is possible to replicate a S3 Bucket between 2 Regions (Cross Region Replication) or in the same Region (Same Region Replication):

Versioning must be enabled on the 2 Buckets
They can belong to 2 different accounts
Permissions are managed by an IAM Role
Replication is asynchronous but fast

Possible use cases are:

For CRR: regulatory compliance, latency reduction, AWS cross-region replication
For SCR: data aggregation, live replication between environments

Once enabled, replication is only performed on new or modified objects
An option allows to replicate deletions (only Delete Markers)
It is not possible to replicate a Replication Bucket

Encryption

Methods

There are 4 methods of encrypting objects in S3:

SSE-S3:
- Key managed by AWS
- Server Side Encryption (SSE)
- Algorithm AES-256
- Activates by passing the Header “x-amz-server-side-encryption”:”AES256” when uploading the object
- Can use HTTP or HTTPS
SSE-KMS:
- Uses KMS (Key Management Service) to manage the key
- Server Side Encryption (SSE)
- Activates by passing the Header “x-amz-server-side-encryption”:”aws:kms” when uploading the object
- Uses the Customer Master Key defined in KMS for encryption
- Can use HTTP or HTTPS
SSE-C:
- Allows you to provide your own key (but it is up to you to store it)
- Server Side Encryption (SSE) but the key is not stored in AWS!
- Activates by passing the key in the Header when uploading the object but also when reading it
- Uses only HTTPS protocol (to protect key)
Client-rated encryption:
- Encryption of objects is the responsibility of the Client
- Side Encryption Client (CSE)
- Encryption / decryption is done on the Client side

Forcing encryption

There are 2 ways to force encryption of an Object in its Bucket:

Force encryption with a S3 Bucket Policy that only accepts PUT requests with an encryption header (and otherwise refuses the request)
Enable Default Encryption on a Bucket:
- If the object is sent with an encryption method in the request, it will be applied
- If the object is sent without an encryption method, it will be encrypted with the default encryption method

The Default Encryption option therefore ensures that objects are always encrypted but does not guarantee the encryption method
Bucket Policy will always be evaluated before Default Encryption

Encryption In Transit only encrypts an object in SSL/TLS when it is transferred to/from AWS. He doesn’t encrypt the object in its bucket.

Security

Access Management

Access to S3 is managed at different levels:

User:
- IAM Policy: Defines the calls allowed to S3 APIs for each IAM user
Resource:
- Bucket Policy:
  - S3 Bucket Policy:
    - Configuration in JSON format
    - Allows you to configure a public access to a Bucket, to force the encryption of objects or to give access to another account (Cross-Account)
  - Block Public Access:
    - Blocks public access to a Bucket
    - Prevents leakage of data stored in a Bucket
- Object Access Control List: ACL for each object
- Bucket Access Control List: ACL at each bucket level

Pre-signed URL

A Pre-signed URL allows to generate a valid URL a time lapse (default 1H) to allow a user to download or upload a file into a Bucket:

It can be generated with AWS CLI or SDK
The user of the Pre-signed URL inherits the same rights (GET / PUT) as the one who created it

Generation of unique URLs and temporary downloads
Generation of temporary URLs to upload into specific Bucket locations

Others

Networking:
- Supports VPC Endpoints (EC2 instances without Internet access)
MFA for deletion:
- Must be enabled under Root Account with the following AWS CLI command: aws s3api put-bucket-versioning --bucket <bucket-name> --versioning-configuration Status=Enabled,MFADelete=Enabled --mfa "<mfa-device-arn> <mfa-code>"
- Reserved for the owner of the Bucket, requires a Multi Factor Authentication (MFA) token to delete a versioned Object or delete the versioning of a Bucket

Logging and Audit

Logging Bucket

It is possible to log all access to a Bucket S3 in another Bucket S3:

This bucket is called a Logging Bucket
All accesses, authorized or not, will be logged with a lot of information about the Client who accessed them (Log Format)
It will then be possible to analyse these requests (see Athena below)
S3 APIs calls can be logged into AWS CloudTrail

You should never configure the Logging Bucket as the same as the monitored Bucket, otherwise you will cause endless loops of logs and see its AWS bill explode!

Audit with Athena

Athena is a service that allows to perform analysis queries directly on S3 objects (without going through a BDD):

It uses the SQL language
It provides JDBC or ODBC drives which allows interfacing with other BI software for example
It supports many formats:
- files: CSV, TSV, delimited, JSON
- related to Hadoop: ORC, Apache Avro, Parquet
- log files: Logstash, AWS CloudTrail, Apache WebServer

S3 Website

S3 can host websites’ static content
The bucket must be activated in this way
The access URL is of the form:
- <bucket>.s3-website.<region>.amazonaws.com
- <bucket>.s3-website-<region>.amazonaws.com

A website that refers to resources on a S3 Bucket may need to configure a Header CORS
The bucket’s DNS name must be autorized in the HTTP Header Access-Control-Allow-Origin

Written by

Jean-Jerome Levy

DevOps Consultant

Seasoned professional in the field of information technology, I bring over 20 years of experience from working within major corporate IT departments. My diverse expertise has played a pivotal role in a myriad of projects, marked by the implementation of innovative DevOps practices.

Amazon S3

Buckets et Objects

Bucket

Object

Versionning

Stockage Classes

Transition and Life Cycle

Versioning

Replication

Encryption

Methods

Forcing encryption

Security

Access Management

Pre-signed URL

Others

Logging and Audit

Logging Bucket

Audit with Athena

S3 Website

Jean-Jerome Levy

You may also like...

AWS CloudFormation: Infrastructure-as-Code by Amazon

AWS Streaming - Amazon Kinesis

AWS Messages - Amazon SQS and SNS

Monitoring and Audit in AWS - CloudWatch, X-Ray et CloudTrail

Buckets et Objects

Bucket

Object

Versionning

Stockage Classes

Transition and Life Cycle

Versioning

Replication

Encryption

Methods

Forcing encryption

Security

Access Management

Pre-signed URL

Others

Logging and Audit

Logging Bucket

Audit with Athena

S3 Website

S3 CORS (Cross-Origin Resource Sharing )

Jean-Jerome Levy

You may also like...

AWS CloudFormation: Infrastructure-as-Code by Amazon

AWS Streaming - Amazon Kinesis

AWS Messages - Amazon SQS and SNS

Monitoring and Audit in AWS - CloudWatch, X-Ray et CloudTrail