Amazon Simple Storage Service (Amazon S3) is a storage for the Internet. You can use Amazon S3 to store and recover any amount of data at any time, anywhere on the web.

Its scale is almost infinite, a considerable number of websites use Amazon S3 as well as many AWS services. Therefore, it is an essential component of AWS.



Buckets et Objects

In Amazon S3, objects (files) are stored in buckets (directories)

Bucket

  • A Bucket must have a unique name at the global level of the AWS network even if it is defined at the Region level
  • It follows a naming convention:
    • No uppercase letter
    • No underscore
    • No IP
    • And starts with a lowercase letter or a number

Object

  • As a file content
  • It has a maximum size of 5 TB (as a 5 GB multi-parts)
  • You can attach meta-data, tags and a version ID to it

  • Objects are accessible by their Key
  • A Key is composed of a prefix and the object name:
    • Prefix: company/department/
    • Object Name: users.json
    • Key: company/department/users.json
  • For a Bucket named referential, the object will be accessed via the URL:
    • s3://referential/company/department/users.json

Even if there is no notion of directory in S3, we see that the naming of Prefix with /`s allows to simulate a tree structure.


Versionning

To implement the version management of objects, you must first enable the versioning at the Bucket level.

  • Version number is generated by Amazon S3
  • The removal of an object is then soft-delete and the object will be marked with a delete marker. It will no longer be displayed in the list of objects but it will still exist with its different versions.

Stockage Classes

There are several categories of S3 storage (S3 Classes) that should be used depending on your use case:

  • Amazon S3 Standard:
    • General Use
    • Highly durable data (10,000 years)
    • 99.99% availability over 1 year (eleven 9)
    • Resilient to AZ disaster (supports 2 concurrent failures)
  • Amazon S3 Standard-Infrequent Access (SIA):
    • For less frequently used data
      • Backup
      • Disaster Recovery
    • Highly durable data (10,000 years)
    • 99.9% availability over 1 year (un 9)
    • Resilient to AZ disaster (supports 2 concurrent failures)
    • Cheaper than S3 Standard
  • Amazon S3 One Zone-Infrequent Access:
    • For less frequently used and lost data:
      • Secondary backup
      • Data that can be recreated
    • Highly durable data (10,000 years) BUT on a single AZ (risk of data loss)
    • 99.5% availability over 1 year
    • Cheaper than S3 SIA
  • Amazon S3 One Intelligent Tiering:
    • Same low latency and high throughput as S3 Standard
    • Moves Objects between 2 thirds (e.g. between standard S3 and S3 IA)
    • Highly durable data (10,000 years)
    • 99.9% availability over 1 year
    • Resilient to AZ disaster (supports 2 concurrent failures)
    • Additional cost due to monitoring required
  • Amazon Glacier:
    • For long-term data retention (minimum 90 days) of up to 10 years, which do not require access:
      • Archives or backups
    • Very low cost storage BUT with cost recovery
    • Recovery:
      • Expedited: 1-5 min
      • Standard: 3 to 5 hours
      • Bulk: 5-12 hours
    • Highly durable data (10,000 years)
    • Archive with a size of 40 TB
    • Storage is done in Vaults
  • Amazon Glacier Deep Archive:
    • Like Amazon Glacier
    • For long-term data retention (minimum 180 days) up to 10 years, which do not require quick access
    • Recovery:
      • Standard: 12 hours
      • Bulk: 48 hours

Transition and Life Cycle

It is possible to create rules so that the data is automatically migrated to a more suitable storage or even deleted:

  • Transition: moves objects to less expensive storage after some time
  • Expiration: deletes an object after some time

Note

  • The supported transition cycle is constrained and it is not possible to switch from Amazon Glacier to S3 Standard directly
  • The ultimate goal is to reduce storage costs

Versioning

To implement the version management of objects, you must first enable versioning at the bucket level.

  • Version number is generated by Amazon S3
  • Deleting an object is then soft-delete and the object will be marked with a Delete Marker. It will no longer be displayed in the list of objects but it will still exist with its different versions.

Replication

It is possible to replicate a S3 Bucket between 2 Regions (Cross Region Replication) or in the same Region (Same Region Replication):

  • Versioning must be enabled on the 2 Buckets
  • They can belong to 2 different accounts
  • Permissions are managed by an IAM Role
  • Replication is asynchronous but fast

Possible use cases are:

  • For CRR: regulatory compliance, latency reduction, AWS cross-region replication
  • For SCR: data aggregation, live replication between environments

Good to know

  • Once enabled, replication is only performed on new or modified objects
  • An option allows to replicate deletions (only Delete Markers)
  • It is not possible to replicate a Replication Bucket

Encryption

Methods

There are 4 methods of encrypting objects in S3:

  • SSE-S3:
    • Key managed by AWS
    • Server Side Encryption (SSE)
    • Algorithm AES-256
    • Activates by passing the Header “x-amz-server-side-encryption”:”AES256” when uploading the object
    • Can use HTTP or HTTPS
  • SSE-KMS:
    • Uses KMS (Key Management Service) to manage the key
    • Server Side Encryption (SSE)
    • Activates by passing the Header “x-amz-server-side-encryption”:”aws:kms” when uploading the object
    • Uses the Customer Master Key defined in KMS for encryption
    • Can use HTTP or HTTPS
  • SSE-C:
    • Allows you to provide your own key (but it is up to you to store it)
    • Server Side Encryption (SSE) but the key is not stored in AWS!
    • Activates by passing the key in the Header when uploading the object but also when reading it
    • Uses only HTTPS protocol (to protect key)
  • Client-rated encryption:
    • Encryption of objects is the responsibility of the Client
    • Side Encryption Client (CSE)
    • Encryption / decryption is done on the Client side

Forcing encryption

There are 2 ways to force encryption of an Object in its Bucket:

  • Force encryption with a S3 Bucket Policy that only accepts PUT requests with an encryption header (and otherwise refuses the request)
  • Enable Default Encryption on a Bucket:
    • If the object is sent with an encryption method in the request, it will be applied
    • If the object is sent without an encryption method, it will be encrypted with the default encryption method

To be noted

  1. The Default Encryption option therefore ensures that objects are always encrypted but does not guarantee the encryption method
  2. Bucket Policy will always be evaluated before Default Encryption

Encryption In Transit only encrypts an object in SSL/TLS when it is transferred to/from AWS. He doesn’t encrypt the object in its bucket.


Security

Access Management

Access to S3 is managed at different levels:

  • User:
    • IAM Policy: Defines the calls allowed to S3 APIs for each IAM user
  • Resource:
    • Bucket Policy:
      • S3 Bucket Policy:
        • Configuration in JSON format
        • Allows you to configure a public access to a Bucket, to force the encryption of objects or to give access to another account (Cross-Account)
      • Block Public Access:
        • Blocks public access to a Bucket
        • Prevents leakage of data stored in a Bucket
    • Object Access Control List: ACL for each object
    • Bucket Access Control List: ACL at each bucket level

Pre-signed URL

A Pre-signed URL allows to generate a valid URL a time lapse (default 1H) to allow a user to download or upload a file into a Bucket:

  • It can be generated with AWS CLI or SDK
  • The user of the Pre-signed URL inherits the same rights (GET / PUT) as the one who created it

Use case

  1. Generation of unique URLs and temporary downloads
  2. Generation of temporary URLs to upload into specific Bucket locations

Others

  • Networking:
    • Supports VPC Endpoints (EC2 instances without Internet access)
  • MFA for deletion:
    • Must be enabled under Root Account with the following AWS CLI command: aws s3api put-bucket-versioning --bucket <bucket-name> --versioning-configuration Status=Enabled,MFADelete=Enabled --mfa "<mfa-device-arn> <mfa-code>"
    • Reserved for the owner of the Bucket, requires a Multi Factor Authentication (MFA) token to delete a versioned Object or delete the versioning of a Bucket

Logging and Audit

Logging Bucket

It is possible to log all access to a Bucket S3 in another Bucket S3:

  • This bucket is called a Logging Bucket
  • All accesses, authorized or not, will be logged with a lot of information about the Client who accessed them (Log Format)
  • It will then be possible to analyse these requests (see Athena below)
  • S3 APIs calls can be logged into AWS CloudTrail

Warning

You should never configure the Logging Bucket as the same as the monitored Bucket, otherwise you will cause endless loops of logs and see its AWS bill explode!

Audit with Athena

Athena is a service that allows to perform analysis queries directly on S3 objects (without going through a BDD):

  • It uses the SQL language
  • It provides JDBC or ODBC drives which allows interfacing with other BI software for example
  • It supports many formats:
    • files: CSV, TSV, delimited, JSON
    • related to Hadoop: ORC, Apache Avro, Parquet
    • log files: Logstash, AWS CloudTrail, Apache WebServer

S3 Website

  • S3 can host websites’ static content
  • The bucket must be activated in this way
  • The access URL is of the form:
    • <bucket>.s3-website.<region>.amazonaws.com
    • <bucket>.s3-website-<region>.amazonaws.com

S3 CORS (Cross-Origin Resource Sharing )

  • A website that refers to resources on a S3 Bucket may need to configure a Header CORS
  • The bucket’s DNS name must be autorized in the HTTP Header Access-Control-Allow-Origin
Jean-Jerome Levy

Written by

Jean-Jerome Levy

DevOps Consultant

Seasoned professional in the field of information technology, I bring over 20 years of experience from working within major corporate IT departments. My diverse expertise has played a pivotal role in a myriad of projects, marked by the implementation of innovative DevOps practices.