Transfer data from EC2 to S3 Bucket

Transferring data between your Amazon Elastic Compute Cloud (EC2) instances and Amazon Simple Storage Service (S3) is a fundamental operation of any organization. Whether you’re archiving application logs, backing up important data, or moving processed files for long-term storage, knowing the most efficient and secure way to perform this copy is crucial. This post will walk you through the primary method: using the AWS Command Line Interface (CLI) with an IAM Role.

1. Prerequisites: Secure Access with IAM Roles
2. AWS CLI s3 cp command
3. Using s3 Sync for Advanced Transfer Options s3 sync
4. Performance & Optimization Best Practices

1. Prerequisites: Secure Access with IAM Roles

The most secure way to grant your EC2 instance access to S3 is by leveraging an IAM Role attached to the instance. This can be a one time setup or you can use this IAM Role only for copying temporarily (After copying you can delete the IAM role if it’s no longer needed).See more about it here on how to add IAM roles to an EC2 instance – https://awscat.com/granting-s3-access-to-an-ec2-instance-with-iam-roles/

This IAM role provides the temporary credentials your EC2 instance needs to interact with S3.

2. AWS CLI s3 cp command

The AWS CLI is the fastest and most common way to perform file transfers from an EC2 instance. It handles large file uploads efficiently using multipart uploads automatically.

Installation

Most Amazon Linux AMIs come with the AWS CLI pre-installed, but if yours doesn’t, you’ll need to install it. For Linux distributions, this typically involves using the package manager (apt or yum) or a Python installer (pip).

Copying Files and Folders

The primary command for copying data is aws s3 cp.

A. Copying a Single File

To copy a single file from your local EC2 filesystem to an S3 bucket:

aws s3 cp /path/to/local/file.csv s3://target-bucket-name/targetfolder/file.csv

aws s3 cp /path/to/local/file.csv s3://target-bucket-name/targetfolder/file.csv

Source: //path/to/local/file.csv (local file path on EC2)
Destination: s3://target-bucket-name/targetfolder/file.csv (S3 URI)

B. Copying an Entire Folder (Recursive)

To copy an entire directory and all its contents (including subdirectories) to an S3 bucket, you must use the --recursive flag:

aws s3 cp /path/to/local/data/ s3://your-target-bucket-name/backup/ --recursive

aws s3 cp /path/to/local/data/ s3://your-target-bucket-name/backup/ --recursive

This is the command you’ll use most often for backups or archiving. The /path/to/local/data/ structure will be mirrored under the s3://your-target-bucket-name/backup/ prefix.

3. Using s3 Sync for Advanced Transfer Options `s3 sync`

For synchronizing a directory between your EC2 instance and S3, the aws s3 sync command is even more efficient than cp --recursive. The sync command recursively copies new and updated files from the source to the destination. It will not re-copy files that already exist at the destination with the same size and modification time, making it excellent for incremental backups.

aws s3 sync /path/to/local/app-data/ s3://your-target-bucket-name/app-backups/

aws s3 sync /path/to/local/app-data/ s3://your-target-bucket-name/app-backups/

4. Performance & Optimization Best Practices

To ensure your file transfers are as fast as possible, consider these best practices:

Colocation (Same Region): Always ensure your EC2 instance and your target S3 bucket are in the same AWS Region. This provides the best network performance and, critically, avoids data transfer costs between the EC2 instance and S3.
Use s3 sync for Incremental Transfers: If you are repeatedly copying the contents of a directory, use aws s3 sync to only transfer files that have changed, saving time and potentially money.
VPC Endpoint for S3: If your EC2 instance is in a private subnet, configure a VPC Gateway Endpoint for S3. This allows traffic to S3 to route privately within the AWS network, improving security and often performance, and avoiding the need for a NAT Gateway for this specific traffic.
Tuning the AWS CLI: For very high-throughput transfers of many files, you can adjust CLI settings like max_concurrent_requests and multipart_chunksize in your AWS configuration to optimize transfer speeds, though the defaults are often sufficient.

By using the above steps you can easily copy files from an EC2 instance to S3 in different ways by using either cp,sync or rsync command!