Amazon S3 Features & Lab Using Python

Amazon S3 Features & Lab Using Python

My first exposure to AWS was a few years ago in which I had to securely store some infrequently accessed files on the cloud and do so in the most cost-effective manner. The obvious solution, looking back on it now, was to store them in an S3 Glacier Bucket. Which is exactly what I did. Recently I've taking it upon myself to explore various AWS services and this week I decided to get hands-on and do research into S3. I hope you enjoy what I've learned and written down below as I give a brief description of some Amazon S3 features and then walk you through how to upload your files to an S3 bucket using a simple Python script.  

Storage Classes

S3 offers industry-leading scalability, data availability, security, and performance. The storage containers in S3 are called buckets which have virtually unlimited storage capacity. Each AWS account can provision up to 100 buckets. The items you store in a bucket are called objects and are stored as key-value pairs. AWS provides different types of storage options (Figure 1) depending on your needs. In the example I gave in the introduction above, the best option for storing my infrequently accessed files would have been to store them in an S3 Glacier Deep Archive bucket.   

No alt text provided for this image

Durability

Durability is measured on S3 by the likelihood of losing data. All S3 storage classes off 99.999999999% (also referred to as 11 9s of durability). This means that you will lose one object for every 100 billion objects you store on S3.

Versioning & Replication

The versioning feature gives you the ability to keep multiple versions of an object in the same bucket. This would enable you to recover files from accidental deletion. For example, you might have a file with a list of customer prices which changes from time to time. With versioning available, if you make an incorrect change to the original file and upload it you will have both versions in the bucket and can restore the older version at any time.  

S3 Replication is when you replicate the data from one bucket to another. S3 allows you to replicate across different regions (Cross-Region Replication) or within the same region (Same-Region Replication). Keep in mind that you must have S3 Versioning enabled in order to use replication. 

Interesting Use Cases

S3 isn't only used for storing your infrequently used files. You can use it for some other surprising things as well such as using it to host a static website on S3 (see Figure 2). Your S3 static website could have a customer contact form in which users submit their contact info. In this case a user's contact info is dropped into the bucket. From there, other AWS services could handle the contact info objects you've dropped in your bucket.  

No alt text provided for this image

Exercise: Upload Objects to an S3 Bucket Using Python

In case you want to follow along, part of the following exercise is done on Windows 10 and using the windows command prompt. I've also uploaded all the Python code written in this exercise on GitHub here.

1. Pre-Requisites

If you do not have the boto3 AWS SDK package installed open your command prompt and install it by running the following command

pip install boto3        

 2. Setting up a bucket

Firstly, log into AWS and navigate to the S3 service. Once there click on the Create bucket button on the right-hand side of the screen. You'll be taken to a screen like the one in Figure 3 shown below. As pointed out in step 1 below, name your bucket. Secondly you should de-select the Block all public access check-box; this will allow your Python script to programmatically access your bucket from outside of AWS. Thirdly check the box in which you acknowledge the warning and click on the Create bucket button which you'll find at the bottom-right of your screen. 

No alt text provided for this image


3. Create the User Group and User

It is considered best practice not to assign roles directly to users whenever possible. The alternative is to assign roles to a User group and then assign a user to the particular group with the required permissions.

Navigate to the IAM service and select the User groups panel under the Access management drop down. Then, click on the Create group button. You will now see a creation screen like the one in Figure 4 below. Name your user group S3-Users and filter for the AmazonS3FullAccess permission policy. Then create your group. 

No alt text provided for this image

After creating your group go to the Users panel underneath the Access management drop-down menu on the left. Then click on the Add users button. You should see a panel like the one in Figure 5 below. Name your user S3User01 and select the Access key - Programmatic access check-box. The programmatic access will allow you to work with the AWS APIs using the keys which will be provided to your user.  

No alt text provided for this image

Click the Next button and you should see the permissions configuration screen in Figure 6 below. In this panel you will find your S3-Users group and add the S3User01 to it by selecting your group under the Add user to group section. You can leave the remainder of the options as default and click on Next until the user is created. 

No alt text provided for this image

You have now successfully created a user and assigned the user to a group with programmatic access permissions for S3. Once you've successfully created your user you will see a screen like the one in Figure 7 below. Make sure to securely store your user's Access key ID and Secret access key which AWS has provided.

4. Creating The Upload Python Script

The python script you'll be using is available on GitHub here. The script will search an upload directory for any CSV files and upload the files to the bucket you've created. First create a directory anywhere in your local machine, named s3_upload, in which you will store your coding files. In this directory create another folder and name it upload. Inside the s3_upload directory create a python file named secrets.py (Figure 8). This file will store the Access key ID and Secret access key which you created in the previous step for your user. Remember not to share your Secret access key

No alt text provided for this image

Create a python file named my_bucket_upload.py, example below in Figure 9. There are a few important things to note in the code below.

  • Line 1, you are importing the boto3 AWS SDK package for Python
  • Line 3, you are importing the access keys from your secrets.py file
  • Line 9, make sure to replace the bucket name with the one you have created
  • Line 10, the script will create a csv-dir folder in your bucket which will contain your uploaded CSV files
  • Line 14, The script will explicitly look for files with the .csv extension in the upload directory

No alt text provided for this image

5. Run your script

On your command line make sure you're in the s3_upload directory and then run your python script by executing the following command:

python my_bucket_upload.py        

You will get a message saying your CSV files are being pushed to S3 on the command line. Now, navigate to your bucket on AWS and you will see the csv-dir folder is in your bucket and any CSV files which you uploaded are located inside this folder.

Takeaway

Now that I've explained S3's storage benefits and walked you through a Python exercise in which you upload files to S3 I hope you've started to think of ways you could leverage S3 to your benefit. Amazon S3 provides nearly unlimited storage for users, last year it was announced that S3 stores over 100 trillion objects (You can access the blog post here). That's mind-blowing to think about. What percentage of the current S3 objects do you own?

Feel free to connect with me on LinkedIn. 

Thanks for reading,

Eric

To view or add a comment, sign in

More articles by Eric Huerta

  • AWS Data Lakes + Glue & Athena

    You may have noticed that there is an ever increasing amount of data being collected by companies. It was estimated by…

    3 Comments
  • Amazon EC2 Features + Walkthrough

    The Amazon Elastic Compute Cloud (EC2) service provides users with a way to launch virtual servers on the cloud…

    2 Comments
  • Key AWS Lambda Features & Lab Exercise

    AWS Lambda is a service which enables you to create and run functions within AWS. It's commonly used for data…

    15 Comments
  • How AWS Can Improve Your Database Ecosystem

    Have you dealt with a database that wasn't adequately prepared for a disaster-recovery scenario? Have you worked with a…

    4 Comments

Others also viewed

Explore content categories