Boto3: Amazon S3 as Python Object Store

Use Amazon Simple Storage Service (S3) as an object store to manage Python data structures.

Saravanan Subramanian

Jan. 21, 19 · Tutorial

Likes (1)

Comment

Save

96.1K Views

Use Amazon Simple Storage Service (S3) as an object store to manage Python data structures.

Introduction

Amazon S3 is extensively used as a file storage system to store and share files across the internet. Amazon S3 can be used to store any type of objects, it is a simple key-value store. It can be used to store objects created in any programming languages, such as Java, JavaScript, Python, etc. AWS DynamoDB recommends using S3 to store large items of size more than 400KB. This article focuses on using S3 as an object store using Python.v

Prerequisites

The Boto3 is the official AWS SDK to access AWS services using Python code. Please ensure Boto3 and awscli are installed in the system.

$pip install boto3

$pip install awscli

Also, configure the AWS credentials using "aws configure" command or set up environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY store your keys in the environment. Please DO NOT hard code your AWS Keys inside your Python program.

To configure aws credentials, first install awscli and then use "aws configure" command to setup. For more details, refer to AWS CLI Setup and Boto3 Credentials.

Configure the AWS credentials using the following command:

$aws configure

Do a quick check to ensure you can reach AWS.

$aws s3 ls

The above CLI must show the S3 buckets created in your AWS account. The AWS account will be selected based on the credentials configured. In case, multiple AWS accounts are configured, use the "--profile " option in the AWS CLI. If you don't mention "--profile " option the CLI takes the profile "default".

Use the below commands to configure development profile named "dev" and validate the settings.

$aws configure -profile dev
$aws s3 ls --profile dev

The above command show s3 buckets present in the account which belongs to "dev" profile.

Connecting to S3

Connecting to Default Account (Profile)

The client() API connects to the specified service in AWS. The below code snippet connects to S3 using the default profile credentials and lists all the S3 buckets.

import boto3

s3 = boto3.client('s3')
buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
    print bucket['CreationDate'].ctime(), bucket['Name']

Connecting to Specific Account (Profile)

To connect to a specific account, first, create a session using Session() API. The Session() API allows to mention the profile name and region. It also allows to specify the AWS credentials.

The below code snippet connects to an AWS account configured using "dev" profile and lists all the S3 buckets.

import boto3

session = boto3.Session(profile_name="dev", region_name="us-west-2")
s3 = session.client('s3')buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
    print bucket['CreationDate'].ctime(), bucket['Name']

Storing and Retrieving a Python LIST

Boto3 supports put_object()and get_object() APIs to store and retrieve objects in S3. But the objects must be serialized before storing. The python pickle library supports serialization and deserialization of objects. Pickle is available by default in Python installation.

The APIs pickle.dumps() and pickle.loads() is used to serialize and deserialize Python objects.

Storing a List in S3 Bucket

Ensure serializing the Python object before writing into the S3 bucket. The list object must be stored using a unique "key." If the key is already present, the list object will be overwritten.

import boto3
import pickle

s3 = boto3.client('s3')
myList=[1,2,3,4,5]

#Serialize the object 
serializedListObject = pickle.dumps(myList)

#Write to Bucket named 'mytestbucket' and 
#Store the list using key myList001

s3.put_object(Bucket='mytestbucket',Key='myList001',Body=serializedListObject)

The put_object()API may return a "NoSuchBucket" exception if the bucket does not exists in your account.

NOTE: Please modify bucket name to your S3 bucket name. I don't won this bucket.

Retrieving a List From S3 Bucket

The list is stored as a stream object inside Body. It can be read using read() API of the get_object() returned value. It can throw an "NoSuchKey" exception if the key is not present.

import boto3
import pickle

#Connect to S3
s3 = boto3.client('s3')

#Read the object stored in key 'myList001'
object = s3.get_object(Bucket='mytestbucket',Key='myList001')
serializedObject = object['Body'].read()

#Deserialize the retrieved object
myList = pickle.loads(serializedObject)

print myList

Storing and Retrieving a Python Dictionary

Python dictionary objects can be stored and retrieved in the same way using put_object() and get_object() APIs.

Storing a Python Dictionary Object in S3

import boto3
import pickle


#Connect to S3 default profile
s3 = boto3.client('s3')

myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
#Serialize the object
serializedMyData = pickle.dumps(myData)

#Write to S3 using unique key - EmpId007
s3.put_object(Bucket='mytestbucket',Key='EmpId007')

Retrieving Python Dictionary Object From S3 Bucket

Use the get_object() API to read the object. The data is stored as a stream inside the Body object. This can be read using read() API.

import boto3

s3 = boto3.client('s3')

object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()

myData = pickle.loads(serializedObject)

print myData

Working With JSON

When working with Python dictionary, it is recommended to store it as JSON if the consumer applications are not written in Python or do not have support for Pickle library.

The API, json.dumps(), converts the Python Dictionary into JSON, and json.loads() converts a JSON to a Python dictionary.

Storing a Python Dictionary Object As JSON in S3 Bucket

import boto3
import json

s3 = boto3.client('s3')

myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
serializedMyData = json.dumps(myData)

s3.put_object(Bucket='mytestbucket',Key='EmpId007')

Retrieving a JSON From S3 Bucket

import boto3
import json

s3 = boto3.client('s3')
object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()

myData = json.loads(serializedObject)

print myData

Upload and Download a Text File

Boto3 supports upload_file() and download_file() APIs to store and retrieve files to and from your local file system to S3. As per S3 standards, if the Key contains strings with "/" (forward slash) will be considered as subfolders.

Uploading a File

import boto3

s3 = boto3.client('s3')
s3.upload_file(Bucket='mytestbucket', Key='subdir/abc.txt', Filename='./abc.txt')

Download a File From S3 Bucket

import boto3

s3 = boto3.clinet('s3')
s3.download_file(Bucket='mytestbucket',Key='subdir/abc.txt',Filename='./abc.txt')

Error Handling

The Boto3 APIs can raise various exceptions depends on the condition. For example, "DataNotFoundError","NoSuchKey","HttpClientError", "ConnectionError","SSLError" are few of them. The Boto3 exceptions inherit Python "Exception" class. So, handle the exceptions by looking for Exceptions class in error and exception handling in the code.

import boto3

try:
s3 = s3.client('s3')
except Exceptions as e:
        print "Exception ",e

Summary

Storing python objects to an external store has many use cases. For example, a game developer can store an intermediate state of objects and fetch them when the gamer resumes from where they left off, and the API developer can use an S3 object store as a simple key-value store. Please refer the URLs in the Reference sections to learn more. Thanks!

References

AWS Python (language) Object (computer science)

Published at DZone with permission of Saravanan Subramanian, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Boto3: Amazon S3 as Python Object Store

Use Amazon Simple Storage Service (S3) as an object store to manage Python data structures.

Introduction

Prerequisites

Connecting to S3

Connecting to Default Account (Profile)

Connecting to Specific Account (Profile)

Storing and Retrieving a Python LIST

Storing a List in S3 Bucket

Retrieving a List From S3 Bucket

Storing and Retrieving a Python Dictionary

Storing a Python Dictionary Object in S3

Retrieving Python Dictionary Object From S3 Bucket

Working With JSON

Storing a Python Dictionary Object As JSON in S3 Bucket

Retrieving a JSON From S3 Bucket

Upload and Download a Text File

Uploading a File

Download a File From S3 Bucket

Error Handling

Summary

References

Related

Partner Resources