Boto3: Amazon S3 as Python Object Store
Use Amazon Simple Storage Service (S3) as an object store to manage Python data structures.
Join the DZone community and get the full member experience.
Join For FreeUse Amazon Simple Storage Service (S3) as an object store to manage Python data structures.
Introduction
Amazon S3 is extensively used as a file storage system to store and share files across the internet. Amazon S3 can be used to store any type of objects, it is a simple key-value store. It can be used to store objects created in any programming languages, such as Java, JavaScript, Python, etc. AWS DynamoDB recommends using S3 to store large items of size more than 400KB. This article focuses on using S3 as an object store using Python.v
Prerequisites
The Boto3 is the official AWS SDK to access AWS services using Python code. Please ensure Boto3 and awscli are installed in the system.
$pip install boto3
$pip install awscli
Also, configure the AWS credentials using "aws configure" command or set up environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY store your keys in the environment. Please DO NOT hard code your AWS Keys inside your Python program.
To configure aws credentials, first install awscli and then use "aws configure" command to setup. For more details, refer to AWS CLI Setup and Boto3 Credentials.
Configure the AWS credentials using the following command:
$aws configure
Do a quick check to ensure you can reach AWS.
$aws s3 ls
The above CLI must show the S3 buckets created in your AWS account. The AWS account will be selected based on the credentials configured. In case, multiple AWS accounts are configured, use the "--profile " option in the AWS CLI. If you don't mention "--profile " option the CLI takes the profile "default".
Use the below commands to configure development profile named "dev" and validate the settings.
$aws configure -profile dev
$aws s3 ls --profile dev
The above command show s3 buckets present in the account which belongs to "dev" profile.
Connecting to S3
Connecting to Default Account (Profile)
The client() API connects to the specified service in AWS. The below code snippet connects to S3 using the default profile credentials and lists all the S3 buckets.
import boto3
s3 = boto3.client('s3')
buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
print bucket['CreationDate'].ctime(), bucket['Name']
Connecting to Specific Account (Profile)
To connect to a specific account, first, create a session using Session() API. The Session() API allows to mention the profile name and region. It also allows to specify the AWS credentials.
The below code snippet connects to an AWS account configured using "dev" profile and lists all the S3 buckets.
import boto3
session = boto3.Session(profile_name="dev", region_name="us-west-2")
s3 = session.client('s3')buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
print bucket['CreationDate'].ctime(), bucket['Name']
Storing and Retrieving a Python LIST
Boto3 supports put_object()and get_object() APIs to store and retrieve objects in S3. But the objects must be serialized before storing. The python pickle library supports serialization and deserialization of objects. Pickle is available by default in Python installation.
The APIs pickle.dumps() and pickle.loads() is used to serialize and deserialize Python objects.
Storing a List in S3 Bucket
Ensure serializing the Python object before writing into the S3 bucket. The list object must be stored using a unique "key." If the key is already present, the list object will be overwritten.
import boto3
import pickle
s3 = boto3.client('s3')
myList=[1,2,3,4,5]
#Serialize the object
serializedListObject = pickle.dumps(myList)
#Write to Bucket named 'mytestbucket' and
#Store the list using key myList001
s3.put_object(Bucket='mytestbucket',Key='myList001',Body=serializedListObject)
The put_object()API may return a "NoSuchBucket" exception if the bucket does not exists in your account.
NOTE: Please modify bucket name to your S3 bucket name. I don't won this bucket.
Retrieving a List From S3 Bucket
The list is stored as a stream object inside Body. It can be read using read() API of the get_object() returned value. It can throw an "NoSuchKey" exception if the key is not present.
import boto3
import pickle
#Connect to S3
s3 = boto3.client('s3')
#Read the object stored in key 'myList001'
object = s3.get_object(Bucket='mytestbucket',Key='myList001')
serializedObject = object['Body'].read()
#Deserialize the retrieved object
myList = pickle.loads(serializedObject)
print myList
Storing and Retrieving a Python Dictionary
Python dictionary objects can be stored and retrieved in the same way using put_object() and get_object() APIs.
Storing a Python Dictionary Object in S3
import boto3
import pickle
#Connect to S3 default profile
s3 = boto3.client('s3')
myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
#Serialize the object
serializedMyData = pickle.dumps(myData)
#Write to S3 using unique key - EmpId007
s3.put_object(Bucket='mytestbucket',Key='EmpId007')
Retrieving Python Dictionary Object From S3 Bucket
Use the get_object() API to read the object. The data is stored as a stream inside the Body object. This can be read using read() API.
import boto3
s3 = boto3.client('s3')
object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()
myData = pickle.loads(serializedObject)
print myData
Working With JSON
When working with Python dictionary, it is recommended to store it as JSON if the consumer applications are not written in Python or do not have support for Pickle library.
The API, json.dumps(), converts the Python Dictionary into JSON, and json.loads() converts a JSON to a Python dictionary.
Storing a Python Dictionary Object As JSON in S3 Bucket
import boto3
import json
s3 = boto3.client('s3')
myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
serializedMyData = json.dumps(myData)
s3.put_object(Bucket='mytestbucket',Key='EmpId007')
Retrieving a JSON From S3 Bucket
import boto3
import json
s3 = boto3.client('s3')
object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()
myData = json.loads(serializedObject)
print myData
Upload and Download a Text File
Boto3 supports upload_file() and download_file() APIs to store and retrieve files to and from your local file system to S3. As per S3 standards, if the Key contains strings with "/" (forward slash) will be considered as subfolders.
Uploading a File
import boto3
s3 = boto3.client('s3')
s3.upload_file(Bucket='mytestbucket', Key='subdir/abc.txt', Filename='./abc.txt')
Download a File From S3 Bucket
import boto3
s3 = boto3.clinet('s3')
s3.download_file(Bucket='mytestbucket',Key='subdir/abc.txt',Filename='./abc.txt')
Error Handling
The Boto3 APIs can raise various exceptions depends on the condition. For example, "DataNotFoundError","NoSuchKey","HttpClientError", "ConnectionError","SSLError" are few of them. The Boto3 exceptions inherit Python "Exception" class. So, handle the exceptions by looking for Exceptions class in error and exception handling in the code.
import boto3
try:
s3 = s3.client('s3')
except Exceptions as e:
print "Exception ",e
Summary
Storing python objects to an external store has many use cases. For example, a game developer can store an intermediate state of objects and fetch them when the gamer resumes from where they left off, and the API developer can use an S3 object store as a simple key-value store. Please refer the URLs in the Reference sections to learn more. Thanks!
References
Published at DZone with permission of Saravanan Subramanian, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments