How to Keep Your Files Safe in S3 with Versioning
Warning! Your important files are at risk of being accidentally or maliciously overwritten or deleted and will be lost forever!
Join the DZone community and get the full member experience.
Join For FreeA new S3 bucket has versioning disabled by default. By enabling versioning, S3 will manage an unlimited amount of historical versions for your objects. Uploading an object will not overwrite but will instead create a new version. Deleting an object will not remove but will instead create a delete marker, which is a placeholder that enables S3 to keep track of deleted objects without actually removing the object.
This means that your files are safe because only the bucket owner can delete previous versions.
What’s the downside, you ask? Cost!
AWS charges for every stored version, which can add up quickly depending on how frequently you are replacing and deleting existing objects.
Every time you delete an object, the object isn’t being removed; you will continue to pay storage costs for that object.
So how can I get the benefits of versioning while still keeping your costs down? The answer is lifecycle rules.
Using lifecycle rules, you can delete older versions after some time automatically (more on this later)
How to Enable Versioning for Your Bucket?
You can only enable versioning on the bucket level.
You can turn on versioning by accessing the S3 section in the AWS Console and performing the steps below:
-
Select the bucket from the list.
Tip: Enabling versioning is a pre-requisite to other S3 capabilities such as cross-region replication and object locking.
Disclaimer: You Only Need to Read This if You Already Have Lifecycle Rules Setup
If you have an existing lifecycle rule setup to delete objects after a specified period, the behavior will change. Before versioning, the lifecycle rule will simply remove the object. After versioning, the object is made a previous version, and you will still pay storage costs on that object. To maintain the past behavior, you would need to modify the rule to extend to earlier versions as well. You can perform the following steps to accomplish this:
The above lifecycle rule translates to:
a) Create a delete marker for the newest version of an object if it was created more than a day ago.
b) Any older version which has not been the latest version for more than a day should be permanently deleted.
This means that there is a waiting period of a minimum of 2 days before an active object can be fully cleaned up. One day to make the current version a previous one and one to remove the previous one.
The Details: How Does Versioning Actually Work?
When retrieving an object, S3 will always give you the latest version. If the newest version is a delete marker, the object will appear deleted. You can ask S3 for a specific version of an object by specifying the version number when requesting an object. Alternatively, you can tell S3 to list all the versions of an object and then choose the one you would like to retrieve.
A version is considered a regular S3 object and, therefore, can have its own permissions and encryption settings.
Creating a Version
Uploading any object will automatically create a new version and be given a version id, any object stored before enabling versioning will have a version id of null.
Retrieving the Latest Version of An Object
When requesting an object, S3 will always give you the latest version. If the newest version is a delete marker, you will receive a 404 error and a response header of “x-amz-delete-marker:true”.
The URL to access an object consists of the bucket name, the region, and filename, for example:
GET https://bucketname.s3.region.amazonaws.com/image.jpg
Retrieving a Specific Version of An Object
You can retrieve a specific version of an object by adding a query string parameter called “version id,” as shown below;
GET: https://bucket.s3.region.amazonaws.com/image.jpg?versionId=f5jZYxWqfe.WlmF73GctmFHqVYfdrf8.
Alternatively, you can make a HEAD request just to get the metadata of the object without the actual contents, as shown below:
HEAD: https://test-bucket-2-dg.s3.amazonaws.com/image.jpg?versionId=f5jZYxWqfe.WlmF73GctmFHqVYfdrf8.
Listing the Files in A Bucket
Listing the files in a bucket will only return the current version of all objects stored and exclude anything which has a delete marker.
An example of listing the files in a bucket is below:
GET https://bucket.s3.region.amazonaws.com/
The response…
Listing the Object Versions in A Bucket
You can list all versions of all files in a bucket by calling
GET https://bucket.s3.region.amazonaws.com/?versions
The response…
GET https://test-bucket-2-dg.s3.us-east-1.amazonaws.com/?versions&prefix=image.jpg
Deleting an Object
You can delete an object by hiding versions (1), selecting the object to remove (2), clicking actions (3), and then delete (4).
This means that anyone trying to retrieve the object will see that it’s deleted, but the previous versions will still exist, and you can either restore from an older version or remove the delete marker to restore the file.
Here’s how you would remove the delete marker to restore the file:
Delete markers accrue a nominal charge for storage in Amazon S3. The storage size of a delete marker is equal to the size of the key name of the delete marker. The UTF-8 encoding adds from 1 to 4 bytes of storage to your bucket for each character in the name.
Deleting a Version
When deleting a version, the version will be permanently erased. Only the bucket owner or those with the correct permissions can delete a version.
Additionally, you can require multi-factor authentication (MFA) to delete a version. This will require a multi-factor token to be used to perform the deletion or change any versioning settings for the bucket. Muti Factor delete cannot be enabled through the AWS console at this time, and only the root account can activate it.
When MFA is enabled, every delete API request will need the header “x-amz-MFA” which is a combination of your authentication device’s serial number, and the authentication code displayed on it. An example is below:
DELETE /image.jpg?versionId=3HL4kqCxf3vjVBH40Nrjfkd
x-amz-mfa: 20899872 301749
Restoring a Version
Restoring a version of a previous object can be accomplished in one of two ways
- The first is to delete the current version of the object, which will cause S3 to promote the last version as the current one. You would use this technique if you are not concerned about losing any version information, as it will require you to delete every version from the current until the version you wanted to restore.
- An alternative approach is to download the version you want to restore and re-upload as the current version. You can also issue a copy API call to avoid downloading the file. You would use this technique if you didn’t want to lose any version history.
Lifecycle Rules
You can use lifecycle rules to clean up older versions by:
- Removing previous versions after a specific period
- Adding a delete marker to current versions after a particular period
- Cleaning up old expired delete markers
To set up a lifecycle rule, perform the following steps:
Decide whether the rule should apply to every object or just specific objects based on either a tag or prefix.
Leave this next screen blank.
Decide whether to insert a delete marker on the current version (1) or have this rule permanently delete previous versions after a specified period (2). Additionally, you can have the rule cleanup expired delete markers (3), which serve no purpose and can degrade the performance of List Operations.
Save the rule:
Cleaning Up Expired Delete Markers
S3 doesn’t clean up automatically expired delete markers. If an object is deleted more than once, S3 will keep both the current delete marker, which indicates the object is currently deleted and the old one, which serves no purpose.
Not Happy with Versioning! Let’s Disable It!
Let’s say you decide versioning isn’t for you! Let’s go ahead and disable it or can we? The truth is you can only suspend versioning, which means you can prevent new versions from being created, but it will not automatically cleanup old versions. You will continue to be charged for those versions until they are removed from your bucket. Additionally, lifecycle rules will continue to run for previous versions even though versioning is suspended.
Only the bucket owner or those with the relevant permissions can suspend versioning.
To truly disable versioning, you would need to create a new bucket and copy all your objects into that bucket.
When versioning is suspended, your old versions are safe and can’t be accidentally overwritten, except for any object that has a version id of null. Every version id must be unique, so only one version can have an id of null. This applies to both full objects as well as delete markers.
Suppose you uploaded an object with versioning disabled, and that object was given a version id of null. After enabling versioning, you uploaded several more versions of that file, as shown below.
Deleting an object with versioning suspended will create a delete marker with a version id of null. If there is already an existing version with an id of null, it will be replaced.
Cleaning up Previous Versions
To clean up older versions, you can either delete those versions manually or set up a temporary lifecycle policy to handle it for you.
I hope you enjoyed this article. Feel free to leave any comments below.
Opinions expressed by DZone contributors are their own.
Comments