Encrypting Sensitive Data Stored on S3
S3 offers a bunch of options to encrypt your data at rest. Check out this tutorial to learn more about using server-side and client-side encryption with S3!
Join the DZone community and get the full member experience.
Join For FreeS3 comes with a bunch of features to encrypt your data at rest. Data at rest refers to the inactive data stored physically on a disk. Before we dive into encrypting data at rest, I want to highlight that there is also data in use and data in transit. If the data is in memory, it is in use. If the data is on the network, then it is in transit. If you transfer data to S3, it is TLS encrypted by default. This blog post will guide you through all ways to encrypt your S3 data at rest.
Comparing Options
S3 offers a bunch of options to encrypt your data at rest. The fundamental questions to compare the options are:
- Who en/decrypts the data? Data encryption can happen either on your side (client-side encryption) or on AWS (server-side encryption or SSE). When you encrypt data on your side, the data transferred to S3 is already encrypted. S3 never sees the raw data. Server-side encryption is different because you send the raw data to S3 where it is encrypted.
- Who stores the secret? Imagine you encrypted all your pictures and uploaded them to S3. You store the secret used for encryption on your USB stick. A few months later, you want to look at your pictures. Unfortunately, the USB stick where you stored the secret is broke. The loss of this USB stick is a catastrophe. You are no longer able to decrypt your pictures, and they are gone forever.
- Who manages the secret? Data encryption makes no sense if everyone can access your secret. Managing access to the secret is a great responsibility.
The following table summaries the available options on S3 to encrypt your data at rest.
Let’s dive into the details of each option!
Server-Side Encryption
Server-side encryption means that you send unencrypted raw data to AWS. On the AWS infrastructure, the raw data is encrypted and finally stored on disk. When you retrieve data, AWS reads the encrypted data from the disk, decrypts the data, and sends raw data back to you. The en/decryption is transparent to the AWS user.
SSE-AES
SSE-AES
is a straightforward approach. AWS handles encryption and decryption for you on the server-side using the aes256 algorithm. AWS also controls the secret key that is used for encryption/decryption.
To upload a file and store it encrypted, run:
aws s3 cp path/to/local.file s3://bucket-name/sse-aes --sse AES256
To download the decrypted file, run:
aws s3 cp s3://bucket-name/sse-kms path/to/local.file
SSE-KMS (AWS-Managed CMK)
SSE-KMS
is very similar to SSE-AES
. The only difference is that the secret key (aka AWS-managed Customer Master Key (CMK)) is provided by the KMS service and not by S3.
To upload a file and store it encrypted, run:
aws s3 cp path/to/local.file s3://bucket-name/sse-kms --sse aws:kms
To download the decrypted file, run:
aws s3 cp s3://bucket-name/sse-kms path/to/local.file
The AWS managed CMK comes with the following default key policy that you can not modify. The default key policy allows:
- The S3 service is called from the same AWS account to encrypt/decrypt using the CMK
- IAM in the same AWS account to use authorize read-only API actions
This is the policy in its full length:
{
"Version": "2012-10-17",
"Id": "auto-s3-2",
"Statement": [
{
"Sid": "Allow access through S3 for all principals in the account that are authorized to use S3",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.us-east-1.amazonaws.com",
"kms:CallerAccount": "ACCOUNT_ID"
}
}
},
{
"Sid": "Allow direct access to key metadata to the account",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT_ID:root"
},
"Action": [
"kms:Describe*",
"kms:Get*",
"kms:List*"
],
"Resource": "*"
}
]
}
You can not delete or restrict the AWS managed CMK used by S3!
SSE-KMS (Customer-Managed CMK)
Alternatively, you can manage the secret key (aka customer-managed Customer Master Key) using the KMS service. You create a Customer Master Key (CMK) and reference that key for encryption/decryption. At any time, you can delete the CMK to make all data useless. You also have full control over the CMK by customizing the key policy.
To create a basic CMK, run:
aws kms create-key
``
The key policy will allow access from all IAM entities in your AWS account (as long as the IAM policy allows it).
Your output will look similar:
{
“KeyMetadata”: {
“Origin”: “AWS_KMS”,
“KeyId”: “858d8d36-c87b-4b48-9a41-b69b7ad9d4e2”,
“Description”: “”,
“KeyManager”: “CUSTOMER”,
“Enabled”: true,
“KeyUsage”: “ENCRYPT_DECRYPT”,
“KeyState”: “Enabled”,
“CreationDate”: 1534164269.969,
“Arn”: “arn:aws:kms:us-east-1:ACCOUNT_ID:key/858d8d36-c87b-4b48-9a41-b69b7ad9d4e2”,
“AWSAccountId”: “ACCOUNT_ID”
}
}
Remember somewhere the `KeyId` value (e.g., `858d8d36-c87b-4b48-9a41-b69b7ad9d4e2`).
To upload a file and store it encrypted using your newly created CMK, run and replace `KMS_KEY_ID` with the `KeyId` value:
aws s3 cp path/to/local.file s3://bucket-name/sse-kms-cmk –sse aws:kms –sse-kms-key-id KMS_KEY_ID
To download the decrypted file, run:
aws s3 cp s3://bucket-name/sse-kms-cmk path/to/local.file
Now, disable the CMK:
aws kms disable-key –key-id KMS_KEY_ID
And try to download the file again and you will run into an error (dKMS.DisabledException). That's the difference compared with the AWS managed CMK that you can not control.
Finally, mark the CMK for deletion to avoid future costs:
aws kms schedule-key-deletion –key-id KMS_KEY_ID
You will never be able to retrieve the file from S3 once you delete the CMK!
### SSE-C
With `SSE-C`, you are in charge of the secret key while AWS still cares about encryption/decryption. Every time you call the S3 API, you also have to attach the secret key.
To generate a random 32 bytes (256 bits) secret key, run:
openssl rand -out sse-c.key 32
To upload a file and store it encrypted, run:
aws s3 cp path/to/local.file s3://bucket-name/sse-c –sse-c AES256 –sse-c-key fileb://sse-c.key
The big difference comes when you want to download the file again. Now you also have to provide the secret key.
aws s3 cp s3://bucket-name/sse-c path/to/local.file –sse-c AES256 –sse-c-key fileb://sse-c.key
If you lose the key, you can not retrieve the data from S3 anymore!
Client-Side Encryption
Client-side encryption means that you encrypt the data before you send it to AWS. It also means that you decrypt the data that you retrieve from AWS. Usually, client-side encryption needs to be deeply embedded into your application.
AWS SDK and KMS
You can use the AWS SDK to upload/download files from S3. The KMS service can generate data keys that you can use for encryption/decryption. The data key itself is encrypted using the KMS Customer Master Key. If you want to use the encrypted data key, you have to send the encrypted data to the KMS service and ask for decryption. The decrypted data key is only returned if the CMK is still available and you have permissions to use it.
I implemented client-side encryption using the AWS SDK for Node.js. The encrypted key is uploaded together with the object to reduce the risk of losing the data key for a particular object. Keep in mind that the implementation is not very efficient, since you will make a call to the KMS service for every encryption and decryption operation. It could make sense to keep the data keys in memory and reuse them for multiple objects.
You can find the full source code on GitHub.
Let’s dive into the code!
First, we need a way to create an encrypted data key. The encrypted data key is stored on a disk as a performance optimization. Multiple encrypt calls reuse the data key as long as you do not remove or override the file data.key
in your current working directory.
const util = require('util');
const fs = require('fs');
const AWS = require('aws-sdk');
const kms = new AWS.KMS({apiVersion: '2014-11-01'});
const TEMP_DATA_KEY_FILE_NAME = 'data.key';
const writeFile = util.promisify(fs.writeFile);
exports.create = async (kmsKeyId) => {
const data = await kms.generateDataKey({
KeyId: kmsKeyId,
KeySpec: 'AES_256'
}).promise();
await writeFile(TEMP_DATA_KEY_FILE_NAME, data.CiphertextBlob);
return TEMP_DATA_KEY_FILE_NAME;
};
You created a data key, but it is encrypted. Before you can use the data key, you have to decrypt it.
const getDecryptedDataKeyBuffer = async (encryptedKeyBuffer) => {
const data = await kms.decrypt({CiphertextBlob: encryptedKeyBuffer}).promise();
return data.Plaintext;
};
The AES algorithm that you will use for encryption relies on an (initialization vector (IV))[https://en.wikipedia.org/wiki/Initialization_vector]. The IV is generated randomly and ensures that similar data results in the very different ciphertext. The IV is also needed when decrypting the ciphertext.
const crypto = require('crypto');
const IV_LENGTH = 8;
const generateIVBuffer = (keyBuffer) => {
const salt = crypto.randomBytes(16);
const iv = crypto.pbkdf2Sync(keyBuffer, salt, 100000, IV_LENGTH, 'sha512');
return iv;
};
Last but not least, we need a way to parse S3 URIs (e.g., s3://bucket-name/key
) that are used to specify the location on S3.
const url = require('url');
const parseS3Uri = (uri) => {
const u = new url.URL(uri);
if (u.protocol !== 's3:') {
throw new Error('invalid S3 URI');
}
return {
Bucket: u.hostname,
Key: u.pathname
};
};
You might be impatient to see the implementation of the encryption. The encrypted data key is stored together with the IV and the file’s content on S3. You also add a small (8 bytes) header at the beginning of the file to add some metadata that you need for decryption. The idea to store the encrypted data key together with the encrypted data is called envelope encryption.
const s3 = new AWS.S3({apiVersion: '2006-03-01'});
const HEADER_LENGTH = 8;
const readFile = util.promisify(fs.readFile);
exports.encrypt = async (inputFile, s3Uri) => {
const encryptedKeyBuffer = await readFile(TEMP_DATA_KEY_FILE_NAME);
const decryptedKeyBuffer = await getDecryptedDataKeyBuffer(encryptedKeyBuffer);
const plainBuffer = await readFile(inputFile);
const ivBuffer = generateIVBuffer(decryptedKeyBuffer);
const cipher = crypto.createCipheriv('aes256', decryptedKeyBuffer, ivBuffer.toString('hex'));
const headerBuffer = Buffer.alloc(HEADER_LENGTH);
headerBuffer.writeUInt8(1, 0); // header version
headerBuffer.writeUInt8(0, 1); // reserved for future use
headerBuffer.writeUInt8(0, 2); // reserved for future use
headerBuffer.writeUInt8(0, 3); // reserved for future use
headerBuffer.writeUInt32LE(encryptedKeyBuffer.length, 4); // length of encrypted data key
const bodyBuffer = Buffer.concat([headerBuffer, encryptedKeyBuffer, ivBuffer, cipher.update(plainBuffer), cipher.final()]);
const params = Object.assign({}, parseS3Uri(s3Uri), {Body: bodyBuffer});
await s3.putObject(params).promise();
return s3Uri;
};
An encrypted object cannot be uploaded to S3. Let’s look at the reverse operation:
exports.decrypt = async (s3Uri, outputFile) => {
const params = parseS3Uri(s3Uri);
const object = await s3.getObject(params).promise();
const bodyBuffer = object.Body;
const headerBuffer = bodyBuffer.slice(0, HEADER_LENGTH);
const headerVersion = headerBuffer.readUInt8(0);
if (headerVersion !== 1) {
throw new Error('Unsupported header version');
}
const encryptedKeyLength = headerBuffer.readUInt32LE(4);
const encryptedKeyBuffer = bodyBuffer.slice(8, 8 + encryptedKeyLength);
const decryptedKeyBuffer = await getDecryptedDataKeyBuffer(encryptedKeyBuffer);
const ivBuffer = bodyBuffer.slice(8 + encryptedKeyLength, 8 + encryptedKeyLength + IV_LENGTH);
const decipher = crypto.createDecipheriv('aes256', decryptedKeyBuffer, ivBuffer.toString('hex'));
const decryptedBuffer = Buffer.concat([decipher.update(bodyBuffer.slice(8 + encryptedKeyLength + IV_LENGTH)) , decipher.final()]);
await writeFile(outputFile, decryptedBuffer);
return outputFile;
};
You can find the full source code on GitHub.
AWS SDK and Self-Managed Secrets
You likely need an HSM device — could be AWS CloudHSM — (https://en.wikipedia.org/wiki/PKCS_11)[PKCS11] to implement the same idea as described above using AWS KMS.
Conclusion
S3 offers a bunch of options to encrypt your data at rest. Usually, the criticality of your data determines the options you can choose from. Is it okay if AWS technically sees your raw data? If yes, server-side encryption is the right option for you. If not, go with client-side encryption. Keep in mind that client-side encryption requires know-how and is more effort to implement compared to server-side encryption. The AWS Encryption SDKs (Java and Python) might help to implement client-side encryption.
Published at DZone with permission of Michael Wittig. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments