Demystifying DynamoDB Performance
Avoid DynamoDB performance, cost, and management issues with a well-designed definition, pricing strategy, and key selection.
Join the DZone community and get the full member experience.
Join For FreeWhat Is DynamoDB?
DynamoDB is the popular key-value (NoSQL) datastore offered by Amazon Web Services (AWS). DynamoDB is known for low latency, high scalability, and integration with other AWS services.
DynamoDB being a fully managed service reduces the complexities of database management and the need for database administrators. DynamoDB offers two pricing models pay-as-you-go and reserved capacity to support different business use cases. This article focuses on the following aspects of DynamoDB:
- Best practices for designing DynamoDB tables
- Different provisioning modes and choosing between them
- Detecting and avoiding hot partitions
- Impacts of throughput dilation
Designing DynamoDB Tables
DynamoDB uses two key concepts:
- Partition key: The partition key defines which internal DynamoDB shard/partition data should be written into, which helps in data distribution across multiple partitions internally. It is important to choose the value as uniquely as possible to distribute the data evenly, which helps in low latency and high throughput.
- Range key: The range key, in conjunction with the partition key, makes the primary key of the table, which uniquely identifies a record in the table. The range key is also used for sorting data within the partition. Usually, a timestamp is used as the range key of the table.
How To Choose the Range Key and Partition Key
- The partition key is required; it should be unique to allow load distribution evenly across partitions.
- DynamoDB has an internal partition limit of 10 GB. If the partition key is not unique, it results in a large amount of data being written into the same partition. If the partition size exceeds 10 GB, data is sharded automatically to different machines, which results in reduced throughput.
- The combination of partition and range keys makes the unique key of the table. If the same key combination is used, values are overwritten.
- It is recommended to use DynamoDB TTL to reduce the partition size.
- Keep the name of the partition key and range key generic so that the values can be changed in the application later without a major code refactor. Example:
@Data
@NoArgsConstructor
@DynamoDBTable (tableName = DummyDynamoModel.TABLE_NAME)
public class DummyDynamoModel {
public static final String TABLE_NAME = "DummyTable";
public static final String SECOND_INDEX = "SecondIndex";
public static final long TIME_TO_LIVE = 604800 // 7 days
@DynamoDBHashKey
private string primaryHashKey;
@DynamoDBRangeKey
private string primaryRangeKey;
@DynamoDBIndexHashKey (globalSecondaryIndexName = SECOND_INDEX)
private string secondaryHashKey;
@DynamoDBIndexRangeKey (globalSecondaryIndexName = SECOND_INDEX)
private string secondaryRangeKey;
}
Provisioning Modes
DynamoDB supports two different kinds of provisioning modes: On-demand and provisioned. In on-demand mode, the user is charged based on the actual read/write capacity usage. In provisioned capacity mode, the user pre-pays the amount based on the expected capacity usage. Below are the details for each mode:
On-Demand
- With on-demand, the user only pays for the actual read and write capacity used by the application.
- There is no upfront cost or base price for on-demand mode.
- On-demand is ideal for applications with spiky or unpredictable traffic.
Provisioned
- Provisioned requires the user to specify the read and write throughput in advance.
- The user pre-pays the amount regardless of the actual read/write capacity usage.
- If the traffic is higher than the provisioned capacity, then additional traffic is throttled.
- Provisioned is a recommended mode for applications with predictable load. Provisioned is usually three times cheaper than on-demand.
In conclusion, On-demand is the recommended approach when traffic is unpredictable, while provisioned is recommended for applications with predictable traffic.
Hot Partition
A hot partition happens when there is uneven traffic between partitions which results in requests getting throttled even though the table request rate is less than the provisioned capacity. This results in unpredictable behavior of the application resulting in high latency and requests getting throttled. Below are recommendations to avoid hot partitions:
- Use a partition key that spreads the data evenly across multiple partitions, like a randomly generated unique identifier. Avoid using a generic partition key which results in overburdening single partitions, resulting in hot spots.
- Use a range key in combination with the partition key; this helps in spreading data even further within the partition.
- Use a caching strategy for frequently accessed data. For DynamoDB, AWS provides DynamoDB Accelerator (DAX), a fully managed, highly available caching service built for Amazon DynamoDB which can be explored as an option beside other caching solutions like Redis.
Throughput Dilution
In DynamoDB, throughput dilution happens when the provisioned capacity of a table is not used effectively. This happens when a table's capacity is reduced and has been scaled for high throughput previously. When the table is scaled, the extra throughputs cause DynamoDB to increase the number of partitions. When the throughput is reduced to previous levels, some requests are throttled even when the provisioned throughput on the table is not exceeded. This happens because there are fewer read and write throughput units per partition than before due to the increased number of partitions. The dilution of throughput can be handled in the following ways:
- Migrate to a new table.
- Specify higher table-level throughput to boost the throughput units per partition to previous levels.
Conclusion
In conclusion, even though DynamoDB is a fully managed datastore, it requires careful design from the start of the project. Based on the traffic pattern and cost analysis, decide to choose between on-demand and provisioned. Design the partition and range keys of the table in a way to avoid a huge refactor later. Choosing the right partition key strategy would prevent any future hot partition issues. Keeping an eye on the metrics of DynamoDB throughput dilution can be caught early. By keeping these strategies in mind, developers can design DynamoDB databases that are scalable, performant, and cost-effective.
Opinions expressed by DZone contributors are their own.
Comments