Consistency Patterns Demystified
In this article, we will delve into the essential consistency patterns that can ensure the reliability of distributed systems.
Join the DZone community and get the full member experience.
Join For FreeA distributed system provides benefits such as scalability and fault tolerance. However, maintaining consistency across the distributed system is non-trivial. Consistency is vital to achieving reliability, deterministic system state, and improved user experience.
A distributed system replicates the data across multiple servers to attain improved fault tolerance, scalability, and reliability. The consistency patterns (consistency models) are a set of techniques for data storage and data management in a distributed system. The consistency pattern determines the data propagation across the distributed system. Hence, the consistency pattern will impact the scalability and reliability of the distributed system.
There are numerous consistency patterns in distributed systems. The choice of the consistency pattern depends on the system requirements and use cases because each consistency pattern has its benefits and drawbacks. Consistency patterns must be at the crux of multi-data center system architecture as it’s non-trivial to maintain consistency across multiple data centers. The consistency patterns can be broadly categorized as follows:
- Strong consistency
- Eventual consistency
- Weak consistency
The eventual consistency model is an optimal choice for distributed systems that favor high availability and performance over consistency. Strong consistency is an optimal consistency model when the same data view must be visible across the distributed system without delay. In summary, each consistency model fits a different use case and system requirements.
Strong Consistency
In the strong consistency pattern, read operations performed on any server must always retrieve the data that was included in the latest write operation. The strong consistency pattern typically replicates data synchronously across multiple servers. Put another way, when a write operation is executed on a server, subsequent read operations on every other server must return the latest written data.
The benefits of strong consistency are the following:
- Simplified application logic.
- Increased data durability.
- Guaranteed consistent data view across the system.
The limitations of strong consistency are as follows:
- Reduced availability of the service.
- Degraded latency
- Resource-intensive
The workflow to reach strong consistency in data replication is the following:
- The server (client) executes a write operation against the primary database instance.
- The primary instance propagates the written data to the replica instance.
- The replica instance sends an acknowledgment signal to the primary instance.
- The primary instance sends an acknowledgment signal to the client.
The popular use cases of the strong consistency model are the following:
- File systems
- Relational databases
- Financial services such as banking.
- Semi-distributed consensus protocols such as two-phase commit (2PC).
- Fully distributed consensus protocols such as Paxos.
For instance, any changes to the user's bank account balance must be immediately replicated for improved durability and reliability. Google’s Bigtable and Google’s Spanner databases are real-world applications of strong consistency.
Eventual Consistency
In the eventual consistency pattern, when a write operation is executed against a server, the immediate subsequent read operations against other servers do not necessarily return the latest written data. The system will eventually converge to the same state, and the latest data will be returned by other servers on succeeding read operations. The eventual consistency pattern typically replicates the data asynchronously across multiple servers. In layman’s terms, any data changes are only eventually propagated across the system, and stale data views are expected until data convergence occurs.
Eventual consistency can be implemented through multi-leader or leaderless replication topology. The system converges to the same state usually in a few seconds, but the time frame depends on the implementation and system requirements. The benefits of eventual consistency pattern are as follows:
- Simple
- Highly available
- Scalable
- Low latency
The drawbacks of eventual consistency are the following:
- Weaker consistency model
- Potential data loss
- Potential data conflicts
- Data inconsistency
The workflow to attain eventual consistency in data replication is the following:
- The client executes a write operation against the primary database instance.
- The primary instance sends an acknowledgment signal to the client.
- The primary instance eventually propagates the written data to the replica instance.
The eventual consistency pattern is a tradeoff between data staleness and scalability. The typical use cases of eventual consistency are the following:
- Search engine indexing
- URL shortener
- Domain name server (DNS)
- Simple mail transfer protocol (SMTP)
- Object storage such as Amazon S3.
- Comments or posts on social media platforms such as Facebook.
- Distributed communication protocol such as gossip protocol.
- Leader-follower and multi-leader replication
- Distributed counter and live comment service
For example, any changes to the domain name records are replicated eventually by DNS. Distributed databases such as Amazon Dynamo and Apache Cassandra are real-world applications of the eventual consistency pattern. Eventual consistency is not a design flaw but a feature to satisfy certain use cases. The business owner should determine whether application data is a candidate for the eventual consistency pattern.
Weak Consistency
In the weak consistency pattern, when a write operation is executed against a server, the subsequent read operations against other servers may or may not return the latest written data. In other words, a best-effort approach to data propagation is performed — the data may not be immediately propagated. The distributed system must meet various conditions, such as the passing of time, before the latest written data can be returned.
The advantages of weak consistency are the following:
- High availability
- Low latency
The disadvantages of weak consistency are as follows:
- Potential data loss
- Data inconsistency
- Data conflicts
The write-behind (write-back) cache pattern is an example of weak consistency. The data will be lost if the cache crashes before propagating the data to the database. The workflow of the write-behind cache pattern is the following:
- The client executes a write operation against the cache server.
- The cache writes the received data to the message queue.
- The cache sends an acknowledgment signal to the client.
- The event processor asynchronously writes the data to the database.
The common use cases of weak consistency are the following:
- Real-time multiplayer video games
- Voice over Internet Protocol (VoIP)
- Live streams
- Cache server
- Data backups
For instance, the lost video frames due to poor network connectivity are not retransmitted in a live stream.
Tradeoffs of Consistency Patterns
The tradeoffs associated with each consistency pattern can be outlined as the following:
Further System Design Learning Resources
Subscribe to the system design newsletter and never miss a new blog post again. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.
Other Consistency Models in Distributed Systems
A distributed quorum can be used to implement various consistency patterns. The configuration of quorum parameters decides the consistency pattern that will be achieved.
Linearizability
In the linearizability pattern, the data written to a server must be immediately visible (between the start and end of the write operation) to subsequent read operations against other servers. Linearizability is a variant of strong consistency and is also known as atomic consistency. The following techniques can be used to implement linearizability:
- Single leader to handle both read and write operations.
- Distributed consensus algorithms such as Paxos.
- Distributed quorum
The advantages of linearizability are as follows:
- Makes a distributed system behave as if the system were non-distributed.
- Simple for application to use.
The disadvantages of linearizability are the following:
- Degraded performance
- Limited scalability
- Reduced availability
One of the popular use cases of the linearizability pattern is the implementation of the user ID field’s uniqueness constraint in a distributed system.
Causal Consistency
In the causal consistency pattern, the related events (cause-effect) are observed in the exact order by other servers, while unrelated events might be observed without a specific ordering by other servers. Causal consistency is a variant of eventual consistency and emerges as a middle ground between eventual consistency and strong consistency. The write operations that are causally unrelated or occur in parallel in real time are known as concurrent events. The causal consistency pattern does not guarantee ordering for concurrent events.
The cause-effect relationships in the causal consistency pattern can be implemented via vector clocks. The benefits of causal consistency are as follows:
- Low latency
- Reduced cost of synchronization
- High availability
- Relatively stronger consistency
The widespread use cases of the causal consistency pattern are the following:
- Apache Cassandra provides lightweight transactions with causal consistency.
- Data propagation in Bayou distributed database.
- A comment thread on social media platforms such as Reddit.
For example, replies to the same comment thread on Reddit must be causally ordered. However, unrelated comment threads can be shown in any order. The causal consistency pattern is also used in real-time chat services such as Slack.
Summary
Numerous consistency patterns can be employed in different parts of the same distributed system. There is no silver bullet but only tradeoffs when it comes to choosing a suitable consistency pattern. The optimal choice of consistency pattern depends on the specific use case and requirements.
Published at DZone with permission of N K. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments