Geo-Replication
Ensure business continuity and quick disaster recovery as you explore this overview of quorum, global quorum, and the two geo-replication types.
Join the DZone community and get the full member experience.
Join For FreeDistributed databases spread across multiple regions use geo-replication to transfer data from one region to another. Such systems use quorum consensus to ensure fault tolerance. The reads and writes should also be ordered. Geo-replication enhances high availability for data systems because if one region goes down other regions are available to serve requests.
Quorum
In the above picture, P represents a primary node and S1, S2, and S3 are secondary nodes. P, S1, S2, and S3 combined together represent one replica unit. Each replica unit is fault tolerant; i.e., one node going down will not impact availability. If the primary goes down, one of the remaining secondary nodes will be elected as the next primary. In such a configuration, when a client sends a write request W1 to the primary, the primary, in addition to persisting the data in its storage, will also send the data to all the secondary nodes. The primary will acknowledge back to the client only when the majority of the nodes in the replica unit have acknowledged the write.
Reads can happen from any node and not just the primary. In the above configuration, the replica unit is fault tolerant to one node failure; i.e., the quorum is achieved when the majority (for example, three out of four) nodes have the write. The lagging node or faulty node can be recovered by copying all the data from a healthy node. When two nodes fail, then a quorum can’t be achieved, because there is no majority.
Geo-Replication and Global Quorum
In geo-replication, in addition to a given primary region having the write operation, the request is also replicated to all other secondary regions. This helps with high availability over region failures. For example, if East US goes down, then West US can serve requests by becoming the new primary region. This action is called failover and can be done automatically or manually. The primary region in this case can acknowledge a write request only when the majority of other secondary regions have seen the write request. This is similar to what happened in a local quorum within a replica unit.
Geo-Replication Types
When a client issues a write request to the primary region, it is replicated to other secondary data regions via a special secondary (XS). For example, one secondary out of the remaining secondary is elected to do this replication. The XS node will wait for the majority of regions to acknowledge back before it can commit to its store.
There are broadly two types of geo-replication strategies as discussed below:
Synchronous Geo-Replication
In this case, the primary node of the write region waits for an acknowledgement from XS even though it already satisfied its local quorum. For example, it sends an acknowledgement to the client only when there is both local and global quorum.
Asynchronous Geo-Replication
In this case, the primary node of the write region doesn’t wait for an acknowledgement from XS. For example, it can send an acknowledgement to the client when there is local quorum.
As we can see with synchronous geo-replication, the data is never lost. For example, the majority of nodes and regions will have the data. But in asynchronous replication, if only one region knows about the data, then there can be regional outages causing data loss. However, asynchronous replication systems are faster compared to synchronous replication systems because the acknowledgement wait time is shorter. Commercial databases find a balance between the two approaches.
Conclusion
This article gives a brief overview of quorum, global quorum, and the two geo-replication types. The databases supporting geo-replication offer high availability with manual or automatic failover capabilities. They offer good scalability when data is distributed geographically. These architectures ensure business continuity and quick disaster recovery in case of regional disasters or large-scale outages.
Opinions expressed by DZone contributors are their own.
Comments