Intrinsic Aspects of Apache ZooKeeper and Their Importance
This article explores ZNodes, sessions, watches, quorum, transactions, and local storage and snapshots, all aspects of Apache ZooKeeper.
Join the DZone community and get the full member experience.
Join For FreeAs a bird’s eye view, Apache Zookeeper has been leveraged to get coordination services for managing distributed applications. It holds responsibility for providing configuration information, naming, synchronization, and group services over large clusters in distributed systems. As an example, Apache Kafka uses ZooKeeper for choosing their leader node for the topic partitions. Please click here if you want to read how to set up the multi-node Apache Zookeeper cluster on Ubuntu/Linux.
ZNodes
The key concept of ZooKeeper is the ZNode, which can act as either files or directories. ZNodes can be replicated between servers as they are working in a distributed file system. A ZNode can be described by a data structure called stats, and it consolidates information about ZNode context like creation time, number of changes (as version), number of children, length of stored data or ZXID (ZooKeeper transaction ID) of creation, and last change.
For every modification of ZNodes, its version increases. The ZNodes are classified into three categories: persistence, sequential, and ephemeral. A persistence ZNode is alive even after the client, which created that particular ZNode, is disconnected. Also, they survive after ZooKeeper restarts. Ephemeral ZNodes are active until the client is alive. As soon as the client gets disconnected from the ZooKeeper ensemble, then the ephemeral ZNodes also get deleted automatically. Sequential ZNodes can be either persistent or ephemeral. Once a new ZNode is created as a sequential ZNode, then ZooKeeper sets the path of the ZNode by attaching a 10-digit sequence number to the original name. The sequential ZNode can be easily differentiated from the normal ZNode with the help of different suffixes. The ZNodes can have public or more restricted access. The access rights can be managed by special ACL permissions.
Sessions
Apache ZooKeeper’s operation relies heavily on sessions. The session will be established, and the client will be given a session ID (a 64-bit number) when they connect to the ZooKeeper server. A session has a timeout period which is specified in milliseconds. The session might get expired when the connection remains idle for more than the timeout period. The sessions are kept alive by the client sending a ping request (heartbeat) to the ZooKeeper service. By using a TCP connection, a client maintains the sessions with the Zookeeper server. When a session ends for any reason, the ephemeral ZNodes created during that session also get deleted. The right session timeout is determined by several factors, including the size of the ZooKeeper ensemble, application logic complexity, and network congestion.
Watches
The client can easily receive notifications about changes to the ZooKeeper ensemble through watches. The clients are able to set watches while reading a specific ZNode. Any time a ZNode (on which the client registers) changes, watches notify the registered client. Data associated with the ZNode or changes in the ZNode’s children are referred to as ZNode changes. Watches are only activated once. A client must perform a second read operation if they want a notification again. The client will be disconnected from the server, and the associated watches will also be removed when a connection session expires. The watches registered on a ZNode can be removed with a call to removeWatches
. Also, a ZooKeeper client can remove watches locally even if there is no server connection by setting the local flag to true.
ZooKeeper Quorum
ZooKeeper Quorum refers to the bare minimum of server nodes that must be operational and accessible to client requests. For a transaction to be successful, any client-generated updates to the ZooKeeper tree must be persistently stored in this quorum of nodes. Using the formula Q = 2N+1, where Q is the number of nodes required to form a healthy ensemble and N is the maximum number of failure nodes, quorum specifies the rule for forming a healthy ensemble.
The above formula can be considered to decide what is the safest and optimal size of a quorum. The Ensemble can be defined simply as a group of ZooKeeper servers. The minimum number of nodes that are required to form an ensemble is three. A five-node ZooKeeper ensemble can handle two node failures because a quorum can be established from the remaining three nodes as per the formula Q = 2N+1.
The following entries can be defined as the quorum of ZookKeper servers and must be available in zoo.cfg
file located under the conf directory.
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
And they follow the pattern as
server.X=server_name:port1:port2
server.X
, where X is the server number in ASCII. Prior to that, we will have to create a file named myid under the ZooKeeper data directory in each ZooKeeper server. This file should contain the server number X as an entry in it. server_name
is the hostname of the node where the ZooKeeper service is started.
port1, the ZooKeeper server uses this port to connect followers to the leader.
port2, this port is used for leader election.
Transactions
Transaction in Apache Zookeeper is atomic and idempotent and involves two steps, namely leader election, and atomic broadcast. ZooKeeper uses ZooKeeper Atomic Broadcast (ZAB), a unique atomic messaging protocol. Because it is atomic, the ZAB protocol ensures that updates will either succeed or fail.
Local Storage and Snapshots
Transactions are stored in local storage on ZooKeeper servers. The ZooKeeper Data Directory contains snapshot and transactional log files, which are persistent copies of the ZNodes stored by an ensemble. The transactions are logged into transaction logs. Any changes to ZNodes are appended to the transaction log, and when the log file size increases, a snapshot of the current state of ZNodes is written to the filesystem. In the below image, we can see how snapshots are getting persisted inside Zookeeper’s data directory.
The ZooKeeper tracks a fuzzy state of its own data tree within the snapshot files. Because ZooKeeper transaction logs are written at a rapid rate, it is critical that they be configured on a disk separate from the server’s boot device. In the event of a catastrophic failure or user error, the transactional logs and snapshot files in Apache ZooKeeper make it possible to recover data. Inside zoo.cfg
file available under the conf directory of the ZooKeeper server, the data directory is specified by the dataDir
parameter, and the data log directory is specified by the dataLogDir
parameter.
Hope you have enjoyed this read? Please like and share if you feel this composition is valuable.
Published at DZone with permission of Gautam Goswami, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments