Zookeeper cluster construction

1, Introduction to zookeeper cluster

The purpose of zookeeper cluster is to ensure the performance of the system and to carry more client connections. A specially provided mechanism is set up.

The following functions can be realized through the cluster:

  • Read write separation: improve the bearing, provide connections for more clients, and ensure performance.
  • Automatic master-slave switchover: improve service fault tolerance, and the failure of some nodes will not affect the entire service cluster.

Because zookeeper judges whether the whole cluster is available by judging the survival of most nodes, it is necessary to ensure that more than half of zookeeper nodes are running.

More than half of the operating mechanisms:

The cluster requires at least three servers, and it is strongly recommended to use an odd number of servers. Because zookeeper judges whether the whole service is available by judging the survival of most nodes. For example, for three nodes, if 2 nodes are down, the whole cluster will be down. If 4 nodes are even, if 2 nodes are down, most of them will not survive, so they will also be down.

Reference: https://www.cnblogs.com/ysocean/p/9860529.html

2, Cluster role

There are three roles in the zookeeper cluster: leader (primary node), follower (child node) and observer (secondary child node);

  • Leader: master node, also known as leader. It is used to write data. It is generated by election. If it goes down, a new primary node will be elected.
  • follwer: child node, also known as follower. Used to read data. At the same time, it is also an alternative node to the master node and has voting rights.
  • Observer: secondary child node, also known as observer. It is used to read data. The difference from follower is that it has no voting rights and cannot be selected as the primary node. In addition, the observer will not be included in the calculation of the available status of the cluster.

Observer configuration: just add the observer suffix to the configuration file of the cluster configuration. The example is as follows:


3, zookeeper cluster deployment

Due to the limited machine resources, the pseudo cluster of zookeeper is built this time.

1. Introduction

(1) Configuration description of Zookeeper cluster:

Server< Node id>=<ip>: < data synchronization port >: < election port >

  • Node ID: manually specify a number between 1 and 125 for the service ID and write it to the {dataDir}/myid file of the corresponding service node.
  • IP address: the remote IP address of the node, which can be the same. However, this cannot be done in the production environment, because fault tolerance cannot be achieved on the same machine. So this is called pseudo cluster.
  • Data synchronization port: master-slave simultaneous data replication port.
  • Election port: the election port of the primary and secondary nodes.

(2) myid file

When setting up a zookeeper cluster, the node ID of zookeeper is configured in the myid file; The myid file needs to be placed under the dataDir directory configured by the configuration file.

2. Create three data directories to store the data of each node

mkdir -p /usr1/zookeeper/data/zoo1
mkdir -p /usr1/zookeeper/data/zoo2
mkdir -p /usr1/zookeeper/data/zoo3

3. Write myid file

echo 1 > /usr1/zookeeper/data/zoo1/myid
echo 3 > /usr1/zookeeper/data/zoo2/myid
echo 2 > /usr1/zookeeper/data/zoo3/myid

4. Configuration file for zookeeper

(1) Configuration file of node 1: zoo1 CFG


#Cluster configuration, which must be configured on the zoo Cfg file

(2) Configuration file of node 2: zoo2 CFG


#Cluster configuration, which must be configured on the zoo Cfg file

(3) Configuration file of node 3: zoo3 CFG


#Cluster configuration, which must be configured on the zoo Cfg file

5. Start zookeeper service

./bin/zkServer.sh start conf/zoo1.cfg
./bin/zkServer.sh start conf/zoo2.cfg
./bin/zkServer.sh start conf/zoo3.cfg

6. View status separately

./bin/zkServer.sh status conf/zoo1.cfg
Mode: follower
./bin/zkServer.sh status conf/zoo2.cfg
Mode: leader
./bin/zkServer.sh status conf/zoo3.cfg
Mode: follower 

7. Check the data synchronization of the cluster

Connect the specified node:

./bin/zkCli.sh -server

Create data at any node and check whether other nodes are synchronized successfully;

4, Electoral mechanism

1. View the status of a node

/ Bin/zkserver SH status <zoo profile >

./bin/zkServer.sh status conf/zoo1.cfg
Mode: follower
./bin/zkServer.sh status conf/zoo2.cfg
Mode: leader
./bin/zkServer.sh status conf/zoo3.cfg
Mode: follower

It can be found that node 2 is a leader;

2. Electoral mechanism

Zookeeper adopts the FastLeaderElection algorithm by default, and the mechanism of winning if the number of votes is more than half;

When will the election be triggered?

  • Service node initialization and startup (new cluster election);
  • More than half of the nodes cannot establish a connection with the leader; The

2.1 new cluster election

The new cluster election is newly built. There is no data ID and logical clock data to affect the cluster election;

Assume that there are 5 servers in the current cluster, and their numbers are 1 - 5. Start the Zookeeper service by number;

(1) When server 1 is started, it will vote for itself first. Since other machines have not been started, it cannot receive the feedback of voting information; Therefore, server 1 is always in the "LOOKING" state;

(2) Server 2 starts, first vote for itself; Initiate a voting comparison. This is the result that it will exchange with server 1. Server 2 wins because the number of server 2 is large. At this time, server 1 votes for server 2, and the number of votes of server 2 is not more than half of the cluster. Therefore, the status of servers 1 and 2 is "LOOKING";

(3) Server 3 starts, first vote for itself; Exchange results with previous servers 1 and 2. Since server 3 has the largest number, server 3 wins. Then both servers 1 and 2 will vote for server 3. At this time, the number of votes for server 3 is 3, which is exactly more than half (5/2); So server 3 becomes a leader and services 1 and 2 become follower s

(4) When server 4 starts, first vote for itself; Exchange results with the previous servers 1, 2 and 3. If it is found that a leader has been generated, the server 4 is directly a follower;

(5) Server 5, like server 4, becomes a follower;

2.2 non brand new cluster election

For the Zookeeper cluster in normal operation, once there is a server downtime (leader) in the middle of the process, the server ID, data ID (zxid) and logical clock (the number of votes will increase in the evening of each round of voting) will be introduced during the re-election.

(1) First, check whether the logical clock is the same. If the logical clock is small, it indicates that there is downtime on the way. Therefore, if the data is incomplete, the election result will be ignored and the election will be re voted;

(2) After unifying the logical clock, compare the data ID value. The data ID reflects the old and new degree of the data. Therefore, the node with a large data ID wins; The

(3) If the logical clock and data ID values are the same, compare the data IDs. The one with a large data ID wins;

To put it simply, the non brand new cluster is the best of the best, and the Leader is the most complete and reliable server in the Zookeeper cluster.

If a follower or observer node goes down during the cluster operation, the normal operation of the entire cluster service will not be affected as long as no more than half of the nodes are down. However, if the leader goes down, the external service will be suspended, and all followers will enter the "LOOKING" status and enter the election process.

3. Data synchronization mechanism

The data synchronization of zookeeper is to ensure the consistency of data in each node. The synchronization involves two processes, one is the normal client data submission, and the other is the data synchronization after the recovery of a cluster node after downtime.

3.1 client write request

The leader is required to process the write request. If the client requests a follower, the follower forwards the request to the leader for synchronous processing;



(1) The client sends a write request to the server in zk. If the server is not a leader, the write request will be forwarded to the leader server, and the leader will distribute the request transaction to the follower in the form of proposal;

(2) When the follower receives the proposal from the leader, it processes the proposal according to the receiving sequence;

(3) When the Leader receives an ack from the follower for more than half of a proposal, he will initiate a transaction submission and re initiate a commit ted proposal;

(4) After receiving the commit proposal, the Follower records the transaction submission and updates the data to the memory database;

(5) When the writing is successful, it is fed back to the client.

3.2 service node initialization synchronization

During the operation of the cluster, if a follower node goes down, the cluster will still be able to service normally because more than half of the nodes are down. When the Leader receives a new client request, it cannot synchronize to the down node at this time, resulting in data inconsistency. To solve this problem, when the node is started, the first thing is to find the current Leader and compare whether the data is one to one. If not, start to synchronize. After the synchronization is completed, provide external services.

How to compare the data versions of leaders? Confirm by ZXID transaction ID. If it is smaller than the Leader's ZXID, it needs to be synchronized.

ZXID Description:

ZXID is a number with a length of 64 bits. The lower 32 bits are incremented according to the number. Any data change will result in the simple addition of 1 to the lower 32 bits. The upper 32 bits are the leader cycle number. Whenever a new leader is elected, the new leader will take ZXID from the local transaction log, then parse the cycle number of the upper 32 bits, add 1, and set all the lower 32 bits to 0. In this way, the uniqueness and increment of ZXID are ensured after the leader of each new election. The

4. Four word operation and maintenance command

ZooKeeper responds to a small number of commands. Each command consists of four letters. Commands can be sent to ZooKeeper via telnet or nc.

These commands are off by default, and 4lw Commands Whitelist to open, you can open some or all of them.

(1) Open the four word command, for example:

#Open specified command
4lw.commands.whitelist=stat, ruok, conf, isro

#Open all commands

(2) Install the Netcat tool and use the nc command:

#Installing the Netcat tool
yum install -y nc

#View server and client connection status
echo stat | nc localhost 2181

(3) Common commands

echo stat|nc 2181 See which node is selected as follower perhaps leader
 apply echo ruok|nc 2181 Test whether the Server,If reply imok Indicates that it has been started.
echo dump| nc 2181 ,Lists unprocessed sessions and temporary nodes.
echo kill | nc 2181 ,turn off server
echo conf | nc 2181 ,Output details of related service configurations.
echo cons | nc 2181 ,Lists the full connections of all clients connected to the server / Details of the session.
echo envi |nc 2181 ,Output detailed information about the service environment (different from conf Command).
echo reqs | nc 2181 ,Lists unprocessed requests.
echo wchs | nc 2181 ,List servers watch Details of.
echo wchc | nc 2181 ,adopt session List servers watch Its output is a watch List of related sessions.
echo wchp | nc 2181 ,List servers by path watch Details of. It outputs a session Associated path.


(4) Command list

1. conf: 3.3.0 New in: print detailed information about service configuration.
2. cons: 3.3.0 What's new in: lists the complete connections of all clients connected to the server/Session details. Include information about received/Number of packets sent, session ID,Operation waiting time, the last operation executed, etc.
3. crst: 3.3.0 What's new in: reset all connected connections/Session statistics.
4. dump: Lists incomplete sessions and temporary nodes. This applies only to leaders.
5. envi: Print details about the service environment
6. ruok: Test whether the server is running in a non error state. If the server is running, it will imok Response. Otherwise, it will not respond at all. Response“ imok"It does not necessarily mean that the server has joined the arbitration, but that the server process is active and bound to the specified client port. Use“ stat"Get more information about status arbitration and client connection information.
7. srst: Reset server statistics.
8. srvr: 3.3.0 New in: lists the full details of the server.
9. stat: Lists brief details of the server and connected clients.
10. wchs: 3.3.0 What's new in: lists brief information about server monitoring.
11. wchc: 3.3.0 What's new in: lists details about server monitoring by session. This outputs a list of sessions (connections) with related monitoring (paths). Please note that depending on the number of watches, this operation may be very expensive (that is, it will affect the server performance). Please use it with care.
12. dirs: 3.5.1 What's new in: displays the total size of snapshot and log files in bytes
13. wchp: 3.3.0 What's new in: lists detailed information about server monitoring by path. This outputs the path with the associated session( znode)List. Please note that depending on the number of watches, this operation may be very expensive (that is, it will affect the server performance). Please use it with care.
14. mntr: 3.4.0 New in: output a list of variables that can be used to monitor cluster health.












Command list


Posted by frans-jan on Wed, 01 Jun 2022 01:55:24 +0530