ZooKeeper
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data. This tutorial explains the basics of ZooKeeper, how to install and deploy a ZooKeeper cluster in a distributed environment, and finally concludes with a few examples using Java programming and sample applications.
How does Zookeeper work?
The data within Zookeeper is divided across multiple collection of nodes and this is how it achieves its high availability and consistency. In case a node fails, Zookeeper can perform instant failover migration; e.g. if a leader node fails, a new one is selected in real-time by polling within an ensemble. A client connecting to the server can query a different node if the first one fails to respond.
Why is Zookeeper necessary for Apache Kafka?
Controller election
The controller is one of the most important broking entity in a Kafka ecosystem, and it also has the responsibility to maintain the leader-follower relationship across all the partitions. If a node by some reason is shutting down, it’s the controller’s responsibility to tell all the replicas to act as partition leaders in order to fulfill the duties of the partition leaders on the node that is about to fail. So, whenever a node shuts down, a new controller can be elected and it can also be made sure that at any given time, there is only one controller and all the follower nodes have agreed on that.
Configuration Of Topics
The configuration regarding all the topics including the list of existing topics, the number of partitions for each topic, the location of all the replicas, list of configuration overrides for all topics and which node is the preferred leader, etc.
Access control lists
Access control lists or ACLs for all the topics are also maintained within Zookeeper.
Membership of the cluster
Zookeeper also maintains a list of all the brokers that are functioning at any given moment and are a part of the cluster.
Please note that you can’t run Kafka services without first installing Zookeeper. However, Zookeeper is already installed and configured for your CloudKarafka cluster.
CloudKarafka and Zookeeper
Since Zookeeper is a part of CloudKarafka and most of our users never have to acknowledge its presence. Zookeeper is installed and configured by default, depending on the number of nodes in your cluster, and most customers will never actively integrate with Zookeeper. Thus, you still have the option to reach Zookeeper from Cloudkarafka dedicated plans, if you wish to.