ZooKeeper

ZooKeeper

ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.

The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data. This tutorial explains the basics of ZooKeeper, how to install and deploy a ZooKeeper cluster in a distributed environment, and finally concludes with a few examples using Java programming and sample applications.

How does Zookeeper work?

The data within Zookeeper is divided across multiple collection of nodes and this is how it achieves its high availability and consistency. In case a node fails, Zookeeper can perform instant failover migration; e.g. if a leader node fails, a new one is selected in real-time by polling within an ensemble. A client connecting to the server can query a different node if the first one fails to respond.

Why is Zookeeper necessary for Apache Kafka?

Controller election

The controller is one of the most important broking entity in a Kafka ecosystem, and it also has the responsibility to maintain the leader-follower relationship across all the partitions. If a node by some reason is shutting down, it’s the controller’s responsibility to tell all the replicas to act as partition leaders in order to fulfill the duties of the partition leaders on the node that is about to fail. So, whenever a node shuts down, a new controller can be elected and it can also be made sure that at any given time, there is only one controller and all the follower nodes have agreed on that.

Configuration Of Topics

The configuration regarding all the topics including the list of existing topics, the number of partitions for each topic, the location of all the replicas, list of configuration overrides for all topics and which node is the preferred leader, etc.

Access control lists

Access control lists or ACLs for all the topics are also maintained within Zookeeper.

Membership of the cluster

Zookeeper also maintains a list of all the brokers that are functioning at any given moment and are a part of the cluster.

Please note that you can’t run Kafka services without first installing Zookeeper. However, Zookeeper is already installed and configured for your CloudKarafka cluster.

CloudKarafka and Zookeeper

Since Zookeeper is a part of CloudKarafka and most of our users never have to acknowledge its presence. Zookeeper is installed and configured by default, depending on the number of nodes in your cluster, and most customers will never actively integrate with Zookeeper. Thus, you still have the option to reach Zookeeper from Cloudkarafka dedicated plans, if you wish to.

To view or add a comment, sign in

More articles by Kishan Kumar

  • Sales Manager

    What is a Sales Manager? A sales manager is responsible for overseeing and leading a team of sales representatives to…

  • Data Modelers

    Data modelers are systems analysts who work with data architects and database administrators to design computer…

  • Deepfake Technology

    What is Deepfake? Deepfake is a term that refers to synthetic media that have been digitally manipulated to replace one…

  • Analytics

    Analytics is a field of computer science that uses math, statistics, and machine learning to find meaningful patterns…

  • What is Apache Airflow?

    The Apache Airflow platform allows you to create, schedule and monitor workflows through computer programming. It is a…

  • LSTM Networks

    LSTM networks are an extension of recurrent neural networks (RNNs) mainly introduced to handle situations where RNNs…

  • Free Space Laser Communication

    FSO is a line-of-sight technology that uses lasers to provide optical bandwidth connections or FSO is an optical…

  • Neo4j

    A Neo4j graph database stores nodes and relationships instead of tables or documents. Data is stored just like you…

  • Customer Communications Management

    What is customer communications management? Customer communications management is a strategic framework designed to…

  • Bid Rigging

    Bid rigging is a common practice in almost every industry. It hampers the buyers’ efforts to get goods and services at…

Others also viewed

Explore content categories