APACHE KAFKA
Apache Kafka

APACHE KAFKA


What is Apache Kafka:

Apache Kafka is an open-source distributed event streaming platform used for building real-time data pipelines and streaming applications. It utilizes non-destructive consumer semantics and is designed to handle large volumes of data and provide high-throughput, low-latency and fault-tolerant messaging.

In simple words – Kafka is a messaging system that is capable of receiving, storing and emitting large number of messages in a short amount of time in a highly reliable manner.

What problems does Kafka solve:

Apache Kafka solved scalability, throughput, and fault tolerance challenges by introducing a distributed architecture with efficient partitioning and replication. It minimized latency for real-time data processing and offered durable storage, allowing organizations to handle high volumes of data with low latency and ensure continuous operation in the face of failures. Additionally, Kafka provided built-in stream processing capabilities, simplifying real-time data analytics and enabling seamless integration into modern data architectures.

Components of Kafka:

Producer: Applications that send data (messages) to Kafka topics. Producers publish records to one or more Kafka topics.

Consumer: Applications that read data from Kafka topics. Consumers subscribe to one or more topics and process the records produced to those topics.

Broker: Kafka runs as a cluster of one or more servers, each of which is called a broker. Brokers are responsible for storing and serving the data, as well as handling client requests.

Topic: A category or feed name to which records are published by producers. Topics in Kafka are divided into partitions for scalability.

Partition: Each topic can be split into partitions, which are ordered and immutable sequences of records. Partitions allow data to be distributed across multiple brokers.

Offset: Each record within a partition is assigned a unique identifier called an offset, which represents its position in the partition.

Replica: Kafka maintains redundancy and fault tolerance through replicas. Each partition can have multiple replicas, which are copies of the data stored on different brokers.

Consumer Group: Consumers are organized into consumer groups. Each consumer group contains one or more consumers that jointly consume all the partitions of the topics they subscribe to.

ZooKeeper: Kafka uses ZooKeeper for managing and coordinating the Kafka brokers. It stores metadata about the cluster, such as broker information, topic configuration, and partition assignment.

Connectors: Kafka Connect is a framework for connecting Kafka with external systems such as databases, message queues, and file systems. Connectors are plugins that enable the integration of Kafka with various data sources and sinks.

 

Understanding Kafka with simple example:

Here is a simple explanation of Kafka using Restaurant analogy:

  1. Producer (Chef): The chef in the kitchen who prepares and cooks the food represents the Kafka producer. The chef creates various dishes and sends them out to the serving counter.
  2. Consumer (Server): The server who takes orders from customers and serves food represents the Kafka consumer. The server picks up dishes from the serving counter and delivers them to the respective tables.
  3. Broker (Kitchen Counter): The kitchen counter where the cooked dishes are placed before serving represents the Kafka broker. It stores and manages the prepared dishes until they are picked up by the server.
  4. Topic (Menu): The menu in the restaurant represents the Kafka topic. It categorizes the dishes available for order, such as appetizers, main courses, and desserts.
  5. Partition (Sections of the Menu): Each section of the menu, such as appetizers, main courses, and desserts, represents a partition within the Kafka topic. It organizes similar types of dishes together.
  6. Offset (Order Number): Each dish on the menu is assigned a unique order number, representing its position within the menu. This order number corresponds to the offset of records in Kafka.
  7. Replica (Backup Dishes): The backup dishes prepared by the chef represent Kafka replicas. In case a dish gets spilled or damaged, the restaurant has backup dishes available to ensure that customers still receive their orders.
  8. Consumer Group (Team of Servers): A team of servers working together to serve customers represents a Kafka consumer group. They collaborate to ensure efficient delivery of dishes to tables and handle customer orders effectively.
  9. ZooKeeper (Restaurant Manager): The restaurant manager oversees the operations, manages staff schedules, and ensures smooth coordination between the kitchen, serving staff, and customers. In the Kafka analogy, ZooKeeper plays a similar role in coordinating and managing the Kafka cluster.
  10. Connectors (Delivery Service): A delivery service that transports ingredients and supplies to the restaurant represents Kafka connectors. Connectors facilitate the movement of data between Kafka and external systems, such as databases, message queues, and file systems, similar to how a delivery service transports goods to and from the restaurant.

 

Understanding how Kafka works at a high level:

1. A restaurant has a menu which has sections. Similarly, Kafka has topics and partitions.

2. A dine in customer walks in and the server hands them the menu. The customer reads through the menu and places order(s) which the server notes down represented by an order number. Server noting down the order is similar to a consumer subscribing to a particular topic/ partition in Kafka.

3. Chefs prepare dishes that belong to a menu/ section and store it on the Kitchen Counter. Similarly, Producer publish messages that belong to a topic / partition and store it on a broker (a Kafka instance).

4. Server checks the kitchen counter to see what dishes are ready to be served and serves the dish to the customer. Once all the dishes ordered are served, the server marks the order number as completed. This is equivalent to Kafka modifying the offset after the batch of messages are processed and delivered.

The reliability of the above process is taken care by Replica, Zookeeper, Consumer group and connectors.

 

 

 

To view or add a comment, sign in

More articles by Pavan Morab

  • RESTful API Authentication Methods

    Every RESTful web service must verify the identity of the client requesting resources before processing request or…

Others also viewed

Explore content categories