Getting started with Apache Kafka

shamim khan

Published Apr 21, 2020

+ Follow

To start with, let us understand what is Apache Kafka?

Apache Kafka is a distributed streaming platform with 3 capabilities

Messaging System
Store Stream with fault tolerance
Process the stream data

Well, let us take a moment to understand each of them

Messaging System:

It is a message bus developed for high Ingres data, it allows access to published data and if needed replay the data i.e. It allows applications to process, persist and re-process streamed data.

We can actually divide huge projects in a small micro-services and use Kafka to communicate between these micro services

Store Stream with fault tolerance:

Since Kafka is distributed system, we can divide the data to be stored on different broker using replication_factor, if we have set the replication factor as 3, we can tolerate 2 node failures, in general the formula is replication_factor -1

Process the stream data:

Kafka provides streaming API to do data processing, I will cover that in future article

Basic concept of Kafka:

Topic: Unique name of a feed

Record: Smallest data and made up of key, value and timestamp

Partition: An ordered sequence of immutable record

Offset: Sequential ID assigned to Record

Broker: A node in a distributed system which forms Kafka Cluster

Broker ID: Each node is assigned with unique identifier

We have lot more like leader, group_id, replication_factor etc, which I would cover in other article

Installation and configuration of 3 Node cluster, since i am using Mac I will show in Mac, the command may not differ a lot in Linux

wget http://apachemirror.wuchna.com/kafka/2.5.0/kafka_2.12-2.5.0.tgz (the URL will differ depending on version you want to install)
tar -xvf kafka_2.12-2.5.0.tgz
cd kafka_2.12-2.5.0
bin/zookeeper-server-start.sh config/zookeeper.properties

create 3 copies of Kafka server configuration by copying and modifying config/server.properties file as config/server1.properties and server2.properties

Change broker.id (just increment integer by 1)
change port(just increment integer by 1)
also good to change log_dir

then start all the 3 servers as follows:

bin/kafka-server-start.sh config/server.properties
bin/kafka-server-start.sh config/server1.properties
bin/kafka-server-start.sh config/server2.properties

Cool we have the servers up and running, I would recommend to look all the scripts in bin folder, it gives you the best tools to manage your Kafka Cluster

Let's create a topic:

We can you the tool bin/kafka-topics.sh to create list describe etc the topics

Command bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic fault_tolerated_topic --partition 3 --replication-factor 3

--create is an option to create --topic <name of the feed>

--partitions will tell how many partition the data would be done to

--replication_factor defines the level of fault tolerance

Awesome, we just created out first topic, let's see the detail of it with help of --describe option

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic fault_tolerated_topic

Topic: fault_tolerated_topic PartitionCount: 3 ReplicationFactor: 3 Configs:

Topic: fault_tolerated_topic Partition: 0 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1

Topic: fault_tolerated_topic Partition: 1 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2

Topic: fault_tolerated_topic Partition: 2 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0

All right too much to grasp, let me continue from here in my next article on Kafka, for time being just keep playing with your Kafka setup :)

To view or add a comment, sign in

Getting started with Apache Kafka

shamim khan

More articles by shamim khan

Others also viewed

Apache Kafka Crash Course: Everything You Need to Know

Kafka Simplified

What should the capacity of my kafka cluster be?

Introducing Apache Kafka – Part One

Topic, Partition, Offset and Broker in Apache Kafka

Kafka Core Components, Part 03

Kafka in 1 minute

Understanding Apache Kafka and Its Role in Modern System Design

What is Apache Kafka Used For? Benefits & Real-World Examples

Comparing Apache Kafka and Apache Pulsar: A Comprehensive Technical-Professional Analysis

Explore content categories