Apache Kafka

Murugesan Loganathan

Published Mar 26, 2023

What?

Apache Kafka is a distributed streaming platform, data pipeline, enables real-time data ingestion and messaging, running on clusters to provide scalability, availability, throughput and performance.

Allows to build data pipeline, where the producers can publish the data in topics on partitions, and subscribers can read the data from the appropriate partitions

Let us understand key terminologies

producer - applications that sends messages to Kafka
message - simple array of bytes as far as Kafka concerned
consumer - application that reads and process messages from Kafka
cluster- group of machines/nodes that form the Kafka server, each node running an instance of the broker
topic - unique name for a Kafka stream
partition - Kafka create multiple partitions for a topic and stores each partition in a node/machine

Why Kafka over Messaging?

Kafka has better throughput, built-in partitioning, replication, and fault-tolerance , which makes it a good solution for large scale message/event processing applications.

Streaming vs. Message Platform

Stream

Messages/events persisted for specific amount of time based on the retention period configuration
Any number of consumers can pull the messages any number of times
Supports partitioned consumer pattern, consumer apps has to maintain the counter/offset, from where it has to start reading on the partition.

Recommended by LinkedIn

Kafka - Monitor producer metrics using JMX, Prometheus…

Hari Ramesh 5 years ago

Understanding Apache Kafka: The Backbone of Modern…

Jacob Bennett 1 year ago

Exploring Kafka: Unlocking the Power of Event Streaming

Mohd Ali Naqvi 2 years ago

Message

Messages are not getting persisted, it gets deleted once read by the consumer/worker apps
Once the message read by any one of the consumer, it won’t be available for other consumers.
Supports competing consumer pattern – where consumers compete for reading the message from the broker.

Use cases/Scenarios

Messaging - compared to existing enterprise messaging tools, Kafka can provide the high throughput, low latency and persistence of messages

Click stream analysis - track the website activities like view, search,… will be published into topics and the subscribers can consumes the data

Log aggregation -. Kafka abstracts the log files and provide log as stream of messages

Event sourcing - Kafka is very good backend for event sourcing based applications

Stream processing - Kafka streams to process the real-time data, does the aggregation and transformed for further consumption

Key Benefits:

Kafka can handle millions of events per second.
Clustering enables high throughput and availability.
Geo-replication on clusters, enables fail-over.
Fault-tolerant, performant and scalable.

Challenges:

Kafka setup and configuration bit complex but this can be overcome by Managed Kafka services.
Lack of full set of monitoring and management tools.

To view or add a comment, sign in

Apache Kafka

Murugesan Loganathan

What?

Why Kafka over Messaging?

Streaming vs. Message Platform

Recommended by LinkedIn

Use cases/Scenarios

Key Benefits:

Challenges:

More articles by Murugesan Loganathan

Others also viewed

System Design - A2A communication concepts explained using Apache Kafka

What is Kafka, Why It's So Important Nowadays?

Kafka Streams, a Simple Stream Processor for your Kafka projects

Understanding Kafka Topic and Partition Architecture

Kafka Demystified: A Clear Introduction

Internal Architecture of Kafka

How to Optimize Kafka Topics and Messaging

Kafka Producer Concepts: Understanding the Essentials

Kafka Streams - Avoid Stop The World During repartitioning in Openshift / Kubernetes

Why Kafka is so Fast ?

Explore content categories

What?

Why Kafka over Messaging?

Streaming vs. Message Platform

Recommended by LinkedIn

Use cases/Scenarios

Key Benefits:

Challenges:

More articles by Murugesan Loganathan

Enterprise Architecture

Introduction to Azure - HDInsights

Microservices - An Introduction

Cloud Computing

internet of things

Others also viewed

System Design - A2A communication concepts explained using Apache Kafka

What is Kafka, Why It's So Important Nowadays?

Kafka Streams, a Simple Stream Processor for your Kafka projects

Understanding Kafka Topic and Partition Architecture

Kafka Demystified: A Clear Introduction

Internal Architecture of Kafka

How to Optimize Kafka Topics and Messaging

Kafka Producer Concepts: Understanding the Essentials

Kafka Streams - Avoid Stop The World During repartitioning in Openshift / Kubernetes

Why Kafka is so Fast ?

Explore content categories