Kafka as a Messaging System

Ashish kumar

Published Nov 30, 2019

Traditionally there were two type of messaging model: Queuing and publisher-subscriber and then Kafka came as a new messaging system.

Now before understanding about Kafka and its feature, we must understand about Queuing and publisher-subscriber. We need to understand pros and cons of these traditional model which lead the invention of Kafka.

Queuing – we all understand about queue and have read in data structure. A queue have two end front & rear and work in FIFO (First in First out) model. Same in queuing source produce data at one end and consumer consume it from other end.

It allows you divide up the processing of data over multiple consumer instances and which let you scale up processing strength of queuing but queues aren’t multi-subscriber. It mean once one process or consumer read one data, it’s gone, it’s consumed and it no more available for others.

To understand more let’s take real life example-

Suppose you are in queue before airport to get a taxi/cab. When taxi is stopping before queue, person is boarding in that and telling his destination and gone for their destination.

Now thing could it possible that two person have same cab and going for two destination in two different direction, definitely not.

Another example you take as your city water supply or amazon pkg delivery. It can’t be possible that same pkg is delivered to two people at two different address.

Now coming to our point, so queuing has one advantages that it can distribute the data processing load over multiple consumer & disadvantage that data cannot be available for multiple consumer.

Publish-subscribe: It allows you to broadcast the data to multiple consumer but unfortunately there is no way of scaling up since every message goes to every subscriber.

For example thinks about radio broadcast or live news. The same feed or stream is transmitting to everybody.

So with publish-subscribe there is one advantage is data is available for multiple consumer and disadvantage is it can’t scale up the process.

So we saw both have one pros and one cons and therefore Kafka came as a hybrid of queuing and publish-subscribe.

Kafka has concept of consumer group & it gives both pros. means it can divide data processing load over multiple members of a consumer group and same time it make same data available to multiple consumer groups

In traditional queuing model, it record data in-order but when multiple consumer consume data, the data order get lost as data has been distributed to multiple consumer.

Kafka does it better, it has the partition within topic. Kafka provide both ordering and parallelism over pool of consumer. Data are stored in partitions in order and it assign the partition in topic to the consumer in a consumer group. So each partition can be consumed by only one consumer in a consumer group. As we store data to many partition of a topic, we still get parallelism over pool of consumer.

To view or add a comment, sign in

Kafka as a Messaging System

Ashish kumar

More articles by Ashish kumar

Others also viewed

Microsoft Fabric: August Data Factory upgrades you should actually use

A Versatile Real-time Data Platform to Future-Proof Your Telecom Use Cases

Top 5 Use Cases of Object Storage

Attention CIO's, Big Data is NOT an IT Plaything!

The Data Lake - Solving For Symptoms, Not Problems

When designing a data pipeline, what should you consider?

Architecting a Data Network to Connect our World

Creating Economic Value with Data

Case Study: Building a Data Pipeline for Millions of Records

All About Big Data

Explore content categories

More articles by Ashish kumar

Azure & Databricks diagnostic logs settings

Multi-threaded Process with Parallel Collections

Delta Lake Concurrency Control

Azure Databricks Standard vs. Premium

Azure Databricks Notebook - How to get current workspace name

Log4j Configuration with spark-submit

Databricks Log4j Configuration

Delta Lake

Access data lake gen 2 with a service principal

Sqoop import to Text, Avro, Parquet, Sequence