Apache Kafka: All about Kafka topic with example

Apache Kafka: All about Kafka topic with example

What is KAFKA TOPIC:

“a particular stream of data”

In a Kafka cluster, you can have multiple topics. If you compare it with a database topic, it is a kind of table that will store some set of data. For example, check the image for Kafka cluster topics. You can create an “N” number of topics with no limitations.

We can identify the topic with a 'name'. For example, batch_process_status is a topic.

Kafka cluster sample demo

Any data format, such as CSV, JSON, or AVRO, is supported by the Kafka topic.

The sequence of topics is called "data streaming." 

Topics are like tables, but you can’t query them. (If you want to read the data, you need to create the Kafka consumers and read the data).

Each topic has data blocks called partitions. Sample partitions are below.

No alt text provided for this image


Each partition has an order, and each partition has different values with no dependency on the other partitions.

Partition has an order and incremental ID called “offset

Each partition has a different offset (this offset going to play a key role in data movements)

Kafka topics are immutable once you add the data to the partition. We can't change the partitions, but we can replace them with other data.

Kafka Topic Example :

No alt text provided for this image

Use case :

An application owner/stakeholder has 4 different ETL applications. At any point in time, they want to track the batch process status, application health, and user entitlements in a single window.

Each application will send a message to Kafka every 30 seconds. Each message will contain the batch process status job level.

You can have a topic 'ETL_batch_process_status' that contains the track of all applications.

We chose to create that topic with 10 partitions (arbitrary number).

Limitations of Kafka Topic

Once the data is written to the partitions that can’t be changed, -- immutable.

Data is kept only for a limited time in the partition (default 7 days – we can configurable)

An offset only has a meaning for a specific partition.

  • Ex: offset 5 in partition 0 doesn’t represent the same data as offset 3 in partition 1.
  • Offsets are not rescued even if previous messages have been deleted.

Data is assigned randomly unless a key is provided.

We can create many partitions per topic. 

Thank you!

To view or add a comment, sign in

More articles by Saikrishna Cheruvu

Others also viewed

Explore content categories