Generating Unique Sequence across Kafka Stream Processors

Venkatesh Wagh

Published Sep 24, 2023

Imagine a case where you have multiple JVM processes running across multiple machines and you’ve got a use case to generate unique identifiers across machines.

Quick Solutions

Solution 1

Utilizing Universally Unique Identifier (AKA UUID)

UUID is unique across all machines - with a high probability and supposedly has no duplicates

Problem - it's a 36-character long sequence (with hexadecimal base)

Most business requires shortened IDs, which are lesser in length

Solution 2

Having a Synchronized Counter that serves as a seed for uniqueness across machines

We increment numbers whenever one machine out of the set uses the number

This can be achieved by ensuring uniqueness and locking for counter usage

Distributed locking is one of the use cases that can be implemented by consensus or metadata management tools like - Zookeeper

Common Locking mechanisms - Centralized, De-Centralized, Optimistic, Consensus-based.

Problem - The locking mechanism causes the system to be slow, and few of them don't have synchronization guarantees, it can become a single point of failure or create a stop-the-world situation

Recommended by LinkedIn

Message Queue

dp charan 9 months ago

Release Notes: February 19, 2026 (v1.0.116)

Causely 2 months ago

Setting Up Observability for Kubernetes Clusters

Rameshwar Shelge 1 year ago

Solution 3

Use a Lock-Free mechanism to have synchronization across multiple machines

This is using an algorithm that relies on atomic operations like compare and swap

Problem - This can be tricky to implement and have limited applicability

Recently had encountered this problem in the Kafka Stream Processor setting - where required to implement a unique ID across multiple machines running the same Stream Processor

Kafka Stream Provides Stateful Kafka Processing - Using RocksDB for storing state for each Node Running Stream Processor

This Statestore - is partition-based and stores entries running for each Partition in a key-value store

Statestore in Kafka Stream Provides a thread-safe model backed by the Kafka Partitioning mechanism and supports rebalancing and local state maintenance with changelog topics

It is fault-tolerant and can be implemented with exactly one semantic - which prevents ordering issues

Implementing State store Implementation is not straightforward though - Kafka Stream DSLs don't provide outbound support for handling state store interactions, which are a bit low-level in nature, hence the use of Processor API needs to be done

Kafka Stream DSL to Processor API transition can be done using a few functions like - processValues(), transformValues(), process(), and transform().

Processor API provides ProcessorContext by which one can interact with KeyValueStore easily and also keep Consumed Record Metadata Intact.

Virendra Khade 2y

We have a problem with Kafka stream, @StreamListener where we are listening from one topic and generating new key to do some operation. Do you have any knowledge with that. would like to connect

Mahasys 2y

Thank you for sharing your article, Venkatesh Wagh! Looking forward to more of your contributions on Kafka.

See more comments

To view or add a comment, sign in

Generating Unique Sequence across Kafka Stream Processors

Venkatesh Wagh

Recommended by LinkedIn

More articles by Venkatesh Wagh

Others also viewed

Data Persistence in Kubernetes | Kubernetes Volumes explained

January Observability updates from Middleware

Contribute to OpenTelemetry to enhance end-to-end observability

My Experience in CQRS for digital channel needs

Why Most “Scalable Architectures” Fail in Production (And What Potentially Works)

Day 19 - Exploring Docker Volumes and Networks in Multi-Container Environments

Service Mesh — Istio

Beyond Basics: Why Advanced MQTT Matters

Manipulating HTTP Headers in IBM Integration Bus

Building and deploying a Kubernetes Controller for Custom Resource Definitions (CRD)

Explore content categories

Recommended by LinkedIn

More articles by Venkatesh Wagh

How to manage yourself ?

Burnout | Unintentional Choices | Auto-pilot

Raft - Distributed Systems Consensus Algorithm And its application in the context of Kafka known as KRaft

Others also viewed

Data Persistence in Kubernetes | Kubernetes Volumes explained

January Observability updates from Middleware

Contribute to OpenTelemetry to enhance end-to-end observability

My Experience in CQRS for digital channel needs

Why Most “Scalable Architectures” Fail in Production (And What Potentially Works)

Day 19 - Exploring Docker Volumes and Networks in Multi-Container Environments

Service Mesh — Istio

Beyond Basics: Why Advanced MQTT Matters

Manipulating HTTP Headers in IBM Integration Bus

Building and deploying a Kubernetes Controller for Custom Resource Definitions (CRD)

Explore content categories