Generating Unique Sequence across Kafka Stream Processors

Generating Unique Sequence across Kafka Stream Processors

Imagine a case where you have multiple JVM processes running across multiple machines and you’ve got a use case to generate unique identifiers across machines.


Quick Solutions

Solution 1 

Utilizing Universally Unique Identifier (AKA UUID)

UUID is unique across all machines - with a high probability and supposedly has no duplicates

Problem - it's a 36-character long sequence (with hexadecimal base) 

Most business requires shortened IDs, which are lesser in length


Solution 2

Having a Synchronized Counter that serves as a seed for uniqueness across machines

We increment numbers whenever one machine out of the set uses the number

This can be achieved by ensuring uniqueness and locking for counter usage

Distributed locking is one of the use cases that can be implemented by consensus or metadata management tools like - Zookeeper

Common Locking mechanisms - Centralized, De-Centralized, Optimistic, Consensus-based.

Problem - The locking mechanism causes the system to be slow, and few of them don't have synchronization guarantees, it can become a single point of failure or create a stop-the-world situation


Solution 3

Use a Lock-Free mechanism to have synchronization across multiple machines 

This is using an algorithm that relies on atomic operations like compare and swap

Problem - This can be tricky to implement and have limited applicability



Recently had encountered this problem in the Kafka Stream Processor setting - where required to implement a unique ID across multiple machines running the same Stream Processor

Kafka Stream Provides Stateful Kafka Processing - Using RocksDB for storing state for each Node Running Stream Processor 

This Statestore - is partition-based and stores entries running for each Partition in a key-value store 

Statestore in Kafka Stream Provides a thread-safe model backed by the Kafka Partitioning mechanism and supports rebalancing and local state maintenance with changelog topics

It is fault-tolerant and can be implemented with exactly one semantic - which prevents ordering issues

Implementing State store Implementation is not straightforward though - Kafka Stream DSLs don't provide outbound support for handling state store interactions, which are a bit low-level in nature, hence the use of Processor API needs to be done

Kafka Stream DSL to Processor API transition can be done using a few functions like - processValues(), transformValues(), process(), and transform().

Processor API provides ProcessorContext by which one can interact with KeyValueStore easily and also keep Consumed Record Metadata Intact.


We have a problem with Kafka stream, @StreamListener where we are listening from one topic and generating new key to do some operation. Do you have any knowledge with that. would like to connect

Like
Reply

Thank you for sharing your article, Venkatesh Wagh! Looking forward to more of your contributions on Kafka.

Like
Reply

To view or add a comment, sign in

More articles by Venkatesh Wagh

Others also viewed

Explore content categories