Generating Unique Sequence across Kafka Stream Processors
Imagine a case where you have multiple JVM processes running across multiple machines and you’ve got a use case to generate unique identifiers across machines.
Quick Solutions
Solution 1
Utilizing Universally Unique Identifier (AKA UUID)
UUID is unique across all machines - with a high probability and supposedly has no duplicates
Problem - it's a 36-character long sequence (with hexadecimal base)
Most business requires shortened IDs, which are lesser in length
Solution 2
Having a Synchronized Counter that serves as a seed for uniqueness across machines
We increment numbers whenever one machine out of the set uses the number
This can be achieved by ensuring uniqueness and locking for counter usage
Distributed locking is one of the use cases that can be implemented by consensus or metadata management tools like - Zookeeper
Common Locking mechanisms - Centralized, De-Centralized, Optimistic, Consensus-based.
Problem - The locking mechanism causes the system to be slow, and few of them don't have synchronization guarantees, it can become a single point of failure or create a stop-the-world situation
Recommended by LinkedIn
Solution 3
Use a Lock-Free mechanism to have synchronization across multiple machines
This is using an algorithm that relies on atomic operations like compare and swap
Problem - This can be tricky to implement and have limited applicability
Recently had encountered this problem in the Kafka Stream Processor setting - where required to implement a unique ID across multiple machines running the same Stream Processor
Kafka Stream Provides Stateful Kafka Processing - Using RocksDB for storing state for each Node Running Stream Processor
This Statestore - is partition-based and stores entries running for each Partition in a key-value store
Statestore in Kafka Stream Provides a thread-safe model backed by the Kafka Partitioning mechanism and supports rebalancing and local state maintenance with changelog topics
It is fault-tolerant and can be implemented with exactly one semantic - which prevents ordering issues
Implementing State store Implementation is not straightforward though - Kafka Stream DSLs don't provide outbound support for handling state store interactions, which are a bit low-level in nature, hence the use of Processor API needs to be done
Kafka Stream DSL to Processor API transition can be done using a few functions like - processValues(), transformValues(), process(), and transform().
Processor API provides ProcessorContext by which one can interact with KeyValueStore easily and also keep Consumed Record Metadata Intact.
We have a problem with Kafka stream, @StreamListener where we are listening from one topic and generating new key to do some operation. Do you have any knowledge with that. would like to connect
Thank you for sharing your article, Venkatesh Wagh! Looking forward to more of your contributions on Kafka.