Raft - Distributed Systems Consensus Algorithm And its application in the context of Kafka known as KRaft

Raft - Distributed Systems Consensus Algorithm And its application in the context of Kafka known as KRaft


Distributed System, defined by Leslie Lamport as - “.. A system in which failure of a computer you didn't even know existed, can render your own computer unusable”.

Making Distributed systems - helps to increase Availability, improve Geo proximity and hence performance, and improve fault tolerance of a system

It comes with its own headache of -

  • Network Unreliabilities
  • Node Unreliabilities
  • Time, Synchronization Challenges

Communication among nodes in a distributed system often involves broadcasting messages from one node to another. This is a fundamental aspect of achieving coordination and consistency. 

There are different types of broadcast algorithms -

  1. Eager Reliable/Gossip Algorithm
  2. FIFO Broadcast algorithm
  3. Causal (Relative/Event) broadcast
  4. Total Order Broadcast

Out of which Total order is the most reliable/perfect ordering guarantee broadcast algorithm, which involves leader based mechanism to send out messages to nodes in the system


Article content
Label: Characteristics of Total Order Broadcast Algorithm


One caveat though - How to manage the system when the leader node fails - can be a crash or liveness failure

In such cases, leader election comes into the picture

There are many consensus algorithms to effectively select a leader in a group of nodes - Paxos, Zookeeper Atomic Broadcast (ZAB), Multi-Paxos, Raft (Recent)

In this article, we’ll try to understand Raft as a consensus algorithm and then will also take a look at KRaft - a new Control Plane implementation in Kafka and how it uses Raft for electing leaders for a metadata partition.


Characteristics of the Raft consensus algorithm 

Article content
Label: Characteristics of Raft Algortihm


As mentioned in image -

  1. It detects crash or failure detection for leader node (majorly by heartbeats)
  2. Avoids having two leaders for a term (term is limited duration of time)
  3. On every election, term is incremented by 1
  4. Every candidate requires - quorum votes ((n+1)/2) to become a Leader



State Transition diagram for RAFT

Article content
Label: State Transition in Raft


Start State - Follower (Each node in a term starts as a follower and on crash recovery also - comes in a follower state).

Intermediary - Can be a candidate state and ask for Votes.

After receiving quorum votes it becomes a Leader for the system and then can decide the order of messages to be consumed or delivered.



Kafka as a case study for Raft

Kafka is a distributed Log, used for asynchronous message processing. It involves 2 major components in architecture - the Control Plane and the Data Plane.

Control Plane - Responsible for managing metadata for the Kafka cluster.

Data Plane - Responsible for replicating and managing data for the Kafka cluster.


Article content
Label: Kafka Control Plane (Kraft Mode and No Zookeeper)


In the New Control Plane implementation called KRaft - Kafka has removed Zookeeper dependency and performs metadata management using an internal single partition topic called __cluster_metadata. 


Article content
Label: Kafka Control Plane - cluster metadata management using __cluster_metadata


A few set of brokers in a Kafka Cluster acts as control plane nodes - called Controllers in Controller Pool, out of which one Controller which is the leader of the single partition topic is called an active controller. 


Note - There is no in-sync replica (ISR) maintenance for this Single partition, as it's on the metadata Plane that decides ISR for data topics.

Hence - A leader crash scenario is crucial and needs a voting technique to decide new leader - in case an active controller crashes.


In KRaft

A term is governed by the epoch time of each controller.

On leader failure, the follower controller transitions to the candidate and makes VoteRequest (Leader Election Request) to Other Controllers.

The image below explains Kraft VotingRequest and Raft Algorithm Voting Requests as a comparison:


Article content
Comparison KRaft v/s Raft: Voting Request From Candidate Node to Cluster (Expecting Quorum Acks - (n+1)/2)


Article content
Comparison KRaft v/s Raft: VotingRequest Evaluation From Candidate Node



When a candidate receives a Grant from (n+1)/2 nodes in a cluster then it broadcasts its leadership election to the cluster

The addition of Kraft in Kafka brings a significant improvement in design - due to the removal of redundant component and making the ecosystem more Kafka analogy-based, by having durability with the use of internal topics.

This article tried covering implementation semantics on a high level with the help of references I read in a few courses

References:

  1. Distributed Systems course by Martin Kleppman, Notes
  2. Kafka Internals Course By Jun Rao, Course
  3. All the images are taken from the above two references, no copyright infringement is intended

To view or add a comment, sign in

More articles by Venkatesh Wagh

  • How to manage yourself ?

    Work Edition Most mid-level and mid-senior engineers, testers, and product managers struggle with managing their own…

  • Burnout | Unintentional Choices | Auto-pilot

    If you've felt that your energy is being sapped, you not being in the mood to do stuff, feeling that you've lost…

    2 Comments
  • Generating Unique Sequence across Kafka Stream Processors

    Imagine a case where you have multiple JVM processes running across multiple machines and you’ve got a use case to…

    2 Comments

Others also viewed

Explore content categories