Using Sagas as a Pattern for Distributed transactions

Rahul C.

Published Jan 13, 2019

Definition - What does a Distributed Transaction mean?

A distributed transaction includes one or more instructions that, individually or as a group, transact on two or more distinct nodes in a network.

Two phase Commit(2PC) in a Distributed System

In a 2PC the co-ordinator sends commit/abort message to each of its resources and waits for the acknowledgement response from all of the resources. The co-ordinator then completes/undoes a transaction if it receives acknowledgment response from all of its contributing resources.

Why 2PC is not a viable solution?

Co-ordinator acts as a single point of failure.
During transactions locks are imposed on resources which brings down the overall throughput of the system.
In the worst case scenario, the retry attempts may lead to O(n^2) complexity

What is a Saga ?

"Long lived transactions (LLTs) hold on to database resources for relatively long periods of time, significantly delaying the termination of shorter and more common transactions. To alleviate these problems we propose the notion of a saga. A LLT is a saga if it can be written as a sequence of transactions that can be interleaved with other transactions. The database management system guarantees that either all the transactions in a saga are successfully completed or compensating transactions are run to amend a partial execution." -- Hector Garcaa-Molrna and Kenneth Salem , 1987

A large unit of functionality (T) can be thought of as a saga, which can be considered as a collection of sub transactions T1, T2,T3....Tn each independently running in its own context. Sub-transaction should not have a dependency on one another and they may or may not be ACID(Atomic,Consistent,Isolated,Durable) compliant. In a micro-service based architecture, a aggregator service(T)which is depending on the responses of other services (T1,T2,T3,...Tn) can be thought of as a saga comprising of sub resources/services.

Each transaction/request (Ti) in the saga has an equivalent compensating transaction/request (Ci). In other words, C1 semantically compensates the effect of T1. Ci effectively is an abort or rollback transaction/request which undoes the effect of Ti.

So in a typical scenario, execution of T1,T2,T3,...Tn means that the saga was successfully executed without any compensating actions. Or, there were compensating actions/requests Ci for every completed sub transactions Ti, in the event of an abort saga request.

How to execute a Saga ?

Each transaction/request in a saga is maintained by a Saga Execution Component(SEC). SEC is responsible for executing each of the sub transaction/requests and in the event of failure of any of the sub transaction/request(Ti), it can execute the corresponding compensating transaction/request(Ci). Typically, SEC can be a thought of as a state machine or event store managing events of each sub transactions/requests and compensating transactions/requests. So ideally, SEC should be running as a highly available, scalable process within its own separate context.

Logging saga events

Its important to have durable log streams of saga events which typically looks somewhat like this

Begin Saga:
    
   Begin Ti:
   
   End Ti
   
   Begin Ci:
   
   End Ci

End Saga

Backward Recovery

In backward recovery compensating transactions/requests try to restore the sub transactions/requests by nullifying it. So, basically compensating transactions/requests rollback the effect of the sub transactions/requests.

When to trigger Compensating requests ?

Whenever there is an abort/error saga request from any of the services to SEC.
Whenever any of the request fails due to unavailability/error.
SEC itself fails , triggering compensating requests on tear down.

What if Compensating requests fail ?

Since we are executing transactions over a distributed network, we should take into consideration that even compensating requests may fail. For that, it is critical that compensating requests are idempotent (an operation that has no additional effect if it is called more than once with the same input parameters), which implicitly mandates that compensating requests should be stateless. Compensating requests should be retried until we get a success state.

Forward Recovery

In case of pure forward recovery there are no compensating transactions rather the saga retries by restarting it from the beginning in the event of error/abort saga. So ideally in a pure forward recovery saga, each transaction/request(Ti) has to be idempotent.

Backward and Forward Recovery

Backward and Forward recovery in a saga can can be designed with proper save points.

Let's says a saga executes transactions/requests in the following order T1, T2, followed by a save-point command, and then transaction/request T3. Then during the execution of transaction/request T4 the system crashes. When the system recovers then it must must first perform a backward recovery to the save-point (aborting T4 and running C3). After ensuring that the code for running T3, T4, is available, the SEC can restart the saga in forward recovery mode for T1,T2.

Using Sagas as a Pattern for Distributed transactions

Rahul C.

Definition - What does a Distributed Transaction mean?

Two phase Commit(2PC) in a Distributed System

Why 2PC is not a viable solution?

What is a Saga ?

How to execute a Saga ?

Logging saga events

Backward Recovery

When to trigger Compensating requests ?

What if Compensating requests fail ?

Forward Recovery

Backward and Forward Recovery

Others also viewed

Serializable Snapshot Isolation: Improving on Optimistic Concurrency Control

Demystifying Cache Strategies: Unleashing the Power of Data Access Efficiency

From ACID to PACELC: How Real-World Constraints Shaped Distributed Databases

Tips and Techniques for Using TransactionScope in C#

⏳ MVCC: Your Database’s Time Machine — How Dashboards Read the Past Without Locking the Future

The 7 Real Scalability Problem Clusters Blocking Institutional Tokenization

Understanding Database Isolation Levels: A Focus on Repeatable Read

Anomalies in Database Transactions

When Kafka Scaling Fails Quietly: The “Bull Run vs Tax Reporting” Problem

Why and When to Use the @Transactional Annotation in Spring

Explore content categories