Distributed transactions in micro-services.
Nowadays, microservice architecture is becoming very popular. All the microservices need to solve a common problem that is "How to manage distributed transactions across multiple microservices". In this post, I will explain how we solved the problem of distributed transaction in my current project.
Distributed transaction
Transactions that span over multiple physical systems or computers over the network, are simply termed Distributed Transactions.
Transactions must be atomic, consistent, isolated, and durable (ACID). Transactions within a single service are ACID, but data consistency across services requires a cross-service transaction management strategy.
When a microservice architecture decomposes a monolithic system into various small services, it can break transactions. Our first approach should be avoiding the distributed transaction as much as possible. A local transaction in the monolithic system is distributed into multiple services that will be called in a sequence.
A database-per-microservices model provides many benefits in micro services architectures.
Let's consider an e-commerce application example built on a monolithic design using a local transaction:
In the below example, when any customer places an Order in a monolithic system, the system will create a local database transaction that works over multiple database tables. If any of the step fails, the transaction can roll back. This is known as ACID (Atomicity, Consistency, Isolation, Durability), which is guaranteed by the database system.
But when we decompose this system, both the CustomerService and OrderService will have separate databases. After the decomposition, the Sequence diagram for the customer order use case will be as below.
When a placeOrder request comes from the user, both the microservices (CustomerService and OrderService) will be called to apply changes into their own database. Because the transaction is now across multiple databases, it is now considered as a distributed transaction.
In a monolithic system, the database system guarantees the transactions to be ACID. But in case of microservice based system, there is no concept of global transaction coordinator. So, maintaining data consistency across services is the one of the key challenges here.
Possible solutions
This problem is very common and important for microservices based systems. Data inconsistency issues will arise if it is not handled in an efficient way.
The following two patterns can resolve the problem:
- 2PC (two-phase commit): 2pc is widely used in database systems. For few of the situations, this approach can be used in microservices as well. But, I do not recommend this approach which I will explain in the post later.
- Saga
Two-phase commit (2pc) pattern
As it name suggests , 2pc has two phases:
Prepare phase : In the prepare phase, all microservices will be asked to prepare for some data change that could be done atomically.
Commit phase : Once all microservices are prepared, the Coordinator will initiate the commit phase i.e will ask all the microservices to make the actual changes or say commit the changes.
If at any point, any of the microservices fails to prepare, the Coordinator will abort the transaction and begins the rollback process. Here is a diagram of a 2pc rollback for the customer order example:
2pc is a very strong consistency protocol. First, the prepare and commit phases guarantee that the transaction is atomic. The transaction will end with either all microservices returning successfully or all microservices have nothing changed.
DRAWBACK:
It is not really recommended for many microservice-based systems because 2pc is synchronous (blocking). In a database system, transactions tend to be fast—normally within 50 ms. However, microservices have long delays with RPC calls, especially when integrating with external services such as a payment service. The lock could become a system performance bottleneck.
SAGA Pattern
SAGA is one of the best way to ensure the consistency of the data in a distributed architecture without having a single ACID transaction. SAGA commits multiple compensatory transactions at different stages to ensure the rollback whenever required. SAGA introduce the master process called “Saga Execution Coordinator” or SEC.
The High-level design for any e-commerce application would look like below:
There are couple of different ways to implement a SAGA transaction, but the two most popular are:
- Events/Choreography: When there is no central coordination, each service produces and listens to the other service's events and decides if an action should be taken or not.
- Command/Orchestration: Here, a coordinator service is responsible for centralizing the saga's decision making and sequencing business logic.
Let's go a little bit deeper to understand how they work.
Events/Choreography:
In the Events/Choreography approach, the first service executes a transaction from some external actions and publishes an event. This event is listened to by one or more services, which execute local transactions and publish new events.
The distributed transaction ends when the last service executes its local transaction and does not publish any events, or the event published is not heard by any of the saga's participants.
- When a customer places an order, Order Service will receive a request .It will create new order object and save its state as PENDING in its own dB and publish an event called ORDER_CREATED_EVENT in one of the queues.
- The Payment Service listens to ORDER_CREATED_EVENT and block some amount from customer account and publishes the event PAYMENT_CREATED_EVENT.
- The Inventory Service listens to PAYMENT_CREATED_EVENT first, then checks if the product is in it's inventory, prepares the products bought in the order and publish the event ORDER_READY_EVENT.
- Delivery Service listens to ORDER_READY_EVENT and then picks up and delivers the product. At the end, it publishes an ORDER_DELIVERED_EVENT .
- Finally, Order Service listens to ORDER_DELIVERED_EVENT and sets the state of order object as DONE in its own dB .
In the above case, Order Service could simply listen to all events or some events and update its order object state.
Rollbacks in distributed transactions
If any local transaction fails, lets suppose the product is not available at the time, request comes to Inventory service.Then the following actions will take place:
- Inventory service will generate the PRODUCT_NOT_READY_EVENT.
- Both Order Service and Payment Service listen to this message and trigger compensating transactions that undo the changes that were made by the preceding local transactions.
- Payment Service reverts the amount to customer and Order service will change the state of the Order object to FAIL.
Note: Here, common shared ID for each transactions will play a significant role. So,we will emit transaction id in every event as through these ids, any service will identity which transaction needs to be rolled back.
Advantages
- It is easier to execute the simple workflows that require few micro services.
- Since the responsibilities are distributed across the saga participants, so there is no chance of Single Point of Failure.
Drawbacks
- Workflow can become confusing when adding new steps as it's difficult to track which saga participants listen to which of the commands.
- There may be a a risk of cyclic dependency between saga participants as they have to consume each other's commands.
Command/Orchestration:
Orchestration is a way to coordinate sagas where a centralized controller tells the saga participants which local transactions to execute. The saga orchestrator handles all the transactions and tells the participants which operation to perform based on events. The orchestrator executes saga requests, stores and interprets the states of each task, and handles failure recovery with compensating transactions.
Benefits
- This approach is more suitable where our distributed transaction involve more microservices and workflow is quite complex.
- This approach doesn't introduce cyclic dependencies, as the orchestrator is used to coordinate between them.
- Saga participants don't need to know about commands for other participants. Clear separation of concerns simplifies business logic.
Drawbacks
- A new component which have complex logic is added in the system.
- SPF (Single point of failure) : Since all the coordination activities among various microservice will be driven by orchestrator layer/component, so the load and scalabality is very important .
I hope the concept related to SAGA pattern is clear now. If the concept is clear, then the implementation in real project will not be so difficult.
In next articles, I will try to explain the implementation of this pattern without (using BPM tool) using any third party framework.
We will discuss how to use the Axon Framework and Axon Server to implement saga pattern in our microservice based project.
Thanks to Chris Richardson for explaining the SAGA so efficiently .
Refer YouTube Link for more clarity on SAGA pattern : https://www.youtube.com/watch?v=YPbGW3Fnmbc
Please provide your valuable feedback/comments/suggestion and feel free to connect in case you need any clarification. :)
I plan to use the Choreography-based saga to implement the transaction, coz I feel like the workflow is clear with a high-level picture and really easy to understand and follow. for the Orchestration-based saga, it needs the message broker as the middleware. could you give me an example in the real world, what kind of applications need to use the orchestration-based sage which is the best suited for them? also, the Choreography saga supports CQRS as well. Thank you
(Except Axon) How can we implement 2PC OR 3PC or SAGA in spring cloud? I mean specifically for Spring Boot / Spring Cloud?
Easy to understand 👍