Saga Design Pattern in Microservices (With Code & Interview Questions)
Imagine you’re booking a holiday package online. You pick a flight, reserve a hotel, and rent a car — all in one go.
Each of these is handled by a different service (Flight Service, Hotel Service, Car Rental Service), with its own database.
Now, what happens if:
Do you still want the flight booked while everything else failed? Of course not. You’d expect the whole transaction to roll back gracefully.
But here’s the problem: in microservices, there’s no single database transaction that magically undoes everything. Each service is independent.
That’s where the Saga Design Pattern steps in.
Think of it like a chain of promises with backup plans. Each step in the process has a “happy path” (book the hotel) and a “sorry, let’s undo it” path (cancel the flight if the hotel fails).
The Saga makes sure the entire journey either succeeds as a whole or gracefully compensates when things fall apart.
What Is the Saga Pattern:
A Saga is a sequence of local transactions. Each transaction updates a service’s database and then triggers the next step via an event or a command.
If one transaction fails, instead of rolling back everything, the Saga executes a series of compensating transactions to undo the previous work.
Simple Example: Travel Booking
If Step 2 fails (hotel booking), you don’t roll back the flight in the database directly. Instead, you compensate by cancelling the flight reservation.
import java.util.*;
// Represent one step in the Saga
class SagaStep {
Runnable action;
Runnable compensation;
SagaStep(Runnable action, Runnable compensation) {
this.action = action;
this.compensation = compensation;
}
}
// Saga orchestrator
class Saga {
List<SagaStep> steps = new ArrayList<>();
void addStep(SagaStep step) {
steps.add(step);
}
void execute() {
Stack<SagaStep> completed = new Stack<>();
try {
for (SagaStep step : steps) {
step.action.run();
completed.push(step);
}
System.out.println("Saga completed successfully!");
} catch (Exception e) {
System.out.println("Saga failed: " + e.getMessage());
System.out.println("Running compensations...");
while (!completed.isEmpty()) {
completed.pop().compensation.run();
}
}
}
}
// Example usage
public class TravelBookingSaga {
public static void main(String[] args) {
Saga saga = new Saga();
// Step 1: Reserve Flight
saga.addStep(new SagaStep(
() -> {
System.out.println("Flight reserved");
},
() -> {
System.out.println("Cancel flight reservation");
}
));
// Step 2: Book Hotel
saga.addStep(new SagaStep(
() -> {
System.out.println("Hotel booked");
// Simulate failure here
throw new RuntimeException("Hotel booking failed!");
},
() -> {
System.out.println("Cancel hotel booking");
}
));
// Step 3: Rent Car
saga.addStep(new SagaStep(
() -> {
System.out.println("Car rented");
},
() -> {
System.out.println("Cancel car rental");
}
));
saga.execute();
}
}
Sample Output (when hotel booking fails):
Flight reserved
Hotel booked
Saga failed: Hotel booking failed!
Running compensations...
Cancel flight reservation
Why Do We Need Sagas:
In monolithic systems, transactions are straightforward:
But microservices don’t play by those rules:
Without something like Saga, you’re stuck with two bad choices:
Saga is the middle ground — it gives you eventual consistency with reliable compensation mechanisms.
Saga Execution Styles:
There are two main ways to orchestrate Sagas:
1. Choreography (Event-Driven):
Think of choreography like a flash mob dance:
How it works:
Example: Order Processing
// Order Service
eventBus.publish("OrderCreated", orderId);
// Payment Service
eventBus.on("OrderCreated", (orderId) -> {
System.out.println("Processing payment for " + orderId);
eventBus.publish("PaymentProcessed", orderId);
});
// Inventory Service
eventBus.on("PaymentProcessed", (orderId) -> {
System.out.println("Reserving stock for " + orderId);
eventBus.publish("StockReserved", orderId);
});
// Shipping Service
eventBus.on("StockReserved", (orderId) -> {
System.out.println("Shipping order " + orderId);
eventBus.publish("OrderShipped", orderId);
});
// Payment Service - failure compensation
eventBus.on("OrderCreated", (orderId) -> {
if (!processPayment(orderId)) {
eventBus.publish("PaymentFailed", orderId);
}
});
// Order Service reacts
eventBus.on("PaymentFailed", (orderId) -> {
System.out.println("Cancelling order " + orderId);
});
In real systems, this event bus could be Kafka, RabbitMQ, or any messaging system.
Pros:
Cons:
Best For: Small workflows, where steps are simple and services are loosely coupled.
2. Orchestration (Centralized):
Orchestration is more like a conductor leading an orchestra:
How it works:
Example: Order Processing
class OrderOrchestrator {
PaymentService paymentService;
InventoryService inventoryService;
ShippingService shippingService;
public void processOrder(String orderId) {
try {
System.out.println("Paying for " + orderId);
paymentService.pay(orderId);
System.out.println("Reserving stock for " + orderId);
inventoryService.reserve(orderId);
System.out.println("Shipping " + orderId);
shippingService.ship(orderId);
System.out.println("Order completed successfully!");
} catch (Exception e) {
System.out.println("Failure: " + e.getMessage());
System.out.println("Running compensations...");
// Compensation (reverse order)
shippingService.cancel(orderId);
inventoryService.release(orderId);
paymentService.refund(orderId);
}
}
}
Pros:
Cons:
Best For: Large, complex workflows that need tight coordination.
Key Design Considerations
When designing Sagas, think about:
Recommended by LinkedIn
// Idempotency example (safe retry)
@PostMapping("/reserve")
public ResponseEntity<?> reserve(@RequestBody Request req) {
if (alreadyReserved(req.orderId)) {
return ResponseEntity.ok("Already processed");
}
// normal reservation logic
}
2. Compensation Logic: Define how to undo actions. Sometimes compensation isn’t possible — in those cases, design for manual intervention.
3. Failure Handling: What happens if compensation itself fails?
4. Monitoring & Visibility: Sagas can fail silently without good logging. You’ll need observability tools.
5. Timeouts & Retries: What if a service is slow but not dead? Balance between retries and compensation.
Real-World Example: Order Processing System
Let’s say you’re building an e-commerce checkout system.
If payment fails → cancel order. If inventory fails → refund payment and cancel order. If shipping fails → release inventory, refund payment, cancel order.
This is a Saga in action — each step has both a “do” and “undo” path.
Benefits:
Challenges and Trade-Offs:
Use Saga when:
Don’t use Saga when:
Interview Questions:
Below are some of the Interview Questions on Saga Design Pattern:
Q1. In the Saga Pattern, how do you ensure idempotency of compensating transactions in a distributed system where retries are common?
Q2. How would you handle the “double compensation problem” where two concurrent compensations might try to undo the same step in a Saga?
This happens when concurrent failures trigger rollback from multiple branches.
Solutions:
Q3. Compare Saga Pattern with Two-Phase Commit (2PC). Why would Saga be preferred in high-scale microservice architectures?
2PC:
Saga
Why Saga wins in microservices:
Q4. In Saga orchestration, how do you prevent the “Orchestrator Bottleneck” problem when it becomes a single point of failure or performance choke?
Below are some techniques to prevent the “Orchestrator Bottleneck” problem when it becomes a single point of failure or performance choke:
Q5. How do you design compensating transactions for non-reversible side effects (like sending emails, push notifications, or SMS)?
Not all actions are reversible. Strategies:
Q6. How would you detect and handle a stuck saga where one service does not respond indefinitely?
Q7. How do you ensure consistency across long-running sagas that span hours or days, where intermediate states may change due to external events?
Challenges: Data may drift while saga is running.
Solutions:
Final Thoughts
Sagas aren’t about avoiding failure — they’re about failing predictably and recovering gracefully.
The Saga Design Pattern isn’t just theory — it’s the backbone of reliable distributed transactions in modern systems.
The key is not to treat it as a silver bullet. It comes with its own complexity, and designing good compensations is often harder than designing the happy path.
But once you understand it, Sagas give you a way to embrace microservices without sacrificing reliability.
At the end of the day, distributed systems are all about making trade-offs. Sagas simply make those trade-offs explicit — and manageable.