How do you call multiple downstream services from an API
Problem
Client → API A → (calls 5 downstream services: S1, S2, S3, S4, S5)
I know the interviewer is not expecting the sequential calls approach because:
Instead, I explain a couple of approaches for calling multiple downstream services from an API, depending on scalability, latency and reliability requirements.
1. Event-Driven Approach:
One approach is to use an event-based system, like Kafka, as a broker between the API and downstream services. The API can publish events and downstream consumers can subscribe to the events they are interested in. This decouples the services, allows asynchronous processing, and improves scalability.
Question - What happens if the broker goes down?
Answer - In production, to avoid a single point of failure, we would deploy multiple Kafka brokers for high availability and fault tolerance.
2. CompletableFuture / Asynchronous Calls:
For direct, non-blocking calls from the API to multiple services, we can use CompletableFuture in Java to execute calls in parallel.
Question - What if CompletableFuture experiences high load?
Answer -
Problem 1 :- Thread Pool Exhaustion (Biggest Problem) :- By default, CompletableFuture uses : ForkJoinPool.commonPool()
Problems:
Under load:
Threads get exhausted → tasks queue → latency spikes
Problem 2 :- Thread Blocking → Deadlock-like Situation
If you do:
future.get(); // blocking
Threads wait instead of doing work
Problem 3 :- Memory Pressure
Problem 4 :- Cascading Failures
By default, CompletableFuture uses the ForkJoinPool.commonPool(), which is sized according to CPU cores. For high I/O-bound operations, using the default ForkJoinPool may not be sufficient under high load. In that case, it’s better to use a ThreadPoolExecutor or newFixedThreadPool(fixed-size) tuned for I/O tasks to avoid starvation and ensure optimal throughput.
Production-Grade Solution
1. Latency Control
Recommended by LinkedIn
2. Fault Tolerance
What if 1 service fails?
Use:
S3 fails → return partial response
3. Bulkhead Pattern
Separate thread pools per service:
One slow service won’t affect others
4. Timeout Strategy
Example:
5. Response Aggregation
Combine results:
{
"user": {...},
"orders": [...],
"payments": [...],
"recommendations": [...]
}
6. Backpressure / Rate Limiting
7. Observability
8. Consider Reactive (Better at Scale) - ( I forgot to cover)
Instead of:
Use:
Tools:
Used for Non-blocking, Handles high concurrency better