How do you call multiple downstream services from an API

Alok Pratap Singh

Published Apr 4, 2026

Problem

Client → API A → (calls 5 downstream services: S1, S2, S3, S4, S5)

I know the interviewer is not expecting the sequential calls approach because:

Description: Call each service one after another.
Pros: Simple to implement; easy error handling.
Cons: Slow if services are independent; total response time = sum of all service calls.
When to use: When calls depend on each other.

Instead, I explain a couple of approaches for calling multiple downstream services from an API, depending on scalability, latency and reliability requirements.

1. Event-Driven Approach:

One approach is to use an event-based system, like Kafka, as a broker between the API and downstream services. The API can publish events and downstream consumers can subscribe to the events they are interested in. This decouples the services, allows asynchronous processing, and improves scalability.

Question - What happens if the broker goes down?

Answer - In production, to avoid a single point of failure, we would deploy multiple Kafka brokers for high availability and fault tolerance.

2. CompletableFuture / Asynchronous Calls:

For direct, non-blocking calls from the API to multiple services, we can use CompletableFuture in Java to execute calls in parallel.

Question - What if CompletableFuture experiences high load?

Answer -

Problem 1 :- Thread Pool Exhaustion (Biggest Problem) :- By default, CompletableFuture uses : ForkJoinPool.commonPool()

Problems:

Limited threads (≈ CPU cores)
Not designed for blocking I/O (HTTP/DB calls)

Under load:

1000 requests × 5 async calls = 5000 tasks

Threads get exhausted → tasks queue → latency spikes

Problem 2 :- Thread Blocking → Deadlock-like Situation

If you do:

future.get(); // blocking

Threads wait instead of doing work

Problem 3 :- Memory Pressure

Each CompletableFuture consumes memory
Thousands of futures → heap pressure
Risk: OutOfMemoryError

Problem 4 :- Cascading Failures

One slow downstream service
Threads blocked waiting
Entire system slows down

By default, CompletableFuture uses the ForkJoinPool.commonPool(), which is sized according to CPU cores. For high I/O-bound operations, using the default ForkJoinPool may not be sufficient under high load. In that case, it’s better to use a ThreadPoolExecutor or newFixedThreadPool(fixed-size) tuned for I/O tasks to avoid starvation and ensure optimal throughput.

Production-Grade Solution

1. Latency Control

Use parallel calls
Add timeout per service

Recommended by LinkedIn

[Java][JVM Logs][GC Logs][G1GC] Monday with JVM logs -…

Krzysztof Ślusarski 4 years ago

Exactly once delivery guarantee and Failure recovery…

Tarun Annapareddy 3 years ago

Tackling CPU-Bound Tasks in Node.js with Worker…

saurav chatterjee 1 year ago

2. Fault Tolerance

What if 1 service fails?

Use:

Circuit Breaker (Resilience4j)
Fallback response

S3 fails → return partial response

3. Bulkhead Pattern

Separate thread pools per service:

S1 → pool1
S2 → pool2
S3 → pool3

One slow service won’t affect others

4. Timeout Strategy

Example:

S1 timeout → 200ms
S2 timeout → 300ms

5. Response Aggregation

Combine results:

{

"user": {...},

"orders": [...],

"payments": [...],

"recommendations": [...]

}

6. Backpressure / Rate Limiting

Limit incoming requests
Avoid system overload

7. Observability

Distributed tracing (Zipkin / Jaeger)
Logs + metrics

8. Consider Reactive (Better at Scale) - ( I forgot to cover)

Instead of:

CompletableFuture

Use:

Reactive (WebFlux)

Tools:

Spring WebFlux
Project Reactor

Used for Non-blocking, Handles high concurrency better

To view or add a comment, sign in

How do you call multiple downstream services from an API

Alok Pratap Singh

1. Event-Driven Approach:

2. CompletableFuture / Asynchronous Calls:

Production-Grade Solution

Recommended by LinkedIn

More articles by Alok Pratap Singh

Others also viewed

[Java][JVM][Tuning][Profiling] Biased lock disabled, again

Performance measurement and observations for spring data save() & saveAll()

The Hidden Pitfall of Running SQS and HTTP in the Same ECS Task

Node.js Worker Threads – In a Nutshell

[Java][JVM Logs][GC Logs] Monday with JVM logs - allocation rate

Persisted Queues: The Basics of Single Producer - Multiple Consumers

Conflation queues --- protecting high frequency systems during peak loads

⚠️ 7 Reasons Your Code Works Locally But Breaks in Production🚨

Explore content categories

1. Event-Driven Approach:

2. CompletableFuture / Asynchronous Calls:

Production-Grade Solution

Recommended by LinkedIn

More articles by Alok Pratap Singh

Java Concurrency Made Simple: From Threads to Executors

Prevent Kafka Consumer Lag

How to Apply CQRS in Legacy Monolithic Applications

Others also viewed

[Java][JVM][Tuning][Profiling] Biased lock disabled, again

Performance measurement and observations for spring data save() & saveAll()

The Hidden Pitfall of Running SQS and HTTP in the Same ECS Task

Node.js Worker Threads – In a Nutshell

[Java][JVM Logs][GC Logs] Monday with JVM logs - allocation rate

Persisted Queues: The Basics of Single Producer - Multiple Consumers

Conflation queues --- protecting high frequency systems during peak loads

⚠️ 7 Reasons Your Code Works Locally But Breaks in Production🚨

Explore content categories