Building a Smart HTTP Client in Java
🚀 A Dynamic Thread Management with Rate Limiting & Circuit Breaker
📌 By Kashan Asim | Aug 2025 🔗 GitHub Repository
In my recent work with a bulk API calling system, I had to design an HTTP client that could hit downstream services at a desired input TPS (transactions per second), dynamically adapt to runtime latency or congestion, and avoid overwhelming downstream services.
Initially, the system either over-flooded the service with threads or hung when the server started lagging. So, I decided to go beyond traditional approaches like RestTemplate or WebClient with fixed thread pools and implement a self-aware HTTP client.
In this article, I’ll walk you through:
💡 The Problem: Threads Gone Wild
When firing thousands of requests per minute, using a static thread pool creates two extremes:
What we needed was an adaptive mechanism that starts with a baseline thread count and dynamically scales up/down based on how quickly requests complete.
🧠 The Solution: Dynamic Threaded HTTP Client
Let’s break it down into its major components:
1️⃣ Circuit Breaker & Rate Limiter
RateLimiter rateLimiter = RateLimiter.create(targetTPS); // Guava's RateLimiter
if (!circuitBreaker.allowRequest()) {
return CompletableFuture.completedFuture(
new AdaptiveHttpResponse("Circuit breaker is open")
);
}
rateLimiter.acquire(); // Enforce TPS
This makes sure:
2️⃣ Submitting Requests with Monitoring
CompletableFuture<AdaptiveHttpResponse> send(HttpRequest request, int retry) {
rateLimiter.acquire();
if (retry == 0) total.incrementAndGet();
return CompletableFuture.supplyAsync(() -> {
Instant start = Instant.now();
try {
// Simulate a call to downstream service
HttpResponse<String> response = httpClient.send(request, BodyHandlers.ofString());
return new AdaptiveHttpResponse(response.body());
} catch (Exception ex) {
// Retry logic, error tracking etc.
} finally {
updateCompletionStats(Duration.between(start, Instant.now()).toMillis());
}
}, executor);
}
We track latency and adjust thread counts accordingly.
Recommended by LinkedIn
3️⃣ Dynamic Thread Tuner
We run a scheduler every 15 seconds to check current TPS and adjust thread count.
int delta = actualTPS - targetTPS;
if (delta < 0) {
currentThreads += Math.min(Math.abs(delta), maxThreads - currentThreads);
} else {
currentThreads = Math.max(minThreads, currentThreads - (delta / 2));
}
((ThreadPoolExecutor) executor).setCorePoolSize(currentThreads);
((ThreadPoolExecutor) executor).setMaximumPoolSize(currentThreads);
This ensures:
🧪 The Test: Simulated Delay Service
I created a quick Spring Boot controller to simulate downstream latency and spike scenarios.
@GetMapping("/{delay}/{count}")
ResponseEntity getDelay(@PathVariable int delay, @PathVariable int count) throws InterruptedException {
Thread.sleep(delay * 1000);
if (Math.random() > 0.8)
Thread.sleep(40000); // Random spike
System.out.println("Request received!\t" + count);
return ResponseEntity.ok(count);
}
✅ This helped verify how well the HTTP client adjusts to changing server responsiveness.
📈 Example Scenario
Let’s say we want to achieve 100 TPS:
This loop continues, always aiming to hover around target TPS, avoiding saturation.
🌍 Real-World Benefits
⚠️ A Drawback to Address
One limitation in the current implementation is the blind ramp-up/down logic. It assumes all request failures are performance-related, which isn't always the case. Also, it does not differentiate between client-side timeouts vs. server-side issues vs. network glitches.
💡 In a future article, I’ll improve this model by adding latency buckets, error classification, and a sliding window TPS calculator for smarter decisions. Stay tuned!
🧩 Final Thoughts
This smart HTTP client isn't just a tool — it’s a strategy for making high-volume services reliable, predictable, and efficient.
Feel free to contribute, fork, or follow the repo here: 🔗 GitHub Repository
but i think your approach is a fine one if we use traditional threads
Kāshān Asim very insightful this ...i have a query though did you try virtual threads with rate limiting they are light weight threads so even if there numbers is high it will be not as intense as platform thread