Thread Pool Starvation: When CPU is 20% and Memory is Fine

🚨 Your service can go down even when CPU is 20% and memory is perfectly fine. Sounds weird? It’s usually Thread Pool Starvation. Consider a typical setup: ExecutorService pool = Executors.newFixedThreadPool(10); Each request does: pool.submit(() -> { return dbCall(); // blocking }); 💥 What happens under load: 10 threads pick up 10 requests All 10 are blocked waiting on DB 11th request arrives → queued Queue grows → latency spikes Eventually → timeouts ⚠️ Observability looks confusing: CPU → low Memory → normal Threads → all busy Requests → timing out Nothing “looks broken”… but your system is effectively stuck. 🧠 Root cause: You’re using a limited thread pool for blocking operations. Threads are not doing work — they’re just waiting. 🔥 Real production scenario: DB latency increases from 20ms → 300ms Same thread pool size Throughput drops drastically Queue explodes System appears to collapse suddenly. ✅ Better approaches: ✔ Separate pools for blocking work ✔ Tune pool size based on workload ✔ Use async/non-blocking where possible ✔ Apply backpressure (don’t accept infinite requests) Example: ExecutorService dbPool = Executors.newFixedThreadPool(50); ExecutorService cpuPool = Executors.newFixedThreadPool(8); 💡 Key insight: More threads ≠ more throughput If your threads are waiting, adding more just increases: memory usage context switching instability 🚀 The real question isn’t: “How many threads do I have?” It’s: “What are my threads doing — working or waiting?” Have you ever seen a system fail with low CPU but high latency? #BackendEngineering #Java #SoftwareEngineering #Multithreading #Concurrency #SystemDesign #SpringBoot #PerformanceEngineering

To view or add a comment, sign in

More Relevant Posts

Sarthak Singhwal
4d
Report this post
Our service was dropping requests under load. CPU was at 30%. Everything looked "fine." It wasn't a capacity problem. It was thread-pool starvation. Here's how I diagnosed and fixed it: 📌 SYMPTOM 503 errors and timeouts spiking. CPU looked idle. DB looked healthy. Restarting temporarily fixed it. 🔍 DIAGNOSIS Pulled a thread dump → every thread was in WAITING state. Pool was saturated. Queue depth was at max. 🕵️ ROOT CAUSE One upstream service had no timeout set. Each request grabbed a thread and held it for 8–15 seconds. 200 threads × 10s = 2,000 thread-seconds of capacity gone. 🛠 FIX 1. Set timeouts on all external calls (non-negotiable) 2. Moved slow tasks to a separate pool 3. Added circuit breakers to fail fast 📐 SIZING (the math that saved us) I/O-bound pool size = cores × (1 + wait_time / compute_time) We had 8 cores and 90% wait time → pool needed ~80 threads, not 20. The lesson: When CPU is low but requests drop, look at your thread pool, not your servers. Have you ever debugged thread-pool starvation? Drop your experience below👇 #SystemDesign #Backend #SoftwareEngineering #Java #DistributedSystems
1 Comment
Like Comment
To view or add a comment, sign in
Nikolay Sivko
6d
Report this post
Continuous profiling usually means CPU profiles, especially in the #ebpf world. But for many incidents, memory profiling is just as useful. When an application is under memory pressure, infrastructure metrics alone are not enough. You need to dig into the code level and understand which functions allocate memory and why. Coroot can now collect heap profiles for Go applications with zero configuration. No code changes, no redeploys, and surprisingly, no eBPF involved :) I described how we implemented it in the Coroot blog: https://lnkd.in/dxwXjxmV

Zero-config Go heap profiling | Coroot Blog coroot.com

2 Comments
Like Comment
To view or add a comment, sign in
Adarsh Agarwala
3w
Report this post
𝗗𝗮𝘆 𝟵𝟱 𝗼𝗳 ∞: 𝗕𝗮𝗻𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝗢𝗽𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 — 𝗖𝗣𝗨 𝗣𝗶𝗻𝗻𝗶𝗻𝗴 & 𝗶𝘀𝗼𝗹𝗰𝗽𝘂𝘀 🛑 You’ve written the perfect C++ matching engine. Your data is cache-aligned, you are using Lock-Free queues, and your compiler optimizations are flawless. But at 2:00 PM, right as the market flashes, your 500-nanosecond latency randomly spikes to 50,000 nanoseconds. 𝗪𝗵𝘆? Because the Linux Operating System decided to pause your trading thread to run a background system update. 𝗧𝗵𝗶𝘀 𝗶𝘀 𝗰𝗮𝗹𝗹𝗲𝗱 𝗢𝗦 𝗝𝗶𝘁𝘁𝗲𝗿, and here is how High-Frequency Trading (HFT) engineers destroy it. 🤹♂️ 𝗧𝗵𝗲 𝗟𝗶𝗻𝘂𝘅 𝗦𝗰𝗵𝗲𝗱𝘂𝗹𝗲𝗿 𝗧𝗿𝗮𝗽 Linux is a time-sharing system. To juggle hundreds of background processes, the OS Scheduler rapidly swaps programs in and out of the CPU. If the 𝗢𝗦 𝗽𝗮𝘂𝘀𝗲𝘀 𝘆𝗼𝘂𝗿 𝗧𝗿𝗮𝗱𝗶𝗻𝗴 𝗧𝗵𝗿𝗲𝗮𝗱 𝘁𝗼 𝗽𝗿𝗼𝗰𝗲𝘀𝘀 𝗮 𝗿𝗮𝗻𝗱𝗼𝗺 𝗯𝗮𝗰𝗸𝗴𝗿𝗼𝘂𝗻𝗱 𝘁𝗮𝘀𝗸, 𝘪𝘵 𝘵𝘳𝘪𝘨𝘨𝘦𝘳𝘴 𝘢 𝘊𝘰𝘯𝘵𝘦𝘹𝘵 𝘚𝘸𝘪𝘵𝘤𝘩 𝘢𝘯𝘥 𝘰𝘷𝘦𝘳𝘸𝘳𝘪𝘵𝘦𝘴 𝘺𝘰𝘶𝘳 𝘱𝘦𝘳𝘧𝘦𝘤𝘵𝘭𝘺 𝘸𝘢𝘳𝘮𝘦𝘥-𝘶𝘱 𝘓1 𝘊𝘢𝘤𝘩𝘦. 𝘐𝘯 𝘏𝘍𝘛, 𝘵𝘩𝘪𝘴 𝘶𝘯𝘱𝘳𝘦𝘥𝘪𝘤𝘵𝘢𝘣𝘭𝘦 𝘭𝘢𝘵𝘦𝘯𝘤𝘺 𝘴𝘱𝘪𝘬𝘦 𝘪𝘴 𝘢 𝘥𝘦𝘢𝘵𝘩 𝘴𝘦𝘯𝘵𝘦𝘯𝘤𝘦. 📌 𝗦𝘁𝗲𝗽 𝟭: 𝗖𝗣𝗨 𝗔𝗳𝗳𝗶𝗻𝗶𝘁𝘆 (𝘁𝗮𝘀𝗸𝘀𝗲𝘁) The first defense is "Pinning." C++ engineers use POSIX threads (pthread_setaffinity_np) or the Linux taskset command to lock a specific thread to a specific CPU core. "Lock the Network Thread to Core 2, and the Trading Thread to Core 3." This stops your threads from bouncing across the silicon, keeping the L1 cache hot. But the OS can still put background tasks on those cores. ☢️ 𝗦𝘁𝗲𝗽 𝟮: 𝗖𝗼𝗿𝗲 𝗜𝘀𝗼𝗹𝗮𝘁𝗶𝗼𝗻 (𝗶𝘀𝗼𝗹𝗰𝗽𝘂𝘀) To achieve true zero-jitter performance, Quants edit the core Linux boot parameters and deploy isolcpus. This tells the Linux kernel to literally pretend specific CPU cores do not exist. The OS is forbidden from scheduling any background tasks, network interrupts, or cron jobs on those isolated cores. They become completely silent, empty rooms. We then manually inject our C++ Trading Thread into that isolated core and lock the door. 𝘛𝘩𝘦 𝘵𝘩𝘳𝘦𝘢𝘥 𝘴𝘱𝘪𝘯𝘴 𝘪𝘯 𝘢𝘯 𝘪𝘯𝘧𝘪𝘯𝘪𝘵𝘦 𝘸𝘩𝘪𝘭𝘦(𝘵𝘳𝘶𝘦) 𝘭𝘰𝘰𝘱 𝘸𝘪𝘵𝘩 100% 𝘶𝘯𝘥𝘪𝘴𝘱𝘶𝘵𝘦𝘥 𝘰𝘸𝘯𝘦𝘳𝘴𝘩𝘪𝘱 𝘰𝘧 𝘵𝘩𝘦 𝘴𝘪𝘭𝘪𝘤𝘰𝘯 𝘩𝘢𝘳𝘥𝘸𝘢𝘳𝘦. 𝗭𝗲𝗿𝗼 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝘀𝘄𝗶𝘁𝗰𝗵𝗲𝘀. 𝗭𝗲𝗿𝗼 𝗢𝗦 𝗝𝗶𝘁𝘁𝗲𝗿. 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆: 𝘐𝘧 𝘺𝘰𝘶 𝘸𝘢𝘯𝘵 𝘢𝘣𝘴𝘰𝘭𝘶𝘵𝘦 𝘥𝘦𝘵𝘦𝘳𝘮𝘪𝘯𝘪𝘴𝘵𝘪𝘤 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦, 𝘺𝘰𝘶 𝘤𝘢𝘯𝘯𝘰𝘵 𝘴𝘩𝘢𝘳𝘦 𝘺𝘰𝘶𝘳 𝘩𝘢𝘳𝘥𝘸𝘢𝘳𝘦 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘖𝘱𝘦𝘳𝘢𝘵𝘪𝘯𝘨 𝘚𝘺𝘴𝘵𝘦𝘮. #CPlusPlus #Quant #HighFrequencyTrading #SoftwareEngineering #Linux #OperatingSystems #ComputerScience #LearnCpp #Day95 #LearningJourney
Like Comment
To view or add a comment, sign in
abhinav Ashok kumar
3w
Report this post
Two C++ loops. Same code. Same flags. One: 0.25s Other: 1.25s 👉 Code didn’t change. 👉 CPU behavior did. If: instructions ≈ same cycles ↑ IPC ↓ cache misses ↑ 👉 It’s not compute 👉 It’s memory I wrote a deep dive on using perf to see this clearly. 📖 https://lnkd.in/gjzwJWTP If you’re not measuring, you’re guessing. #CPP #Performance #Systems #LLVM #compilersutra

From Concepts to Reality: Measuring Throughput, Cache Misses, and CPU Behavior in C++ | CompilerSutra compilersutra.com

1 Comment
Like Comment
To view or add a comment, sign in
Shriram Thangapandian
3w
Report this post
Most .NET developers use async/await — but do you know what's actually happening under the hood? Let me explain it with a simple analogy. Think of a thread as a string wire. Think of the CPU as a worker who processes tasks. Think of an I/O call (DB query, HTTP request, file read) as sending an errand to an external helper — the worker doesn't do this job, an outside system does. Without async (synchronous): → The string wire attaches to your request → The CPU worker starts processing → An I/O call is made — the errand goes to an external helper → But the string wire just sits there, doing nothing, waiting for the errand to return → It won't let go until the response comes back → Every other request waits in the queue — even though the CPU is free 🚦 The wire is idle. The CPU is idle. But the system is stuck. With async/await: → The string wire attaches to your request → The CPU worker starts processing → An I/O call is made — errand goes to the external helper → The string wire detaches itself immediately → It picks up the next waiting request and brings it to the CPU → When the errand (I/O) returns, a wire picks the original request back up and completes it ⚡ The CPU stays busy. The threads stay busy. No one sits idle waiting. This is the real power of async/await — it's not about making things faster on the CPU. It's about not wasting threads on idle waiting. One important caveat though: If your CPU gets flooded with too much compute-heavy work — async won't save you. That's a different bottleneck entirely. You'll still hit 100% CPU. So remember: ✅ Use async for I/O-bound work — DB, HTTP, file operations 🚫 Async won't help CPU-bound work — heavy computation needs a different solution The string wire should never just sit and wait when it could be doing something useful. 🧵 Have you ever hit thread starvation or a CPU spike in production? What was the root cause? Drop it below 👇 #dotnet #asyncawait #csharp #softwareengineering #backenddev #dotnetcore
Like Comment
To view or add a comment, sign in
Sukhmanjot Singh
4w
Report this post
A subtle multithreading issue that kills performance: We parallelized a workload using multiple threads. On paper: More threads = faster execution. In reality: Performance barely improved. The reason? False sharing. Multiple threads were updating variables that lived on the same CPU cache line. • No locks involved • No obvious contention • Still… constant slowdowns The CPU kept invalidating cache lines between cores. The fix: Align or pad shared data to avoid cache line overlap. Not all contention is visible in code. Some of it happens at the hardware level. #Multithreading #Concurrency #BackendEngineering #Performance #SystemDesign #Java
Like Comment
To view or add a comment, sign in
Bruno Borges
1w
Report this post
Fresh #gRPC benchmarks just dropped (Apr 2026) and the results are spicy... # 1 CPU: #Rust Tonic dominates at 102K req/s using just 15 MiB RAM # 2-4 CPUs: #Java Vert.x takes the crown at 153K req/s... but needs 305 MiB to do it Meanwhile Rust Thruster does 139K req/s on 12 MiB. That's 25x less memory than Java. #dotNET quietly sitting at #4-#8 across all configs. No hype, just solid numbers. The boring enterprise choice... that keeps winning. #Scala Pekko & Akka? Surprise top-5 finishers at 3-4 CPUs. JVM tuning is real. Now let's talk about the bottom of the chart: - #Python: 4,492 req/s (1 CPU) - #Node: 12,471 req/s (1 CPU) That's 23x slower than Rust. Node doesn't even scale past 1 CPU — 16K req/s whether you give it 1 or 4 cores. But sure, "developer productivity" even when it is AI generating code these days...
14 Comments
Like Comment
To view or add a comment, sign in
Aaditya Srinivasan
2w Edited
Report this post
String validation in C++ is fast, until you have to do it millions of times per second across massive datasets. While engineering a correctness fix for a silent truncation bug in Apache Arrow’s base64_decode utility, an automated review flagged a bottleneck: the function was using a linear search (std::string::find) to validate every single byte. For a 1MB payload, that meant potentially tens of millions of redundant CPU operations. Rather than bloating the initial correctness PR, I scoped the performance upgrade into a separate architectural follow-up. I replaced the linear scan with a static 256-entry lookup table (a direct-addressed array). This shifted the character validation from an O(N) linear search to an O(1) constant-time memory lookup via pointer arithmetic. The benchmarks on a 1MB payload: 🔴 Before (Unsafe): ~2832 ms 🟡 Intermediate (Strict Validation, but linear): ~4302 ms 🟢 Final (Strict Validation + O(1) Lookup): ~1126 ms Massive thanks to Kouhei Sutou and Dmitry C. for the feedbacks and for helping me get this optimization across the finish line. PR Link : https://lnkd.in/gjpM5ey9 #Cpp #Apache #ApacheArrow #SystemsEngineering #DataInfrastructure #OpenSource
21 Comments
Like Comment
To view or add a comment, sign in
Seetha Srikanth
4d
Report this post
DMA Bugs: When Memory Gets Corrupted Without Your Code Touching It “The scary part about some memory corruptions? Your code never wrote there.” This is where DMA-related bugs become painful. Devices can write directly into memory. CPU is not involved in every byte transfer. That is the performance advantage. And also the debugging nightmare. 🧠 What actually happens Driver prepares DMA buffer Hardware writes into mapped region If: * mapping wrong * length wrong * stale descriptor reused * sync issue exists Hardware may overwrite memory silently. No normal function call trace will show this. ⚡ Why this feels random Symptoms appear far away from root cause: * random page faults * strange kernel panics * corrupted structures * impossible pointer values Crash happens later. Corruption happened earlier. 🔥 This is why DMA bugs waste days Because engineers debug: where it crashed. Not: where memory first got poisoned. 💡 What changed how I think Whenever corruption looks “nonsensical,” I ask: “Can hardware write here?” That question matters. 🧠 One line to remember: “Not every memory write comes from software.” #Linux #DMA #KernelDebugging #DeviceDrivers #EmbeddedSystems #SystemDesign #LowLevelProgramming #FirmwareDebugging
Like Comment
To view or add a comment, sign in
Shahjahan Ali
1w Edited
Report this post
🚨 Production Issue at 2 AM: System looks fine… but new processes are failing! CPU ✅ Memory ✅ Disk ✅ But still… something is wrong. 👉 The hidden culprit? Zombie Processes 🧟♂️ --- 🧟♂️ What is a Zombie Process? A Zombie Process is a process that has: ✔ Finished execution ❌ But still exists in the process table Why? 👉 Because the parent process didn’t collect its exit status using "wait()" In Linux: - Every process = Parent + Child relationship - When child exits → parent must read status - If not → child becomes Zombie --- ⚠️ Why Should You Care? Linux has a finite PID table — 32,768 PIDs by default (check yours: cat /proc/sys/kernel/pid_max). - ❌ Do NOT consume CPU or Memory - ❗ BUT occupy PID slots 👉 If too many zombies: - System hits PID limit - New processes FAIL to start - Production impact 🚨 --- ⚡ Linux Signals (Your Control System) Signals = Way to communicate with processes Think of them as commands sent to running processes 🔑 Common Signals: - "SIGTERM (15)" → Graceful shutdown (recommended) - "SIGKILL (9)" → Force kill (no cleanup) - "SIGSTOP" → Pause process - "SIGCONT" → Resume process 👉 Signals help you control lifecycle of processes --- 🔗 Connection: Zombies & Signals 💡 Important truth: 👉 You cannot kill a Zombie directly (it’s already dead) ✔ Solution: - Fix or kill the parent process - Parent should call "wait()" Or: - Restart parent service - System reassigns zombie to "init/systemd", which cleans it --- 🛠️ Practical Commands 🔍 Find zombies: ps aux | grep Z Or: top (look for Z state) 📌 Check parent PID: ps -el | grep Z 🔪 Kill parent: kill -15 <PPID> kill -9 <PPID> # if stubborn --- 🔧 Real-World Scenario We had a backend service spawning child processes for jobs. 👉 Bug: Developer forgot "wait()" Result: - Thousands of zombie processes - PID exhaustion - New API workers failed to spawn 💥 Production outage Fix: - Patched code to handle "wait()" - Restarted service - Zombies cleared automatically --- 💡 Pro Tip 👉 Always monitor: - Process states ("Z", "D", "R", "S") - PID usage ("cat /proc/sys/kernel/pid_max") 👉 In automation: - Ensure proper child handling - Use process supervisors (systemd, supervisord) #Linux #DevOps #SRE #SystemAdmin #Cloud #ProductionIssues HCLTech Abhishek Veeramalla Oracle Linux
Like Comment
To view or add a comment, sign in

1,015 followers

10 Posts

View Profile Follow

Thread Pool Starvation: When CPU is 20% and Memory is Fine

More Relevant Posts

Explore content categories