Built CacheCore in C++ with Dynamic TTL Eviction

3d Edited

Built CacheCore — an in-memory key-value store in C++. One thing that bugged me was the TTL eviction thread was sleeping for a fixed 60 seconds between checks. It worked, but it felt wasteful. Since the next key to expire is always at the top of the min-heap, I switched it to calculate the exact sleep duration dynamically. Fewer wakeups, no unnecessary work. The rabbit holes are the best part of this kind of project. Debugging lock contention led me down into MESI cache coherence — how L1/L2 caches across cores stay consistent when threads are fighting over the same mutex. Didn't plan to go there. Curiosity just does that. What's inside: LRU via doubly linked list + unordered_map (O(1)), TTL via min-heap with lazy deletion, custom RESP protocol parser, stateless multi-DB routing. Benchmarked at 500K ops, ~6,260 ops/sec, p99 6.3ms, zero deadlocks. Repo: https://lnkd.in/gwsxQEHk If you've worked on storage engines or low-level concurrency, feedback is welcome. #cpp #systemsprogramming #backend #lowlevelprogramming

To view or add a comment, sign in

More Relevant Posts

Yahav Gabay
2w
Report this post
In C++, traditional error handling can disrupt the most optimized hot paths. The hidden costs of Exceptions are often too high for performance-critical systems. 🔹 The Stack Unwinding Overhead: When an exception is thrown, the CPU must stop its current execution to search for a matching catch block. This process involves complex logic that is impossible to predict accurately, leading to significant latency spikes (Jitter). 🔹 The Binary Bloat: Supporting exceptions requires the compiler to generate extra data structures (exception tables) to track object lifetimes. This increases the binary size and can lead to less efficient Instruction Cache usage, even if an exception is never actually thrown. 🔹 The Alternative: Use patterns like std::expected (C++23) or simple return codes. This keeps the control flow explicit, the stack clean, and the execution time predictable. #CPP #LowLatency #SystemsProgramming #SoftwareEngineering #PerformanceOptimization #HFT #CodingStandard #ModernCpp
31 Comments
Like Comment
To view or add a comment, sign in
Bhavanth Kumar KR
3w
Report this post
🚨 Your application is slow… but CPU and DB look fine. Hidden culprit? Frequent Garbage Collection (GC). --- 💡 What’s happening behind the scenes GC is supposed to help by cleaning memory. But when it runs too often: • CPU is consumed by GC threads • Application threads pause (Stop-The-World) • Requests get delayed 👉 Result: slow APIs and poor user experience --- ⚠️ Signs of frequent GC issues Watch for: • Sudden response time spikes • High CPU with no clear reason • Throughput drops under load • Application “freezes” for milliseconds/seconds --- 🔍 When GC becomes a problem ❌ Too many GC cycles ❌ Frequent Full GC ❌ Long pause times ✅ Healthy behavior: • Minor GC → frequent but fast • Full GC → rare --- 🧠 Common root causes • Small heap size • High object creation rate • Memory leaks • Inefficient coding patterns --- 🛠️ How to detect it Check: • GC logs (frequency & pause time) • Heap usage trends • STW pause duration 👉 Correlate GC activity with response time spikes --- 🎯 Key takeaway GC is normal. 👉 But frequent GC = performance killer If your app pauses randomly… GC is often the hidden reason. --- 💬 Have you ever debugged a performance issue that turned out to be GC-related? #Java #JVM #GarbageCollection #PerformanceTesting #BackendEngineering #Loadtesting
1 Comment
Like Comment
To view or add a comment, sign in
David Cermak

Software developer | Embedded systems | IoT | Natural language coder
3w
Report this post
Constant-time implementation🤔? Your compiler might break it. An interesting vuln popped up last week: CVE-2025-66442 in Mbed TLS (compiler-induced constant-time violation) 👉https://lnkd.in/dEqcZ-nk And not for the first time, some links in the comments (the recent wolfSSL patch and the "clangover" work around ML-KEM, ...) I was playing with a minimal repro on Godbolt, so we can literally see: `mask & a | ~mask & b` → turned into a branch on RISC-V 👉https://lnkd.in/dmyjBHWz A good reminder for anyone doing embedded/security work: Check also the ASM, not just sources and treat compiler flags as a part of the model.
3 Comments
Like Comment
To view or add a comment, sign in
Aaditya Srinivasan
2w Edited
Report this post
String validation in C++ is fast, until you have to do it millions of times per second across massive datasets. While engineering a correctness fix for a silent truncation bug in Apache Arrow’s base64_decode utility, an automated review flagged a bottleneck: the function was using a linear search (std::string::find) to validate every single byte. For a 1MB payload, that meant potentially tens of millions of redundant CPU operations. Rather than bloating the initial correctness PR, I scoped the performance upgrade into a separate architectural follow-up. I replaced the linear scan with a static 256-entry lookup table (a direct-addressed array). This shifted the character validation from an O(N) linear search to an O(1) constant-time memory lookup via pointer arithmetic. The benchmarks on a 1MB payload: 🔴 Before (Unsafe): ~2832 ms 🟡 Intermediate (Strict Validation, but linear): ~4302 ms 🟢 Final (Strict Validation + O(1) Lookup): ~1126 ms Massive thanks to Kouhei Sutou and Dmitry C. for the feedbacks and for helping me get this optimization across the finish line. PR Link : https://lnkd.in/gjpM5ey9 #Cpp #Apache #ApacheArrow #SystemsEngineering #DataInfrastructure #OpenSource
21 Comments
Like Comment
To view or add a comment, sign in
Gaurang Gupta
3w
Report this post
Improved my multithreaded, cache-blocked matrix multiplication engine in C++ with deeper performance optimizations. Progression: • Started with naive row × column multiplication • Implemented block-based multiplication for better cache locality • Switched from task queues + mutexes → static partitioning (lock-free parallelism) • Added dynamic thread scaling based on workload Benchmarks vs optimized implementation: 256×256 Library: 0.0153s Ideal: 0.00527s (~2.9× slower) 512×512 Library: 0.102s Ideal: 0.0218s (~4.68× slower) 800×800 Library: 0.414s Ideal: 0.0620s (~6.67× slower) Key takeaway: Performance isn’t just about multithreading — memory access patterns and cache locality dominate. Even with parallelism, inefficient memory access (like column-wise traversal) can bottleneck performance. Currently working on: • Improving cache efficiency (transpose + loop ordering) • Reducing memory stalls • Closing the gap toward high-performance (BLAS-style) implementations #cpp #multithreading #performance #systemsprogramming #hpc
2 Comments
Like Comment
To view or add a comment, sign in
Nikolay Sivko
6d
Report this post
Continuous profiling usually means CPU profiles, especially in the #ebpf world. But for many incidents, memory profiling is just as useful. When an application is under memory pressure, infrastructure metrics alone are not enough. You need to dig into the code level and understand which functions allocate memory and why. Coroot can now collect heap profiles for Go applications with zero configuration. No code changes, no redeploys, and surprisingly, no eBPF involved :) I described how we implemented it in the Coroot blog: https://lnkd.in/dxwXjxmV

Zero-config Go heap profiling | Coroot Blog coroot.com

2 Comments
Like Comment
To view or add a comment, sign in
Felix Söderström
3w
Report this post
So appearently my prompt cache was cut from 1hr to 5mins. thanks Anthropic Claude Code users report 15-53% cost inflation on Max plans. Sort of a deal for someone assuming cache hits between turns 🙃 i build and maintain a framework on top of Claude Code and this kind of silent change is the worst kind. Not a breaking change in the traditional sense, just a performance assumption that quietly stopped being true. Caching behavior should be in the SLA, or at least the docs. If the TTL can move without notice, every cost estimate built on it is soft. The tooling layer will need better observability into cache behavior, not just token counts.
Like Comment
To view or add a comment, sign in
Sagar Yadav
4w
Report this post
p99 latency hit 2 seconds. CPU was at 55%. Nothing looked broken. We fixed it in 30 minutes — not by guessing, but by following a simple process. Thread dump → dependencies → connection pools → GC → validate. Most production issues aren’t complex. They’re just hidden behind the wrong signals. Wrote the exact step-by-step we use 👇 https://lnkd.in/g_x-Ffgp #SoftwareEngineering #BackendEngineering #SystemDesign #DistributedSystems #PerformanceEngineering #Debugging #Java #Microservices #Scalability #TechLeadership #SRE #DevOps #Programming #Coding #Engineering

We Debugged a Production Latency Issue in 30 Minutes. Here’s Exactly How. blog.stackademic.com
Like Comment
To view or add a comment, sign in
Codewarbler

422 followers
6d
Report this post
good. a practical, no-nonsense how-to for when you open a blob and ghidra stares back at you. set the flash base (0x08000000), add sram/periph segments, then let SVD-Loader do the heavy lifting – saves you from red-address hell. walkthrough shows the real pain: mapping memory, not guessing strings. screenshots actually match steps. useful when you’re elbows-deep in ARM firmware and tired of fairy-tale tooling.

Analyzing bare metal firmware binaries in Ghidra blog.attify.com
Like Comment
To view or add a comment, sign in
Sujeet Kumar Singh
3w
Report this post
.NET 10 introduces escape analysis-driven stack allocation: Non-escaping arrays → stack allocated Possible scalar replacement → zero allocation Significant reduction in Gen0 GC This shifts heap allocation from default → fallback for short-lived objects. 📈 Result: Better throughput, lower latency, improved cache locality. #dotnet #jit

NET 10 JIT Breakthrough: Escape Analysis & Stack Allocation for High-Performance Apps medium.com
Like Comment
To view or add a comment, sign in

4,082 followers

5 Posts

View Profile Follow

Built CacheCore in C++ with Dynamic TTL Eviction

More Relevant Posts

Explore content categories