Distributed System Design for High Availability with Redis and Spring Boot

I didn’t just build a URL shortener; I built a Distributed System capable of handling millions of requests. 🚀 Most people use a simple Database Auto-Increment for IDs. But what happens when you have 10 servers running at once? You get ID collisions. In my latest project, EspressoLinks, I tackled the challenges of System Design and High Availability: 🔹 Distributed ID Generation: Used Redis Atomic Counters (INCR) to ensure unique short-keys across a cluster of 3 Spring Boot instances. 🔹 Latency Optimization: Implemented a Cache-Aside pattern with Redis, dropping redirection latency from 85ms to 4ms (a 21x improvement!). 🔹 Load Balancing: Configured an Nginx Load Balancer to distribute traffic using a Round-Robin algorithm. 🔹 Resilience: Built a "Failover" mechanism—if Redis goes down, the system gracefully falls back to PostgreSQL without crashing. This project taught me how to move beyond "it works on my machine" to "it works at scale." 🛠 Tech Stack: Java 17, Spring Boot 3, Redis, PostgreSQL, Nginx, Docker, and Bucket4j. Check out the architecture and source code here: https://lnkd.in/gDTHaiGY #Java #SpringBoot #SystemDesign #Redis #Docker #BackendDevelopment #SoftwareEngineering #CloudComputing

2 Comments

Pavan Kumar 3w

Good design - however redis can be the single point of failure to the entire system. Also it’s add infrastructure cost as welll.

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Mukul Khanna
3d
Report this post
In-memory rate limiting is a lie. Here's what I built instead. Most rate limiter tutorials are broken by design. They store bucket state in a HashMap inside the application. That works fine — until you scale to two instances. Now user A hits pod-1 and gets 60 req/min. Then hits pod-2 and gets another 60 req/min. Your "100 req/min limit" just became 200. The fix: shared bucket state in Redis, so all pods enforce the same limit against the same counter. I built a distributed rate limiter in Spring Boot + Bucket4j + Redis that solves exactly this. → 5 strategies (token bucket, burst, strict, sliding window, daily quota) → AOP-driven — @RateLimit on any method, zero boilerplate in business logic → Prometheus metrics per strategy, real-time Retry-After headers → Testcontainers integration tests against a real Redis 7.2 container This week I'll walk through each layer. Starting with why I picked Bucket4j over rolling my own. What rate limiting mistake have you seen break in production? 👇 #Java #SpringBoot #Redis #BackendEngineering #SystemDesign
Like Comment
To view or add a comment, sign in
Sachin Singh
2w Edited
Report this post
🚀 ShieldGate – Day 5 & Day 6 Update Continuing to build my API Rate Limiter & Monitoring System (MERN + Redis) — now moving towards a more scalable and production-ready backend. ⚡ Day 5: Distributed Caching with Redis Replaced in-memory cache with Redis-based distributed caching ✔ Cache stored using SET + EXPIRE (TTL) ✔ Eliminated dependency on single server memory ✔ Improved scalability across multiple instances ✔ Reduced repeated API calls using Cache HIT / MISS strategy 📊 Result: Faster responses + production-ready caching layer ⚡ 🔑 Day 6: API Key-Based Rate Limiting + Logging System Upgraded rate limiter from IP-based → user-based (multi-tenant system) 🔹 API Key System ✔ Introduced x-api-key header ✔ Dynamic rate limits: Free users → 10 requests/min Premium users → 50 requests/min ✔ Redis key structure: rate_limit:apiKey:ip 📊 Logging & Monitoring Built a request logging system using MongoDB ✔ Logs every request (allowed + blocked) ✔ Tracks: IP API Key Status (allowed/blocked) Timestamp 📈 This enables: Traffic analysis Attack detection Future dashboard analytics 🧠 Key Learnings Distributed cache > in-memory cache for scalability API key design enables multi-user systems Logging is essential for observability and monitoring Backend systems must be both efficient and measurable 🛠️ Tech Stack Node.js • Express • Redis • MongoDB • Docker • Axios 🔗 GitHub Repository 👉 https://lnkd.in/gcEpiMsp . . . . . #BackendDevelopment #SystemDesign #Redis #NodeJS #MERN #SoftwareEngineering #100DaysOfCode
8 Comments
Like Comment
To view or add a comment, sign in
Nannuri Manoj
3w
Report this post
Engineering for Scale: Why I implemented Redis in my Project Even though my current project doesn't have thousands of concurrent users yet, I wanted to tackle a very real-world problem: - API Latency and Database Load. I noticed that routes like GET /projects/:id/tasks require heavy SQL joins and filtering. In a production environment, hitting the DB for the same data every few seconds is a bottleneck waiting to happen. To see how tech companies solve this, I decided to implement Redis as a caching layer. Solving Real-World Challenges: I didn't just "add a cache", I treated this as a deep dive into distributed systems: - Read-Aside Pattern: I built a "withCache" utility that prioritizes microsecond Redis hits but falls back to the database if the cache is empty or the Redis server is unreachable (Graceful Degradation). - Auth-First Approach: One crucial takeaway was ensuring authentication always happens before checking the cache. Speed should never come at the cost of security. - Filter-Aware Caching: I learned how to design dynamic cache keys that encode filters like status or priority. Without this, the system would accidentally serve "To-Do" tasks to a user asking for "Done" tasks. The "Aha!" Moments and Errors: Implementation taught me things a tutorial never could: - The ACL Trap: I learned the hard way that Redis acl.conf files don't support comments. A single "#" at the top caused a startup crash, a small detail that taught me a lot about production-ready configurations. - Invalidation Logic: I had to ensure that cache keys are deleted after a successful DB write. If you do it before, you open a race condition where the cache might be re-populated with old data. The Goal: - For me, this wasn't just about making the API faster, It was about learning how to design systems that balance performance, consistency, and failure handling. Link for the full implementation(Github) and Documentation (inside /docs folder) is in the comments below 👇 👇 , don't forget to check it out #SoftwareEngineering #Redis #BackendDevelopment #SystemArchitecture #Postgres #NodeJS #WebPerformance

1 Comment
Like Comment
To view or add a comment, sign in
Mohamed Salim Agil
2w
Report this post
Part 2 of my Redis Journey: From 502 Errors to Sub-Millisecond Speeds ⚡ Yesterday, I shared how I crashed my SaaS app, TaskZilla, while learning Redis to build a background worker. Today, I took that same Redis container and used it to solve my next bottleneck: Database I/O. Every time a user loaded their Kanban board, PostgreSQL had to fetch 50+ tasks and resolve all the relationships. It worked, but it wasn't scalable. The Solution: I integrated Flask-Caching to bypass the database entirely. Now, when a user loads their dashboard, Redis intercepts the request and hands the JSON directly from RAM. My total API round-trip time (including Cloudflare routing!) dropped to just ~30ms. 🏎️💨 I also learned a valuable lesson in cache invalidation (Cache Busting). I had to write custom logic to delete specific user snapshots from Redis whenever they created, updated, or deleted a task, ensuring they never see stale data. The Real-World Hiccup: Of course, it wasn't perfectly smooth. When I pushed my code, my GitHub Actions CI/CD pipeline failed immediately. Why? Because I had manually tweaked my docker-compose.yaml on my production server yesterday to fix a network bug. Git tried to pull the new changes, saw the manual edits, and aborted to protect the server. A quick SSH into the server and a git reset --hard origin/main wiped the manual edits, synced everything to the repository (the single source of truth!), and got the pipeline glowing green again. ✅ Next up on the roadmap: Real-time WebSockets so users don't even have to refresh the page. What is your go-to strategy for cache invalidation? Do you prefer time-based TTLs or event-driven cache busting? Let me know! 👇 #SoftwareEngineering #LearningInPublic #Python #Flask #Redis #Docker #DevOps #SystemArchitecture
Like Comment
To view or add a comment, sign in
Venkatasai Kanchi
3w Edited
Report this post
⚡ 𝗛𝗼𝘄 𝗥𝗲𝗱𝗶𝘀 𝗛𝗮𝗻𝗱𝗹𝗲𝘀 𝗠𝗶𝗹𝗹𝗶𝗼𝗻𝘀 𝗼𝗳 𝗥𝗲𝗾𝘂𝗲𝘀𝘁𝘀... 𝗪𝗶𝘁𝗵 𝗝𝘂𝘀𝘁 𝗢𝗻𝗲 𝗧𝗵𝗿𝗲𝗮𝗱 At first glance, this sounds impossible. How can a single-threaded system like Redis handle massive traffic without slowing down? The answer lies in a powerful concept: 𝗜/𝗢 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅𝗶𝗻𝗴. --- 🧠 𝗧𝗵𝗲 𝗨𝘀𝘂𝗮𝗹 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 In traditional systems: * One request = one thread * 10,000 requests = 10,000 threads 💣 𝗥𝗲𝘀𝘂𝗹𝘁: * High memory usage * Context switching overhead * Poor scalability --- ⚡ 𝗪𝗵𝗮𝘁 𝗥𝗲𝗱𝗶𝘀 𝗗𝗼𝗲𝘀 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁𝗹𝘆 👉 Redis uses 𝗜/𝗢 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅𝗶𝗻𝗴 Instead of creating threads for each connection: * It uses a 𝘀𝗶𝗻𝗴𝗹𝗲 𝘁𝗵𝗿𝗲𝗮𝗱 * Monitors 𝘁𝗵𝗼𝘂𝘀𝗮𝗻𝗱𝘀 𝗼𝗳 𝗰𝗹𝗶𝗲𝗻𝘁 𝘀𝗼𝗰𝗸𝗲𝘁𝘀 * Processes only the ones that are ready --- 🔄 𝗛𝗼𝘄 𝗜𝘁 𝗪𝗼𝗿𝗸𝘀 1. Clients send requests 2. Redis registers all connections with OS (via `epoll`) 3. It waits for events using I/O multiplexing 4. Only 𝗮𝗰𝘁𝗶𝘃𝗲/𝗿𝗲𝗮𝗱𝘆 𝗰𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀 are processed 👉 No wasted CPU 👉 No thread explosion --- 🎯 𝗞𝗲𝘆 𝗜𝗻𝘀𝗶𝗴𝗵𝘁 > Redis is not slow because it’s single-threaded… > It’s fast because it avoids unnecessary work. --- ⚔️ 𝗪𝗵𝘆 𝗧𝗵𝗶𝘀 𝗕𝗲𝗮𝘁𝘀 𝗠𝘂𝗹𝘁𝗶𝘁𝗵𝗿𝗲𝗮𝗱𝗶𝗻𝗴 ❌ No context switching ❌ No locks (no mutex headaches) ❌ No thread management overhead ✅ Predictable performance ✅ High throughput ✅ Simpler design --- 💡 𝗥𝗲𝗮𝗹 𝗜𝗺𝗽𝗮𝗰𝘁 This is why Redis can: * Handle 𝗺𝗶𝗹𝗹𝗶𝗼𝗻𝘀 𝗼𝗳 𝗿𝗲𝗾𝘂𝗲𝘀𝘁𝘀/𝘀𝗲𝗰 * Serve as a cache, queue, and real-time engine * Power high-scale systems effortlessly --- 🔥 𝗕𝘂𝘁 𝗧𝗵𝗲𝗿𝗲’𝘀 𝗮 𝗖𝗮𝘁𝗰𝗵 * Long-running operations can block the event loop * CPU-heavy tasks can slow everything down 👉 That’s why Redis workloads must be: * Fast * Non-blocking * Lightweight --- 🎯 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆 > Scalability is not always about adding more threads… > Sometimes it’s about 𝗱𝗼𝗶𝗻𝗴 𝗹𝗲𝘀𝘀 𝘄𝗼𝗿𝗸, 𝘀𝗺𝗮𝗿𝘁𝗲𝗿. --- #SystemDesign #Redis #BackendEngineering #DistributedSystems #Scalability #Java
1 Comment
Like Comment
To view or add a comment, sign in
Om Singh
2w
Report this post
Spent the last few weeks going really deep into PostgreSQL Global Development Group internals Learned how MVCC actually handles transactions under the hood and how WAL ensures data never gets lost even when things crash. The query planner and vacuum process completely changed how I think about writing queries. Then went through NodeJS Developer internals how libuv actually drives the event loop and what really happens when you write async code. The difference between microtask and macrotask queues finally clicked for me at a deeper level. Now starting Redis internals. Excited to understand how it handles memory encoding and why the persistence mechanisms are designed the way they are. Honestly going deep into how these tools actually work has made me a better engineer than any tutorial ever could If you have good resources on Redis internals drop them below 👇 #PostgreSQL #NodeJS #Redis #BackendDevelopment #LearningInPublic
Like Comment
To view or add a comment, sign in
Ganesh Guntanala
2w
Report this post
Spent 2 days debugging slow API response times. Turned out we were hitting the database for the same data on every single request. User profile. Permissions. Config settings. All fetched fresh every time. The fix was embarrassingly simple. Redis cache with a 5 minute TTL. Before: 850ms average response time After: 180ms average response time 78% faster. No code refactor. No architecture change. Just stopped asking the database questions it already answered. Sometimes the bottleneck is not your code. It is how many times you ask the same question. What is the simplest fix that gave you the biggest performance win? #Java #Redis #Performance #Backend #SpringBoot

1 Comment
Like Comment
To view or add a comment, sign in
Sai Pavan Ganesh Mallampalli
1w
Report this post
I built this project in four layers. The first one has zero dependencies. That was intentional. Here's the structure: RateLimiter.Core → pure C#, no NuGet packages RateLimiter.Redis → depends on Core RateLimiter.Middleware → depends on Core RateLimiter.Api → depends on all three Core has nothing. No Redis. No HTTP. No DI container. No external packages. Just interfaces, models, enums, and the algorithm. This means one thing practically: The sliding window algorithm is unit testable with zero infrastructure. No Redis running. No HTTP context. No mocks for external dependencies. Just pure C# objects and assertions. The dependency direction is the point. Redis references Core. Core never references Redis. Middleware references Core. Core never references Middleware. Dependencies always flow inward — never outward. This is not just clean architecture theory. It has a real practical consequence. If tomorrow I want to swap Redis for SQL Server as the rate limit store — I implement one interface in a new project. Core doesn't change. Middleware doesn't change. Api doesn't change. Just the infrastructure layer gets replaced. That's what loose coupling actually means in practice. Not a principle on a slide. A decision you feel when you need to change something. Two calls in Program.cs — AddRateLimiting() and UseRateLimiting(). Everything else — Redis connection, strategy selection, window size — lives in appsettings.json. Architecture is not about making things complex. It's about making the right things easy to change. What's the most painful tight coupling you've had to untangle in a codebase? 👇 Part 5 of my rate limiter build series — follow for more. #dotnet #csharp #cleanarchitecture #softwaredesign #backend #aspnetcore #softwaredevelopment
Like Comment
To view or add a comment, sign in
Dinar Shafikov
3w
Report this post
Hi everyone! Lets talk about race conditions testing 🚀 Taming Race Conditions: A Modern Approach with NATS, Redis, PostgreSQL & Testcontainers Race conditions are the silent bugs of distributed systems. They hide in plain sight, often appearing only under high load or specific timing scenarios. Recently, I’ve been diving deep into testing these elusive issues in a stack combining NATS for messaging, Redis for caching/locking, and PostgreSQL as the source of truth. The challenge? Reproducing concurrency issues reliably in a local environment without spinning up complex infrastructure manually. Enter Testcontainers. 🐳 By orchestrating ephemeral, real instances of NATS, Redis, and PostgreSQL directly within our test suite, we can: ✅ Simulate high-concurrency scenarios with precision. ✅ Test actual network latency and container startup orders. ✅ Ensure our distributed locks (Redis) and transaction isolation levels (PostgreSQL) hold up under fire. ✅ Validate message ordering and at-least-once delivery guarantees in NATS. Key takeaways from our journey: Realism matters: Mocks often fail to capture the subtle timing nuances of real databases and message brokers. Testcontainers bridge this gap. Deterministic chaos: We use controlled delays and parallel workers in tests to force race conditions intentionally, verifying our idempotency and locking strategies. CI/CD Integration: These tests run in our pipeline, ensuring no regression slips through when we tweak our concurrency logic. Testing distributed systems is hard, but with the right tools, we can make race conditions visible before they hit production. Has anyone else tackled race condition testing in a similar stack? I’d love to hear your war stories and strategies! 👇 #SoftwareTesting #DistributedSystems #NATS #Redis #PostgreSQL #Testcontainers #Go #TypeScript #DevOps #QualityAssurance #Engineering
Like Comment
To view or add a comment, sign in
Devender Narra
2d
Report this post
I reduced API latency by 35% using Redis. But the interesting part wasn't the caching itself — it was the decisions around it. Here's what I actually learned: 𝟭. Choosing what to cache is harder than how to cache Not every endpoint deserves a cache. I only cached data that was read frequently and changed rarely. Wrong caching = stale data in production. 𝟮. Cache invalidation is the real problem Redis TTL handles expiry. But what if data changes before TTL expires? I had to think about invalidation strategy before writing a single line of caching code. 𝟯. Eviction policy matters more than memory size I used allkeys-lru — so when Redis memory filled up, least recently used keys were evicted automatically. Without this, Redis throws errors under memory pressure. 𝟰. Redis is not just a cache Same Redis instance in my system served three jobs: → Cache layer (API response caching) → Message broker (Celery async job queue) → Session store (user session data) One tool. Three completely different responsibilities. Result: 35% latency reduction on critical endpoints — without touching a single database query. Stack: Redis · Django · Celery #Redis #BackendEngineering #Python #SystemDesign #Django #Celery #SES #async
Like Comment
To view or add a comment, sign in

906 followers

6 Posts

View Profile Connect

Distributed System Design for High Availability with Redis and Spring Boot

More Relevant Posts

Explore content categories