Dead Letter Queue: Preventing Infinite Retry Loops

Dead Letter Queue (When Messages Keep Failing Silently) --- Built:- A background system processing messages from a queue (orders, emails, events). --- Problem I faced:- Everything worked fine… until some messages started failing. Then: Same message kept retrying Logs kept growing Queue got slower Some messages were never processed successfully Worse part? Failures were getting buried in retries. --- What was really happening:- Messages were failing repeatedly with no exit path. Every retry pushed them back into the queue. They kept coming back… again and again. --System was stuck in a loop. --- How I fixed it:- Introduced a Dead Letter Queue (DLQ). Instead of retrying forever: Set a max retry limit After limit → move message to DLQ Logged and monitored failed messages Added manual or automated reprocessing Now: Queue stays clean Failures are isolated No infinite retry loops --- What I learned:- Not every message should be retried forever. Some failures need attention — not repetition. --- Simple mental model:- Think of DLQ like a “quarantine zone”. Healthy messages → processed normally Problematic messages → isolated for inspection --- Carousel Breakdown:- Slide 1 → Messages failing repeatedly Slide 2 → Infinite retries Slide 3 → Queue slowdown Slide 4 → Introduce DLQ Slide 5 → Move failed messages Slide 6 → Inspect & reprocess --- Question In your system, what happens to messages that keep failing… do they stop somewhere, or retry forever? #Java #SpringBoot #Programming #SoftwareDevelopment #Cloud #AI #Coding #Learning #Tech #Technology #WebDevelopment #Microservices #API #Database #SpringFramework #Hibernate #MySQL #BackendDevelopment #CareerGrowth #ProfessionalDevelopment #RDBMS #PostgreSQL #backend

To view or add a comment, sign in

More Relevant Posts

Umar Ashraf Lone
5d
Report this post
Partial Failure (When Only Part of Your System Breaks) --- Built:- A service that aggregates data from multiple services: User service Order service Recommendation service All combined into one response. --- Problem I faced:- Everything worked fine… until one dependency started failing. Then: Entire API failed Even though other services were working Users saw errors for everything One small failure took down the whole response. --- What was really happening:- This was a partial failure. Only one service failed… but the system treated it like a full failure. * No isolation * No fallback * No graceful handling --- How I fixed it:- Instead of failing everything: Added fallback responses for optional services Marked some data as non-critical Used timeouts + circuit breakers Returned partial responses where possible Now: Core data always loads Optional features degrade gracefully System stays usable even during failures --- What I learned:- In distributed systems, failure is normal. The goal is not to avoid failure. It’s to limit its impact. --- Simple mental model:- If one feature breaks, the whole app shouldn’t feel broken. --- Carousel Breakdown :- Slide 1 → One service fails Slide 2 → Entire API fails Slide 3 → Identify partial failure Slide 4 → Add fallbacks Slide 5 → Return partial response Slide 6 → System stays usable --- Question::- If one dependency in your system goes down, does your API fail completely… or degrade gracefully? #Java #SpringBoot #Programming #SoftwareDevelopment #Cloud #AI #Coding #Learning #Tech #Technology #WebDevelopment #Microservices #API #Database #SpringFramework #Hibernate #MySQL #BackendDevelopment #CareerGrowth #ProfessionalDevelopment #RDBMS #PostgreSQL #backend
Like Comment
To view or add a comment, sign in
Umar Ashraf Lone
2w
Report this post
Sometimes your system isn’t slow because of heavy logic. It’s slow because it’s waiting. Waiting for: another service a database an external API And while it waits, threads just sit there doing nothing. --- This is where Async Processing helps The idea is simple: Don’t block. Do the work later. --- What this looks like Instead of doing everything in one request: User places an order System saves order immediately Email is sent later Notification is processed in background The user doesn’t wait for everything. --- How it’s usually done Background jobs Message queues (Kafka, RabbitMQ) @Async in Spring Boot You move non-critical work out of the main flow. --- Why this matters Without async: Requests take longer Threads stay blocked System struggles under load With async: Faster response times Better scalability Smoother user experience --- Real-world example When you upload a file: You don’t wait for processing You get a response quickly Processing happens in background --- Trade-offs Async adds complexity: Harder to debug Requires retry handling Failures are not immediate --- Simple takeaway Not everything needs to happen right now. --- If your system is slow, how much of that work actually needs to be done synchronously? #Java #SpringBoot #Programming #SoftwareDevelopment #Cloud #AI #Coding #Learning #Tech #Technology #WebDevelopment #Microservices #API #Database #SpringFramework #Hibernate #MySQL #BackendDevelopment #CareerGrowth #ProfessionalDevelopment #RDBMS #PostgreSQL #backend
Like Comment
To view or add a comment, sign in
Aditya Prasad
4w
Report this post
Ever had a system challenge you not because of faulty logic, but because of the way it was built to handle scale? I recently ran into a few interesting issues while developing a new data sourcing job, I was working with a cursor-based paginated API that returned data in batches of 500 records per request and the final response was huge and the learnings were too valuable not to share. 🔴𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝟭: 𝗢𝘂𝘁𝗢𝗳𝗠𝗲𝗺𝗼𝗿𝘆𝗘𝗿𝗿𝗼𝗿 𝗳𝗿𝗼𝗺 𝗦𝘁𝗿𝗶𝗻𝗴𝗕𝘂𝗶𝗹𝗱𝗲𝗿 The application crashed with: java.lang.OutOfMemoryError at AbstractStringBuilder.hugeCapacity() 👉 Root cause: After getting the final response there was a need of JSON transformation by appending "\n" after every obejct, which resulted in (~GBs) in memory. ✅ Fix: Switched to chunk-based processing (~1GB chunks) instead of accumulating everything in memory and then writing through azure process into lake. 🟠 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝟮: 𝗚𝗖 𝗢𝘃𝗲𝗿𝗵𝗲𝗮𝗱 𝗟𝗶𝗺𝗶𝘁 𝗘𝘅𝗰𝗲𝗲𝗱𝗲𝗱 Frequent GC pauses due to large in-memory lists holding paginated API data. 👉 Root cause: Storing entire dataset in lists before DB insertion. ✅ Fix: Inserted data in batches and cleared lists after each batch, reducing memory pressure significantly. 🔵 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝟯: 𝗣𝗿𝗲𝗺𝗮𝘁𝘂𝗿𝗲 𝗔𝗣𝗜 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗖𝗹𝗼𝘀𝘂𝗿𝗲 Facing incomplete responses due to connection drops which resulted into JSON mapping failure into our data model. 👉 Root cause: Default HTTP socket timeout too low for large payloads. ✅ Fix: Increased socket timeout to as per the requirement, ensuring full data retrieval. 💡 Key Takeaways: Never build massive objects in memory — stream or chunk your data. Always release memory proactively in long-running processes. Tune timeouts and resource configs based on real workload, not defaults. These issues reinforced an important lesson: 👉 Efficient memory and resource management is as critical as writing correct logic and even basics of CSE101 comes into picture. #systemdesign #scaling #java #api #development #cloud #datalake

2 Comments
Like Comment
To view or add a comment, sign in
Harsh Thaker
1w
Report this post
𝗗𝗲𝗲𝗽-𝗱𝗶𝘃𝗶𝗻𝗴 𝗶𝗻𝘁𝗼 𝘁𝗵𝗲 𝗘𝗹𝗮𝘀𝘁𝗶𝗰𝘀𝗲𝗮𝗿𝗰𝗵 𝗖𝗼𝗿𝗲: 𝗔𝗱𝗱𝗿𝗲𝘀𝘀𝗶𝗻𝗴 𝗙𝗶𝗲𝗹𝗱 𝗖𝗮𝗽 𝗜𝗻𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝗶𝗲𝘀 I’ve just submitted a Pull Request (#146105) to 𝗘𝗹𝗮𝘀𝘁𝗶𝗰𝘀𝗲𝗮𝗿𝗰𝗵 to address a nuanced bug(#109797) in the 𝗙𝗶𝗲𝗹𝗱 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗔𝗣𝗜. 𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: I’m proposing an update to the mapping coordination logic to strictly enforce type parameters. The goal is to ensure the API response is atomic—if you ask for a keyword, you get only keywords, with no leaky parent objects. 𝗧𝗵𝗲 𝗙𝗶𝘅 (𝗖𝘂𝗿𝗿𝗲𝗻𝘁𝗹𝘆 𝘂𝗻𝗱𝗲𝗿 𝗿𝗲𝘃𝗶𝗲𝘄): I’m proposing an update to the coordination logic. The goal is to ensure that when an alias points to multiple indices, the field type remains consistent and "unmapped" states from one index don't overshadow valid mappings in another. 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗶𝘀 𝗮 𝗳𝘂𝗻 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲: 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗖𝗼𝗼𝗿𝗱𝗶𝗻𝗮𝘁𝗶𝗼𝗻: Merging responses from multiple shards requires absolute precision. 𝗦𝗰𝗮𝗹𝗲: In a system used by millions, a "small" API inconsistency can have massive ripple effects on data integrity. It’s been a great experience digging into the TransportFieldCapsAction and seeing how the Elastic team manages such a complex Java codebase. Looking forward to the review process! 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝘁𝗵𝗲 𝗣𝗥 𝗵𝗲𝗿𝗲: https://lnkd.in/g2AnUPH6 Special thanks to the Elastic team for the great codebase. #Java #Elasticsearch #OpenSource #Backend #DistributedSystems #SoftwareEngineering #BuildInPublic
Like Comment
To view or add a comment, sign in
CHANDRASHEKHAR VP.
2w
Report this post
Part 1: Architecture & Real-World System Design Modern backend systems don’t break because of scale alone — they break due to complexity. In a recent redesign, the focus was on simplifying the handling of large, dynamic form data while improving performance, maintainability, and the developer experience. 📊 The shift: 🔹 From rigid column-based schema → flexible JSONB-based storage 🔹 From heavy raw SQL → clean ORM-driven queries 🔹 From scattered APIs → structured, minimal endpoints ⚙️ Architecture Improvements ✔️ Modular design using separate Django applications ✔️ Class-based views for reusable and maintainable logic ✔️ API structuring using Django Ninja Router ✔️ Reduced the number of APIs by consolidating responses ✔️ Strong alignment with frontend for payload and contract design 📦 Data Handling Strategy Instead of creating hundreds of columns for dynamic forms: → Stored complete form responses as JSON objects → Handled 300–500+ fields without schema changes → Simplified debugging with structured payloads → Enabled faster iteration without production risks 🔄 Processing Flow User Input → API Validation → Store JSON (status = 0) → Async Processing (Celery + Redis) → Update status = 1 → Dashboard reflects real-time updates 🚀 Outcome ✔️ Reduced schema complexity ✔️ Improved API performance ✔️ Avoided production issues caused by raw queries ✔️ Built a scalable and flexible backend system ✔️ Delivered smoother frontend-backend integration Security handled via JWT-based authentication with proper token flow. Still evolving with improvements in performance, validation, and system design. #BackendEngineering #Django #Python #SystemDesign #PostgreSQL #APIs #Celery #Redis #JWT
Like Comment
To view or add a comment, sign in
Thrilokesh Lakshmisetti
1w
Report this post
🚀 Deep Internal Flow of a REST API Call in Spring Boot 🧭 1. Entry Point — The Gatekeeper DispatcherServlet is the front controller. Every HTTP request must pass through this single door. FLOW: Client → Tomcat (Embedded Server) → DispatcherServlet 🗺️ 2. Handler Mapping — Finding the Target DispatcherServlet asks: “Who can handle this request?” It consults: * RequestMappingHandlerMapping This scans: * @RestController * @RequestMapping FLOW : DispatcherServlet → HandlerMapping → Controller Method Found ⚙️ 3. Handler Adapter — Executing the Method Once the method is found, Spring doesn’t call it directly. It uses: * RequestMappingHandlerAdapter Why? Because it handles: * Parameter binding * Validation * Conversion FLOW : HandlerMapping → HandlerAdapter → Controller Method Invocation 🧭 4. Request Flow( Forward ): Controller -> Service Layer (buisiness Logic) -> Repository Layer -> DataBase 🔄 5. Response Processing — The Return Journey Now the response travels back upward: Repository → Service → Controller → DispatcherServlet -> Tomcat -> Client. ———————————————— ⚡ Hidden Magic (Senior-Level Insights) 🧵 Thread Handling * Each request runs on a separate thread from Tomcat’s pool 🔒 Transaction Management * Managed via @Transactional * Proxy-based AOP behind the scenes 🎯 Dependency Injection * Beans wired by Spring IoC container 🧠 AOP (Cross-Cutting) * Logging, security, transactions wrapped around methods ⚡ Performance Layers * Caching (Spring Cache) * Connection pooling (HikariCP) ———————————————— 🧠 The Real Insight At junior level i thought: 👉 “API call hits controller” At senior level i observe: 👉 “A chain of abstractions collaborates through well-defined contracts under the orchestration of DispatcherServlet” #Java #SpringBoot #RestApi #FullStack #Developer #AI #ML #Foundations #Security
Like Comment
To view or add a comment, sign in
Umar Ashraf Lone
1w Edited
Report this post
Timeouts (The Small Setting That Saves Your System) --- Built:- A service calling multiple downstream APIs to fetch and aggregate data. --- Problem I faced:- Everything worked fine… until one dependency slowed down. Then suddenly: Requests started hanging Thread pool got exhausted API response time shot up Entire service became slow All because one service was taking too long. --- How I fixed it:- The issue was missing timeouts. Requests were waiting indefinitely. Fixes applied: Added strict timeouts for all external calls Used fallback responses where possible Combined with circuit breaker for failing services Monitored slow calls with proper logging Now: Slow services don’t block everything System fails fast instead of hanging Overall stability improved --- What I learned A slow dependency is sometimes worse than a failed one. At least failures are quick. Slow calls quietly kill your system. --- Question:- Do your API calls have proper timeouts… or are they waiting forever without you noticing? #Java #SpringBoot #Programming #SoftwareDevelopment #Cloud #AI #Coding #Learning #Tech #Technology #WebDevelopment #Microservices #API #Database #SpringFramework #Hibernate #MySQL #BackendDevelopment #CareerGrowth #ProfessionalDevelopment #RDBMS #PostgreSQL #backend
Like Comment
To view or add a comment, sign in
Jibril Olajuwon
1mo
Report this post
Building scalable systems is challenging, but testing those limits is the best way to learn. Over the past couple of weeks during my SIWES, I decided to dive deep into systems architecture, security, and high-concurrency environments. To bridge the gap between my current Python/Django expertise and my company’s C#/ASP.NET stack, I built a high-concurrency event ticketing backend -essentially a mini-Ticketmaster! 🎟️ Here is what I engineered: 🔹 Concurrency Safety: Solved the dreaded "double-booking" race condition using PostgreSQL pessimistic locking (select_for_update()). 🔹 Read-Heavy Optimization: Implemented Redis caching to shield the DB from traffic spikes, paired with strict cache invalidation to keep data accurate. 🔹 Asynchronous Processing: Decoupled slow processes using Celery & message brokers so the API stays lightning-fast while emails queue in the background. 🔹 API Defense: Built strict throttling/rate limiting to block scalper bots from spamming the purchase endpoints. 🔹 Containerization: Orchestrated the entire multi-server architecture with Docker and docker-compose. I also spent time deploying a CRUD blog app to Azure and building a new portfolio using HTML/CSS/Python. Next up: Taking these exact same architectural concepts --caching, locking, rate limiting, and containerization --and translating them into ASP.NET. #SoftwareEngineering #BackendDevelopment #Django #Redis #Docker #PostgreSQL #ASPNET #SIWES #TechJourney
Like Comment
To view or add a comment, sign in
Shivanshu Raj
1w Edited
Report this post
Built a production-grade backend from scratch — here's what I learned. TaskAlloc is an employee and task allocation REST API I built with FastAPI and PostgreSQL. Not a tutorial follow-along — I designed the architecture, made the decisions, and figured out why things break. What's under the hood: → 3-tier role system (Admin / Manager / Employee) with access enforced at the query layer — not just filtered in the response → JWT auth with refresh token rotation. Raw tokens never touch the database, only SHA-256 hashes are stored. If the DB leaks, the tokens are useless. → Task state machine — PENDING → IN_PROGRESS → UNDER_REVIEW → COMPLETED. Invalid transitions are rejected before any database write. → Middleware that auto-logs every mutating request with who did it, what resource they touched, and the HTTP status code → 67 passing tests against SQLite in-memory. No external database needed to run the suite. 35+ endpoints. Soft delete. UUID primary keys. Docker + Docker Compose. Full Swagger docs. The thing that surprised me most was how much I learned from just trying to do things the right way — not "make it work" but "make it work correctly." Things like why audit logs shouldn't have a foreign key to users, or why you write the activity log before the status update commits. GitHub in the comments. #FastAPI #Python #BackendDevelopment #PostgreSQL #SoftwareEngineering #BuildingInPublic #OpenToOpportunities #Development

11 Comments
Like Comment
To view or add a comment, sign in
Mohitt Chopra
1w
Report this post
Day 28. I fixed the N+1 problem. Or at least… I thought I did. I had this: @Entity public class User { @OneToMany(mappedBy = "user", fetch = FetchType.LAZY) private List<Order> orders; } And I was careful. I wasn't using EAGER. I avoided obvious mistakes. Still… something felt off. The API was slow. Query count was high. That's when I checked the logs. And saw this: → 1 query to fetch users → N queries to fetch orders Again. Even after "fixing" it. Here's what was actually happening. I was mapping entities to DTOs like this: users.stream() .map(user -> new UserDTO( user.getId(), user.getName(), user.getOrders().size() // 👈 triggers lazy load per user )) .toList(); Looks harmless. But user.getOrders() → triggers lazy loading → inside a loop → causing N+1 again That's when it clicked. N+1 isn't just about fetch type. It's about when and where you access relationships. So I changed it. (see implementation below 👇) What I learned: → LAZY doesn't mean safe → DTO mapping can silently trigger queries → N+1 often hides in transformation layers The hard truth: → You think you fixed it → But it comes back in a different place Writing queries is easy. Controlling when data is accessed is what makes your backend scalable. Have you ever fixed N+1… and then seen it come back somewhere else? 👇 Drop your experience #SpringBoot #Java #Hibernate #BackendDevelopment #Performance #JavaDeveloper
2 Comments
Like Comment
To view or add a comment, sign in

882 followers

536 Posts

View Profile Follow

Dead Letter Queue: Preventing Infinite Retry Loops

More Relevant Posts

Explore content categories