Stateless System with a Hidden Stateful Secret

1mo

One fine morning, a customer reported: “File upload sometimes fails…” Not always. Not consistently. Just sometimes. 😄 And of course, those are the best bugs. 👉 System handles 1000+ uploads daily 👉 Issue happens randomly (10–20 times) 👉 Chunk upload + merge logic (unchanged for years) 👉 Stateless architecture (or so I thought…) I jumped into debugging mode. After hours of checking: NFS configs ✅ Multi-server behavior ✅ Retry logic ✅ Logs (100 times) ✅ Observation: Chunks uploaded from Server A were not visible on Server B immediately (10–15 sec delay). Confusion level: 🔥🔥🔥 Then I did something simple (and often ignored)… 👉 Compared old vs new code Guess what changed? Just one line removed (thanks to Sonar cleanup 😅): HttpSession session = request.getSession(); And that innocent line was silently adding JSESSIONID, making requests sticky and hiding the real problem all along. 💡 So for years, reality was something like this: Stateless system... except when upload API enters the chat 😄 Or simply: stateless most of the time, secretly stateful during uploads 🎭 And the moment I removed an “unused variable”… 💥 Load balancing started behaving correctly 💥 NFS delays became visible 💥 Hidden dependency got exposed 💥 Bug said: Hello 👋 I was always here And the best realization: 👉 My application is perfectly stateless… 👉 Until the user hits the upload API and boom, it becomes emotional (stateful) 🤣🤣🤣 Lesson learned: Sometimes the bug is not in new code… It’s in removing the wrong old code 😄 And sometimes… Your system isn’t broken, your assumptions are. Still one mystery remains: 👉 Why exactly NFS behaved that way (never got a perfect answer 😅) #BackendStories #ProductionIssues #Java #NFS

To view or add a comment, sign in

More Relevant Posts

Rohith Kumar Karnati
1w
Report this post
A recent issue reminded me that performance optimizations can sometimes become production problems. We had an API that: 1️⃣ Fetches initial details 2️⃣ Extracts IDs from the response 3️⃣ Makes another database call to fetch larger secondary data To speed up step 3, parallel processing was introduced using a fixed thread pool. Sounds reasonable — until load testing began. Under heavy traffic, thread creation kept increasing across instances until limits were hit, leading to: ⚠️ "Can't create new native thread" The interesting part? The optimization worked for individual requests. But at scale, the resource model didn’t. A request with a small number of IDs didn’t always need dedicated worker threads, yet threads were still being allocated repeatedly under concurrent load. The fix was moving to a shared/reusable thread pool model with better resource control. 💡 My takeaway: Code that is fast in isolation may fail under concurrency. When designing for performance, it’s important to ask: - How does this behave at 1 request? - How does this behave at 1000 requests? - What resources grow with traffic? Scalability is often less about speed, more about control. #BackendEngineering #Java #PerformanceTesting #Scalability #Concurrency
Like Comment
To view or add a comment, sign in
Kanika Gosain
1mo
Report this post
Day 2 — The bug that had no errors Yesterday: assumptions broke production. Today’s lesson was worse. The API didn’t crash. It returned a 200 OK. But the data was wrong. No exceptions. No alerts. Nothing in monitoring. Just silent failure. The only thing that helped? Logs. We added detailed logs for: • Request input • Intermediate steps • Final response And suddenly — the issue was obvious. 👉 Lesson: The most dangerous bugs don’t throw errors. They silently return wrong data. If your logs can’t explain the flow, you’re debugging blind. Tomorrow: The logging mistake most developers make. #Backend #Java #Debugging #SoftwareEngineering #LearningInPublic
4 Comments
Like Comment
To view or add a comment, sign in
Abhishek Pundir
3w
Report this post
💥 Getting 405 Method Not Allowed in API? Here’s why 👇 I hit this error while calling my API and it took time to figure out 😤 👉 Status Code: 405 (Method Not Allowed) 🔍 The Problem: The API endpoint exists… but the HTTP method is wrong. ✅ The Fix: ✔️ Check your controller method: [HttpPost] public IActionResult SaveData() { // code } 👉 If your API is POST and you call it with GET → ❌ 405 error ✔️ Match method in Postman: GET → [HttpGet] POST → [HttpPost] PUT → [HttpPut] DELETE → [HttpDelete] ✔️ Also check route: [Route("api/[controller]")] ⚡ Pro Tip: Always read the error code carefully—HTTP status codes tell the exact problem 💡 💬 Have you faced this error before? Let’s discuss 👇 🔖 Save this post for future debugging! #dotnet #api #developer #coding #debugging #tricks
Like Comment
To view or add a comment, sign in
Ajala Nalawade
6d Edited
Report this post
HTTP 429 isn’t an error. It’s a decision. And I built the system that makes that decision. Most developers have hit rate limits. Very few understand how they actually work under the hood. So I built a production-grade 𝐀𝐏𝐈 𝐑𝐚𝐭𝐞 𝐋𝐢𝐦𝐢𝐭𝐞𝐫 from scratch — not a clone, not a tutorial. ➣ What it does Controls how many requests a client can make within a time window: • Within limits → HTTP 200 ✅ • Cross the limit → HTTP 429 🚫 This is what protects real APIs from: ✓ bot traffic ✓ abuse ✓ infrastructure overload ➣ 3 algorithms. 3 different trade-offs. • 𝐓𝐨𝐤𝐞𝐧 𝐁𝐮𝐜𝐤𝐞𝐭 → absorbs burst traffic (user-facing APIs) • 𝐒𝐥𝐢𝐝𝐢𝐧𝐠 𝐖𝐢𝐧𝐝𝐨𝐰→ fair distribution, no boundary exploits • 𝐋𝐞𝐚𝐤𝐲 𝐁𝐮𝐜𝐤𝐞𝐭 → strict constant rate (payments, critical systems) 👉 Switch between them LIVE — no restart, no downtime. ➣ Where theory meets reality 1. 𝐑𝐚𝐜𝐞 𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧: Two requests see “1 token left” → both pass → Fixed using serialized writes + DB transactions 2. 𝐖𝐫𝐢𝐭𝐞 𝐥𝐨𝐜𝐤 𝐜𝐨𝐧𝐭𝐞𝐧𝐭𝐢𝐨𝐧 : High traffic = silent failures → Fixed with retry logic + scoped transactions 3. 𝐑𝐮𝐧𝐭𝐢𝐦𝐞 𝐚𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦 𝐬𝐰𝐢𝐭𝐜𝐡𝐢𝐧𝐠: Changing logic without breaking user state or API keys → Required careful state isolation 🛠️ 𝐒𝐭𝐚𝐜𝐤 : Python · FastAPI · SQLite · Vanilla JS · Chart.js 🔗 Live Demo: https://lnkd.in/dmK_WF6V 💻 GitHub: https://lnkd.in/dnK5AAPZ Built this from scratch to understand how production systems think about traffic control. 🚀 Would really appreciate feedback — especially from engineers who've worked on distributed systems or high-traffic APIs. #BackendEngineering #Python #FastAPI #SystemDesign #SoftwareEngineering #Backend
Like Comment
To view or add a comment, sign in
Antonio Berbén
3w
Report this post
Voice Support and Human in the loop: These features are tremendous! HITL is mandatory in an angentic infrastructure. In many case, autonomous agent cannot be deployed. Voice support, in the other hand,... I love letting my agent do my work just talking with it. Call me lazy 😉

kagent

2,445 followers
1mo

What's dropped recently in kagent 👇 🎙️ Voice Support Voice input/output support was added across all agents. 🧑💻 Human-in-the-Loop Two distinct HITL modes were shipped back-to-back: 1. Confirmation mode: agents can pause and ask the user to confirm before taking an action. 2. User input mode: agents can pause mid-task and prompt the user for freeform input 🧠 Long-Term Memory Store A memory store was added enabling agents to retain and access long-term memory across sessions. 📝 Built-in Prompts, Prompt Templates & Context Management Configurable context management was added for agents, alongside support for built-in prompts and prompt templates, making it easier to reuse and standardize instructions. ⚙️ Go/Python Runtime Selection + Go Module Refactoring A runtime field was added to the Declarative Agent CRD, allowing users to choose between Go and Python runtimes. 📦 Git-Based Skill Fetching Agents can now pull skills from Git repositories with shared auth support and a lightweight init image 📡 Distributed Tracing + A2A Trace Propagation The controller was instrumented with distributed tracing, including propagation of traces across agent-to-agent (A2A) calls 🗄️ Postgres Support + Security Hardening A `--postgres-database-url-file` flag was added for file-based DB credential injection. A Postgres variant was added to the e2e CI test matrix via matrix strategy. 🔍 Dynamic Provider/Model Discovery in UI The UI now dynamically discovers available providers and models, rather than requiring a hardcoded list 🔐 API Key Passthrough API keys can now be passed through directly in `ModelConfig` and a `--token` flag was added to kagent invoke for the same purpose from the CLI. 🌐 Global Default Service Account for Agent Deployments A global default serviceAccountName can now be set for all agent deployments, reducing per-agent boilerplate. #agenticai #KubeCon #CloudNativeCon
Like Comment
To view or add a comment, sign in
Somayajulu Sharma
1w Edited
Report this post
🚀 I’ve been thinking about thread usage in backend services recently… Most of our services do a lot of: • DB calls • S3 reads • External API calls Basically… a lot of waiting. Traditionally, we use thread pools. It works fine, but here’s what I’ve noticed: When a thread makes an API or DB call, it just sits there waiting. It’s not doing any work… but still consuming memory. At scale: More requests → more threads → more pods → more cost. To solve this, we used patterns like DeferredResult in Spring. Idea was simple: 👉 Don’t block request thread 👉 Process in background 👉 Send response later It works… But honestly: • More complex code • Harder to debug • Still need to manage thread pools internally Recently, I started exploring virtual threads (Java 21), and it feels much simpler. You just write normal code: • Call API • Save to DB • Return response Even if it blocks, virtual threads handle it efficiently. For use cases like: • Vendor API calls • Reading from S3 • Writing to DB Virtual threads seem like a natural fit. From what I understand so far: • Thread pools → limited + need tuning • DeferredResult → non-blocking but adds complexity • Virtual threads → simple + scalable And better resource usage usually means lower infra cost. Still exploring edge cases (CPU-heavy tasks, locking, etc.) But for typical backend services, this feels like a solid improvement. 💬 Curious if anyone has replaced DeferredResult / async handling with virtual threads in production? How was your experience?
Like Comment
To view or add a comment, sign in
Naman Khurana
1w
Report this post
I was building filtering for financial records in my backend. Date range. Category. Amount range. User scope. All optional. All combinable. I started with hardcoded query logic using if-else conditions for different filter cases. It got messy fast. Every new filter meant rewriting existing logic. At one point, the queries looked like they were never meant to be read again. So I scrapped it. I implemented the Specification pattern using Spring Data JPA. Each filter became an isolated, composable predicate. At runtime, only the active ones combine into a single query. No hardcoding. No duplication. Small change in approach. Big impact on scalability and future scope. Now, adding a new filter is just one addition. Existing logic doesn't change. This is the Open/Closed principle from SOLID in practice, open for extension, closed for modification. Each Specification also owns exactly one filter concern. Single Responsibility, naturally enforced. The filtering layer went from something I avoided touching to something I can extend confidently, without regression risk. Interesting how backend complexity shifts as systems grow: performance → security → maintainability. This was firmly the third. #Backend #Java #Maintainability #SOLID #LearningInPublic #SWE
Like Comment
To view or add a comment, sign in
Shreyansh Soni
3w
Report this post
Database errors will humble you. No matter how confident you are. Today was one of those days. Everything looked correct: logic made sense API routes were fine schema was clean But nothing worked the way it should. Hours went into: Prisma validation errors weird TypeScript issues (never type…) data not updating even though queries looked right And the funniest part? The actual issue is always something small. One missing field. One wrong assumption. One mismatch between frontend and backend. That’s it. But it’ll cost you hours. What I’ve learned (again): Debugging databases is not just coding. It’s patience + clarity + brutal honesty with your own logic. You have to slow down and ask: “What is actually happening?” “What am I assuming?” “Where is the data breaking?” Finally fixed it. And yeah, that moment when it works? Worth it. But still… database errors are a pain in the ass. #webdev #programming #debugging #buildinpublic
Like Comment
To view or add a comment, sign in
Sameer Chereddy
1w Edited
Report this post
Only way to solve a problem: fix it so it stays fixed. https://lnkd.in/ecaGs6_N Not a workaround. Not a patch that needs revisiting next quarter. Something that handles the problem completely, works across your whole stack, and that you never have to think about again. Enter Boomerang: The problem: slow sync endpoints. The kind that start fine, then the business grows, the data model grows, and suddenly you're holding connections open for >30 seconds waiting on external APIs. Callers time out. Retries pile up. Everything gets worse. The real fix is well understood — async job with webhook delivery. But doing it right means building a lot of things that have nothing to do with your business logic: a queue, idempotency, retry backoff, SSRF protection, dead-letter storage, auth, metrics. And then building all of it again for every service, every language, every team. So I built it once and made it language-agnostic. Boomerang runs as a standalone Docker service. Your Java service uses it. So does your Python service, your Node.js service, your Go service. One deployment. Thin SDKs handle the HTTP calls — no queue logic, no Redis on the client side. The infrastructure is written once and shared across your entire stack. If you're all-in on Java there's also an embedded Spring Boot starter — one annotation and it's running inside your app.
Like Comment
To view or add a comment, sign in
Kshitij Gupta
2w Edited
Report this post
🧠 LeetCode POTD — The Bug Wasn’t Logic… It Was Leading Zeros 3761. Minimum Absolute Distance Between Mirror Pairs At first glance, this problem looked simple. Find two indices (i, j) such that: 👉 reverse(nums[i]) == nums[j] and return the minimum distance. My first instinct was straightforward: 👉 Store all numbers in a map 👉 Reverse the current number 👉 Check if it already exists Simple enough. 💥 But then one small edge case caused issues: Leading zeros Example: 120 → 21 Not 021 So if you think in strings, it’s easy to make mistakes. 💡 The cleaner approach: Instead of storing original numbers first, 👉 Reverse each number mathematically 👉 Store the reversed value with its latest index 👉 If current number already exists in map, we found a mirror pair Why this works: If we process: 120 We store: 21 Later when 21 appears, we instantly know it matches. 📌 Best part: Mathematical reversal automatically handles leading zeros. 120 → 21 300 → 3 101 → 101 No extra checks needed. 💡 What I liked about this problem: The challenge wasn’t data structures. It was noticing that a small representation detail changes the whole solution. Sometimes bugs are not in algorithms. They’re hidden inside edge cases. Curious — did anyone else first think of using strings here? 👀 #LeetCode #ProblemSolving #HashMap #SoftwareEngineering #DSA #SDE #Java #C++
Like Comment
To view or add a comment, sign in

783 followers

16 Posts

View Profile Connect

Stateless System with a Hidden Stateful Secret

More Relevant Posts

Explore content categories