A subtle mistake I’ve seen in many APIs isn’t about architecture… it’s about how we handle async code. Everything looks correct on the surface: • Async endpoints • Await everywhere • Clean structure But then performance starts degrading under load. Now picture this: • You have an API built with ASP.NET Core. • Your service calls the database using async methods from Entity Framework Core. So far, so good. But somewhere in the code, a small decision slips in: .𝗥𝗲𝘀𝘂𝗹𝘁 or .𝗪𝗮𝗶𝘁() It works locally. It even passes tests. But under real load, this becomes a problem. 𝗪𝗵𝘆? Because you’re blocking threads in a system designed to be non-blocking. Each blocked thread: • Holds resources longer than necessary • Reduces throughput • Increases latency for other requests In extreme cases, this leads to thread pool starvation. Now your “scalable” API starts behaving like a bottleneck. The fix is simple, but requires discipline: • Go fully async end-to-end • Never mix sync-over-async • Return Task all the way up the call stack • Use await consistently And this connects directly with your database layer: Whether you’re using SQL Server or PostgreSQL, async queries help free threads while waiting for I/O. But only if you don’t block them afterward. What makes this tricky is that the issue doesn’t show up immediately. It only appears when: • Traffic increases • Latency matters • Concurrency grows This is one of those small details that separates code that works from systems that scale. #DotNet #ASPNetCore #EntityFramework #AsyncProgramming #Performance #BackendDevelopment #Microservices #Cloud #SQLServer #PostgreSQL #SoftwareEngineering #Scalability
Daniel Stock’s Post
More Relevant Posts
-
Recently I came across a discussion on query performance that made me rethink a habit most of us have when writing APIs. You build an endpoint in ASP.NET Core, hook it to your database, and everything works fine. Clean code, async calls, repository pattern… all good... until one day the endpoint slows down. Not because of traffic. Not because of infrastructure. But because of data shape. Picture this: You have an endpoint that returns a list of orders with customer info and items. So you write a query using your ORM (like Entity Framework Core): • Include Orders • Include Customer • Include Items Looks fine, right? But under the hood, this often becomes a massive join that multiplies rows: 𝟭 𝗼𝗿𝗱𝗲𝗿 × 𝟭 𝗰𝘂𝘀𝘁𝗼𝗺𝗲𝗿 × 𝗡 𝗶𝘁𝗲𝗺𝘀 = 𝗱𝘂𝗽𝗹𝗶𝗰𝗮𝘁𝗲𝗱 𝗱𝗮𝘁𝗮 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝘄𝗶𝗿𝗲 I was reading a post from SQLAuthority that reminded of a key principle: The problem is not always the query, it’s what you ask the query to return. Instead of loading everything in one shot, a better approach in many cases is: • Project only what you need (SELECT specific columns) • Split queries when relationships explode • Avoid blindly using .Include() for complex graphs For example: • First query: Orders (lightweight) • Second query: Items grouped by OrderId • Merge in memory Yes, it’s two queries, but often faster, smaller, and more predictable. This becomes even more important when using databases like PostgreSQL or SQL Server in high-scale systems, where: • Network payload matters • Execution plans matter • Memory pressure matters What I like about this is how it challenges a common assumption: “𝘍𝘦𝘸𝘦𝘳 𝘲𝘶𝘦𝘳𝘪𝘦𝘴 = 𝘣𝘦𝘵𝘵𝘦𝘳 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦” In reality, better-shaped data beats fewer queries almost every time. If you’re building APIs today, especially in microservices, it’s worth asking: 𝘈𝘳𝘦 𝘺𝘰𝘶 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘪𝘯𝘨 𝘲𝘶𝘦𝘳𝘺 𝘤𝘰𝘶𝘯𝘵... 𝘰𝘳 𝘥𝘢𝘵𝘢 𝘧𝘭𝘰𝘸? #DotNet #EntityFramework #SQLServer #PostgreSQL #Performance #BackendDevelopment #Microservices #API #CleanArchitecture #SoftwareEngineering #Cloud
To view or add a comment, sign in
-
Our API response time jumped from 120ms → 600ms overnight. No code deployed. No infra change. No incidents reported. Just... slower. Here’s how I debugged it in 40 minutes 👇 Step 1: Isolate the symptom CloudWatch showed the spike started at 11:42 PM. But here’s the interesting part: P95 latency spiked P50 stayed normal That usually means: Large payloads Heavy queries Edge-case traffic Not a full-system slowdown. Step 2: Eliminate the usual suspects I checked the obvious first: Lambda cold starts? ❌ Warm instances were also slow DB connection pool? ❌ Only 42% utilized External APIs? ❌ Not in the request path That narrowed it down to one likely culprit: ➡️ The database query itself. Step 3: Inspect the query plan Ran EXPLAIN ANALYZE on the main trade lookup query. Result: Sequential scan on 2.1M rows Estimated cost: 48,000 Index was no longer being chosen Why? As the table grew, PostgreSQL recalculated cost estimates and changed the execution plan automatically. Silent. Invisible. Expensive. Step 4: Fix it Added a composite index on: (user_id, created_at DESC) Immediately after: Query planner switched to Index Scan P95 dropped from 600ms → 89ms The real lesson Your system can break without deployments. Because performance bugs often come from: Data growth Query planner decisions Traffic shape changes Hidden thresholds EXPLAIN ANALYZE isn’t just an optimization tool. It’s a production survival tool. And if you’re not tracking P95 latency, you’re blind to what power users are experiencing. My takeaway As systems scale, the code may stay the same — but behavior changes. That’s where engineering gets interesting. Curious: what’s the sneakiest production bug you’ve debugged? Drop it in the comments 👇 (Real stories only — those are always the best lessons.) If this was useful, repost it so more engineers see it. #PostgreSQL #BackendEngineering #NodeJS #SystemDesign #SoftwareEngineering #Debugging #AWS #Performance #DevOps
To view or add a comment, sign in
-
-
🚀 Scaling Smart: Using Bloom Filters to Eliminate Unnecessary DB Hits In my previous post, I talked about building resilient systems with Kafka + DLQs. Today, let’s zoom into a powerful optimization technique that quietly boosts performance at scale: Bloom Filters. 💡 The Problem: In high-traffic systems, databases often get flooded with repetitive existence checks: Does this user exist? Is this email already registered? Is this token valid? Even with caching, these checks can become a bottleneck under heavy load. ⚡ The Solution: Bloom Filters A Bloom Filter is a space-efficient probabilistic data structure that helps answer: 👉 Is this element definitely NOT in the set, or MAYBE in the set? ✔️ If it says NOT present → 100% accurate (skip DB call) ✔️ If it says MAYBE present → fallback to DB/Redis check This simple layer drastically reduces unnecessary database queries. 🔧 Where It Fits in Architecture: Placed before DB or cache lookups Works great with Redis-backed systems Ideal for auth systems (user/email existence checks) Can be shared across services in distributed environments 📈 Why It Matters: ⚡ Reduces DB load significantly 🚀 Improves response times 💰 Saves infrastructure cost at scale 🔄 Perfect for read-heavy systems ⚠️ Trade-off: Bloom Filters can have false positives, but never false negatives. 👉 That’s why they’re used as a first-pass filter, not a source of truth. 🧠 Pro Tip: Tune your hash functions and bit array size carefully to balance memory vs accuracy. ✨ In the next post, we will talk about Redis + JWT token and refresh token #SystemDesign #BackendEngineering #Scalability #BloomFilter #DistributedSystems #PerformanceOptimization #nodejs #javascript
To view or add a comment, sign in
-
Sometimes everything in your system works fine. Then one day, traffic spikes… and multiple requests try to update the same data at the same time. Now you get weird issues: Duplicate orders Overbooked seats Negative inventory Not because of bugs. Because of concurrent updates. --- This is where Distributed Locking comes in The idea is simple: Only one process should modify a resource at a time. Everyone else has to wait. --- What actually happens Let’s say two requests try to update the same product stock. Without locking: Both read stock = 10 Both reduce it Final value becomes wrong With locking: First request gets the lock Second request waits Updates happen safely --- Where this is used Payment processing Inventory management Booking systems Scheduled jobs Anywhere consistency matters. --- Common ways to implement Database locks Simple, but can affect performance. Redis locks (like Redisson) Fast and commonly used in distributed systems. Zookeeper / etcd Used in large-scale systems. --- Why this matters In distributed systems: Multiple instances run in parallel Race conditions are common Data can get corrupted silently Locks help keep things consistent. --- But be careful Locks can slow things down. If not handled properly, they can even cause deadlocks. Use them only where necessary. --- Simple takeaway When multiple processes touch the same data, coordination becomes essential. --- Where in your system could two requests clash at the same time without you noticing? #Java #SpringBoot #Programming #SoftwareDevelopment #Cloud #AI #Coding #Learning #Tech #Technology #WebDevelopment #Microservices #API #Database #SpringFramework #Hibernate #MySQL #BackendDevelopment #CareerGrowth #ProfessionalDevelopment #RDBMS #PostgreSQL #backend
To view or add a comment, sign in
-
Why Pre-signed URLs are an Engineering "Trap" (and how I solved it) 🛡️ Most developers think moving from "Server Uploads" to "S3 Pre-signed URLs" is a 5-minute task. They are wrong. Sure, the logic is simple: 1. Client asks Server for a URL. 2. Server saves metadata to DB and gives the URL. 3. Client uploads directly to S3. The Scalability Win: Your server no longer chokes on 500MB video files. 🚀 The Distributed Nightmare: What happens when the "Handshake" breaks? Scenario A: The user gets the URL but closes the tab. Now you have a "Zombie" record in your database for a file that doesn't exist. Scenario B: The file uploads to S3 successfully, but the client’s "Confirmation" request to your server fails (due to a bad network). Now S3 has the file, but your database says it’s missing. Your Source of Truth is now out of sync. This is where real engineering begins. How to built a "Zero-Failure" Pipeline : To solve this, I implemented a two-layer safety net: 1. The "Push" Layer (S3 Event Notifications): Never trust the client to tell you the upload is finished. I configured S3 Event Notifications to trigger a webhook/Lambda whenever a PutObject succeeds. This allows S3 to tell my backend directly: "Hey, the file is actually here." 2. The "Cleanup" Layer (EventBridge/Scheduled Tasks): To handle the "Zombies" (metadata exists, but upload never happened), I implemented a Stale-Record Cleaner. Using AWS EventBridge (or a simple Cron job), the system scans for records in a PENDING state older than 2 hours. If the file hasn't arrived in S3 by then, the system deletes the DB record and cleans up the "leak." The Lesson: File offloading isn't just about moving data; it’s about managing State. As a Backend Engineer, our job isn't just to make it work—it's to make sure it doesn't break when no one is watching. #BackendDevelopment #SystemDesign #AWS #NodeJS #CloudArchitecture #BuildInPublic
To view or add a comment, sign in
-
Sometimes one request needs to touch multiple systems. It looks simple: Save order Update inventory Process payment But what happens if one step fails? In a single database, you’d use a transaction. In distributed systems, that’s not so simple. That’s where Distributed Transactions come in. The problem You’re dealing with multiple services, each with its own database. If one succeeds and another fails, your system becomes inconsistent. The traditional approach (2PC) Two-Phase Commit tries to solve this: 1. Ask all services if they can commit 2. If yes → commit everywhere 3. If not → rollback everywhere Sounds perfect, but: Slow Complex Not scalable Can lock resources That’s why it’s rarely used in modern microservices. The practical approach Instead of strict transactions, systems use: Saga Pattern (you’ve seen this) Eventual Consistency Compensating actions You don’t force everything to succeed together. You handle failures gracefully. Why this matters In distributed systems: Failures are normal Networks are unreliable Systems are independent Trying to make everything perfectly consistent often hurts performance and scalability. Simple takeaway In microservices, consistency is designed — not guaranteed. If multiple services in your system need to update data together, are you using strict transactions — or handling it differently? #Java #SpringBoot #Programming #SoftwareDevelopment #Cloud #AI #Coding #Learning #Tech #Technology #WebDevelopment #Microservices #API #Database #SpringFramework #Hibernate #MySQL #BackendDevelopment #CareerGrowth #ProfessionalDevelopment #RDBMS #PostgreSQL #backend
To view or add a comment, sign in
-
A month ago I ran into an interesting (and slightly scary) problem. I had to design an API that runs a heavy background process. During that process, a lot of things could change in the database — multiple users, multiple updates, different operations happening at the same time. And then a thought hit me: “What if multiple users trigger this API at the same time in a horizontally scaled or async system?” That’s where things can get dangerous. You can end up with: race conditions stale or inconsistent data even potential data corruption when multiple operations depend on the same records At first, the obvious solution seems to be: Use a synchronous backend like Django transactions for sensitive operations. And yes, that helps — but only within a single instance. The real challenge starts when your system scales: multiple instances, async workers, FastAPI services, distributed architecture… At that point, you realize: 👉 local locks are not enough anymore That’s where Redis became my “safety net”. Using Redis distributed locks, you can control concurrency across the entire system: lock by user_id + process_name to prevent duplicate execution per user or lock by event_id / shared resource key to prevent conflicting operations ensure only one process can modify a critical dataset at a time So instead of relying on a single app instance, you enforce global consistency across all services. This approach works regardless of: Django FastAPI async workers multiple server instances It’s simple but powerful: 👉 “If the process is running somewhere, no one else can run it.” And honestly, Redis saved me from a lot of potential chaos. Key takeaway: When scaling systems, database safety is not just about transactions — it’s about coordination across processes. #BackendDevelopment #SystemDesign #Redis #Django #FastAPI #DistributedSystems #SoftwareEngineering #Scalability #Databases #Microservices
To view or add a comment, sign in
-
-
Everyone talks about scalability. Very few show how to structure a backend properly. Here’s a simple structure I use for production-ready systems: → routes/ Handles HTTP layer (FastAPI/Flask) → services/ Business logic (core system behavior) → repositories/ Database interactions (PostgreSQL queries) → models/ Data schemas (ORM / validation) → utils/ Shared helpers (logging, auth, etc.) Request flow: Client → Route → Service → Repository → Database → Response Why this works: ✔ Clean separation of concerns ✔ Easy to debug ✔ Easy to scale later ✔ No unnecessary complexity What I avoid early: ✘ Microservices ✘ Event-driven chaos ✘ Over-abstraction Rule: “Structure first. Scale later.” A clean monolith beats a messy distributed system. #Backend #SystemDesign #SoftwareEngineering #APIs #PostgreSQL #Mentee #follow #followformore
To view or add a comment, sign in
-
-
"Just use Postgres." Modern software engineering has become a subscription management simulator. I finally stopped the madness and consolidated most of my specialized infrastructure into a single source of truth: Postgres. Postgres has been in active development for three decades. It is basically the Skyrim of databases, a rock-solid foundation you can mod until it replaces your entire stack. The technical reality: NoSQL: JSONB + GIN indices give you document-store flexibility with ACID compliance. Search: TS_VECTOR handles full-text search. I am glad I am not the only one who realized Elasticsearch is usually an expensive layer of overkill. Vector DB: pgvector with HNSW indices solves the hybrid search problem natively. Message Queue: FOR UPDATE SKIP LOCKED creates reliable queues without adding a new service. Time Series: Partitioning + BRIN indices handle massive telemetry without the B-tree bloat. API Layer: Row-Level Security (RLS) can eliminate hundreds of lines of boilerplate middleware. The result is one connection string, one backup strategy, and zero distributed consistency headaches. Stop over-engineering for Google-scale problems you do not have yet. Pick the tool that has been battle-tested since the 90s and just start shipping.
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
I remember troubleshooting a legacy migration to .NET Core where the system would randomly hang under peak traffic, even though CPU usage was low. We found a few .Result calls hidden inside a custom logging middleware. It was a textbook case of Thread Pool Starvation: the threads were all waiting for each other to finish, creating a deadlock that didn't exist in dev but crushed production.