Optimize API Performance with Better Data Shaping

Recently I came across a discussion on query performance that made me rethink a habit most of us have when writing APIs. You build an endpoint in ASP.NET Core, hook it to your database, and everything works fine. Clean code, async calls, repository pattern… all good... until one day the endpoint slows down. Not because of traffic. Not because of infrastructure. But because of data shape. Picture this: You have an endpoint that returns a list of orders with customer info and items. So you write a query using your ORM (like Entity Framework Core): • Include Orders • Include Customer • Include Items Looks fine, right? But under the hood, this often becomes a massive join that multiplies rows: 𝟭 𝗼𝗿𝗱𝗲𝗿 × 𝟭 𝗰𝘂𝘀𝘁𝗼𝗺𝗲𝗿 × 𝗡 𝗶𝘁𝗲𝗺𝘀 = 𝗱𝘂𝗽𝗹𝗶𝗰𝗮𝘁𝗲𝗱 𝗱𝗮𝘁𝗮 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝘄𝗶𝗿𝗲 I was reading a post from SQLAuthority that reminded of a key principle: The problem is not always the query, it’s what you ask the query to return. Instead of loading everything in one shot, a better approach in many cases is: • Project only what you need (SELECT specific columns) • Split queries when relationships explode • Avoid blindly using .Include() for complex graphs For example: • First query: Orders (lightweight) • Second query: Items grouped by OrderId • Merge in memory Yes, it’s two queries, but often faster, smaller, and more predictable. This becomes even more important when using databases like PostgreSQL or SQL Server in high-scale systems, where: • Network payload matters • Execution plans matter • Memory pressure matters What I like about this is how it challenges a common assumption: “𝘍𝘦𝘸𝘦𝘳 𝘲𝘶𝘦𝘳𝘪𝘦𝘴 = 𝘣𝘦𝘵𝘵𝘦𝘳 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦” In reality, better-shaped data beats fewer queries almost every time. If you’re building APIs today, especially in microservices, it’s worth asking: 𝘈𝘳𝘦 𝘺𝘰𝘶 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘪𝘯𝘨 𝘲𝘶𝘦𝘳𝘺 𝘤𝘰𝘶𝘯𝘵... 𝘰𝘳 𝘥𝘢𝘵𝘢 𝘧𝘭𝘰𝘸? #DotNet #EntityFramework #SQLServer #PostgreSQL #Performance #BackendDevelopment #Microservices #API #CleanArchitecture #SoftwareEngineering #Cloud

To view or add a comment, sign in

More Relevant Posts

Ig Or
4w
Report this post
Stop Over-Engineering Your Storage Layer 🛑 I like to use not bloated database solutions: avoid SQL wherever it's possible and stick with document-oriented JSON databases, but sometimes the app needs only simple storage without any overhead. For my recent screenshot API I’ve switched to NATS for key:value storage, and I’m not looking back. Why NATS? Originally a messaging service, NATS has a built-in KV store that is a minimalist’s dream: 0 bloat: It’s a single binary. No massive dependencies. Low friction: I can deploy it as a simple Systemd unit in seconds. Dev fetish: The CLI is intuitive, and the Go integration is clean. Speed: It’s lightweight enough to stay out of the way of your performance. nats-server -js In the GO app: import: "https://lnkd.in/eDeAAeE7" init NATS: nc, err = nats.Connect(env.natsConnect) init JetStream: js, err := jetstream.New(nc) init KV Bucket: kv, err = js.CreateKeyValue(ctx, jetstream.KeyValueConfig{ Bucket: "users", }) create KV pair: _, err := kv.Put(context.Background(), key, []byte(value)) get value: entry, err := kv.Get(context.Background(), key) delete KV pair: kv.Delete(context.Background(), key) The Verdict If you need complex relational queries, stick to SQL (but normally you don't). But if you want a service that is fast, deployable, and straightforward, stop ignoring NATS. Simplicity is a feature, not a compromise. ⚡ #BackendDevelopment #NATS #Golang #SoftwareArchitecture #Minimalism #CloudNative
Like Comment
To view or add a comment, sign in
Sandip Kalsariya
1w
Report this post
🚀 𝗦𝘁𝗼𝗽 𝗪𝗿𝗶𝘁𝗶𝗻𝗴 𝗦𝗹𝗼𝘄 𝗔𝗣𝗜𝘀 — 𝗬𝗼𝘂𝗿 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗜𝘀 𝘁𝗵𝗲 𝗥𝗲𝗮𝗹 𝗕𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸 Most developers try to optimize code… But ignore the 𝗼𝗻𝗲 𝘁𝗵𝗶𝗻𝗴 𝘁𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘀𝗹𝗼𝘄𝘀 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝗱𝗼𝘄𝗻: 👉 𝗧𝗵𝗲 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲. I’ve seen APIs with clean code, great architecture… Still performing terribly. 𝗪𝗵𝘆? 𝗕𝗲𝗰𝗮𝘂𝘀𝗲 𝗼𝗳 𝘁𝗵𝗶𝘀 👇 ❌ SELECT * everywhere ❌ Missing indexes ❌ N+1 query problem ❌ No pagination (loading thousands of records) ❌ Blocking calls instead of async queries ❌ Over-fetching unnecessary data ❌ No caching strategy 𝗥𝗲𝘀𝘂𝗹𝘁? ⚠️ Slow APIs ⚠️ High server load ⚠️ Poor user experience ⚠️ Increased cloud cost ✅ 𝗪𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 ✔ Use proper indexing (most ignored, most powerful) ✔ Fetch only required columns (projection) ✔ Implement pagination (Skip/Take) ✔ Use async database calls ✔ Avoid N+1 queries (use joins / includes wisely) ✔ Add caching (Redis for frequently accessed data) ✔ Use read replicas for heavy read systems 💡 𝗥𝗲𝗮𝗹𝗶𝘁𝘆 𝗰𝗵𝗲𝗰𝗸: You don’t need a faster server… You need smarter queries. 🎯 𝗣𝗿𝗼 𝗧𝗶𝗽: Before scaling infrastructure, run query analysis. In 80% of cases, the issue is sitting in your SQL. 💬 What’s the worst DB performance issue you’ve faced? #dotnet #sqlserver #database #performance #backend #webapi #softwareengineering #developers #optimization
Like Comment
To view or add a comment, sign in
Shivam Kumar Verma
1w
Report this post
I recently re-architected a core part of my system (CodeSM) while migrating from MongoDB to Postgres — and it exposed some flawed design decisions I had been ignoring. Old Architecture (Submission Flow) User clicks Submit Create submission entry in DB Enqueue job (BullMQ + Redis) Worker: Downloads testcases from S3 Spins up Docker container Compiles & executes code Stores result in job_results table This worked well for SUBMIT, but I handled RUN very differently. Previous RUN Design (Flawed) For the "Run Code" button: ❌ No DB entry ❌ Directly pushed payload to queue: { "code": "...", "language": "...", "problemId": "..." } Why? Workers had no persistent source of truth. Problems: No debugging or traceability No retry capability Two separate execution pipelines (RUN vs SUBMIT) Harder to scale and maintain New Architecture (Unified & Scalable) I redesigned the system to treat RUN and SUBMIT the same way: ✅ Create a submission entry for both ✅ Add mode = RUN | SUBMIT ✅ Enqueue only submissionId Worker now: Fetches data from DB Executes based on mode Updates submission state New Challenge RUN generates a large amount of temporary data. Solution: Mark RUN submissions as temporary Add cleanup job (cron) Delete entries older than 1–6 hours Key Takeaways Don’t create separate pipelines for similar workflows Persist minimal identifiers, not full payloads Design for retries and debugging from day one Temporary data still needs lifecycle management This redesign made the system: More consistent Easier to debug More scalable under load Still iterating, but this was a big shift from “making it work” to designing for scale and reliability. Sharing the system design diagram as well — would love to hear feedback.
2 Comments
Like Comment
To view or add a comment, sign in
Muhammad Saleem
2w
Report this post
🚀 How to Handle Large Data (Millions of Rows) Efficiently Most systems work fine… until data grows. Then suddenly: - APIs slow down 🐌 - Queries start timing out ⏳ - Servers struggle under load 🔥 If you're dealing with millions of rows, your approach needs to change. Here’s what actually works in real-world systems 👇 --- 🔹 1. Stop Loading Everything Pulling full datasets into memory is a silent killer. ✅ Use: - Pagination ("Skip/Take") - Server-side filtering - Streaming (IAsyncEnumerable in .NET) --- 🔹 2. Index Smartly (Not Excessively) Indexes speed up reads… but they slow down writes. ⚠️ Too many indexes can hurt: - INSERT performance - UPDATE operations - Bulk processing ✅ Best practice: - Index only what you query frequently - Avoid duplicate/unused indexes - Review index usage periodically --- 🔹 3. Choose the Right Tool Not every query needs an ORM. 👉 Use: - EF Core for maintainability - Dapper for high-performance queries --- 🔹 4. Optimize Your Queries Bad queries don’t scale. ❌ Avoid: - SELECT * - Unnecessary joins - Heavy subqueries ✅ Prefer: - Projections - Filter early - Analyze execution plans --- 🔹 5. Use Caching Smartly Don’t hit the database every time. 💡 Cache: - Frequently accessed data - Read-heavy endpoints Tools: Redis, MemoryCache --- 🔹 6. Batch Processing Over Loops Processing row-by-row? That’s slow. ✅ Use: - Bulk inserts/updates - SQL batching - Background jobs --- 🔹 7. Think in Terms of Architecture At scale, code isn’t enough. Consider: - Read replicas - CQRS pattern - Data partitioning (sharding) --- 🔥 Real Talk: Handling large data isn’t about one trick. It’s about balancing reads vs writes and making trade-offs. --- 💬 Curious: Have you ever removed indexes to improve write performance? What was your experience? --- #dotnet #aspnetcore #sqlserver #backenddevelopment #softwarearchitecture #performance #scalability #saas #developers #programming
Like Comment
To view or add a comment, sign in
Teja Adabala
2w
Report this post
🚀 How to Handle Large Data (Millions of Rows) Efficiently Most systems work fine… until data grows. Then suddenly: - APIs slow down 🐌 - Queries start timing out ⏳ - Servers struggle under load 🔥 If you're dealing with millions of rows, your approach needs to change. Here’s what actually works in real-world systems 👇 --- 🔹 1. Stop Loading Everything Pulling full datasets into memory is a silent killer. ✅ Use: - Pagination ("Skip/Take") - Server-side filtering - Streaming (IAsyncEnumerable in .NET) --- 🔹 2. Index Smartly (Not Excessively) Indexes speed up reads… but they slow down writes. ⚠️ Too many indexes can hurt: - INSERT performance - UPDATE operations - Bulk processing ✅ Best practice: - Index only what you query frequently - Avoid duplicate/unused indexes - Review index usage periodically --- 🔹 3. Choose the Right Tool Not every query needs an ORM. 👉 Use: - EF Core for maintainability - Dapper for high-performance queries --- 🔹 4. Optimize Your Queries Bad queries don’t scale. ❌ Avoid: - SELECT * - Unnecessary joins - Heavy subqueries ✅ Prefer: - Projections - Filter early - Analyze execution plans --- 🔹 5. Use Caching Smartly Don’t hit the database every time. 💡 Cache: - Frequently accessed data - Read-heavy endpoints Tools: Redis, MemoryCache --- 🔹 6. Batch Processing Over Loops Processing row-by-row? That’s slow. ✅ Use: - Bulk inserts/updates - SQL batching - Background jobs --- 🔹 7. Think in Terms of Architecture At scale, code isn’t enough. Consider: - Read replicas - CQRS pattern - Data partitioning (sharding) --- 🔥 Real Talk: Handling large data isn’t about one trick. It’s about balancing reads vs writes and making trade-offs. --- 💬 Curious: Have you ever removed indexes to improve write performance? What was your experience? --- #dotnet #aspnetcore #sqlserver #backenddevelopment #softwarearchitecture #performance #scalability #saas #developers #programming
Like Comment
To view or add a comment, sign in
DharmOps

1 follower
1w
Report this post
You added an index. Your query is still slow. Here's the part nobody taught you. Adding an index isn't optimization. Knowing which index, that's optimization. Most devs are using 20% of what indexes can actually do. 👇 ───────────────────────── 1/ Composite Indexes - order is everything Index on (user_id, created_at) ≠ (created_at, user_id) PostgreSQL reads left to right. Wrong order = index ignored entirely. Your query filters matter. Column sequence matters more. 2/ Partial Indexes - index only what you query 95% of queries filter WHERE status = 'active' why index every archived and deleted row too? CREATE INDEX ON orders (user_id) WHERE status = 'active' Smaller. Faster. Less memory. Zero downsides. 3/ Covering Indexes — eliminate the table entirely Normal index → finds the row → fetches from table. Covering index → contains every column your query needs. PostgreSQL never touches the main table. Cost drops to near zero. 4/ Expression Indexes — index the transformation Querying WHERE LOWER(email) = 'user@email.com'? Your index on email is completely useless here. Fix: CREATE INDEX ON users (LOWER(email)) → now it's instant. ───────────────────────── The consequence of not knowing this: → You keep adding indexes that don't help → Write performance degrades from useless index overhead → Your DB grows heavier and somehow stays just as slow Most teams don't have an indexing problem. They have an indexing knowledge problem. ───────────────────────── Dharmops audits your entire index structure, what's missing, redundant, or wrongly ordered. → Free diagnosis: "https://lnkd.in/dYGfeSmt" Which of these 4 did you not know before today? Drop it below 👇 #DatabaseIndexing #PostgreSQL #BackendEngineering #QueryOptimization #Dharmops #DevTools #SoftwareEngineering #SystemDesign #TechFounders #SaaS
Like Comment
To view or add a comment, sign in
Daniel Stock
5d
Report this post
A subtle mistake I’ve seen in many APIs isn’t about architecture… it’s about how we handle async code. Everything looks correct on the surface: • Async endpoints • Await everywhere • Clean structure But then performance starts degrading under load. Now picture this: • You have an API built with ASP.NET Core. • Your service calls the database using async methods from Entity Framework Core. So far, so good. But somewhere in the code, a small decision slips in: .𝗥𝗲𝘀𝘂𝗹𝘁 or .𝗪𝗮𝗶𝘁() It works locally. It even passes tests. But under real load, this becomes a problem. 𝗪𝗵𝘆? Because you’re blocking threads in a system designed to be non-blocking. Each blocked thread: • Holds resources longer than necessary • Reduces throughput • Increases latency for other requests In extreme cases, this leads to thread pool starvation. Now your “scalable” API starts behaving like a bottleneck. The fix is simple, but requires discipline: • Go fully async end-to-end • Never mix sync-over-async • Return Task all the way up the call stack • Use await consistently And this connects directly with your database layer: Whether you’re using SQL Server or PostgreSQL, async queries help free threads while waiting for I/O. But only if you don’t block them afterward. What makes this tricky is that the issue doesn’t show up immediately. It only appears when: • Traffic increases • Latency matters • Concurrency grows This is one of those small details that separates code that works from systems that scale. #DotNet #ASPNetCore #EntityFramework #AsyncProgramming #Performance #BackendDevelopment #Microservices #Cloud #SQLServer #PostgreSQL #SoftwareEngineering #Scalability

1 Comment
Like Comment
To view or add a comment, sign in
Bhaskar Purohit
2w
Report this post
Our API response time jumped from 120ms → 600ms overnight. No code deployed. No infra change. No incidents reported. Just... slower. Here’s how I debugged it in 40 minutes 👇 Step 1: Isolate the symptom CloudWatch showed the spike started at 11:42 PM. But here’s the interesting part: P95 latency spiked P50 stayed normal That usually means: Large payloads Heavy queries Edge-case traffic Not a full-system slowdown. Step 2: Eliminate the usual suspects I checked the obvious first: Lambda cold starts? ❌ Warm instances were also slow DB connection pool? ❌ Only 42% utilized External APIs? ❌ Not in the request path That narrowed it down to one likely culprit: ➡️ The database query itself. Step 3: Inspect the query plan Ran EXPLAIN ANALYZE on the main trade lookup query. Result: Sequential scan on 2.1M rows Estimated cost: 48,000 Index was no longer being chosen Why? As the table grew, PostgreSQL recalculated cost estimates and changed the execution plan automatically. Silent. Invisible. Expensive. Step 4: Fix it Added a composite index on: (user_id, created_at DESC) Immediately after: Query planner switched to Index Scan P95 dropped from 600ms → 89ms The real lesson Your system can break without deployments. Because performance bugs often come from: Data growth Query planner decisions Traffic shape changes Hidden thresholds EXPLAIN ANALYZE isn’t just an optimization tool. It’s a production survival tool. And if you’re not tracking P95 latency, you’re blind to what power users are experiencing. My takeaway As systems scale, the code may stay the same — but behavior changes. That’s where engineering gets interesting. Curious: what’s the sneakiest production bug you’ve debugged? Drop it in the comments 👇 (Real stories only — those are always the best lessons.) If this was useful, repost it so more engineers see it. #PostgreSQL #BackendEngineering #NodeJS #SystemDesign #SoftwareEngineering #Debugging #AWS #Performance #DevOps
Like Comment
To view or add a comment, sign in
Micha(el) Bladowski
3w
Report this post
🐘 PostgREST turns your #PostgreSQL #database into a fully #RESTful #API — instantly. No backend code needed 🚀 #PostgreSQL #REST #OpenSource #DevTools ⚡ What makes PostgREST special? Your database schema IS your API. Tables, views, stored procedures — everything becomes a RESTful endpoint automatically. No ORM, no controller code, no boilerplate. 🚀 Performance on a completely different level: Written in #Haskell with the Warp HTTP server. JSON gets serialized directly in SQL. Connection pooling via Hasql. Subsecond response times at 2000 req/sec — even on free tier hosting. 🔐 Security is built in, not bolted on: Auth runs via JWT. Authorization is controlled directly in the database through roles. One single declarative source of truth for security. The server assumes the identity of the authenticated user. 📖 Automatic API documentation: PostgREST generates #OpenAPI specs automatically from your schema. Hook up Swagger-UI → interactive, always up-to-date docs for your API. No more manual doc maintenance. 🏗️ Versioning through DB schemas: Instead of managing API versions in code, PostgREST uses #PostgreSQL schemas. Tables can change, the public API stays stable behind views. Clean separation, no brittle endpoints. 🛡️ Data integrity lives in the database: No imperative validation in app code. Constraints, foreign keys, check constraints — all declarative in PostgreSQL. No app can corrupt your data. PUT requests are enforced as idempotent. 📦 Installation is simple: #Docker image available. Native binaries for all platforms. Just run postgrest --help and get started. Current version: v14.4. 195 contributors strong. → https://lnkd.in/dRfwgGbQ
7 Comments
Like Comment
To view or add a comment, sign in
Muhammad Zeeshan
1w Edited
Report this post
Most developers skip database indexing until things get slow. By then, they've already shipped a performance problem to production. Here's the thing: a query that returns in 12ms on your local machine with 500 rows can take 4+ seconds in production with 500,000 rows. Same query. Same code. Completely different result. I hit this on a SaaS project. A dashboard endpoint was timing out for users with large datasets. The backend logic looked fine. The query looked fine. But there were no indexes on the columns being filtered and sorted. Adding two targeted indexes dropped response time from ~3.8 seconds to under 80ms. Here's what I check now before going to production: 1. Any column used in a WHERE clause — index it 2. Any column used in ORDER BY on large tables — index it 3. Foreign keys — almost always need an index 4. Columns used in JOIN conditions — same rule 5. Composite indexes when you filter by multiple columns together One thing people miss: an index on column A and a separate index on column B is not the same as a composite index on (A, B). The database uses them differently. You don't need to index everything. Over-indexing slows down writes. The goal is to index what your actual queries need. Performance problems are usually not about better code — they're about understanding what the database is actually doing. What's the worst performance issue you've found after going live? #PostgreSQL #BackendDevelopment #WebDevelopment #FullStack
Like Comment
To view or add a comment, sign in

1,065 followers

35 Posts

View Profile Follow

Optimize API Performance with Better Data Shaping

More Relevant Posts

Explore content categories