FastAPI Performance: Identify and Fix Blocking Calls

1mo

⚡ Your FastAPI isn't fast. Here's why. FastAPI gives you async superpowers. But ONE blocking call can stall every single concurrent request. Here's the cheat sheet I wish I had before going to production 👇 🔴 The #1 Killer: Blocking the Event Loop These look innocent. They're not: • Loading ML models from disk • time.sleep() instead of asyncio.sleep() • Synchronous DB calls • Heavy Pandas/NumPy computation ❌ BLOCKING - Kills everything: result = heavy_model.predict(data) ✅ NON-BLOCKING - Event loop stays free: result = await asyncio.to_thread(heavy_model.predict, data) Fix: Offload to a thread pool. 🔍 How to FIND Blocking Calls (the part nobody teaches) Add this to your middleware: blocking_time = wall_time - event_loop_time If blocking_time > 10ms → you have a problem. Expose it as a response header during development: X-Blocking-Time-Ms: 47.23 You'll be shocked what you find. 🏗️ The 5-Point Scalability Checklist Before you ship, verify: ✅ Load models ONCE at startup (lifespan pattern) ✅ Use asyncio.to_thread() for CPU-heavy work ✅ Connection pooling for every external call ✅ Async libraries only (httpx, asyncpg, aiofiles) ✅ Load test with 50+ concurrent users If you can't check all 5 → don't deploy yet. 💡 The Golden Rule FastAPI is async by default. Your code probably isn't. Find the blocking calls. Fix them. That's the whole game. Load test. Watch p99 latency. If it spikes under concurrency — you're blocking somewhere. The framework is fast. The question is: is your code? ♻️ Repost if this saves someone a production incident. 💬 What's the sneakiest blocking call you've found? #FastAPI #Python #Backend #SystemDesign #SoftwareEngineering #AsyncProgramming #Performance #API #MLOps

To view or add a comment, sign in

More Relevant Posts

Himanshu `
2w
Report this post
𝗙𝗮𝘀𝘁𝗔𝗣𝗜 𝗶𝘀𝗻'𝘁 𝗳𝗮𝘀𝘁 𝗯𝗲𝗰𝗮𝘂𝘀𝗲 𝗼𝗳 𝗙𝗮𝘀𝘁𝗔𝗣𝗜. 𝗜𝘁'𝘀 𝗳𝗮𝘀𝘁 𝗯𝗲𝗰𝗮𝘂𝘀𝗲 𝗼𝗳 𝘄𝗵𝗮𝘁'𝘀 𝘂𝗻𝗱𝗲𝗿𝗻𝗲𝗮𝘁𝗵. Most people stop at "FastAPI is faster than Flask." Few ask 𝘸𝘩𝘺. Here's what's actually happening: 𝗙𝗹𝗮𝘀𝗸 runs on 𝗪𝗦𝗚𝗜. One request = one thread = blocked until done. Your thread waits while the DB responds. It does nothing. Just sits there. 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 runs on 𝗔𝗦𝗚𝗜. One thread handles 𝘵𝘩𝘰𝘶𝘴𝘢𝘯𝘥𝘴 of connections. While one request waits for DB, the thread picks up another. No idle time. But FastAPI doesn't do this alone. The real stack: • 𝗨𝘃𝗶𝗰𝗼𝗿𝗻 — the ASGI server (built on uvloop) • 𝗦𝘁𝗮𝗿𝗹𝗲𝘁𝘁𝗲 — the async engine (handles requests, WebSockets, middleware) • 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 — the developer layer (validation, docs, type hints) Think of it this way: Starlette = 𝘵𝘩𝘦 𝘦𝘯𝘨𝘪𝘯𝘦. FastAPI = 𝘵𝘩𝘦 𝘥𝘢𝘴𝘩𝘣𝘰𝘢𝘳𝘥. Uvicorn = 𝘵𝘩𝘦 𝘧𝘶𝘦𝘭. Flask was built for a 𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗼𝘂𝘀 world. FastAPI was built for an 𝗮𝘀𝘆𝗻𝗰-𝗳𝗶𝗿𝘀𝘁 world. The speed difference isn't a feature. It's a 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 difference. Next time someone says "FastAPI is fast", ask them: 𝘐𝘴 𝘪𝘵 𝘍𝘢𝘴𝘵𝘈𝘗𝘐, 𝘰𝘳 𝘪𝘴 𝘪𝘵 𝘚𝘵𝘢𝘳𝘭𝘦𝘵𝘵𝘦? #FastAPI #Flask #Starlette #Python #AsyncProgramming #BackendEngineering #SystemDesign #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Adedoyin A.
3w
Report this post
🎯 Precision Engineering: Beyond Basic Queries "A great API doesn't just give you data—it gives you the right data, or a clear reason why it can't. 🛡️ Today I expanded my TodoApp by implementing Path Parameters. Moving beyond fetching 'all' records, I’ve added logic to retrieve specific tasks by their ID. Key technical highlights from this update: ✅ Input Validation: Used FastAPI’s Path to ensure only valid IDs (greater than 0) are processed. ✅ Robust Error Handling: Integrated HTTPException to return a clean 404 Not Found status if a user requests an ID that doesn't exist. ✅ Clean Code: Refactored using Annotated dependencies to keep the route handlers lean and readable. Building a backend isn't just about the 'Happy Path'—it's about handling every edge case with precision. Next: Implementing POST requests to allow users to create their own tasks! 🚀" #FastAPI #Python #BackendDevelopment #WebAPI #CleanCode #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Pragati Khatri
2w
Report this post
End-to-End Project: Telecom Churn Prediction System I’ve recently completed the development and deployment of a customer churn prediction application. The project focuses on bridging the gap between a predictive model and a production-ready interface. Core Implementation: Machine Learning: Classification model developed using Scikit-Learn. Backend: REST API built with FastAPI for real-time inference. Frontend: Interactive dashboard designed with Jinja2 and HTML/CSS. Deployment: Hosted on Render via a GitHub-integrated CI/CD pipeline. The application is fully functional and optimized for low-latency predictions. Links: Live App: https://lnkd.in/gTM7RFsH GitHub: https://lnkd.in/gXcEm2Gc #DataScience #FastAPI #MachineLearning #Python #Deployment #Portfolio
2 Comments
Like Comment
To view or add a comment, sign in
Anish Mohapatra
1mo
Report this post
Built something I've been wanting to exist for a while — a storage health monitor that actually tells you *why* your drive might fail, not just that it's at 80% health. Instead of a vague percentage, it surfaces the actual reasons — things like high sustained temperature, dropping spare block capacity, or a spike in media errors. Stuff that matters, explained plainly. Here's what's under the hood 🧵 → The FastAPI backend calls smartmontools to read raw S.M.A.R.T data straight from your drive's firmware — temperature, power-on hours, reallocated sectors, media errors. Real numbers, not estimates. → That data gets fed into a Random Forest classifier I trained on the Backblaze Hard Drive Dataset — 6M+ rows of real drive telemetry with actual failure labels. The model outputs a failure probability, which shows up as an animated risk gauge on the React frontend. → NVMe and SATA drives are handled differently. NVMe drives don't have the same SMART attributes as SATA, and public NVMe failure data barely exists. So for NVMe I built heuristics based on what the NVMe spec and manufacturers actually define as end-of-life indicators — available spare capacity, percentage used, media errors. → Auto-refreshes every 30 seconds, detects all connected drives automatically, rate-limited endpoints. One thing I learned building this — the hardest part of applied ML isn't the model. It's the data. You either wait and collect your own, or you rely on what the drive's firmware tells you. Stack: React + Vite · FastAPI · scikit-learn · smartmontools · psutil Still a lot to build — Tauri desktop app, React Native mobile companion, and a database to log history so predictions improve over time. GitHub link in the comments 👇 https://lnkd.in/gQ3TYzTW #MachineLearning #React #FastAPI #Python #OpenSource #SideProject
Like Comment
To view or add a comment, sign in
Vito Botta
3w
Report this post
Just come across this. Another AI agent framework CVE, another reminder that "run this arbitrary code" features are attack surfaces. PraisonAI's run_python() used shell=True with incomplete escaping - $() and backticks slipped right through. The real kicker? The default Flask server ships with AUTH_ENABLED=False, turning a "local" vulnerability into something reachable over the network. AI agents with code execution need the same scrutiny we give to eval() and exec() in web apps.
Like Comment
To view or add a comment, sign in
SynapseKit AI

13 followers
3w
Report this post
📣 We added multi-turn memory to RAG across three frameworks. The LoC gap is the widest in the entire benchmark series. SynapseKit: 6 lines. One constructor argument. memory_window=5 and you're done. LlamaIndex: 9 lines. Token-budget buffer — more predictable prompt sizes than turn-count windows. LangChain: 17 lines. Session store, LCEL wiring, explicit config on every invocation. That's not the interesting part though. The persistence story is what actually matters for production. → SynapseKit — in-memory only. Session ends, history gone. → LlamaIndex — JSON file. Lightweight, no multi-user sessions. → LangChain — Redis, DynamoDB, Postgres. Swap backends with one import change. If you're building a multi-user app, LangChain is the only one that gives you proper session persistence out of the box. The 17 lines are the price of that flexibility. It's worth paying. The thing most engineers miss when adding memory: Memory and RAG compete for the same token budget. Most teams wire in memory and never adjust retrieval depth. Context grows. At some point something gets truncated — silently. The retrieved documents get cut first. The model starts answering from memory instead of documents. Retrieval quality degrades. The answers still sound coherent. Nobody notices until a user catches a hallucination. Do the maths before you hit the limit. Pick the framework that matches where your app needs to be in six months, not where it is today. Full benchmark + reproducible Kaggle notebook → engineersofai.com #Python #AI #LLM #RAG #MLEngineering #OpenSource #AIEngineering #EngineersOfAI #SynapseKit
Like Comment
To view or add a comment, sign in
Uttarkar Sai Nath Rao
3w
Report this post
Let’s talk about something fun and interesting I did quite a while ago. I optimized a keyword-driven query system, focusing on improving throughput and stability under constraints. The core problem: Maximize queries/hour while avoiding conflicts, throttling, and system instability. Key optimizations: • Parallel processing with controlled concurrency • Keyword-based query pipeline for structured input distribution • User-agent rotation to distribute request patterns • Retry + backoff mechanisms for handling transient failures • Idempotent execution to avoid duplicate processing One interesting tweak that made a noticeable difference: I introduced a keyword expansion strategy - combining each keyword with incremental alphabet variations (e.g., keyword + a, keyword + b, ...). This helped: • Increase result coverage without changing the core keyword set • Avoid repetitive query patterns • Improve overall discovery efficiency per keyword After multiple iterations, the system stabilized at ~70 leads/hour from about ~15–20 leads/hour with consistent performance. This was one of the most interesting things I had worked on, may not be as flashy but interesting for sure that such a small change can have such a great impact! Curious to know your thoughts! #Optimizations #Python #Software #SaaS
Like Comment
To view or add a comment, sign in
KSPR Technologies

39 followers
2w
Report this post
We just launched JSON Craft - a free, AI-powered JSON toolkit that does something no other tool does. You know the pain. You're debugging an API response that's 500 lines deep. Nested objects inside arrays inside objects. You find the key you need, but now you have to manually figure out the path to access it. **data.response.users[0].address.coordinates.lat** You count brackets. You scroll up. You mess it up. You try again. We fixed this. Click any key in the tree view → instantly get the exact access path in 8 languages. JavaScript, Python, Java, C#, Ruby, PHP, Go - one click. Copy. Done. No more counting brackets. No more guessing. No more wasted time. And that's just one feature. JSON Craft also includes: 🔧 Editor — Format, validate, minify, and search with JSONPath 🔍 Diff — Side-by-side comparison with visual highlighting ⚡ Transform — Filter, sort, flatten, group, convert case — all in-browser 📊 Visualize — Tree graphs, tables, and charts — with fullscreen mode Oh, and when your JSON is broken? One click → AI fixes it for you. Completely free. No signup. No limits. No tracking. Your data never leaves your browser (except the optional AI fix). 🔗 Try it: https://lnkd.in/dtk4Qdnv Built by the team with love ❤️ at KSPR Technologies. #JSON #DeveloperTools #WebDev #API #FreeTools #AI #Programming #JavaScript #Python #SoftwareEngineering #DevEx #OpenSource #Productivity #KSPRTECH
Like Comment
To view or add a comment, sign in
Niels Verstappen
3w
Report this post
Most operational software I encounter wasn't built to talk to anything else. With FastAPI, you can build a lightweight API layer on top of almost any system, whether it's a database, a legacy application, or a third-party platform. Once that layer is in place, other systems can pull data from it, push data to it, or trigger actions automatically. The result isn't just a technical improvement. It means processes that used to require manual exports, emails back and forth, or someone running a report every morning can simply run on their own. The only thing required is a small Python application. Deployed, maintained, and adapted when business requirements change. No large dev team needed. How many manual actions does your most painful data process require? Drop a number below! D-Data #Python #FastAPI #DataEngineering #SoftwareEngineering #BusinessAutomation #APIIntegration
Like Comment
To view or add a comment, sign in
Vineeth Nambiar
2w Edited
Report this post
Built a local LLM benchmarking dashboard. Here's what it does and why it matters: 𝗪𝗵𝗮𝘁 𝗶𝘁 𝗱𝗼𝗲𝘀: Run any prompt → measure response time → compare answers side by side 𝗪𝗵𝘆 𝗱𝗼𝗲𝘀 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿? In real world scenarios, choosing the right model is one of the most important decisions you make. A model that is too slow will frustrate users. A model that is too small might give poor quality answers. This project teaches you how to make that decision using data — not guesswork. -Speed matters, a 30 second response time is unacceptable in most products -Quality matters, a fast model that gives wrong answers is useless -Cost matters, larger models use more compute and cost more to run -The right model depends on the use case, there is no single best model 𝗠𝗼𝗱𝗲𝗹𝘀 𝘁𝗲𝘀𝘁𝗲𝗱: llama3.2 (2GB) · phi4-mini (3GB) · qwen3.5:9b (6.6GB) All using Q4_K_M quantization 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁: Model size alone does not determine speed. Architecture, quantization format, and memory loading all play a role. Always benchmark on your own hardware — results vary by machine. 𝗧𝗲𝗰𝗵 𝘀𝘁𝗮𝗰𝗸: Python · Ollama · Streamlit Runs fully locally — zero API costs. GitHub: https://lnkd.in/drwCqcgJ #AIEngineering #LLM #Benchmarking #Ollama #Python #BuildingInPublic
Like Comment
To view or add a comment, sign in

645 followers

5 Posts

View Profile Connect

FastAPI Performance: Identify and Fix Blocking Calls

More Relevant Posts

Explore content categories