Python 3.14 Now Supports True Multithreading with FastAPI

This is bigger than it looks. First, Understand the Problem. You buy a powerful server with 10 CPU cores. You build a Python API. You deploy it. Python uses 1 core. The other 9 sit there. Idle. Doing nothing. You just paid for 10, got 1. This wasn't a bug. It was a design decision from the 1990s called the GIL — Global Interpreter Lock. A rule that said: only ONE thread runs at a time, no matter how many cores you have. Why did it exist? It made Python safer and simpler to build back then. Memory management was easier when only one thing ran at a time. It was a smart tradeoff — for 1991. For 2025? Not so much. Since Python couldn't use multiple cores in one process, the solution was: → Run 10 separate Python processes instead of 10 threads → Each process gets its own RAM, its own startup time, its own everything → 10 processes × 500MB RAM = 5GB just to use the machine you already paid for It worked. But it was expensive, wasteful, and messy. Teams switched to Go or Node.js specifically because of this. What Actually Changed ? 🔹 Python 3.13 (October 2024) → Free-threaded build introduced. Experimental. 🔹 Python 3.14 (2025) → Free-threaded officially supported. No longer experimental. Still optional. Note: The GIL hasn't been deleted forever. It's been made OPTIONAL. You choose to disable it. This was a deliberate, careful decision — the Python team didn't want to break the entire ecosystem overnight. FastAPI 0.136.0 now officially supports running on this free-threaded Python. So What Does This Actually Mean? Remember that 10-core machine? With free-threaded Python, FastAPI can now actually use those 10 cores — inside a single process — running threads in true parallel. Real benchmark numbers: → 5 threads on standard Python (with GIL): same speed as 1 thread. No improvement. → 5 threads on free-threaded Python (no GIL): 4.8x faster. In practical terms for your API: → Same traffic, fewer servers needed → Fewer servers = less RAM, less cost, less complexity → Response times improve under heavy load → Scaling becomes a choice, not a survival requirement ━━━ Who Should Pay Attention? ━━━ If you're building: 🔹 ML inference APIs — running a model on every request 🔹 Data processing endpoints — transforming, aggregating, scoring 🔹 Real-time pipelines — processing events as they come 🔹 Document parsing — PDFs, contracts, files at volume 🔹 Any API that actually computes something, not just fetches from a DB The GIL was also acting as an invisible safety net — it prevented two threads from touching the same data at the same time accidentally. Without it, if two threads modify the same variable simultaneously — you can get corrupted data or crashes. These bugs are hard to reproduce and painful to debug. The gains are real. But they require intentional adoption. If you're building Python APIs, this release deserves more than a scroll. Read the changelog. Test it. The ceiling just got raised. Thank you FastAPI

FastAPI

84,398 followers

FastAPI 0.136.0 officially supports: ✨ free-threaded Python 🐍 ✨ (this announcement has no GIL puns) Thanks Sofie, 🍓 Patrick, Nathan, Jonathan 🙌 https://lnkd.in/dvaUFh2F

GitHub - fastapi/fastapi: FastAPI framework, high performance, easy to learn, fast to code, ready for production github.com

To view or add a comment, sign in

More Relevant Posts

Amrith Niyogi
3d Edited
Report this post
Most Python code works. That’s not the problem. The problem is - most of it doesn’t scale past the person who wrote it. You’ve probably seen code like this: • full of comments explaining what’s happening • try/finally blocks everywhere • repeated logic for caching, logging, auth • functions doing 5 things at once It works. Until it doesn’t. The shift that changed how I think about Python: 👉 Stop writing logic 👉 Start using language-level patterns Once you start seeing it this way: • with replaces cleanup logic • decorators replace repeated behavior • generators replace unnecessary data structures • dunder methods make your objects feel native The result? Code that explains itself without comments, removes entire classes of bugs, and actually scales across teams. I wrote a deep dive on this - not surface-level tips, but how these patterns actually work, when to use them, and how they reshape your code. 👉 Read the full article: https://lnkd.in/g_9GZDRk Curious — what’s one Python concept that only clicked after real-world experience? For me, it was realizing generators aren’t about syntax - they’re about thinking in streams instead of collections. #Python #CleanCode #SoftwareEngineering #ScalableSystems #DesignPatterns #AdvancedPython #BackendDevelopment

The Pythonic Toolkit: A Deep Dive Into the Patterns That Separate Working Python from Great Python amrithniyogi.medium.com
Like Comment
To view or add a comment, sign in
Rajinder Verma
3w
Report this post
I had a Python UDF that was slow. Everyone told me to switch to a Pandas UDF. I switched. It got faster. I didn't stop there, which is where this gets interesting. I spent a weekend benchmarking the Arrow serialization overhead across different schema widths and batch sizes because I wanted to actually understand what I was paying for. Here is what I found. On a narrow schema, 4 columns, a Pandas UDF with default batch size of 10,000 records was 6.2x faster than the Python UDF. The serialization cost was trivial relative to the computation savings. On a wide schema, 180 columns, the Pandas UDF at default batch size was 2.1x faster. Still better. But the Arrow conversion was now a meaningful fraction of total execution time because converting 180 columns per batch is not free. When I dropped the batch size on the wide schema to 2,000 records, peak memory per conversion dropped and the job stopped spilling to disk on the executor with the largest partition. Total job time: 1.7x faster than the wide-schema default. A 23% improvement just from tuning spark.sql.execution.arrow.maxRecordsPerBatch. The configuration nobody sets: spark.sql.execution.arrow.pyspark.enabled=true. This is separate from Pandas UDFs. It accelerates toPandas() and createDataFrame() globally. Every time you collect to pandas interactively, you are either paying the Arrow overhead or the row-by-row serialization overhead. Arrow is always cheaper. It is not on by default in all environments. I set that flag. I set it in every cluster config I control. I set it so reflexively now that I had to think to remember whether it was a default or a choice. The point is not to memorize my numbers. Your schema is different. My point is that I ran the experiment and found a 23% improvement by changing one integer. You have not run the experiment. Run it. The number is different for your schema. Find it.
Like Comment
To view or add a comment, sign in
Abhishek Prasad
2w
Report this post
Every time you write obj.attribute in Python, something runs that most developers have never heard of. Day 05 of 30 -- Descriptors and Properties Advanced Python + Real Projects Series When Django marks a field dirty on assignment, when SQLAlchemy tracks changes for save(), when Pydantic validates on every write -- they all use the same mechanism. The descriptor protocol. A descriptor is any object defining get, set, or delete. When it sits as a class attribute, Python routes all attribute access through it -- before the instance dict is even touched. Today's Topic covers: Why descriptors exist and what problem @property alone cannot solve The full descriptor protocol -- get, set, delete, set_name Data vs non-data descriptors and why the difference controls lookup priority The 3-level attribute lookup chain Python follows on every access Annotated syntax -- from @property to a fully reusable Validated field Real e-commerce scenario -- auto type check, range validation, and audit logging on every field write across 30 model classes with 3 lines of code How Django, SQLAlchemy, Pydantic, and dataclasses all use this internally 4 mistakes including the infinite recursion trap that kills production apps Key insight: When you write user.email in Django, that single line triggers a descriptor that queues the change for save(). No magic. Just the descriptor protocol. #Python #PythonProgramming #Django #SQLAlchemy #SoftwareEngineering #BackendDevelopment #100DaysOfCode #LearnPython #PythonDeveloper #TechContent #DataEngineering #BuildInPublic #TechIndia #CleanCode #LinkedInCreator #PythonTutorial

1 Comment
Like Comment
To view or add a comment, sign in
Lucas Pereira
3w
Report this post
OrJSON looks like a small optimization. Until you realize how much time your API spends just serializing JSON. In many Python APIs, the bottleneck isn’t only the database or the LLM. Sometimes it’s the most invisible step: turning Python objects into JSON. What is OrJSON? A high-performance JSON library for Python, written in Rust. It replaces the default json module and focuses on one thing: speed. It: → serializes faster → deserializes faster → supports dataclass, datetime, numpy, UUID out of the box → returns bytes instead of str So what’s happening under the hood? The idea is simple: optimize the hottest path in your API. → less overhead per operation → less work per payload → faster UTF-8 writing And it shows. In its own benchmarks: → dumps() can be ~10x faster than json → loads() can be ~2x faster Where this actually matters: → large payloads → APIs returning a lot of JSON → RAG metadata, events, telemetry → long lists Now the part most people ignore: Trade-offs. → orjson.dumps() returns bytes, not str → no built-in file read/write helpers → not always a perfect drop-in replacement → holds the GIL during serialization So when should you use it? → large responses → heavy metadata → serialization shows up in profiling And when won’t it help? → DB is your bottleneck → LLM latency dominates → responses are small → network / I/O dominates OrJSON won’t magically make your API fast. But if serialization is on your hot path, it’s one of the highest ROI optimizations you can make.
Like Comment
To view or add a comment, sign in
Adam Johnson
1w
Report this post
✍️ New post detailing a workaround I came up with for “leak” beahviour from Python 3.14’s new garbage collection algorithm. Topical as Hugo van Kemenade recently announced that the algorithm will be reverted in the next patch release! https://lnkd.in/eYG8843h #Django #Python

Django: fixing a memory “leak” from Python 3.14’s incremental garbage collection adamj.eu
Like Comment
To view or add a comment, sign in
Bhavin Waghela
2w
Report this post
I nodded along in code reviews for months before I actually understood what half these Python terms meant. Nobody tells you this when you’re learning. You pick up the syntax, you get things working, and you quietly hope nobody asks you to explain what a decorator actually does or why the GIL exists. So here’s a honest breakdown of the Python terms most people pretend to know. Virtual environments are not optional extras. Every serious project uses them because without one, a single package install can quietly break three other projects you forgot you had. Decorators are functions that wrap other functions. That’s it. Every time you see @login_required in Django or @app.route in Flask, that’s a decorator doing its job in the background. The GIL is one of those things that sounds scary until you understand it. Python only lets one thread run at a time by keep memory safe. For I/O heavy work it barely matters. For CPU heavy computation you reach for multiprocessing instead. Generators are underused by most people who aren’t working with large data. The yield keyword lets you process values one at a time instead of loading everything into memory. Reading a 1GB file without crashing your machine is the classic example. List comprehensions are just a cleaner way to build lists. Faster, more readable, and they signal to anyone reviewing your code that you actually know Python. The interpreter vs compiler distinction explains why Python is slower than C but easier to debug. It runs line by line. Most production systems compensate with optimisations layered on top. Pickle lets you save Python objects to disk and reload them later. It’s used constantly in ML for saving models. The one rule is to never unpickle files from sources you don’t trust. It’s a real security risk that catches people off guard. Pip handles Python packages. Conda handles packages, Python versions and environments together. Use pip for web projects and Conda for data and ML work. Mixing them randomly is how you end up with a broken environment at the worst possible time. The gap between writing code that works and actually understanding what it’s doing is bigger than most people admit. Closing that gap is what separates someone who can code from someone who can engineer. Which of these did you have to quietly google after pretending you already knew it? Credits: this.girl.tech #Python #SoftwareEngineering #Developers #Programming #TechEducation #AI
Like Comment
To view or add a comment, sign in
qadrLabs

55 followers
5d
Report this post
We recently received requests from clients around Python, Machine Learning, and Flask. Before diving into the more complex topics, we decided to start with a solid foundation: building a simple CRUD REST API using Flask and SQLite. No ORMs. No extra dependencies. Just Flask, Python's built-in sqlite3, and Pytest. The goal is straightforward: understand how a REST API actually works from the ground up, before adding layers of abstraction on top of it. It is the kind of foundation that makes everything else easier to learn and easier to debug. In this tutorial, we cover: → Structuring a Flask project with the application factory pattern → Managing database connections per request using Flask's g object → Building five CRUD endpoints with proper validation and error handling → Writing a complete Pytest suite with isolated test databases This is the first article in what will become a series. JWT authentication is coming up next. Read the full tutorial here: https://lnkd.in/g_DwkMDk

Building a CRUD REST API with Flask and SQLite qadrlabs.com
Like Comment
To view or add a comment, sign in
Fimijoba Micheal Oladokun
2w
Report this post
Lambda functions are one of those Python features that look strange at first but become second nature once you understand when and why to use them. They are not meant to replace regular functions. They are meant to complement them — providing a clean, concise way to define simple operations right where you need them, without the overhead of a full function definition. Once you start seeing lambda in map(), filter(), sorted(), and pandas apply() and using it naturally yourself — your Python code becomes noticeably cleaner and more expressive. Start simple. Practice with sort keys and map() transformations. Then bring them into your data science workflow and watch how naturally they fit. Read the full post here: https://lnkd.in/ergQmrXP #Python #DataScience #Programming #Pandas #Analytics #DataEngineering

Python Lambda Function Explained With Examples https://codewithfimi.com
Like Comment
To view or add a comment, sign in
HOSTPERL WEB HOSTING SOLUTIONS

19 followers
1w
Report this post
Your Django app went from 200MB to 8GB RAM usage in three weeks. Memory leaks don't crash dramatically—they creep up slowly until your servers start swapping and alerts start screaming. This guide shows you how to profile Python applications in production using memory_profiler and tracemalloc without causing downtime or performance impact. Learn to catch circular references, global variable accumulation, and resource leaks before they kill your application. #Python #DevOps #PerformanceOptimization #Django Learn More: https://lnkd.in/eWe2bRhT

Python Application Memory Profiling: Find Memory Leaks in Production Without Downtime in 2026 hostperl.com
Like Comment
To view or add a comment, sign in
KARTHIKEYAN M
1w
Report this post
You might want to take a look at this, Everybody says NumPy is faster than Python List. But, How fast ?? Well I looked into it !! So, here is the overhead of every value you use in Python. Let's say you use a value 42. Here is the detailed overhead. ob_refcnt - 8 bytes (For garbage collection, if reference count is zero, then python just deletes it from RAM) ob_type - 8 bytes (For storing what datatype that value belongs to, here that's integer) ob_digit - 8 bytes (For the actual value - 42). Therefore, 24 bytes for each value. Let's take it a step further. Say you have a 4 values to store just [1, 2, 3, 4] and Let's compare Python list vs NumPy array. Python List : Stores Pointers not the actual values and Hence, you need a list of pointers first, each pointer in the "pointer list" points to the actual value that is scattered across different locations of RAM. So, in-order to store 4 elements. 4 x 8 = 32 (pointer list) 4 x 24 = 96 (actual values) Therefore, 32 + 96 = 128 Bytes. NumPy arrays : It's contiguous and also homogeneous. Also, we don't have pointers model. Here we store actual values next to each other. Thus, giving us a easy traversal using strides. 4 x 8 = 32 Bytes. NumPy can store raw values directly because it enforces a single dtype. Since every element is the same size, it can locate any element using simple math (base + index × itemsize) instead of pointers. Python lists allow mixed types and that's exactly what forces the pointer model. Note: I am only comparing the storage models here. Both Python lists and NumPy arrays have their own object overhead which I've intentionally left out to keep the comparison clean. Apart from storage models. There is another reason why NumPy is so powerful in numerical computations and that is vectorization Vectorization in NumPy : When you do np.sum(a), NumPy runs optimized C code across the entire array in one shot no Python interpreter involved. A Python loop hits interpreter overhead on every single element. That's the real reason NumPy can be 10-100x faster for numerical operations. There is reason why this guy is named as "Numerical Python" !!
Like Comment
To view or add a comment, sign in

1,588 followers

39 Posts

View Profile Connect

Python 3.14 Now Supports True Multithreading with FastAPI

More Relevant Posts

Explore content categories