When Python became multithreaded
The recent release of Python version 3.14 (oct 2025), and some of the new features related to multithreading performance, has served for me as an excuse to write this article around Python.
For various reasons, Python is a programming language that has been gaining popularity in recent years for many reasons:
Since Python's launch, it has been widely used as "glue" language for tasks related to scripting, file management, and task automation.
Due to its ease of learning, Python began to be widely used by the scientific community (replacing FORTRAN), system administrators, and data scientists and (in 2025) has become one of the most popular languages.
The latter group has been one of the main drivers of Python's popularity. The other element that has contributed the most to the language's popularity has been AWS Lambda with its thon Runtime and the serverless paradigm.
However, like everything in life, Python also has several disadvantages:
The GIL problem
The GIL (Global Interpreter Lock) is a limitation of the Python runtime in which only a single Python thread can execute the bytecode interpreter at a time, restricting the ability to truly leverage multithreading for parallelism.
The Python interpreter allows multithreading, however, due to the global limit on memory access per thread, programs that rely on parallel processing can suffer in performance.
The GIL was implemented as an easy way to manage memory and to simplify tasks like garbage collection. It is also useful for avoiding situations like race conditions. However, it brings other important problems, such as limiting actual parallelism:
This design decision/limitation wasn't important in Python's early days (1990s), when most computers were single-core and single-processor, but it is a major limiting factor in today's processors with 4, 8, 16, or more cores.
Interestingly, the GIL bears strong similarities to what was called the Linux kernel Big Kernel Lock from the same era, which forced many parts of the Linux kernel to be "single-processor" (https://en.wikipedia.org/wiki/Giant_lock))
Recommended by LinkedIn
Starting with Python 3.14 (and Python 3.13 in a partial manner), the CPython interpreter comes in three flavours:
The free-threading interpreter disables the Global Interpreter Lock (GIL), a change that promises to unlock great speed gains in multi-threaded applications.
The JIT interpreter includes an on-the-fly compiler to native code, which should, in theory, help portions of code that run multiple times get faster by having them compiled to native code only once.
For developers, this means in the case of no-GIL Python/Free-threading interpreter:
But what does it mean from a practical point of view?
Here a benchmark as a matter of sample running a multi-threaded Fibonacci test with four threads running the same calculation of the 40th Fibonacci. The four threads ran independently of each other (in Linux and macOS)
The JIT interpreter doesn't help (and wasn’t expect to help) in this scenario. But the free-threading interpreter is showing us how removing the GIL can help with running multiple threads that are CPU hungry.
In Python 3.13 the free-threading interpreter ran about 2.2x faster than the standard interpreter. In 3.14 the performance improvement is about 3.1x.
To be honest, this scenario is an optimal scenario for this testing: A purely computational multithreaded application with no input or output.
Final conclusions
Some final conclusions and considerations: