When Python became multithreaded

When Python became multithreaded

The recent release of Python version 3.14 (oct 2025), and some of the new features related to multithreading performance, has served for me as an excuse to write this article around Python.

For various reasons, Python is a programming language that has been gaining popularity in recent years for many reasons:

  • Easy learning and syntax.
  • Large number of reusable libraries for almost any task.
  • Easy learning, multipurpose, extensive library catalog.
  • "Lightweight" runtime (or lighter compared to Java).

Since Python's launch, it has been widely used as "glue" language for tasks related to scripting, file management, and task automation.

Due to its ease of learning, Python began to be widely used by the scientific community (replacing FORTRAN), system administrators, and data scientists and (in 2025) has become one of the most popular languages.

The latter group has been one of the main drivers of Python's popularity. The other element that has contributed the most to the language's popularity has been AWS Lambda with its thon Runtime and the serverless paradigm.

Article content

However, like everything in life, Python also has several disadvantages:

  • It's an interpreted language, which can be slower to execute than compiled languages.
  • It's a dynamically typed language, which leads to runtime errors, slower performance, and a lower error detection capability at development/compilation time.
  • Not ideal for mobile applications.
  • Global Interpreter Lock (GIL). It's on this last point, the GIL, that I want to focus the rest of this article.

The GIL problem

The GIL (Global Interpreter Lock) is a limitation of the Python runtime in which only a single Python thread can execute the bytecode interpreter at a time, restricting the ability to truly leverage multithreading for parallelism.

The Python interpreter allows multithreading, however, due to the global limit on memory access per thread, programs that rely on parallel processing can suffer in performance.

The GIL was implemented as an easy way to manage memory and to simplify tasks like garbage collection. It is also useful for avoiding situations like race conditions. However, it brings other important problems, such as limiting actual parallelism:

This design decision/limitation wasn't important in Python's early days (1990s), when most computers were single-core and single-processor, but it is a major limiting factor in today's processors with 4, 8, 16, or more cores.

Interestingly, the GIL bears strong similarities to what was called the Linux kernel Big Kernel Lock from the same era, which forced many parts of the Linux kernel to be "single-processor" (https://en.wikipedia.org/wiki/Giant_lock))

Starting with Python 3.14 (and Python 3.13 in a partial manner), the CPython interpreter comes in three flavours:

  1. Standard
  2. Free-threading (FT) (or no-GIL or GIL free)
  3. Just-in-time (JIT).

The free-threading interpreter disables the Global Interpreter Lock (GIL), a change that promises to unlock great speed gains in multi-threaded applications.

The JIT interpreter includes an on-the-fly compiler to native code, which should, in theory, help portions of code that run multiple times get faster by having them compiled to native code only once.

For developers, this means in the case of no-GIL Python/Free-threading interpreter:

  • CPU-bound code can finally benefit from threading.
  • You can fully utilize all cores from within Python.
  • You no longer need to rewrite logic in C, Rust, or use multiprocessing for speed.

But what does it mean from a practical point of view?

Here a benchmark as a matter of sample running a multi-threaded Fibonacci test with four threads running the same calculation of the 40th Fibonacci. The four threads ran independently of each other (in Linux and macOS)

Article content

The JIT interpreter doesn't help (and wasn’t expect to help) in this scenario. But the free-threading interpreter is showing us how removing the GIL can help with running multiple threads that are CPU hungry.

In Python 3.13 the free-threading interpreter ran about 2.2x faster than the standard interpreter. In 3.14 the performance improvement is about 3.1x.

To be honest, this scenario is an optimal scenario for this testing: A purely computational multithreaded application with no input or output.

Final conclusions

Some final conclusions and considerations:

  • CPython 3.14 appears to be the fastest of all the CPythons.
  • Make test with you own code. The 3.14 free-threading interpreter is faster than the standard interpreter for CPU heavy multi-threaded applications, so It is worth a try if your application fits this use case. In non multi-threaded applications or single-threaded applications, the gains are minimal or nonexistent.
  • Future gains will depend on Python algorithms and libraries also being multithreaded. Until now, this wasn't a very important element, but with the elimination of the GIL and the performance gains seen previously, it will become an important factor when selecting which algorithms and libraries to use.
  • If you can't upgrade to 3.14 just yet, consider using a release since 3.11, as these are significantly faster than 3.10 and older. For information purposes, at the time of writing this article (Oct 2025), the maximum Python version on AWS Lambda is 3.13.

To view or add a comment, sign in

More articles by Juan Ignacio "Iñaki" Codoñer

Others also viewed

Explore content categories