Beyond the GIL: True Parallelism in Python with Subinterpreters (PEP 554) We're all familiar with the classic challenge of parallelizing CPU-bound workloads in Python. We've navigated through threads limited by the GIL, processes with serialization overhead, and asynciofor IO-bound tasks. But what if the next evolutionary step is already here, just waiting to be adopted? I'm talking about subinterpreters—isolated interpreters within a single process, introduced in Python 3.12. The per-interpreter GIL model means each subinterpreter has its own private GIL. This allows multiple threads to actually run simultaneously on different CPU cores, executing Python code, all within the same process and sharing some memory (e.g., for immutable objects). This isn't multiprocessing with its heavy pickle and memory separation via queues, nor is it GIL-bound multithreading. It's a fundamentally different concurrency model. So, why aren't we all rewriting our applications to use subinterpreters? The challenge lies in the fact that calling Py_NewInterpreter() from C is just the beginning. The core issue is state. Many popular C extensions, like numpy, are not yet "isolated" ready, as they rely on global static state. Until we have a robust and safe mechanism for data sharing between these isolated worlds (hint: it will likely resemble an actor model more than classic shared memory), this technology will remain niche. However, it represents a critical frontier for pushing Python's performance boundaries, demanding architectural decisions that go far beyond choosing between asyncio and multiprocessing. #Python #GIL #HighPerformanceComputing #HPC #PythonAdvanced #SoftwareArchitecture #Concurrency #Subinterpreters
Python 3.12 introduces subinterpreters for true parallelism
More Relevant Posts
-
Using async in Python DOES NOT always make your code run faster... Synchronous code is like this: it 𝘮𝘶𝘴𝘵 finish task A before it can even 𝘵𝘩𝘪𝘯𝘬 about starting task B. If your code makes a network request, it may wait for the next 2-3 seconds, your CPU basically sitting idle. This is what asyncio is for. It lets you "pause" a task that's waiting and immediately switch to another task ready to run. But here's the catch: asyncio 𝗺𝗮𝗸𝗲𝘀 𝘆𝗼𝘂𝗿 𝗰𝗼𝗱𝗲 𝗳𝗮𝘀𝘁𝗲𝗿 𝗼𝗻𝗹𝘆 𝘄𝗵𝗲𝗻 𝗶𝘁'𝘀 𝗜/𝗢-𝗯𝗼𝘂𝗻𝗱 (𝘄𝗮𝗶𝘁𝗶𝗻𝗴), 𝗻𝗼𝘁 𝘄𝗵𝗲𝗻 𝗶𝘁'𝘀 𝗖𝗣𝗨-𝗯𝗼𝘂𝗻𝗱 (𝗰𝗮𝗹𝗰𝘂𝗹𝗮𝘁𝗶𝗻𝗴). • I/O-Bound: Your program is waiting for something EXTERNAL. Like: Making different API calls Reading multiple files from a network drive. • CPU-Bound: Your program is actively calculating something. Like: Training a machine learning model. Running complex transformations on a large pandas DataFrame. Also, asyncio operates on a single CPU core. Hence, It provides concurrency but not true parallelism. So, if your bottleneck is the CPU, asyncio won't help. You're looking for parallelism. You need to leverage multiple CPU cores, and that’s a job for Python's multiprocessing library. #python #asyncio #concurrency #parallelism #softwarearchitecture #backend #apis
To view or add a comment, sign in
-
🚀 Python 3.13 Takes a Huge Step Forward: The GIL Can Now Be Turned Off One of the longest-standing limitations in Python has finally started to loosen its grip. With the release of Python 3.13, developers now have the option to run Python without the Global Interpreter Lock (GIL) — something many of us have been waiting on for years. For anyone who’s worked on CPU-intensive Python applications, the GIL has always been the invisible ceiling. It kept Python predictable and safe internally, but it also meant threads could never take full advantage of multi-core processors. That changes now. 👉 Python 3.13 introduces an experimental “no-GIL” mode, giving developers true multi-threaded execution without relying on multiprocessing or pushing performance-critical logic into C/C++. Here’s why this matters: ⚡ Real parallelism becomes possible Multiple threads can finally run Python code at the same time on different cores. 📈 Better performance for ML, data pipelines, and backend systems Tasks that were previously bottlenecked by the GIL can now scale across CPUs naturally. 🔧 Simpler concurrency models A lot of the complexity around avoiding the GIL simply disappears. Yes — it’s still early, and the no-GIL mode is optional for now. But this is the clearest signal yet of where Python is heading: 🔹 faster, 🔹 more scalable, 🔹 and far better suited for modern multi-core hardware. This update has the potential to reshape how we think about Python for high-performance work. 💬 What part of a no-GIL Python are you most excited about? #Python #Python3 #GIL #ParallelComputing #Concurrency #AI #SoftwareEngineering #ProgrammingTrends #PythonCommunity
To view or add a comment, sign in
-
-
🔥 Python 3.14 introduces true multithreading — the GIL is finally optional! After decades of limitation, Python 3.14 now ships with a free-threaded (no-GIL) build, officially enabling parallel execution of Python threads across multiple CPU cores. 🧠 What This Means The Global Interpreter Lock (GIL) prevented multiple threads from executing Python bytecode at the same time — effectively serializing all CPU-bound code. With the new build: • The GIL is removed • Reference counting is atomic & thread-safe • Multiple threads can run concurrently and in parallel ⚙️ Quick Example import threading def cpu_heavy(): sum(range(50_000_000)) threads = [threading.Thread(target=cpu_heavy) for _ in range(8)] for t in threads: t.start() for t in threads: t.join() 🧩 In Python ≤3.13 → Only one thread executes at a time 🚀 In Python 3.14 (no-GIL build) → All 8 threads run in true parallelism 📈 Key Takeaways • ✅ True Multithreading: Threads run on different cores • ⚙️ Optional Build: GIL build still exists; use the --disable-gil version for free-threaded mode • 🧩 Extension Update Needed: Libraries like NumPy and Pandas need thread-safe adjustments • ⚡ Performance: Slight single-thread overhead, massive gains for multi-core workloads 🔮 What’s Next This is officially Phase II of Python’s no-GIL adoption. The community will refine it further before it becomes the default build. Once that happens — Python moves into the same parallel performance league as Java and C++. 🐍 Python 3.14 isn’t just another release — it’s the start of Python’s true multithreading era. #Python #NoGIL #Python314 #Multithreading #Concurrency #AI #MachineLearning #ParallelComputing #Developers #Tech
To view or add a comment, sign in
-
🚀 𝐏𝐲𝐭𝐡𝐨𝐧 𝟑.𝟏𝟒 - 𝐓𝐡𝐞 𝐧𝐞𝐰 𝐅𝐫𝐞𝐞-𝐓𝐡𝐫𝐞𝐚𝐝𝐢𝐧𝐠 𝐦𝐨𝐝𝐞𝐥 𝐚𝐧𝐝 𝐰𝐡𝐲 𝐢𝐭 𝐦𝐚𝐭𝐭𝐞𝐫𝐬 🚀 If you’ve been writing Python for a while, you’ve probably bumped into the limitations of the Global Interpreter Lock (GIL). The GIL means that even on a multi-core machine, threads in one Python process can’t execute Python bytecode truly in parallel. Only one thread runs at a time ! With Python 3.14, the “𝒇𝒓𝒆𝒆-𝒕𝒉𝒓𝒆𝒂𝒅𝒆𝒅” or “no-GIL” build is officially supported. That means you can opt-into a version of CPython where the GIL is disabled and threads can truly run in parallel across multiple CPU cores. ⚠️𝐖𝐡𝐚𝐭’𝐬 𝐭𝐡𝐞 𝐆𝐈𝐋? In previous Python versions, the Global Interpreter Lock (GIL) ensured only one thread could really execute Python bytecode at a time, so even on multi‐core hardware, threads couldn’t fully run in parallel. 💡𝐖𝐡𝐚𝐭 𝐜𝐡𝐚𝐧𝐠𝐞𝐬 𝐰𝐢𝐭𝐡 𝐟𝐫𝐞𝐞-𝐭𝐡𝐫𝐞𝐚𝐝𝐢𝐧𝐠? - Threads can now truly run in parallel on multiple cores when using a free-threaded build of Python (python3.14t) - This opens up real gains for CPU-bound, multithreaded Python workloads. - Existing Python libraries written in thread-safe way, should work without modification and utilize all cores of CPU. - C extension that has not been explicitly marked as free-thread-safe, it will re-enable the GIL for the lifetime of that process. 🔍𝐁𝐨𝐭𝐭𝐨𝐦 𝐥𝐢𝐧𝐞 If your Python apps care about multi-core performance or threading, this update is worth watching (or even experimenting with). It’s a strong signal that Python is leveling up its concurrency game, and making it easier for developers to build more scalable, high-performance systems. #Python #Python314 #Concurrency #Multithreading #GIL #SoftwareEngineering #DevCommunity
To view or add a comment, sign in
-
-
🚀 Python 3.14: The GIL Is No Longer a Ceiling If you’ve been using Python for a while, you know the Global Interpreter Lock (GIL) has always been the silent gatekeeper stopping true multi-threaded performance in CPython. But now — with Python 3.14 — the GIL becomes optional. Yes, Python is finally stepping into the world of true parallel execution! 🧵 ✅ What This Unlocks - True parallel execution on multi-core CPUs - Major performance boost for CPU-bound workloads - More competitive performance for ML, simulation, and backend systems ⚠️ What to Watch Out For - Slight drop in single-threaded performance (~10%) - Compatibility challenges for existing libraries/extensions - Increased need for thread-safety awareness and synchronization - Not production-ready for all projects (still maturing) 🎯 My Takeaway This change marks one of Python’s biggest leaps in decades. It doesn’t mean every app will instantly become faster — but it opens the door for a new era of concurrency in Python. If your work involves CPU-heavy or parallel workloads, it’s time to start experimenting with the no-GIL build of Python 3.14. Measure, test, and see how your libraries evolve to support this exciting change. 📚 Credits & Worth-Read Articles A huge shoutout to these incredible authors and resources who’ve covered the topic brilliantly 👇 🧠 Hamilton's deep dive on hamy.xyz (https://lnkd.in/gcJWSGqd) 🧩 Python Cheatsheet article on breaking free from the GIL(https://lnkd.in/gXQS4-3A) 💡 PubNub Blog — Understanding Python’s Global Interpreter Lock(https://lnkd.in/gzrngCus) All are worth a read if you want to understand the technical depth and performance tradeoffs behind Python’s biggest change in years. What’s your take on this — Will removing the GIL finally make Python a true multi-threaded powerhouse?💭 #Python #GIL #Concurrency #Multithreading #Performance #BackendDevelopment #OpenSource #PythonCommunity
To view or add a comment, sign in
-
-
Antonio Cuni (20 years PyPy core) dropped SPy: a compiled Python variant designed for performance. 𝗧𝗵𝗲 𝗰𝗼𝗿𝗲 𝗶𝗱𝗲𝗮 Python's dynamism makes it fundamentally hard to optimize. Everything is mutable, dispatch is complex, and pointer chasing destroys cache locality. JITs help but introduce unpredictable performance cliffs. SPy takes a different approach: remove the dynamism that kills performance, but add new features that keep Python's expressiveness intact. 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗦𝗣𝘆 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 👉🏽 Import time vs runtime: The world freezes after imports. Modules and classes become immutable at runtime. 👉🏽 Redshifting: Blue expressions evaluate at compile time, red at runtime. It's partial evaluation on steroids. 👉🏽 @blue functions: Write metaprogramming code that runs during compilation. Like C++ templates, but debuggable with Python's interpreter. 👉🏽 Static dispatch: Operator lookup happens at compile time based on static types. The runtime overhead just vanishes. 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁? A raytracing example runs 200x faster than CPython. The compiler generates code comparable to C/Rust, with predictable performance. 𝗧𝗿𝗮𝗱𝗲-𝗼𝗳𝗳𝘀 SPy isn't 100% Python compatible (by design). It formalizes constraints that Python devs already follow in practice: stable types, immutable classes, minimal monkey-patching. But you get metaprogramming power back through , which feel surprisingly natural. The project is early stage but the ideas are solid. Worth watching if you care about Python performance without JIT unpredictability. #python #ai #opensource #llm
To view or add a comment, sign in
-
🚀 Python 3.14- Biggest Step Forward is here Python 3.14 arrives, and with it, the long-standing Global Interpreter Lock (GIL) finally disappears. For years, the GIL has quietly kept Python single-threaded. Even if you had 8 CPU cores, only one could run Python code at a time. That’s why multithreaded Python often didn’t run faster We worked around it using multiprocessing, native extensions, or async code (common in LLMS) Now we won’t have to. With Python 3.14, true parallelism in pure Python becomes real. ✅ Multiple threads can run at once across all CPU cores ✅ Async and threads can work together smoothly ✅ Heavy workloads like data prep, ML pipelines, and simulations can scale naturally In my next post, I’ll share real performance comparisons between Python 3.13 and 3.14, thread scaling, CPU utilization, and runtime differences. Follow me if you’d like to see how the no-GIL build performs in practice
To view or add a comment, sign in
-
Still relying on print statements to debug your Python code? There’s a much better way: clean, structured, multi-destination, non-blocking logging. Python’s built-in logging module has been around since 2002. And while it’s powerful, its age also means this: → outdated patterns → inconsistent conventions (PEP8 didn’t even exist when it was designed) → tutorials that often make things more confusing If logging has ever felt clunky, overly complex, or just not worth the effort—you’re definitely not alone. But here’s the good news: modern Python logging is simple once you understand the updated patterns. And it’s far more powerful than people realize. James Murphy (https://mcoding.io) created an excellent video that cuts through the legacy noise and shows how to use Python logging the right way in 2025. No old baggage—just clean, modern practices that scale. One highlight: he demonstrates how to write a custom JSONFormatter to produce clean JSON Lines (JSONL) logs—perfect for parsing, storing, and analyzing across multiple systems. If you’re ready to level-up your debugging and observability skills, this video is absolutely worth your time: https://lnkd.in/g2sNTx_m #SoftwareDevelopment #Python #Coding #Logging Image credits: frame captured from James Murphy’s video
To view or add a comment, sign in
-
-
Python’s 30-Year Limitation - Finally SOLVED! 🐍🔥 Python 3.14 💥 removes the Global Interpreter Lock (GIL) [optional: not completely removed] - unlocking TRUE parallelism across multiple CPU cores 🧠⚙️ Before 🧱: Threads blocked by GIL After ⚡: Threads running truly in parallel 💡 Example: # Before (Python ≤3.13) # Only one thread runs at a time 😩 import threading def work(): for _ in range(10**7): pass threads = [threading.Thread(target=work) for _ in range(4)] [t.start() for t in threads] [t.join() for t in threads] # ~1.2s runtime # After (Python 3.14 🚀) # All 4 threads use real cores 💪 # ~0.47s runtime 🎯 📈 Results: 3.4x faster, real concurrency, zero bottlenecks! 💬 Python just entered the multithreaded era! 🧩 Python Developer Community #Python #GIL #Multithreading #Performance #AI #Developers #Innovation #ParallelComputing 🚀🐍
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development