Concurrency & Threading in Python: How to Achieve Real Parallelism (Without the Usual Pitfalls)

ALI AHMAD

Published Feb 5, 2026

Concurrency is one of those topics every backend engineer thinks they understand until race conditions, idle workers, or mysterious performance bottlenecks show up in production.

Over time, I’ve learned that most concurrency problems aren’t caused by threads themselves, but by how work is claimed, scheduled, and coordinated.

This post breaks down practical techniques to achieve safe, scalable concurrency in Python, especially for I/O-bound systems without overengineering.

The Most Common Concurrency Anti-Pattern

A very typical design looks like this:

Fetch a batch of items
Split the batch across workers
Process items sequentially inside each worker
Wait for all workers
Sleep
Repeat

It feels parallel, but in practice:

Fast workers sit idle waiting for slow ones
Throughput is limited by the slowest task
Artificial sleep introduces unnecessary latency
Capacity is wasted even when work exists

Concurrency isn’t about batching. It’s about continuous flow.

The Key Idea: Persistent Workers, Not Batch Jobs

A far more effective model is persistent polling workers:

Each worker runs independently
It claims one unit of work
Processes it
Immediately asks for more
Sleeps only when there is no work

This creates a system where:

Workers never wait for each other
Slow tasks don’t block fast ones
Throughput scales naturally with worker count
Latency drops dramatically

Think of workers as always-on consumers, not scheduled batch processors.

The Real Problem: Race Conditions

As soon as multiple threads or processes fetch work from a shared database, race conditions appear:

Two workers select the same rows. Both believe they own the same task. The same work gets processed twice

This isn’t a threading bug; it’s a data ownership problem.

The Solution: Atomic Work Claiming

Instead of:

Selecting rows
Updating them later

You must claim work atomically.

In PostgreSQL, the most powerful (and underused) tool for this is:

FOR UPDATE SKIP LOCKED

What it gives you:

Row-level locking
No blocking between workers
Each worker gets different rows
Zero application-level locking logic

Multiple threads or even multiple services can safely run the same query concurrently without collisions.

This single technique eliminates:

Duplicate processing
Race conditions
Complex mutex logic in application code

“But Python Has the GIL…”

Yes—and it matters far less than people think.

The Global Interpreter Lock only blocks CPU-bound Python bytecode.

Most real-world backend systems are:

Recommended by LinkedIn

Understanding Concurrency: How Routines Work in…

Harish Thampy 11 months ago

Python Guru Series 🐍🐍🐍 - Part 2

Dinh Cong Phan 1 year ago

Why Python Leads in AI/ML While Go and Java Lag Behind

Kumud Raj 1 year ago

Database heavy
Network heavy
API-call heavy
File-system heavy

All of these release the GIL.

If your workload is:

95% I/O
5% Python logic

Then threading gives you near-real parallelism with far less complexity than multiprocessing.

When threading is a great choice

Database queries
HTTP calls
External APIs
Message queues

When it’s not

Heavy computation
Data science workloads
Tight numeric loops

Why Batch Size = 1 Often Wins

Counterintuitive but true:

Claiming one task at a time per worker usually outperforms batch processing.

Why?

No task waits behind a slow sibling
Workers rebalance automatically
Long-running tasks don’t stall short ones
Throughput stays high under uneven workloads

The cost (more DB round-trip tickets) is usually negligible compared to the gains.

Sleep Only When Idle

Another subtle performance killer is sleeping after every cycle.

Bad pattern:

Process work
Sleep
Miss available work

Better pattern:

Process work
Immediately check again
Sleep only when no work exists

This alone can reduce latency by seconds under load.

Connection Pools Are Non-Negotiable

Threading without a database connection pool is asking for trouble.

A proper pool:

Reuses connections efficiently
Enforces upper limits
Prevents connection exhaustion
Keeps memory usage predictable

Each thread borrows a connection briefly and returns it no shared connections, no leaks.

Thread Safety Isn’t About Threads

The biggest takeaway:

Thread safety is mostly a data problem, not a threading problem If:

Work ownership is atomic
State transitions are clear
Database guarantees exclusivity

Then your application code becomes dramatically simpler.

No global locks, no complex coordination, no fragile in-memory state

Just workers doing work.

Key takeaways:

True concurrency comes from continuous work claiming, not batching
Databases can coordinate concurrency better than application code
FOR UPDATE SKIP LOCKED is one of PostgreSQL’s most powerful features
Python threading is excellent for I/O-bound systems
Small design choices (batch size, sleep placement) have a huge impact
Simpler concurrency models are usually more scalable and reliable

To view or add a comment, sign in

Concurrency & Threading in Python: How to Achieve Real Parallelism (Without the Usual Pitfalls)

ALI AHMAD

The Most Common Concurrency Anti-Pattern

The Key Idea: Persistent Workers, Not Batch Jobs

The Real Problem: Race Conditions

The Solution: Atomic Work Claiming

“But Python Has the GIL…”

Recommended by LinkedIn

When threading is a great choice

When it’s not

Why Batch Size = 1 Often Wins

Sleep Only When Idle

Connection Pools Are Non-Negotiable

Thread Safety Isn’t About Threads

Key takeaways:

More articles by ALI AHMAD

Others also viewed

Python will be eradicated

Code as Data: analyzing your Python stack with RedBaron

Hybrid Systems: When to Use Rust for Performance and Python for Flexibility

"When Python Fails, What's the Next Step?"

Python For Software Engineers

How to Make Your Python Code 100x Faster with lru_cache

Bridging the Gap: Dynatrace and Python - The Observability Revolution

GIL and Parallelism in Python: Navigating Multithreading, Multiprocessing

Python Concurrency for Agentic Systems

Python vs Golang Memory Models

Explore content categories

The Most Common Concurrency Anti-Pattern

The Key Idea: Persistent Workers, Not Batch Jobs

The Real Problem: Race Conditions

The Solution: Atomic Work Claiming

“But Python Has the GIL…”

Recommended by LinkedIn

When threading is a great choice

When it’s not

Why Batch Size = 1 Often Wins

Sleep Only When Idle

Connection Pools Are Non-Negotiable

Thread Safety Isn’t About Threads

Key takeaways:

More articles by ALI AHMAD

Deep Python Concepts.

Customize User Authentication in Django.

Others also viewed

Python will be eradicated

Code as Data: analyzing your Python stack with RedBaron

Hybrid Systems: When to Use Rust for Performance and Python for Flexibility

"When Python Fails, What's the Next Step?"

Python For Software Engineers

How to Make Your Python Code 100x Faster with lru_cache

Bridging the Gap: Dynatrace and Python - The Observability Revolution

GIL and Parallelism in Python: Navigating Multithreading, Multiprocessing

Python Concurrency for Agentic Systems

Python vs Golang Memory Models

Explore content categories