Concurrency & Threading in Python: How to Achieve Real Parallelism (Without the Usual Pitfalls)

Concurrency & Threading in Python: How to Achieve Real Parallelism (Without the Usual Pitfalls)

Concurrency is one of those topics every backend engineer thinks they understand until race conditions, idle workers, or mysterious performance bottlenecks show up in production.

Over time, I’ve learned that most concurrency problems aren’t caused by threads themselves, but by how work is claimed, scheduled, and coordinated.

This post breaks down practical techniques to achieve safe, scalable concurrency in Python, especially for I/O-bound systems without overengineering.

The Most Common Concurrency Anti-Pattern

A very typical design looks like this:

  • Fetch a batch of items
  • Split the batch across workers
  • Process items sequentially inside each worker
  • Wait for all workers
  • Sleep
  • Repeat

It feels parallel, but in practice:

  • Fast workers sit idle waiting for slow ones
  • Throughput is limited by the slowest task
  • Artificial sleep introduces unnecessary latency
  • Capacity is wasted even when work exists

Concurrency isn’t about batching. It’s about continuous flow.

The Key Idea: Persistent Workers, Not Batch Jobs

A far more effective model is persistent polling workers:

  • Each worker runs independently
  • It claims one unit of work
  • Processes it
  • Immediately asks for more
  • Sleeps only when there is no work

This creates a system where:

  • Workers never wait for each other
  • Slow tasks don’t block fast ones
  • Throughput scales naturally with worker count
  • Latency drops dramatically

Think of workers as always-on consumers, not scheduled batch processors.


The Real Problem: Race Conditions

As soon as multiple threads or processes fetch work from a shared database, race conditions appear:

Two workers select the same rows. Both believe they own the same task. The same work gets processed twice

This isn’t a threading bug; it’s a data ownership problem.

The Solution: Atomic Work Claiming

Instead of:

  1. Selecting rows
  2. Updating them later

You must claim work atomically.


In PostgreSQL, the most powerful (and underused) tool for this is:

FOR UPDATE SKIP LOCKED

What it gives you:

  • Row-level locking
  • No blocking between workers
  • Each worker gets different rows
  • Zero application-level locking logic

Multiple threads or even multiple services can safely run the same query concurrently without collisions.

This single technique eliminates:

  • Duplicate processing
  • Race conditions
  • Complex mutex logic in application code


“But Python Has the GIL…”

Yes—and it matters far less than people think.

The Global Interpreter Lock only blocks CPU-bound Python bytecode.

Most real-world backend systems are:

  • Database heavy
  • Network heavy
  • API-call heavy
  • File-system heavy

All of these release the GIL.

If your workload is:

  • 95% I/O
  • 5% Python logic

Then threading gives you near-real parallelism with far less complexity than multiprocessing.

When threading is a great choice

  • Database queries
  • HTTP calls
  • External APIs
  • Message queues

When it’s not

  • Heavy computation
  • Data science workloads
  • Tight numeric loops

Why Batch Size = 1 Often Wins

Counterintuitive but true:

Claiming one task at a time per worker usually outperforms batch processing.

Why?

  • No task waits behind a slow sibling
  • Workers rebalance automatically
  • Long-running tasks don’t stall short ones
  • Throughput stays high under uneven workloads

The cost (more DB round-trip tickets) is usually negligible compared to the gains.

Sleep Only When Idle

Another subtle performance killer is sleeping after every cycle.

Bad pattern:

  • Process work
  • Sleep
  • Miss available work

Better pattern:

  • Process work
  • Immediately check again
  • Sleep only when no work exists

This alone can reduce latency by seconds under load.

Connection Pools Are Non-Negotiable

Threading without a database connection pool is asking for trouble.

A proper pool:

  • Reuses connections efficiently
  • Enforces upper limits
  • Prevents connection exhaustion
  • Keeps memory usage predictable

Each thread borrows a connection briefly and returns it no shared connections, no leaks.

Thread Safety Isn’t About Threads

The biggest takeaway:

Thread safety is mostly a data problem, not a threading problem If:

  • Work ownership is atomic
  • State transitions are clear
  • Database guarantees exclusivity

Then your application code becomes dramatically simpler.

No global locks, no complex coordination, no fragile in-memory state

Just workers doing work.

Key takeaways:

  • True concurrency comes from continuous work claiming, not batching
  • Databases can coordinate concurrency better than application code
  • FOR UPDATE SKIP LOCKED is one of PostgreSQL’s most powerful features
  • Python threading is excellent for I/O-bound systems
  • Small design choices (batch size, sleep placement) have a huge impact
  • Simpler concurrency models are usually more scalable and reliable

To view or add a comment, sign in

More articles by ALI AHMAD

  • Deep Python Concepts.

    Before Discussing interpreter, we should be aware about the compiler and interpreter What is a compiler? A compiler is…

    10 Comments
  • Customize User Authentication in Django.

    Django is a Python-based most popular free and open-source web framework that follows the model–template–views…

    3 Comments

Others also viewed

Explore content categories