Scaling Async Python: How the Kernel, Network Stack, and epoll Manage Massive Connections

Scaling Async Python: How the Kernel, Network Stack, and epoll Manage Massive Connections


In Blog 1, we have already discussed how Python’s asyncio, file descriptors (FDs), and event loops enable massive concurrency without threads.

In this post, we go a layer deeper — into the kernel and network stack — and see how a single NIC (Network Interface Card) can serve millions of requests per second, with help from epoll.

Let’s break it down.


1. The Kernel’s Role: The Real Network Engine

Your async Python code never deals with network packets directly.

Here’s who does the real work:

NIC (Network Interface Card)

  • Operates at the Data Link Layer (Layer 2)
  • Converts raw electrical or optical signals into Ethernet frames
  • Sends/receives frames to/from the wire
  • Transfers received data into memory (via DMA)

Kernel’s Network Stack (L3 & L4)

  • Implements the IP Layer (Layer 3) and TCP/UDP (Layer 4)
  • Validate/reassemble incoming IP/TCP packets
  • Handle TCP connection state, retransmissions, and ordering
  • Manage flow control and buffering
  • Route data to the correct socket (and FD)

Socket Buffer

  • Every open socket (e.g., from asyncio.open_connection) has a receive buffer in the kernel
  • The kernel places data here after TCP reassembly
  • Python's await reader.read() reads from this buffer

Key Insight: All packet processing, validation, and reassembly are done in the kernel, before your Python code even wakes up.


2. Enter epoll: Scalable Notification, Not Packet Processing

Let’s clear this up: epoll (on Linux) or kqueue (on BSD/macOS) does not handle or touch packets.

Instead, they solve a different problem:

“How do I wait for thousands of sockets and know exactly which one is ready to read/write — without wasting CPU?”

Here’s how epoll works:

✅ What epoll does:

  • Watches many file descriptors (FDs) simultaneously
  • Efficiently waits for I/O readiness (read/write/error)
  • Notifies your event loop only when there’s actual work (e.g., FD has data)

❌ What epoll does not:

  • Does not read data
  • Does not handle packets or connections

epoll Workflow:

  1. Python registers interest in an FD using epoll_ctl
  2. The event loop sleeps using epoll_wait
  3. When the kernel sees data ready (e.g., in the socket buffer), it wakes up your event loop
  4. Python resumes the coroutine waiting on that FD


epoll Event Types (Bitmasks)

Here are the main flags you might see:

| Event      | Meaning                                  |
| ---------- | ---------------------------------------- |
| `EPOLLIN`  | FD is readable (data available)          |
| `EPOLLOUT` | FD is writable (buffer space available)  |
| `EPOLLERR` | An error occurred                        |
| `EPOLLHUP` | Hang up (FD closed remotely)             |
| `EPOLLET`  | Edge-triggered mode (for advanced usage) |        
Python’s asyncio and selectors module handle all of this for you.

3. 🔥 How a Single NIC Handles Millions of Requests

It sounds unbelievable at first — millions of requests per second on one NIC?

Here’s how it’s done:

💡 NIC Hardware Capabilities:

  • Multiple queues with interrupt steering (RSS)
  • DMA to transfer data directly to RAM
  • Large hardware buffers
  • Interrupt coalescing to reduce overhead

⚙️ Kernel Network Stack Optimizations:

  • TCP segmentation offload (TSO) and checksum offload
  • Zero-copy and batch processing for faster user-kernel transitions
  • TCP Fast Open, reuseport, and more

⛓ Combined With epoll + asyncio:

  • You don’t spawn a thread per connection
  • Kernel buffers and waits
  • Async Python just reacts to readiness, not raw I/O


4. Visualizing the Journey from NIC to Coroutine

   [NIC Hardware]
         ↓
[Kernel Network Stack (TCP/IP)]
         ↓
[Socket Receive Buffer (linked to FD)]
         ↓
   epoll watches FD readiness
         ↓
Kernel triggers epoll_wait wake-up
         ↓
Asyncio event loop resumes coroutine
         ↓
Your Python code reads data        

Each connection is just:

  • 1 FD
  • Managed by epoll
  • Paired with a coroutine


5. Responsibilities Breakdown

LayerRoleNICHardware packet send/receive (Layer 2)KernelTCP/IP processing, buffering, connection stateepoll/kqueueNotify which FD is ready (no polling!)asyncioMap FDs to coroutines and resume themYour CodeHandles app logic after data is ready


✅ Final Thoughts

To summarize:

  • epoll is not magic — it’s a notification mechanism, not a packet handler
  • The kernel does the heavy lifting of buffering and TCP state
  • Asyncio + epoll = powerful user-space + kernel collaboration
  • One thread, one NIC = millions of concurrent sockets if designed properly

Understanding this stack helps you:

  • Debug async bottlenecks
  • Build scalable, non-blocking systems
  • Appreciate what your OS and hardware are doing for you


Until next time — keep your FDs clean, your loops tight, and your coroutines flying. And remember: by the time you await reader.read(), the kernel’s already done the heavy lifting.

To view or add a comment, sign in

More articles by Er. Jiwan Gharti

  • Scaling Async Python: Visualizing File Descriptor (FD) Management in asyncio

    As a Senior Python developer, you've likely used asyncio, aiohttp, or Socket.IO, but have you ever wondered how these…

  • Creational Design Pattern

    Creational design patterns are like toolkits for creating objects efficiently, enabling flexibility and reusability in…

  • Design pattern

    Design patterns typically describe the relationships and interactions between classes or objects, emphasizing the best…

  • Design Pattern:

    In the realm of software development, the design phase plays a crucial role in laying the foundation for a successful…

Others also viewed

Explore content categories