Async IO in NodeJS

Deep dive in Nodejs Internals (Blocking, Non-blocking IO, select/poll/epoll, event loop)

We all know by now that Nodejs scales, but why and how? is the question asked by many to no satisfactory answer as there is a lack of satisfactory answers as most of the content available online does not cover the internal workings and provides misleading information. In this article, I will explain why I believe Nodejs is fast.

Understanding the need of a non-blocking IO

The server application is linked to an address and port, forming a socket. When attempting to send a request to this application, it establishes a connection. This connection is accessed through a file descriptor. Whenever a client sends data to the server, it uses the same address and port. The operating system then associates the data with the corresponding file descriptor and stores it in the kernel buffer. The application must now read this data and transfer it to a designated user memory area.

If you have sent a request to receive data, but the client has not yet written any data into the socket, what should you do? You will have to wait for the data to become available so that you can proceed with reading it. However, this waiting time is wasting your CPU’s valuable time, as there may be other important tasks that the CPU could have used that thread for.

In order to execute any task, a thread is required, whether it is waiting for data, writing to a file, or reading from something. Scaling an application using this synchronous model can be difficult, particularly in Nodejs where blocking the main thread can be detrimental to the application’s performance. If there are tons of requests that require reading or writing, the CPU will spend most of its time waiting, resulting in wasted resources.

A similar can be said when it comes to reading from a file. Even though there’s no waiting like how it was in-network calls to read/write the data, the thread actually will be blocked during the write/read of the data. Same with DNS resolution (DNS is a protocol that resolves domains and hostnames to network addresses) as it also is a blocking operation even though being a network request. It is so because many of frameworks and runtimes use existing OS implementations of DNS which is synchronous in nature and will block your thread. So just like file read, the DNS resolution is also a blocking operation.

Non-Blocking IO in Nodejs

Thus comes Nodejs’s Asynchronous Non-blocking IO and thread pool to the rescue.

What happens in the case of Socket IO?

If data is not yet available in the buffer to read there’s a possibility of blocking so Nodejs switches to non-blocking mode by using fcntl a system call. fcntl takes a socket and switches to non-blocking mode so that if data is not available it won't block the thread and the thread can move on and come back later i.e. polling.
If the server is polling for data on 100 connections. Each connection can be considered to have its own file descriptor. It will have to monitor 100 file descriptors to check if the data is available for reading.
To handle socket IO challenges, Nodejs utilizes OS utilities such as epoll in Linux and Kqueuein MacOSx .To efficiently monitor a large number of file descriptors, the solution is to use epoll(tells you which file descriptors have data available to read/write to). With async non-blocking IO, the thread doesn’t wait for the client to write data into the socket. Instead, epoll identifies which file descriptors have data available for reading or writing.
When data becomes available, it is read from the specific file descriptor and written to kernel memory. And from then on to the user dedicated memory.
When writing Nodejs code, we don’t directly read from the socket or connection. Instead, we listen to events that notify us when the data becomes available. Our callback function then executes with that data.

select/poll/epoll only tell you whether data is available, but you’d still have to use blocking system calls like read/write/recv/send to actually perform the IO.

Read More about How Select/Epoll/read works:

What happens in the case of File IO and DNS Resolution?

Non-blocking solves the problem for network calls but what about File IO or the DNS resolution? Well, thread pool comes in handy here.

As File IO and DNS resolution operations are synchronous in nature we use threads from the thread pool for these operations.
Some CPU-intensive libraries in Node such as the crypto library use the thread pool too.
So rather than performing that blocking operation on the main thread, we delegate that task to some other thread.

All this is implemented in the lib_uv library which Node uses.

More on select, poll, and epoll

Imagine you are a web server. Each time you receive a connection using the accept system call, you receive a new file descriptor that represents that connection. It is possible to have thousands of connections open simultaneously. In order to be aware of when people send you new data on these connections and be able to process and respond to them, you don’t want to constantly use up CPU time by repeatedly asking “are there updates now? how about now? how about now? how about now?“, instead we’d rather just ask the Linux kernel “hey, here are 100 file descriptors. Tell me when one of them is updated!”.

select, poll and epoll is how Nodejs achieves its speed under the hood and how it does is that when there’s a list of file descriptors using these sys calls Nodejs checks if anything has changed in those file descriptors. Along with that you pass how much time you going to wait and see if anything has changed. If something changes it notifies you right away. If not it waits till that timeout and checks again in next iteration of event loop. This timeout indicates how long you can wait in case nothing has changed.

With select and poll, the number of connections i.e. file descriptors grows the time it takes to poll on those connections increases linearly. But what epoll does differently is that it creates red black tree which is a self-balancing binary search tree. So as when starts adding file descriptors to epoll it self balances which allows you to search with logarithmic time (it won't grow as the number of fd grows). Below is the table comparing the performance for 100,000 monitoring operations which clearly shows epoll is the winner:

# operations  |  poll  |  select   | epoll
10            |   0.61 |    0.73   | 0.41
100           |   2.9  |    3.0    | 0.42
1000          |  35.0  |   35.0    | 0.53
10000         | 990.0  |  930.0    | 0.66

The epoll group of system calls (epoll_create, epoll_ctl, epoll_wait) allows the Linux kernel to monitor a list of file descriptors and receive updates on their activity.

Here are the instructions for using epoll:

Use epoll_create to inform the kernel that you will be using epoll. It will provide you with an ID.
Use epoll_ctl to inform the kernel about the file descriptors you want to receive updates about.
Use epoll_wait to wait for updates regarding the list of files you are interested in.

Select/poll/epoll only tells you whether data is available, but you’d still have to use blocking system calls like read/write/recv/send to perform the IO.

Async IO in NodeJS

Manik Mudholkar

Deep dive in Nodejs Internals (Blocking, Non-blocking IO, select/poll/epoll, event loop)

Understanding the need of a non-blocking IO

Non-Blocking IO in Nodejs

What happens in the case of Socket IO?

Recommended by LinkedIn

What happens in the case of File IO and DNS Resolution?

More on select, poll, and epoll

References

More articles by Manik Mudholkar

Others also viewed

All you need to know about NodeJS

Implementing a serverless API proxy in 10 minutes

A NodeJS and Nginx Photo Server

How requiring modules really works in NodeJS

A dive into NodeJS I/O

.NET Core and Azure - Used for the first time

NodeJS with RabbitMQ and AMQP for Distributed Work Queues

NodeJS Express and AWS Lambda Functions

Multithreading tests in NodeJS

Socket IO with NodeJS — with a complete example

Explore content categories

Deep dive in Nodejs Internals (Blocking, Non-blocking IO, select/poll/epoll, event loop)

Understanding the need of a non-blocking IO

Non-Blocking IO in Nodejs

What happens in the case of Socket IO?

Recommended by LinkedIn

What happens in the case of File IO and DNS Resolution?

More on select, poll, and epoll

References

More articles by Manik Mudholkar

Event Loop in NodeJS

AWS Migration #3: Migrate & Modernize

AWS Migration #2: Mobilize

AWS Migration #1: Assess

Others also viewed

All you need to know about NodeJS

Implementing a serverless API proxy in 10 minutes

A NodeJS and Nginx Photo Server

How requiring modules really works in NodeJS

A dive into NodeJS I/O

.NET Core and Azure - Used for the first time

NodeJS with RabbitMQ and AMQP for Distributed Work Queues

NodeJS Express and AWS Lambda Functions

Multithreading tests in NodeJS

Socket IO with NodeJS — with a complete example

Explore content categories