Evolution of web server architecture

Abhishek Nigam

Published Oct 19, 2022

In the beginning we had multi process architecture. Every request was processed by a different process. The processes could be long-lived (pooled) and reused across multiple requests. However at any given point of time one request would map to a single process. The idea was simple and enabled multiple cores to be utilized on a single machine without changing the code much. The benefits were requests were isolated so that if processing a single request caused a crash it did not affect the entire server. Each process had its own memory and if you needed to share then one could use shared memory but it had to be explicit.

Then came the multithreaded version of the web server. Instead of each request mapping to a process with its own memory space it would be mapped to a single thread and all the threads would share the memory space by default. If access needed to be restricted than it would require the method or function which would be accessed by multiple threads to setup monitors to avoid race conditions if need be. If you wanted data which was local to the thread you used stack and/or thread-local data explicitly to avoid sharing. Since we no longer had isolation and the processing of a single request could lead to shutting down the web server we needed multiple instances of the web server to ensure availability remained high. An orthogonal direction to manage sharing of data was using actors which was essentially a run-time for every object instance which would look at a queue of incoming requests and process them as opposed to using a function call to invoke object instance methods directly which could cause race conditions to appear on the caller of the object method. Maybe at some level this can be considered true multi-threaded encapsulation. Debugging issues in this model is made more complex by the high degree of sharing.

The next phase is the rise of the event-driven reactive servers like netty, node.js, nginx. Each thread runs a full web-server inside of it (listening on sockets, running business logic) and essentially fully taking up a core on the machine. In this model everything is shared including the thread. Each thread does not process an entire request from beginning to end but works on those parts of the request processing which consume CPU and keeps switching between different requests to ensure core is kept busy even though requests might need to make blocking calls to downstream services. The state machine regarding what needs to be done with each request needs to be kept in memory for each request since the call stack can't be used for this. We have even less isolation in this model and debugging gets progressively harder.

Similar thing is playing out in the cloud architecture. We started with virtual machines which were thought to be too heavyweight what with their own dedicated OSes and very high isolation and security. Since this was too heavyweight for a lot of use cases we moved to containers where only the user side of the kernel is running in an isolated manner in the container everything else is shared. Now code/processes owned by different entities could be running on the same operating system (shared socket buffers, disk buffers etc..) as one another for lesser isolation but more efficient use of resources. However, in this architecture as well containers could be sitting idle without any requests to process so we moved to lambda architecture where the container is spawned only when there is a request associated with it (event-driven model).

If we consider that increased sharing as a way to achieve higher efficiencies and reduce costs (at the expense of increased complexity and developer overhead) is a long-term trend then it would follow that the next architecture would allow for sharing of containers/processing logic (build out services where requests from multiple different entities (users or companies as the case maybe) share the same container instance at the same time). This would mean that instead of users/companies sharing static read-only container images which is deployed separately by each of them each container would expose a service by default which could be invoked by anyone safely leading to fewer wasted resources. Scaling and isolation of the storage attached to the consumer will need to be tackled by a robust framework similar to container technology.

Curious to see what the next computing model would look like which could improve efficiency and achieve more with lesser amount of resources.

To view or add a comment, sign in

Evolution of web server architecture

Abhishek Nigam

More articles by Abhishek Nigam

Explore content categories

More articles by Abhishek Nigam

Scaling logging

Common anti-pattern in code

Explore content categories