Scalability: A deep-dive

Scalability: A deep-dive

I have been reading about scaling in distributed systems from various sources and I also covered about scalability as one part in my last article: Building Reliable, Scalable and Maintainable Applications | LinkedIn

Hence I planned to deep dive into the scalability aspects. The topics covered are,

  1. What is scalability and when is an application considered scalable ?
  2. Types of scalability and its pros & cons
  3. Determining right scalability approach for an application
  4. Bottlenecks that hurts the scalability of the system
  5. Improving and testing the scalability of our application


What is scalability and when is an application considered scalable ?

Scalability refers to an application's capacity to manage an increased workload without compromising its performance. The response time for a user request should remain consistent even when the number of concurrent requests increases, and the app's back-end infrastructure should be able to handle high traffic without experiencing latency or system failure.

What is latency and how to measure it

Latency refers to the time it takes for a system to respond to a user request. A system's scalability is determined by its ability to maintain low latency even with an increased workload.

Latency is measured as the time difference between a user's action and the system's response, which is divided into network and application latency.

Network latency refers to the time it takes for a data packet to travel from point A to point B in a network. To reduce network latency, businesses use CDNs to deploy servers closer to the end-users in edge locations.

Application latency, on the other hand, is the time it takes for the application to process a user request. To reduce application latency, stress and load tests can be run to identify bottlenecks that slow down the system. Additional measures can also be taken to optimize code, improve hardware and software infrastructure, and implement caching and other techniques.


Types of scalability and its pros & cons

visual representation of vertical and horizontal scaling
Diagramatic representation of vertical and horizontal scaling

Vertical, Horizontal and Cloud Elasticity Scaling

Vertical scaling means adding more power to our server. Let’s say our app is hosted by a server with 16 gigs of RAM. To handle the increased load, we now augment the RAM to 32 gigs. Here, we have vertically scaled the server.

Horizontal scaling involves adding more hardware resources, such as servers or data centers, to an existing resource pool to increase the system's computational power and handle increased traffic. This approach allows for infinite scalability and dynamic scaling in real-time as traffic fluctuates.

Cloud elastic scaling is the ability of the cloud to dynamically add or remove servers to the hardware resource pool in response to changes in traffic load on a website or application. This allows businesses to stretch or return to the original infrastructural computational capacity, on the fly, without having to manually add or remove servers.

Horizontal scaling pros:

  • Scalability: horizontal scaling allows for virtually unlimited growth and increased computational power as more hardware is added to the existing resource pool.
  • Availability: horizontal scaling increases availability by distributing the workload across multiple servers, reducing the risk of a single point of failure.
  • Cost-effective: horizontal scaling is more cost-effective than vertical scaling because it utilizes less expensive, commodity hardware that can be easily replaced or upgraded.
  • Dynamic scaling: horizontal scaling allows for dynamic scaling to match the changing traffic demands, ensuring optimal performance at all times.

Horizontal scaling cons:

  • Complexity: horizontal scaling can be more complex to set up and maintain than vertical scaling, requiring load balancers and other tools to distribute traffic and manage multiple servers.
  • Limited performance improvements: adding more servers may not always result in linear performance improvements due to factors such as network latency and synchronization issues.

Vertical scaling pros:

  • Simplicity: vertical scaling is simpler to set up and maintain than horizontal scaling as it involves adding more resources to the existing server.
  • Performance improvements: adding more resources to the existing server can result in significant performance improvements for applications with specific resource requirements.
  • Fewer synchronization issues: vertical scaling eliminates the need to synchronize data across multiple servers, reducing the likelihood of synchronization issues.

Vertical scaling cons:

  • Limited scalability: vertical scaling has a limited capacity for scalability as it is restricted by the maximum capacity of the existing server.
  • High cost: vertical scaling can be expensive as it often requires specialized hardware that can be costly to replace or upgrade.
  • Single point of failure: vertical scaling increases the risk of a single point of failure as all resources are concentrated in a single server.


Determining right scalability approach for an application

Determining the right scalability approach for your application depends on various factors such as application architecture, resource requirements, traffic patterns, and budget.

If your application is a monolithic architecture and has limited resource requirements, vertical scaling can be a good choice. However, if your application is a distributed architecture and requires more resources, horizontal scaling can be a better choice.

And If your application is following microservice architecture, then you can also consider scaling only a specific service based on the ballpark estimate you have in your mind.

You should also consider your traffic patterns. If your traffic is consistent, then vertical scaling can be a better choice. However, if your traffic is variable or unpredictable, then horizontal scaling can be a better choice.

Lastly, you should consider your budget. Vertical scaling can be more cost-effective for small applications, while horizontal scaling can be more cost-effective for larger applications.

In summary, to determine the right scalability approach for your application, you should consider factors such as application architecture, resource requirements, traffic patterns, and budget. You may also want to consult with experts in the field for advice and guidance.


Bottlenecks that hurts the scalability of the system

There are several bottlenecks that can affect the scalability of a system, including:

  1. CPU: The processing power of the CPU is a critical resource for any system. If the CPU is overloaded, it can result in slow response times and increased latency.
  2. Memory: Insufficient memory can result in frequent disk I/O operations, which can cause delays and slow down the system.
  3. Network: Network bottlenecks can result in increased latency and packet loss, which can cause slow response times and reduced throughput.
  4. Storage: Insufficient storage space can cause performance issues, especially if the system is performing frequent I/O operations.
  5. Database: Database bottlenecks can occur due to inefficient queries, locking, or indexing issues. These can lead to slow response times and reduced throughput.
  6. Architecture: The system architecture itself can be a bottleneck. A poorly designed application architecture can limit scalability and result in performance issues.
  7. No appropriate caching: Caching can be deployed at several layers of the application. It speeds up the response time by notches. A cache cuts down the overall load on the app, intercepting all the requests before they hit the origin servers.

Apart from the above mentioned reasons there are other bottlenecks such as Insufficient configuration and setup of load balancers, Adding business logic to the DB, Not picking the right DB, Poor performing code.

It's important to identify and address these bottlenecks to ensure the scalability of the system.


Improving and testing the scalability of our application

  1. Identify potential bottlenecks: Conduct a thorough analysis of the application's architecture to identify potential bottlenecks that may affect scalability, such as resource constraints, poor database design, or inefficient algorithms.
  2. Optimize code and architecture: Once bottlenecks are identified, optimize code and architecture to address them. This may involve re-architecting the system, optimizing database queries, or implementing caching mechanisms.
  3. Load testing: Conduct load testing to determine how the application handles high traffic loads. This can be done using load testing tools that simulate user traffic.
  4. Performance testing: Conduct performance testing to identify how well the application performs under various loads. This can be done using profiling tools that monitor the application's performance metrics.
  5. Capacity testing: Conduct capacity testing to determine the maximum capacity of the system. This can be done by gradually increasing the load on the system until it fails.
  6. Scaling tests: Conduct scaling tests to determine how well the system scales as additional resources are added. This can be done by gradually adding resources to the system and monitoring its performance.
  7. Continuously monitor and optimize: Continuously monitor the application's performance and optimize it as needed to ensure scalability as traffic and usage patterns change.


Reference and credits

  1. How to Improve and Test the Scalability of our Application? - Web Application and Software Architecture 101 (educative.io)
  2. How production engineers support global events on Facebook - Engineering at Meta (fb.com)
  3. “Millions scale” simulations. When you do things at million scale… | by Ashutosh Agrawal | Disney+ Hotstar
  4. Designing Data-Intensive Applications [Book] (oreilly.com)

To view or add a comment, sign in

More articles by Deepak Paramesh

Others also viewed

Explore content categories