Consistency, Scalability and Performance (CSP) Theorem for an Application

No alt text provided for this image


Lets face it, everyone wants to do their job quickly and the software they use should be the last one that should hinder their progress. As architects these requirements do get into our design of the software, may it be a project or for a product that satisfies a business need.

As with all releases we can't satisfy all customers, while we could refer some of the known patterns like the CAP theorem, SOA, Micro Services, N-Tier Architect etc for most of the architectural issues on why we couldn't satisfy their need, we couldn't refer this important aspect which for some reason was not abstracted clearly till now. Here an attempt is made to explain why a compromise needs to be made when dealing with a business application.

Let's understand the known factors that are related to this Theorem.

Consistency

This refers to a business user wanting to see the latest record that he or she has persisted in the data store. So if a user saving an entity with some attributes and relationships, the user would expect to see this entity come back and to be shown on a User Interface (UI) with the latest record that he or she has saved. The same is expected if the user searches for the record.

Performance

This refers to how fast a user can perform a single operation to the entity. So if a user is saving an entity, searching for an entity or is trying to view the entity for some attributes and relationship, the expectation is that this atomic operation is super fast. Usually you would be pitted against a single user trying to do these operations on the application. So there is no complaining that there are others working on the application and so the performance is slow. Some of the architectural concepts like SOA, N-Tier come in handy to save you to explain the slowness, but remember you can't beat a single tier system.

Scalability

This refers to how the application performs in a consistent manner when there are N users (where N could be infinite) doing N concurrent operations on the application. This is often mistaken with performance for a single operation. So if N users are performing N Creates or Reads on N Entities the application should be predictable for these operations so that users do not feel random results, i.e one user is taking 1 second for an operation while the other is taking 5 seconds for the same operation. So an application can be slow in relative terms like it takes 3 seconds to create an entity but this continues to be the same 3 seconds when there are 1000 users doing these operations simultaneously in the application.

What does the Theorem Say?

As with the CAP theorem the CSP theorem states that one could develop an application only to satisfy any two sides of the triangle at the expense of the other side.

Consistency - Performance

If an application is intended for Consistency and Performance you would have to sacrifice on the scalability aspects of not introducing many layers, not having multiple data stores and so on so that you can get the best consistency and performance for the hardware provided. This usually works for applications of decent scale. The users need to be told that this was designed for X users and should expect scalability issues when these are above the recommendation.

Consistency - Scalability

If an application is intended for Consistency and Scalability, you would need to sacrifice the performance of a single operation. Users should be clearly told that we are making a change on Scalability so your individual per operation performance would be slower. Many a times Single and Two Tier Applications make this mistake of selling the Scalability release as a Performance release when they move to N-Tier architecture.

Performance - Scalability

This side of triangle is one of the most misunderstood side of the theorem. Without realizing what we intent to solve we might bring in the cache layer to supplement the Performance angle of the application when we are done with the Consistency and Scalability aspects of the application. So in some sense this is being invoked to solve all three sides in one place without clearly articulating to the user that you are compromising Consistency. So blunders are made to offer read on the cache and write on the actual data. Though this gives the illusion of consistency and performance in the short run, the amount of complexity required to make the reads and writes consistent might overwhelm the application being developed and can lead to catastrophic failures.

Ideal Condition

In an ideal condition all three sides can be satisfied if the Entity object that is being read to be shown to the user is the same object that is persisted in the stores when we write.

In real world the entity with attributes and relationships is broken down to individual elements when it is stored and the read has to do extra processing to display a combination of entities persisted in different objects or the write is written to a single object and the read has to get a subset of the object or related objects.

E.g If we have a customer record, the customer record is broken down into individual records for each attribute values when we persist in the data stores. If on the other hand if we persist the customer record as a single object along with its attributes and relationships references, when we read the customer record we would also like to show the related children the customer has associated with leading to extra reads on the related entities of the customer.

Is there a possible approach to an application that can incorporate all factors?

Before we could answer this question, we need to understand the variations of Read and Write?

Read with an Intent to Write

This refers to a user asking for a record with the intent to edit this record. At this time the user is expecting to see the latest or the consistent record at this point in time.

Read with an Intent to Read

This refers to a user (like an integration system user) asking for a record with the intent to view this record or pass on to another system. At this time the user is not the one editing and wants to see the record at a point in time. The user may not be happy but would be satisfied if an information proving the validity of the data at a point in time is given.

Write only

This refers to writing or persisting the current entity data namely its local values (its attributes and relationship references) in a data store.

Given these definitions, we now need to decide which users can be moved to which side of the triangle. Here a possible solution to solve two sides of the triangle is made.

No alt text provided for this image



The users who are reading with the intent to write can only be satisfied using the Consistency and Scalability side of the triangle.

The users who are reading with the intent to read can be satisfied by both Consistency and Scalability side as well as Scalability and Performance side. So if these users do not care for the latest record but they want the records fast then the obvious choice is to move them to the Scalability and Performance side of the triangle.

This does however introduce the challenge who would prepare the data for the Scalability and Performance side of the triangle. This would need to be incorporated by the application to have extra storage either in the form of disk or memory to store the copy for the "read with the intent to read" as well as extra processing power to prepare this data. So the customer would need to invest more to get these.

With these concepts, depending on the type of application that is being developed one should be able to clearly articulate to the customer on what they can expect from the application and what compromises they need to make to satisfy their needs.

Credits:

I would like to thank Gopal Ramakrishnan and Vishal Jariwala for helping me hash out the finer details of the theorem and living through some of these challenges.

Disclaimer

This is my personal view. The thoughts expressed here represent my own and not those of my employer.








To view or add a comment, sign in

More articles by Anton Alfred

Others also viewed

Explore content categories