CAP Theorem
- CAP stands for Consistency Availability & Partition Tolerance.
- What is availability? Any request sent to non-failed node should respond back with non error response.
- What is consistency? Every read request should respond back with most recent data or an error code.
- What is network partition tolerance? In the event of network partition, Ability of the system to continue to operate and serve the requests.
- What it means forfeiting the partition tolerance? As a result, the system’s behaviour will be undefined.
- What CAP theorem stats? In the event of network partition, A distributed database can achieve either consistency or availability. So, you will end up having CP or AP systems.
- It’s not possible to build CA systems. Simply because, in the event of network partition, Systems behaviour will be undefined. In such cases, we will not able to able to provide any guarantee on availability( remember any non-failed node should respond back with non error response).
- How to achieve strong consistency?
- In leader-follower replication model, Write requests will always be sent to leader and read requests can go to either leader or followers. If write requests are replicated to all it’s followers synchronously, then read requests served by any of the follower can provide strong read consistency. Alternatively, if write requests are replicated to all its follower asynchronously, then read requests served by any of the follower might return stale data resulting in eventual consistency model. Note: Synchronous replication can have a negative impact on latency (or response time) and availability.
- In leaderless replication model, coordinator can propagate the write requests to all N nodes and expect W nodes to acknowledge so that the further read request can read data from R nodes to reconcile and return the final outcome. This model will produce strong consistency if W+R > N where N is number of nodes in the cluster. It’s called quorum operation.
9. The following are out of the scope for this theorem,
- It considers network partition as the only failure. In reality, there can be more failures such as node failure, transient network failure etc.
- It focuses on distributed database, not single node database.
- It considers only 2 levels of consistency(strongly consistent & eventually consistent) while in reality there are various levels of consistency.