High Availability of Processing Elements in Computer Systems
The high availability of data (specifically for distributed systems in web-scale organizations) is a hot research topic, and organizations are currently using multiple schemes for the same. Certainly, the data is the black gold for these organizations so making appropriate investments makes perfect sense.
One of the less discussed topics is the high availability of processing elements. Most of the literature available advocates bringing up a new instance of processing element as soon as the current working one becomes unavailable. Through this article, I am trying to summarize some of the most commonly used methodologies I have encountered during my stint as a software developer.
Each method above has a tradeoff associated with it. From (1) to (4), the time to take over the active processing increases but the resource wastage (both CPU and memory) reduces.
A standby processing element taking over a previously active processing element sounds nice but what is it that a new processing element is taking over?
The answer is the “data”, the previously active processing element was working on. In some cases, it is the (stable) state of objects and in other (transaction-based systems), it is the list of (partially and/or completely) processed transactions. Whatever it is, this data needs to be available to the standby processing element when it takes over.
There are multiple ways in which this can be achieved. Following are a few alternatives:
Recommended by LinkedIn
Again, each method above has its positives and negatives. Note that alternative (2) is strictly applicable if both active and standby processing elements are expected to be on the same physical node but alternative (1) and (3) can be used on multi-node setups. Of course, the nodes in the multi-node setup could be even located on different continents.
The notification of the unavailability of the active processing element to the standby processing element is also important and something needs to be considered on equal footing.
Common schemes used are
In conclusion, we have a multitude of options available to cater to the high availability of an application, each with its tradeoffs. For each use case, we could employ one or a mix of these to achieve the required outcome.
Please provide your valuable feedback and comments if something needs to be added or modified in these methodologies.