Software High Availability In A Nutshell - Part 1(What is High Availability)

Software High Availability In A Nutshell - Part 1(What is High Availability)

What is High Availability ?

The easiest way to define software high availability from my point of view is the ability to provide continues service using a cluster of machines which can provide this service.

While this dentition might seems very short regarding to the fact that high availability is a huge subject it is the most essential requirement I would expect from any high availability system.

How Do We Define Continues Service ?

Continues service is actually a pretty loose term,

Let's take a look at the following case.

We have a database which provide service for a computing lab, one day the database crashed. Someone was called and the database was up an running after 1 hour, later on the database worked without any problem for 1 year.

Was this service continues ?

 Well depends on how continues is perceived...I found 10 detentions of what  is continues at the free dictionary web site

Each one of those definitions is not complete , but one thing can be derived from all the definitions together  is the ability to provide service for a long time and the interrupt time is not taken into consideration when defining "continues".

In this case the service from my point of view was contentious because after an interrupt of one hour it continued to work for additional 8765 hours, which means that the service was down only 0.01% of the entire year which sounds pretty good.

So a continues service can be considered continues as long as the downtime due to unpredictable interrupt is acceptable.  

But Was It A High Availability System?

Defiantly not, It required human intervention to provide continues service and there was a single machine and not a cluster of machines. 

A true High Availability system would have had a backup database waiting to take control and to continue the service in a very short time.

For example let's say that it would have take such a system 10 min to continue and provide the service and later on continue to work for 1 year.

This meas that the service was down for 10 out of 525948 minutes in a year which meas that the service was down for 0.002% of the year which sounds amazing.

Still determining that this service was continues depends on how much downtime of the service is considered acceptable.  

No Down Time At All 

To my opinion there is no such thing as no downtime at all. No matter how much safely mechanisms are in use there is always a chance for an unplanned interrupt.

All that can be done is to try and reduce the chance for such cases by planning a good High Availability System and Software.   

 

In my next post I will go trough the basic concepts of Software High Availability , hope you enjoined this one.

To view or add a comment, sign in

More articles by Tal Engle-Potlog

Others also viewed

Explore content categories