System availability formula -assumptions to remember
System availability, the probability that a system is working when one needs it to, directly affects system maintenance policy, efficiency and redundancy in the system design. The above formula is the most commonly used expression to describe and calculate system availability. The primary advantage of the formula is its simplicity. Unfortunately, however, it is frequently misunderstood that the formula itself is the actual definition for availability. It is not. The above formula is a result of a series of mathematical calculations based on the assumption of a single independent component, with exponential distribution for failure and repair in steady state conditions. In this post I shall give a step-by-step, overview of how the formula is derived.
Introduction
Assuming that S(B)=1 denotes a state (B), or a union of states (B) for which a given system is defined as operational, then System availability, A(t), is the probability that at time t, S(B)=1. System reliability, R(t) is the probability that up to time t, the system has not been in a state of failure, S(B)=0. Reliability can therefore be expressed as follows:
Reliability has a simple relation to a single component failure probability, F₁(t). Availability, however, presents a more complex problem. There are a number of different approaches to its solution. In this post I shall present the Markov equation approach.
Hazard function and system engineering
In the field of system engineering, hazard function h(x), describes the probability that an event (system failure) will occur at time t , given it had not already occurred up to that point in time. The hazard function (which also known as the failure rate, hazard rate, or force of mortality) h(x) is the ratio between the probability density function (PDF), f(t), and the survival function, S(t), 1-F(t). Where F(t), is the cumulative distribution function (CDF). The hazard function plays an important role in system engineering, since the collection of a dataset in system engineering is carried out by observing failures in systems as long as the system is operational.
Equation (2) is the formulation of Hazard definition using Bayes’ Theorem, where:
F(t) denotes the probability that an event (system failure) will occur during the interval from 0 until t.
1-F(t) denotes the probability that an event has not occurred until time t.
f(t)dt denotes the probability that an event will occur within a small interval dt at time t.
z(t)dt denotes the conditional probability that an event will occur within a small interval dt at t, given that the event has not occur up to that time.
The name hazard function was adopted for z(t) because it is related to failures. System failures are important events in system engineering, but they are not the only possible events considering the life history of a system. There are many other events, such as repairs, inspection, maintenance, changes in the operating state, and many more that may also be of a statistical nature.
Exponential distribution
The exponential distribution is a special case where the hazard is constant.
Thus, an exponential failure density corresponds to a constant hazard function that is independent of time or in other words, does not age. The constant hazard function is a consequence of the memoryless property of the exponential distribution: the distribution of the subject’s remaining survival time given that it has survived till time t does not depend on t. Put differently, the probability of death in a time interval [t, t + y] is independent of the starting point, t [1].
The exponential distribution is the most widely used distribution in system engineering applications. It is mainly because it is the simplest function to work with, not because it is a correct one.
Markovian processes
A Markov process is a stochastic process in which the future state of a system is independent of the past, given the present. In such models, the future state of a system is determined entirely by the present state of the system. If all the components in the system have an exponential distribution, then the system is Markovian and the future depends only upon the state in the present.
Equation (4) is known as the Markov equation [2]. It is a set of simultaneous, first order differential equations, derived as a special case of the n, single entry, states transport equation. The Markov equation has a simple physical interpretation. zᵢ represents the rate of leaving state i, thus zᵢPᵢ(t) represents the reduction of the population in this state. Likewise, the components follow from other states into state i. The rate of transfer from any state j into state i is zᵢⱼ, resulting in an increase of the population in state i of zᵢⱼPᵢ(t) per unit time. Markov chains have finite number of n stats, the probability of being in any stat, at any time is 1 (This is known as the normalization equation).
For the simple case of two states with exponential distributions, equation (4) takes a simple form:
Where λ is the hazard for failure, and is often called the failure rate, and μ is called the repair rate. This, with the normalization equation P₁(t)+P₀(t)=1, yields:
The solution to equation (6) with the boundary condition P₁(0)=1, yields:
The availability of a single component with two states contains two terms. The second term is a transient which goes to zero as time increases. The first term is called the steady state availability A∞, and noting that 1/λ is the Mean Time To Failure (MTTF) and 1/μ the Mean Time To Repair (MTTR), one obtains :
This is one of the most commonly used expressions to describe and calculate availability. The main advantage of equation (7) is its simplicity. Indeed, it is quite intuitive that many assume that the equation is the definition of availability. The formula for availability is true if and only if the failure probability density function as well as the repair PDF are exponential, and only for steady state (t → ∞).