Data-driven maintenance - simplified

Data-driven maintenance - simplified

Predictive maintenance is one of the top use cases of Industrial Internet of Things (IIoT). Recent developments in sensor, data storage and computing, and internet technologies have created a solid foundation for deploying more sophisticated maintenance strategies in manufacturing. This article is an attempt to structure and simplify the maintenance strategies’ landscape, and briefly describe the role of data and analytics in each of the strategies.

The goal of all maintenance strategies is the same: maintain and restore equipment to good health so that they continues to be productive. At the highest level, maintenance strategies come in two flavors – reactive and proactive.

Reactive maintenance - "revive the dead"

The central philosophy of the reactive maintenance strategy is: “If it ain’t broke, don’t fix it”.

This is the simplest and earliest form of maintenance strategy in which equipment is fixed only after it fails. If the cost of maintenance is much higher than the potential losses due to an unscheduled breakdown, this strategy is useful, even in the contemporary IIoT world!

Think about equipment that is operating in remote or inaccessible places. In the pre-IIoT world, perhaps there was no way of quickly knowing whether such an equipment is functioning or not; perhaps the symptoms of the equipment’s failure become visible only much after the failure has happened. In this case, IIoT technologies can help to remotely monitor the status of the equipment and trigger repairs soon after it has failed.

Proactive maintenance - "reduce risk"

All the other strategies of maintenance are proactive, i.e., maintenance activities are meant to be undertaken prior to failure. As you may have rightly guessed, if the cost of maintenance is less than the losses due to unscheduled breakdowns, proactive maintenance strategies are useful. Figure 1 shows the different types of proactive maintenance strategies – preventive, condition-based, predictive and prescriptive, on a typical PF curve.

No alt text provided for this image

Preventive maintenance - "be careful"

This is the most prevailing proactive maintenance strategy. Preventive maintenance came into focus with the advent of TPM.

The idea behind preventive maintenance strategy is to maintain, i.e., inspect, clean and service equipment at a predetermined frequency, in order to keep it in good health.

Since there is usually variation in time to failure of equipment, some equipment may get maintained before failures but for others preventive maintenance may come too late.

How many equipment are repaired on time, and how many are missed before failure will depend on the time period that is chosen for maintenance. How does one decide a good time period? This decision is data-driven! Using historical time to failure data, one can model time to failures using either parametric models such as a Weibull distribution or non-parametric models such as Cox regression to establish the probability of failure as a function of the time of operation. Once the probability of failure behavior is established, planning maintenance is down to the trade-off between cost of maintenance vs cost of unscheduled breakdowns. Figure 2 shows an example Weibull plot on the left, and the trade-off in the two costs on the right.

No alt text provided for this image

If a company has a well-functioning maintenance department, it probably uses such techniques to plan maintenance of key equipment already. The time to failure models mentioned above, however, can be constructed not just for individual parts or equipment but entire systems. A disadvantage of the classical time-based preventive maintenance strategy though is that some equipment is serviced even though they might be working just fine and have shown no signs of degradation, while for other equipment, servicing may come too late. This is depicted in Figure 1 too: the range of preventive maintenance extends from normal operation to beyond failures.

Let us look at the other more modern proactive strategies one by one.

Condition-based maintenance - "listen in"

A piece of equipment or a system, before it fails, usually shows some symptoms of impending failure. Sensors can be used to measure the condition, or presence of any failure symptoms. A classic example of this maintenance strategy is the detection of bearing degradation using vibration sensors. The vibration signal of bearings changes as they begin to wear out. Typically, the raw vibration signal from the bearing is processed further to calculate other derived measures, for e.g., rms amplitude. Maintenance intervention is recommended when the variation in the range of values or the pattern of values deviates from what is considered as normal. An example difference in vibration signals between healthy and faulty bearings is shown in Figure 3.

No alt text provided for this image
Provided it is accurate, condition-based maintenance is more efficient than preventive maintenance, in that reparation and restoration happens only after there is evidence of degradation of equipment.

When it comes to higher order complex equipment or at the process level, finding out the right signals to watch out for might be tricky but machine learning algorithms, specifically classification and anomaly detection are able to help here. Commercially available software for condition-based monitoring / maintenance make use of such algorithms.

Condition-based maintenance was practiced in the pre-IIoT era too but with a manual approach. Maintenance engineers, like physicians, examined the health of key equipment by using sensors to periodically collect condition data, and analyze it to diagnose equipment health. With the proliferation of sensors and algorithms, condition-based maintenance now can be done continuously, at very large scale and for very complex systems. Condition-based maintenance, thus, has been the biggest beneficiary of the IIoT revolution, and is being rapidly taken up in the industry.

Predictive maintenance - "look ahead"

While condition-based maintenance alerts us on when the equipment has degraded, predictive maintenance goes a step further. It provides an estimate of how much time is left before the equipment will fail in the future or alternatively for how much longer the equipment will continue to operate productively.

Very often the term “Remaining Useful Life” or RUL is used for this estimate. How is this estimate made? This is a challenging problem to solve and an area of active research. Machine learning algorithms are typically at work here. The algorithms create a model to predict the trajectory of equipment condition all the way until it’s failure. Strictly speaking, RUL models are valid only for continual degradation processes which is a reasonable assumption in many cases. Thus “infant mortality” of equipment or catastrophic failures cannot be predicted by such models.

There are two data-driven approaches for RUL modeling. The first approach is to simply predict the path of degradation that is available from condition-based monitoring data, further into the future, for example, using a forecasting model. The second more sophisticated approach, accounts for both the condition measurements and the variables that influence the condition, for e.g., process conditions, material properties, weather, etc. As one may expect, this approach is more robust, but it needs deeper knowledge of the root causes of degradation. For example, if higher concentration of particulate matter in the liquid phase is the root cause of degradation of a pump, this variable will be a part of the RUL model of that pump.

While root causes of failure of certain equipment types such as compressors and pumps are well known, it may at first appear, that this is an impossible task for complex higher-order systems. However, machine learning algorithms can sift through 100’s or 1000’s of sensor measurements and identify the most important variables influencing (or strictly speaking correlated with) equipment condition. A wide variety of algorithms ranging from simple regression to neural networks have been applied for RUL modeling. There is no silver-bullet algorithm, however, and a range of approaches may need to be attempted in order to create a suitable RUL model.

No alt text provided for this image

Figure 4 above shows an example approach to find RUL of a turbofan engine using a health-trajectory-similarity based approach. The health trajectory of an engine (shown in red) is calculated, based on sensor measurements. From historical data, engines with known time to failure, and similar trajectories as the engine of interest are identified. The failure time of the engine of interest, is then simply the average or the median of the failure time of comparable engines with historical data.

As you can see there is quite some uncertainty in the RUL model prediction and there is a good reason for that. A RUL model will predict equipment life based on the process conditions that it has observed to date, but the process conditions in the future may be significantly different than those in the past, and this simply cannot be accurately known, and taken into account, by the model in advance.

Predictive maintenance based on a RUL model with very large prediction interval or low accuracy is equivalent to preventive maintenance, as the predictions are likely to lead to too-early or too-late maintenance of some equipment.

Predictive maintenance like condition-based maintenance strategy is efficient because it helps us to do maintenance at a point only after the equipment is degraded. But it’s benefits go beyond that. Since predictive maintenance strategy provides an estimate of when the equipment is expected to fail, more accurate maintenance scheduling becomes possible. This enables planning of maintenance activities well in advance resulting in minimum disruption to operations and efficiency gains.

Prescriptive maintenance - "avoid trouble"

This maintenance strategy is one level higher in sophistication than predictive maintenance. Prescriptive maintenance strategy aims not only to flag current or future problems with equipment, but also to recommend solutions or changes in order to mitigate problems and extend the life of the equipment. A simple example could be that, based on the level of usage of a pump, a prescriptive maintenance model adjusts the time needed to replenish lubrication oil.

As you can imagine recommending robust solutions is possible only if equipment degradation routes and root causes are well understood. So, the journey to prescriptive maintenance strategy starts with predictive maintenance. Unsurprisingly, besides for the most well-understood equipment like rotating machinery, prescriptive maintenance is still in its infancy.


That was all! Hope this write-up provided a good overview of the different data-driven maintenance strategies. Thanks for reading it until the end!

***

Vidya is the founder of Decodexis consulting services, and a management consultant and a data science practitioner. He loves to help B2B industry clients solve their problems using data analytics. He has got deep experience with analytics both at the strategic and number-crunching levels, in operations and commercial functions.



 

Thank you for sharing! I was wondering if there were more failure modes in the figure 4? That could be one of the reasons that the variance of the RUL is high. An other reason I found in the type of model and measures in some experiments. For example failure of windings of a motor could occur suddenly, but with the right measure you can predict or monitor it very easy! Like you said, you have to understand the root causes. Therefore I believe that we will find a lot of new applications in the future for predictive or RUL algorithms:-)

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories