When Instruments Fail: Understanding Failure Modes Beyond the Device
In the process industries, instrumentation reliability is often discussed in terms of accuracy, calibration, and maintenance history.
But when we step into the domain of functional safety and reliability engineering, the discussion must go much deeper.
The key question is no longer:
“Can this instrument fail?”
Every experienced engineer knows the answer is yes.
The real engineering question is:
“How does the instrument fail, and what does that failure do to the Safety Instrumented Function (SIF)?”
This is the foundation of failure mode analysis in safety systems.
Instrument Failure Modes – What Actually Happens in the Field
Consider a typical two-wire pressure transmitter generating a 4–20 mA signal.
In practice, the device rarely fails in a binary “working or not working” manner. Instead, we see several distinct failure behaviors, such as:
Each of these behaviors represents a different failure mode, and each has very different consequences depending on how the instrument is used in the process.
This is where the analysis becomes interesting.
Because an instrument failure is not inherently safe or dangerous.
Its consequence depends entirely on the function it is performing.
The Context Matters: Failure Mode vs Safety Function
Imagine a pressure transmitter connected to a Safety PLC programmed to trip on high pressure.
If the transmitter fails in certain ways, the consequences can be very different:
Fail-Safe Behavior
If the transmitter output suddenly drives to the upper current limit, the system may interpret it as high pressure, triggering a trip.
The plant shuts down.
Production may be lost.
But the safety function still works.
This is typically considered fail-safe behavior, although in operational terms it becomes a spurious trip.
Fail-Danger Behavior
Now consider other failure modes:
In these cases, a real high-pressure condition may exist, yet the safety system never receives the correct signal.
The SIF does not trip.
The protection layer silently disappears.
This is a dangerous failure mode.
And these are precisely the types of failures that functional safety engineers must identify, quantify, and mitigate.
Why Device Reliability Metrics Alone Are Not Enough
Traditional reliability metrics such as:
tell us how often equipment may fail.
But they do not tell us whether the failure compromises safety.
Functional safety introduces additional metrics such as:
These metrics only make sense once individual equipment failure modes are properly categorized.
Without understanding failure modes, reliability numbers alone become misleading indicators of safety integrity.
Recommended by LinkedIn
Failure Modes Across the Entire SIF Chain
The same reasoning must be applied to every element in the Safety Instrumented Function.
Sensors
Transmitters may fail by:
Each of these changes how the process deviation is detected or hidden.
Logic Solvers
Even safety PLCs have failure behaviors such as:
Some of these failures lead to automatic shutdown, while others may silently prevent the logic from executing correctly.
Final Elements
Final elements introduce another set of critical failure modes:
The most important engineering question becomes:
Will the valve move to the safe position when demanded?
Because in the end, the valve is often the last line of protection between the process and the hazard.
Another Often Ignored Category: Annunciation Failures
A particularly subtle failure category occurs when automatic diagnostics stop working.
The instrument may still appear operational, but the diagnostic system responsible for detecting failures has stopped functioning.
In such situations:
These failures rarely appear in routine maintenance discussions but are highly relevant in reliability modeling.
Not Every Failure Matters to the Safety Function
Some equipment failures have no effect on the safety function.
For example:
These are typically categorized as “No Effect” failures and may not influence safety metrics.
However, a competent analyst still identifies them to ensure nothing critical is overlooked.
The Real Lesson for Instrument Engineers
One of the biggest mistakes in engineering discussions is to treat instrumentation failures as pure hardware problems.
In reality, failures must always be analyzed in three contexts simultaneously:
Only then can we correctly determine whether a failure becomes:
This systems perspective is essential for:
In Conclusion
Instrumentation does not simply measure the process.
In safety-critical systems, instrumentation defines whether hazards are detected or ignored.
And when failures occur as they inevitably will OR it is not the existence of the failure that determines safety.
It is how the system interprets that failure.
That distinction is what separates device maintenance from true safety lifecycle engineering.
It's important to have a philosophy for driving the transmitter output on detected faults in the direction you want, which varies situationally. 1ooX is different than 2ooX. The logic solver configuration matters quite a bit as well: configuring how the voter treats below scale or above scale or "bad process value" aka "BadPV". Alarms have to be configured or you could sit in a detected fault indefinitely. And then there's deviation alarming between transmitters that are supposed to be reading the same - that can help with the stuck or drifting readings. Good topic!
Very well explained. In practice, device diagnostics are still under-utilized. Many OEM platforms provide alarm management suites and some level of SIF tracking to highlight signal degradation, but the analysis often stops at identifying a bad signal or alarm. What is still missing is deeper correlation between device diagnostics, signal behavior, and actual SIF activation or degradation. Bridging this gap could significantly improve safety monitoring and early detection of hidden failures.