When Instruments Fail: Understanding Failure Modes Beyond the Device

When Instruments Fail: Understanding Failure Modes Beyond the Device

In the process industries, instrumentation reliability is often discussed in terms of accuracy, calibration, and maintenance history.

But when we step into the domain of functional safety and reliability engineering, the discussion must go much deeper.

The key question is no longer:

“Can this instrument fail?”

Every experienced engineer knows the answer is yes.

The real engineering question is:

“How does the instrument fail, and what does that failure do to the Safety Instrumented Function (SIF)?”

This is the foundation of failure mode analysis in safety systems.


Instrument Failure Modes – What Actually Happens in the Field

Consider a typical two-wire pressure transmitter generating a 4–20 mA signal.

In practice, the device rarely fails in a binary “working or not working” manner. Instead, we see several distinct failure behaviors, such as:

  • Output signal frozen at a fixed value
  • Output drifting slowly away from the true process value
  • Erratic or noisy output
  • Current driven to the upper limit (≈20 mA)
  • Current driven to the lower limit (≈4 mA or below)
  • Internal diagnostics failure
  • Communication failure with host systems

Each of these behaviors represents a different failure mode, and each has very different consequences depending on how the instrument is used in the process.

This is where the analysis becomes interesting.

Because an instrument failure is not inherently safe or dangerous.

Its consequence depends entirely on the function it is performing.


The Context Matters: Failure Mode vs Safety Function

Imagine a pressure transmitter connected to a Safety PLC programmed to trip on high pressure.

If the transmitter fails in certain ways, the consequences can be very different:

Fail-Safe Behavior

If the transmitter output suddenly drives to the upper current limit, the system may interpret it as high pressure, triggering a trip.

The plant shuts down.

Production may be lost.

But the safety function still works.

This is typically considered fail-safe behavior, although in operational terms it becomes a spurious trip.


Fail-Danger Behavior

Now consider other failure modes:

  • Output stuck at a low value
  • Signal drifting slowly downward
  • Erratic measurement masking the true process value

In these cases, a real high-pressure condition may exist, yet the safety system never receives the correct signal.

The SIF does not trip.

The protection layer silently disappears.

This is a dangerous failure mode.

And these are precisely the types of failures that functional safety engineers must identify, quantify, and mitigate.


Why Device Reliability Metrics Alone Are Not Enough

Traditional reliability metrics such as:

  • MTBF (Mean Time Between Failures)
  • MTTF (Mean Time To Failure)
  • Availability

tell us how often equipment may fail.

But they do not tell us whether the failure compromises safety.

Functional safety introduces additional metrics such as:

  • Probability of Failure on Demand (PFD)
  • Average PFD (PFDavg)
  • Risk Reduction Factor (RRF)
  • Mean Time to Fail Dangerously

These metrics only make sense once individual equipment failure modes are properly categorized.

Without understanding failure modes, reliability numbers alone become misleading indicators of safety integrity.


Failure Modes Across the Entire SIF Chain

The same reasoning must be applied to every element in the Safety Instrumented Function.

Sensors

Transmitters may fail by:

  • Freezing output
  • Drifting
  • Saturating high or low
  • Losing communication
  • Losing diagnostics capability

Each of these changes how the process deviation is detected or hidden.


Logic Solvers

Even safety PLCs have failure behaviors such as:

  • Digital inputs stuck high or low
  • Output channels stuck
  • CPU execution faults
  • Memory corruption
  • Power supply anomalies

Some of these failures lead to automatic shutdown, while others may silently prevent the logic from executing correctly.


Final Elements

Final elements introduce another set of critical failure modes:

  • Solenoid valve stuck
  • Actuator spring failure
  • Mechanical binding
  • Valve shaft failure
  • Valve stuck in position

The most important engineering question becomes:

Will the valve move to the safe position when demanded?

Because in the end, the valve is often the last line of protection between the process and the hazard.


Another Often Ignored Category: Annunciation Failures

A particularly subtle failure category occurs when automatic diagnostics stop working.

The instrument may still appear operational, but the diagnostic system responsible for detecting failures has stopped functioning.

In such situations:

  • Dangerous failures become harder to detect
  • Proof test coverage becomes less effective
  • The real system risk increases without being visible

These failures rarely appear in routine maintenance discussions but are highly relevant in reliability modeling.


Not Every Failure Matters to the Safety Function

Some equipment failures have no effect on the safety function.

For example:

  • Display panel failure
  • Non-critical internal electronics
  • Auxiliary functions not required by the SIF

These are typically categorized as “No Effect” failures and may not influence safety metrics.

However, a competent analyst still identifies them to ensure nothing critical is overlooked.


The Real Lesson for Instrument Engineers

One of the biggest mistakes in engineering discussions is to treat instrumentation failures as pure hardware problems.

In reality, failures must always be analyzed in three contexts simultaneously:

  1. Device behavior
  2. System architecture
  3. Safety function response

Only then can we correctly determine whether a failure becomes:

  • Fail-Safe (spurious trip)
  • Fail-Danger (protection unavailable)
  • Annunciation failure
  • No effect

This systems perspective is essential for:

  • SIL verification
  • PFD calculations
  • Proof test strategy development
  • Lifecycle reliability management


In Conclusion

Instrumentation does not simply measure the process.

In safety-critical systems, instrumentation defines whether hazards are detected or ignored.

And when failures occur as they inevitably will OR it is not the existence of the failure that determines safety.

It is how the system interprets that failure.

That distinction is what separates device maintenance from true safety lifecycle engineering.

It's important to have a philosophy for driving the transmitter output on detected faults in the direction you want, which varies situationally. 1ooX is different than 2ooX. The logic solver configuration matters quite a bit as well: configuring how the voter treats below scale or above scale or "bad process value" aka "BadPV". Alarms have to be configured or you could sit in a detected fault indefinitely. And then there's deviation alarming between transmitters that are supposed to be reading the same - that can help with the stuck or drifting readings. Good topic!

Very well explained. In practice, device diagnostics are still under-utilized. Many OEM platforms provide alarm management suites and some level of SIF tracking to highlight signal degradation, but the analysis often stops at identifying a bad signal or alarm. What is still missing is deeper correlation between device diagnostics, signal behavior, and actual SIF activation or degradation. Bridging this gap could significantly improve safety monitoring and early detection of hidden failures.

To view or add a comment, sign in

More articles by Amit Singh

Others also viewed

Explore content categories