Edge Data Analytics: Sensor Data Prep

Miles Sims

Published May 6, 2025

Clean Signals, Smart Decisions: Why Edge Data Prep is a Must-Have in Industrial Systems

Imagine trying to diagnose engine noise with earmuffs on. That’s what analyzing raw industrial signals is like.

Whether it’s a vibration probe on a motor or a pressure transducer on a pipeline, sensors in industrial systems speak in voltages, counts, and pulses—not English. These raw signals are often noisy, jittery, misaligned, and incomplete. And if we pump that raw feed into AI, dashboards, or alarms, we risk triggering false shutdowns, missing real failures, or—even worse—taking dangerous actions.

In the world of OT, where safety and uptime rule, data prep isn’t optional. It’s your first line of defense.

Why Raw Signals Are a Minefield

Industrial environments are signal-hostile by default. Here's why:

Electrical Noise – Drives, relays, and switching gear inject broadband interference into analog lines.
Sensor Drift – Temperature swings or aging components slowly shift the zero-line over time.
Quantization Errors – Low-resolution analog-to-digital converters (ADCs) and poor sampling choices introduce artifacts.
Data Gaps – Glitches, buffer overruns, or flaky connectors can create dropouts, stuck values, or spikes.

Example: A vibration spike from a loose cable could falsely flag a bearing failure—costing you thousands in unnecessary maintenance.

From Raw to Ready — The Data Cleaning Playbook

Think of this as your industrial “laundry cycle”:

🌀 Filtering & Smoothing

Low-pass filters (like exponential moving averages or FIR filters) remove high-frequency “buzz” that doesn’t belong.

⚠️ Spike & Outlier Removal

Use robust methods like Median Absolute Deviation (MAD) to catch anomalies without killing true signals.

🧭 Drift Compensation

Calibrate sensors or subtract adaptive baselines to remove slow offset changes over time.

🧩 Missing Value Imputation

Don’t feed your models holes. Use linear interpolation, forward fill, or ML-based methods to plug gaps responsibly.

🧮 Feature Extraction

Derive higher-order indicators like:

Rolling RMS for vibration
FFT peaks for bearing frequency analysis
Kurtosis or skew for early warning signs

📏 Normalization

Scale all features to a common range before feeding into ML or thresholds—ensuring apples-to-apples comparison.

🛠 Best Practice: Implement these steps locally (at the edge) to avoid transmitting noisy bulk data to central systems.

Where Does Data Prep Fit? (Architectural View)

Let’s zoom out.

📍 IIRA Context

In the IIRA (Industrial Internet Reference Architecture), your data-prep layer lives in the Operations & Information Domains, just beneath the Application Domain. It connects:

Control-level devices (Level 1, Purdue Model) with
IT-level decision engines (Levels 3-4).

🧱 Deployment Example: 3-Tier Edge Architecture

Edge Tier: Sensor filtering on Raspberry Pi / Jetson Nano
Platform Tier: OPC UA server aggregates filtered data
Enterprise Tier: Cloud or SCADA dashboard consumes cleaned signals

🧩 Integration Tip: Use tools like Node-RED or EdgeX Foundry for low-code deployment of filter pipelines.

Don’t Just Clean—Secure It

Edge data prep usually happens outside the traditional IT firewall. Here’s how to stay safe:

Run Local, Transmit Less: Filtering at the edge means less sensitive data flowing over networks.
Use Secure Protocols: Favor OPC UA (w/ encryption) or TLS-wrapped MQTT for transport.
Harden Your Devices: Use TPMs, certificates, and minimal OS builds (Yocto, BalenaOS).
Log the Pipeline: Record every step—from spike removal to model updates—in an immutable audit log (e.g., InfluxDB or SQLite with checksums).

Recommended by LinkedIn

AI-Powered Data Centers Set to Skyrocket: Global…

UnivDatos 1 year ago

From Data Silos to Smart Insights: How IT and OT…

Jill Klein 5 months ago

5 Ways AI Is Quietly Transforming Data Centers

Laura Balp 12 months ago

Reference: These steps align with NIST 800-82r3 OT cybersecurity best practices.

Real-World Example — Smarter Anomaly Detection at the Edge

Industry: Midstream Natural Gas Facility Challenge: The site experienced intermittent pressure spikes on pipeline transducers. These were missed by static threshold alarms, leading to unnecessary shutdowns.

Solution:

Engineers deployed a local anomaly detection model (Isolation Forest) on an edge device.
The model was trained on rolling statistical features — such as mean, variance, and kurtosis — extracted from the smoothed pressure signal.
All data preprocessing (filtering, feature generation) happened directly at the edge.

Result:

Within 48 hours, the model flagged two transmitters with abnormal noise signatures.
Field inspection revealed damaged shielding near a high-voltage bus — the true root cause.
After replacing the shielding and recalibrating sensors, false alarms dropped by 80%.

Impact:

Improved anomaly classification accuracy by 25%.
Saved an estimated $150,000 in avoided downtime over six months.

Takeaway: Edge data cleaning + embedded AI doesn’t just detect faults — it helps improve the signal pipeline itself, leading to better decisions, fewer interruptions, and real operational ROI.

Don’t Skip the Validation

Even a good filter can go bad.

Add checkpoints:

Compare raw vs. filtered data over time
Validate against known failure events
Track “data quality scores” (e.g., % spike-free or % complete)

🧪 Consider A/B testing: One pump gets cleaned signal, one gets raw. Compare model outputs or alarm rates.

What Great Looks Like

✅ Clean Data Pipeline Includes:

Edge-resident filters (Node-RED or Python)
Timestamp alignment via NTP/PTP
Smart protocol routing (OPC UA/MQTT)
Logging and audit trail of transforms
Validated outcomes with baseline benchmarks

🔁 Continuous Improvement:

Tune your thresholds and filters based on feedback
Update ML models with new feature sets
Monitor data-prep CPU/memory to avoid overload

Conclusion: Prep Is Your First Line of Defense

In an edge-first OT world, your sensors are only as smart as the pipeline between them and your decisions.

Data prep isn't “just” cleanup—it's your quality gate, your compliance backbone, and your predictive engine primer. Skipping it will cost you in AI misfires, regulatory headaches, and unplanned downtime.

Clean early. Filter locally. Analyze wisely.

About This Series: Edge Data Analytics

This is an exploratory series of posts about how Edge Data Analytics empowers real-time insights and actionable intelligence in complex environments like manufacturing, energy, and field service. The examples are illustrative, yet grounded in the real-world challenges I’ve faced on the plant floor and in control rooms.

My goal is to keep these posts practical, technical, and yes, a little fun—because we deserve more than generic analytics buzzwords and abstract slides. Full transparency: I’m using AI to help generate this content and explore how edge-first strategies can tackle the messiness of industrial operations (and maybe teach me a thing or two along the way).

If you’re evaluating edge data strategies for manufacturing or energy, let’s connect:

💬 Reach out to me here on LinkedIn

Previous article in Edge Data Analytics series:

To view or add a comment, sign in

Edge Data Analytics: Sensor Data Prep

Miles Sims

Clean Signals, Smart Decisions: Why Edge Data Prep is a Must-Have in Industrial Systems

Why Raw Signals Are a Minefield

From Raw to Ready — The Data Cleaning Playbook

🌀 Filtering & Smoothing

⚠️ Spike & Outlier Removal

🧭 Drift Compensation

🧩 Missing Value Imputation

🧮 Feature Extraction

📏 Normalization

Where Does Data Prep Fit? (Architectural View)

📍 IIRA Context

🧱 Deployment Example: 3-Tier Edge Architecture

Don’t Just Clean—Secure It

Recommended by LinkedIn

Real-World Example — Smarter Anomaly Detection at the Edge

Don’t Skip the Validation

What Great Looks Like

✅ Clean Data Pipeline Includes:

🔁 Continuous Improvement:

Conclusion: Prep Is Your First Line of Defense

About This Series: Edge Data Analytics

More articles by Miles Sims

Others also viewed

Overcoming Data Challenges in Digital Twins: Integration, Security & Real-Time Processing

Why AI Data Centers Can’t Get Power

2026 Trends in the Data Center Industry

Harnessing the power of data

AI cybersecurity from the ground up - Part 4 'Data Center Power Components'

AI Won’t Fix a Fragmented Grid

The Biggest Challenge in Digital Twin, Data Science, and Machine Learning? DATA!

Data, The Goldmine: Exploring the Power of Telemetry in Business and Tech

Copy of Each layer of data brings the grid one step closer to flexibility

AI & Data Centers? Not So Fast!

Explore content categories

Clean Signals, Smart Decisions: Why Edge Data Prep is a Must-Have in Industrial Systems

Why Raw Signals Are a Minefield

From Raw to Ready — The Data Cleaning Playbook

🌀 Filtering & Smoothing

⚠️ Spike & Outlier Removal

🧭 Drift Compensation

🧩 Missing Value Imputation

🧮 Feature Extraction

📏 Normalization

Where Does Data Prep Fit? (Architectural View)

📍 IIRA Context

🧱 Deployment Example: 3-Tier Edge Architecture

Don’t Just Clean—Secure It

Recommended by LinkedIn

Real-World Example — Smarter Anomaly Detection at the Edge

Don’t Skip the Validation

What Great Looks Like

✅ Clean Data Pipeline Includes:

🔁 Continuous Improvement:

Conclusion: Prep Is Your First Line of Defense

About This Series: Edge Data Analytics

More articles by Miles Sims

Edge Data Analytics: 5 Lessons Learned on the Micro-Factory Shop Floor at Boomi World

Edge Data Analytics: Meet SCADAi - AI Event-Driven Control for the Smart Factory

Edge Data Analytics: Reinventing OEE with Edge AI

Edge Data Analytics: Sensor Noise Filtering

Edge Data Analytics: Local Event Storage

Edge Data Analytics: Smart Outlier Handling

Edge Data Analytics: What is it Good For?

Agents of Industry: TLDR Series Recap

⚙️ Agents of Industry Best Practices: Fine-Grained Role Assignment in Multi-Agent Systems

Agents of Industry: The Domain-Expert LLM

Others also viewed

Overcoming Data Challenges in Digital Twins: Integration, Security & Real-Time Processing

Why AI Data Centers Can’t Get Power

2026 Trends in the Data Center Industry

Harnessing the power of data

AI cybersecurity from the ground up - Part 4 'Data Center Power Components'

AI Won’t Fix a Fragmented Grid

The Biggest Challenge in Digital Twin, Data Science, and Machine Learning? DATA!

Data, The Goldmine: Exploring the Power of Telemetry in Business and Tech

Copy of Each layer of data brings the grid one step closer to flexibility

AI & Data Centers? Not So Fast!

Explore content categories