We Finished the Second Plane. Here’s What the Full Stack Actually Looks Like.

A technical update on the ISA-95 bottling line demo — three planes running, data governed end to end, and a downtime model that mirrors how real plants actually work.

In the first update, I showed a historian full of telemetry and an AI answering questions about microstops from live signal data. It was a proof of concept for the telemetry plane — “what’s happening, second by second.”

This update is about what we built next.

The transactional plane. The governed layer. The part of the architecture that transforms signals into facts.

Where We Are Now

The demo system now runs three planes end to end on two Raspberry Pis costing less than €200 combined.

Plane 1 — Machine Layer

A Python simulator running on a Pi 4, publishing directly to MQTT. It executes a weekly production schedule loaded from an Excel workbook via a FastAPI ERP shim. The schedule drives everything: order sequences, changeover types, CIP cycles, and three injected major breakdowns.

The simulator runs as a systemd service — starts on boot, restarts on failure, logs to journald. It’s not a script you run in a terminal. It’s a plant that runs continuously.

Ten microstop types fire probabilistically during production, each with a distinct signal fingerprint. Fill Stabilisation Wait (MS02) is weighted more heavily on large-volume SKUs. Cap Feed Stutter (MS04) fires independently of fill state. Each one maps to a defined pattern of signals in the historian — scale_stable = false AND fill_time_ms > expected, for example — that the AI can cluster and identify without any operator input.

Plane 2 — Telemetry

High-rate signals published to MQTT and stored in a TimeBase historian on the Pi 5. 64 tags, ISA-95 hierarchy baked into every tag name:

historian/Amarach/Crosshaven/Bottling/Line01/Filler01/actual_weight_g

The historian stores what’s happening. Continuous. Store-on-change. No schema, no governance — just the signal stream. This is where the microstop fingerprints live, and where the AI reads them.

Plane 3 — Transactions

This is what’s new.

Every governed event — OrderStarted, StateChanged, FaultRaised, MicrostopEnded, BottleCompleted — flows from the simulator over MQTT onto a separate topic plane. Node-RED subscribes, validates, classifies, and writes to PostgreSQL.

The separation matters. Historians are not databases. You cannot run OEE queries against a time-series store and expect the right answer. The historian tells you what the line was doing. The transactional store tells you what happened, why it happened, and how long it lasted.

The Node-RED Architecture

Four flows, each with a distinct responsibility:

Schema Init runs once on startup. It executes the full PostgreSQL DDL — all tables, all indexes, all seed data — using IF NOT EXISTS everywhere. The system is self-initialising. You don’t need a DBA or a manual setup step. Deploy Node-RED, point it at Postgres, restart it, and the schema appears.

Transaction Listener subscribes to transactions/+/+/+/+/# and routes by eventType. Each event type has its own handler function that builds the appropriate SQL and writes to the correct table. Orders land in orders. State transitions land in state_events. Faults, changeovers, CIP, and stops all land in downtime_events with the right classification. Bottle events are sampled 1-in-10 into bottle_events.

OEE Aggregator subscribes to the telemetry line_state signal and accumulates time buckets on the current open order. Every state change increments either planned_downtime_ms (for CIP and changeovers) or unplanned_downtime_ms (for everything else). It writes these buckets back to the orders table in real time. OEE% is never stored — it’s always calculated on query from the components.

Downtime Classifier runs every 60 seconds. It queries for any downtime events where is_assigned = false AND classification = 'OPERATOR' AND ended_ts_utc IS NOT NULL, packages them as an UnassignedDowntime payload, and publishes to MQTT for tablet pickup. This is how the accountability gap metric works — unassigned downtime is tracked as a live number, not a gap you discover at end of shift.

The PostgreSQL Schema

Six tables, all following the same naming discipline: snake_case, _ts_utc suffixes on timestamps, _ms suffixes on durations, is_ prefix on booleans.

shifts holds one row per shift slot (AM/PM) keyed on shift code and date. orders holds the full order lifecycle — planned quantity, good count, reject count, all OEE time buckets, and a generated total_count column. downtime_events is the unified stop log: microstops, auto-classified events, and operator-assigned stops all live here with a classification field (MICROSTOP, AUTO, OPERATOR) and a source field (derived, system, operator). downtime_reasons is the reference table, seed-populated on startup. state_events stores every StateChanged transaction. bottle_events stores sampled quality records with weight and torque.

The most important design decision: total_count is a generated column (good_count + reject_count STORED). You never update it directly. You never recalculate it. It’s always correct.

Three Types of Downtime

This is the model that makes the demo work.

Type 1 — Operator Assigned

When an unclassified stop occurs — the line goes to STOPPED with no known cause — a downtime_events row is created with is_assigned = false. The operator selects a reason from a tree on the tablet. Until they do, that duration appears as an “accountability gap.” The percentage of unassigned downtime is a real metric on the dashboard.

Type 2 — Auto Classified

When the system already knows the cause from the stop code — CIP, a planned changeover, a major breakdown with a fault code — the downtime_events row is inserted with is_assigned = true immediately. No operator input needed. The system classifies it from the event payload.

Type 3 — Microstops

Sub-120-second interruptions that never trigger a formal stop record. Derived from signal fingerprints in the historian. Never visible in the operator UI. The operator’s availability figure says 87%. The AI queries the historian and finds that 22% of the hidden loss is MS02 Fill Stabilisation Wait events on the 2L SKU. That’s the power moment.

Three types in one system. The operator sees Type 1 and Type 2. The AI sees all three.

OEE from First Principles

The formula is:

Availability  = (Planned Production Time - Unplanned Downtime) / Planned Production Time
Performance   = (Ideal Cycle Time × Good Count) / (Planned Production Time - Unplanned Downtime)
Quality       = Good Count / Total Count
OEE           = Availability × Performance × Quality

Everything in that formula is a stored component. None of the percentages are stored. When the definition of “planned” changes — or when you want to include or exclude microstops from the availability figure — you change the query, not the data.

This is not a subtle distinction. It’s the difference between a system you can trust and a number on a dashboard you can’t explain.

The Four Dashboards

Grafana (v12 on the Pi 4) has four dashboards, provisioned from file so they survive container restarts:

Order Summary queries PostgreSQL. Active orders with progress against planned quantity. Completed orders with yield, duration, and actual vs planned time.

Downtime Pareto queries PostgreSQL. All downtime events ranked by total duration, including microstops. Joined to downtime_reasons for human-readable names. The pareto includes everything — planned, unplanned, and derived. The accountability gap is a separate stat.

Fill Quality overlays historian data (fill weight deviation per bottle) with order context from PostgreSQL. This is the panel that demonstrates the historian-database join — signal data enriched by governed context.

Line Live Status queries the TimeBase historian via the Infinity datasource. Line state, speed, fill weight, torque. The historian’s ISA-95 tag structure means the Infinity queries are just URL-encoded tag names — no custom data transformation needed. And this is one tht I am struggling with, It is working then fails, so more digging to get this stable

What We Learned Building This

A few things are worth documenting.

Spending time defining the transaction payloads, is so valuable. Using B2mml as the base meant we had the data we needed when we went to create the application logic, and the dashboards

Transactions and telemetry are genuinely different problems. We tested routing everything through the historian. It was wrong — not inefficient, wrong. A historian has no concept of a validated fact. It stores what it receives. A transaction store stores what happened, after validation, with full context. Mixing them destroys the semantics of both.

Systemd is underrated for demos. The simulator and ERP shim run as services that start on boot, restart on failure, and write structured logs. The Pi is always running. No one has to SSH in and start a script before a demo.

The ISA-95 hierarchy is load-bearing. Every component — the MQTT topic structure, the historian tag names, the PostgreSQL schema design, the Node-RED routing logic — depends on the ISA-95 physical hierarchy being consistent and correct. It’s not documentation. It’s the primary key of the whole system.

What’s Next

The infrastructure is growing. The remaining work is in three areas.

A handful of bugs need fixing: Grafana’s Infinity query format for the historian, a fault code matching issue in the Node-RED downtime handlers, and SQL NULL handling in the OEE calculations.

The full Excel weekly schedule needs to run continuously at speed to generate a meaningful week of production data — orders, breakdowns, changeovers, microstop distributions — that tells a real story in the dashboards.

And then the MCP demo queries. The five questions every plant manager asks:

What is causing the downtime this week?
Which SKU has the worst fill deviation?
What was the availability impact of Tuesday’s breakdown?
What percentage of downtime is still unassigned from the last shift?
Show me OEE by order, sorted by yield.

Each of those should return an answer in under five seconds from structured data. No report writer. No analyst. No custom query per question.

That’s the point of building the foundation correctly.

Everything is open source: 👉 https://github.com/SparkyLarks/ISA95-bottling-simulator

Hardware: Raspberry Pi 4 + Pi 5 — under €200 total.

Stack: Python, FastAPI, MQTT (MonsterMQ), Node-RED (FlowFuse), PostgreSQL, TimeBase Historian, Grafana 12.

Architecture: ISA-95 physical hierarchy, ISA-88 state machine, never store rolled-up OEE.

We Finished the Second Plane. Here’s What the Full Stack Actually Looks Like.

Amárach StackWorks

Standards based manufacturing operations management built on a modern technology stack.

Where We Are Now

The Node-RED Architecture

The PostgreSQL Schema

Three Types of Downtime

Recommended by LinkedIn

OEE from First Principles

The Four Dashboards

What We Learned Building This

What’s Next

More articles by Amárach StackWorks

Others also viewed

Accelerate Pandas 20x Using FireDucks

🧠 What Are Knowledge Graphs (and Why You Should Care)

Supercharging Analytics with Apache Arrow

RAG Quality vs Speed: A Framework for Measuring What Matters

Langchain Challenges and How to Solve Them

The ten commandments of fine-tuning (when your model is already in production)

My Morning Argument with Copilot

Navigate prompt character limitation with TOON

Explore content categories

Where We Are Now

The Node-RED Architecture

The PostgreSQL Schema

Three Types of Downtime

Recommended by LinkedIn

OEE from First Principles

The Four Dashboards

What We Learned Building This

What’s Next

More articles by Amárach StackWorks

Look Out! He Has a BOM!

The Data Contract: What Nobody Thinks About Until Week Two

Building Electronic Batch Records Using ISA-95 and ISA-88

The UNS Is the Stone in the Stone Soup. It's Time We Said So.

ISA-95 Part 4: The Work Master—Manufacturing's Secret Weapon

Three Flavors of Event Storming: Choosing the Right Approach

The Maintenance Transformation Trap: Why Reliability Must Come First

ISA-95 Part 4: Work Schedule and Work Performance—The Final Evolution

After the Event Storm: Turning Post-its into Engineering Documentation

Manufacturing Context Backbone Demo -Update 1

Others also viewed

Accelerate Pandas 20x Using FireDucks

🧠 What Are Knowledge Graphs (and Why You Should Care)

Supercharging Analytics with Apache Arrow

RAG Quality vs Speed: A Framework for Measuring What Matters

Langchain Challenges and How to Solve Them

The ten commandments of fine-tuning (when your model is already in production)

My Morning Argument with Copilot

Navigate prompt character limitation with TOON

Explore content categories