Building an Agent Builder
Here's a confession: most "agent" systems aren't really agent systems. They're a single LLM call dressed up with a while loop. That works for demos. It falls apart the moment a real team with real requirements tries to use it.
A true Multi-Agent System (MAS) is something different. It's a platform where you can define multiple specialized agents, wire them together visually, run them in a coordinated pipeline, pause for human review mid-flight, and watch every decision happen in real time - all without touching infrastructure.
This post covers the engineering behind building that.
What Are We Actually Building?
Think of it like this. You want to build a system where someone like a product manager, an analyst or an engineer can open a browser, create a "Research Agent" or a "Writing Agent," connect them together on a canvas like a flowchart, add a human review step in the middle, hit run, and watch their pipeline execute in real time.
Underneath that simple UI is a surprisingly sophisticated system with three distinct layers:
💡 Design Principle
Before diving in, one principle shapes every decision in this system. Each layer only talks to the layer below it through a defined contract - REST APIs, message queues, or WebSocket channels. No layer reaches into another's database directly. This means you can swap out any individual piece without rebuilding the whole system.
⚙️ The Backend
Most people start with the frontend because it's visible and satisfying. That's a mistake. The backend is the foundation everything else rests on, and if it's wrong, no amount of beautiful UI will save you. Start here.
1. THE REGISTRY - SOURCE OF TRUTH
The Registry is the backbone of the entire system. It's a database of every entity you've ever created, every agent, every tool, every model configuration and every system prompt. Think of it like npm for your AI building blocks. Each entity has a name, a version, and a configuration.
Here's why versioning matters.
You build a pipeline in January: Researcher ➡️ Writer ➡️ Publisher. It works.
In March, a teammate improves the Researcher Agent and saves it as v3. Without versioning, your pipeline silently upgrades to v3 and breaks.
With versioning, your pipeline stays pinned to v2 forever. Nothing changes unless you choose to upgrade. Same principle as pinning a package version in npm. You're not pointing at "whatever the latest is". You're pointing at an exact, known, tested thing.
The result: pipelines are stable, updates are safe, and nothing breaks unless you want it to.
2. THE ORCHESTRATION ENGINE - THE BRAIN
The Orchestrator is the project manager of the system. It doesn't do any AI work. Its job is purely coordination, figure out what needs to run, in what order, and make sure data flows from one agent to the next correctly.
When you hit "Run" on the canvas, the frontend sends the pipeline to the orchestrator as a DAG - a Directed Acyclic Graph. This is a mathematical structure that says: "Node A feeds into Node B. Node B and C can run at the same time. Node D only runs when both B and C are done."
Why a DAG and not a simple ordered list?
Agents depend on each other's outputs. A "Writer" agent needs the output from the "Researcher" agent before it can start. A "Fact Checker" might need output from both the researcher AND writer simultaneously.
An ordered list can only express "A then B then C".
A DAG can express dependencies, parallelism, and branching all at once, without you having to hardcode any of it.
The orchestrator runs a topological sort on the DAG, An algorithm that looks at all the dependencies and produces the safest execution order, automatically identifying which agents can run in parallel. You never have to think about this, it just happens.
3. THE AGENT RUNTIME - THE WORKER
The Agent Runtime is where the actual AI work happens. It's a worker that receives a task from the queue, executes the agent, and returns an output. The most important thing about it: it is completely stateless.
Why stateless?
A stateless worker carries all the information it needs inside the task payload itself. It doesn't rely on any shared memory or global state. This means you can run 1 worker or 10,000 workers with zero changes to the code. When a worker finishes, it dies. When you need more capacity, you spin up more workers. If one crashes, another picks up the retry. No coordination, no race conditions, no bottlenecks.
Inside the runtime, the agent goes through a tool-calling loop. The agent sends a prompt to the LLM. The LLM might respond with "I need to search the web first." The runtime executes the web search, feeds the result back to the LLM, and calls it again. This repeats until the LLM produces a final answer with no more tool requests.
But what if your agent needs memory?
Some agents need to remember context across multiple turns. A customer support agent that recalls what the user said earlier in the session. A coding assistant that remembers what it already tried. If your worker is stateless, it forgets everything the moment the task ends, so asking "what did you do in step 1?" gets a blank stare.
The solution is simple: don't make the worker stateful, just give it a memory store. Every turn follows the same three steps: load ➡️ use ➡️ save. At the start of each task the worker loads the conversation history from Redis. It uses that history as context when calling the LLM. When the task finishes it saves the updated history back to Redis. Next turn, same thing: load the updated history, call the LLM, save again. The worker itself stays stateless and scalable, but it carries memory across every turn. Stateless architecture, stateful behaviour.
4. THE HITL SERVICE - HUMAN IN THE LOOP
HITL stands for Human-in-the-Loop. It is the thing that separates a system you can show in a demo from a system you can trust in production. The idea is simple: pause the pipeline at a specific point, show a human what happened so far, get their approval or edits, then continue.
In practice, implementing this correctly is genuinely hard. Here is the challenge: a pipeline might be mid-execution. The orchestrator is in the middle of running. You cannot just "pause" a running process, especially not for hours while someone reviews the work.
Why checkpoint instead of just pausing?
If you pause a running process and the server restarts (it will), you lose everything. A checkpoint is a complete snapshot of the execution state: every result produced so far, every variable, the exact position in the pipeline, all saved to persistent storage. The pipeline can be paused for 2 seconds or 2 days, the server can restart five times, and when the human clicks Approve, execution resumes from that exact snapshot. Nothing reruns. Nothing is lost.
🎨 The Frontend - Two Surfaces, One Purpose
The frontend's entire job is to make the backend feel simple. There are two distinct screens.
1. THE CONFIG STUDIO - BUILD YOUR PRIMITIVES
This is a tabbed settings panel with one tab per entity type: Agents, Tools, Models, System Prompts, Memory, Guardrails. In each tab, you fill out a form and click Save. That form data gets POSTed to the Registry API and comes back as a versioned entity you can reuse anywhere.
Think of it as building LEGO pieces before you assemble them. The canvas is where you assemble. The Studio is where you design each individual piece.
Recommended by LinkedIn
🧩 Each entity type has different fields. An Agent has a name, a model reference, a list of tools, and a system prompt. A Tool has a name, a description of what it does (the LLM reads this to decide when to use it), and a connection to an API or function. A Model is just a configuration wrapper around an LLM provider like GPT-4, Claude, or Gemini, with temperature, max tokens, and so on.
2. THE VISUAL CANVAS - WIRE IT TOGETHER
The canvas is a node editor, like draw.io or Figma but for agent pipelines. Every entity in your registry appears as a draggable node. You drag them onto the canvas, then draw edges (arrows) between them to define the flow of data.
The most important thing the canvas does is completely invisible. Every time you make a change, it serializes the entire diagram into a PipelineDefinition, a structured JSON document that describes every node, every edge, and every configuration, and saves it to the backend. This is the document the orchestrator uses to execute. The canvas is just a pretty editor for it.
3. THE LIVE TRACE PANEL: WATCH EVERY DECISION
When a pipeline runs, the browser opens a WebSocket connection, a persistent two-way channel to the server. Every time any agent worker emits an event like "started", "called web search", "got LLM response", or "completed", it travels through the event stream and arrives at the browser in real time. The canvas node changes color. The trace log gets a new entry. Token usage updates. If a HITL node is hit, the approval modal appears automatically.
This is how debugging actually becomes possible. When something goes wrong at 2am, you can replay the trace and see exactly what every agent said, what every tool returned, and which decision led to the failure.
🏗️ Infrastructure
Infrastructure is the part that nobody wants to think about until production breaks. Think about it upfront. Here's every piece and exactly why it exists.
1. THE TASK QUEUE - DECOUPLING SAVES LIVES
When the orchestrator decides to run an agent, it does not run it directly. Instead it writes a task onto a queue and moves on. A separate worker process picks that task up and executes it. This sounds like an unnecessary middle step. It is not.
Why not just run the agent directly?
An agent run can take 30, 60, even 120 seconds. An HTTP request cannot wait that long and will time out. So instead of the API sitting there waiting, it drops the task into the queue, immediately responds to the browser with an execution ID, and walks away. The worker handles the rest in the background, completely independent of the API.
This separation also means you can scale the two sides independently. You can run 2 API servers and 50 agent workers if agent tasks are the bottleneck. Or 10 API servers and 5 workers if incoming requests are the bottleneck. Each side scales on its own without touching the other.
And when a worker crashes mid-task, the task does not disappear. It goes back into the queue and another worker picks it up automatically, retrying with exponential backoff: wait 1 second, try again, wait 2 seconds, try again, wait 4 seconds. The task survives even if the worker does not.
2. EVENT STREAMING - THE NERVOUS SYSTEM
Every agent worker, as it runs, shouts out updates. "I started." "I called this tool." "The LLM responded." "I am done." These updates need to travel from a server process all the way to your browser in real time, so the canvas can update live without you ever refreshing the page.
The path is: worker ➡️ Redis Streams ➡️ fan-out service ➡️ WebSocket ➡️ browser canvas.
Here is what each step does.
The worker writes every event into Redis Streams, which is an append-only logbook stored in Redis. Think of it like a running diary. Every worker writes its events into this diary as they happen, each entry tagged with an execution ID so events from different runs never get mixed up. Fifty agents running simultaneously means fifty workers all writing into the same logbook, cleanly separated by their execution IDs.
A dedicated fan-out service watches this logbook continuously. The moment a new entry appears, it checks which browser tabs are currently watching that execution and pushes the event to all of them instantly via WebSocket. WebSocket is a persistent open connection between the server and your browser, meaning the server can send messages to your browser at any time without your browser having to ask. No polling. No refreshing. The update just arrives.
The result is that every node on your canvas reacts in real time. Agent starts: node turns orange. Agent finishes: node turns green. HITL checkpoint reached: approval modal appears automatically. If five people have the same run open in five different browser tabs, all five see the exact same updates at the exact same moment.
3. TWO-TIER STATE STORAGE - HOT AND COLD
Not everything needs to be stored the same way. Some data needs to be read in milliseconds. Some data needs to survive forever. Trying to use one storage system for both is how you end up with either a slow system or a fragile one. So execution state lives in two places.
Redis is an in-memory storage. Reads are sub-millisecond. It is fast because the data lives in RAM rather than on disk. It is the right place for anything that needs to be accessed right now.
Postgres is durable disk-backed storage. Slower to read, but survives server restarts, crashes, and anything else that can go wrong. It is the right place for anything that needs to exist forever.
Here is what lives where and why.
Active node outputs live in Redis. When the Researcher agent finishes and the Writer agent is about to start, the Writer needs to read the Researcher's output instantly. That read cannot take seconds. Redis makes it sub-millisecond.
HITL checkpoints live in both. Redis for speed, so when a human approves, the pipeline resumes instantly by loading the checkpoint from memory. Postgres as a durable backup, so if Redis restarts while the pipeline is waiting for approval (which could be hours or days), the checkpoint is not lost.
Full execution audit logs live in Postgres only. Every run, every agent decision, every tool call, forever. This is your source of truth for debugging, billing, compliance, and replaying any past execution exactly as it happened.
WebSocket connection registry lives in Redis only, and intentionally so. This is just a map of which browser tabs are watching which execution. The moment a tab closes, that entry is gone. There is no reason to persist it. Ephemeral by design.
4. SECRETS MANAGEMENT- THE VAULT
Tools need credentials. The web search tool needs an API key. The email tool needs OAuth tokens. The database tool needs a password. These credentials must never live in code, in environment variables committed to git, or in the registry database.
They live in a dedicated secrets manager. For most teams starting out, AWS Secrets Manager or Doppler gets the job done without any infrastructure to manage. At scale, HashiCorp Vault gives you full control. The principle is the same regardless of which you use.
When a worker starts a task, it fetches only the credentials it needs for that specific tool, uses them for the duration of that one task, and they are gone from memory. No developer ever sees production credentials. If a worker is somehow compromised, the blast radius is limited to a single task execution.
🔄 The Full End-to-End Flow
Now let's put it all together. Here is exactly what happens step by step from the moment a user clicks "Run Pipeline" to the moment they see results on their screen.
🎯 Final Thoughts
The reason most agent systems never make it to production has nothing to do with the AI. It is that teams underestimate how much engineering surrounds a reliable LLM call. Versioning. Orchestration. Stateless workers. Checkpointing. Observability. Secrets management. The LLM call is 5% of the system. The infrastructure around it is the other 95%.
Build each of these correctly and you do not just have an agent. You have a platform. One where any team member can design, deploy, and monitor complex AI workflows without touching infrastructure. One that scales, recovers from failure, and gives you full visibility into every decision at every step.
Treat agents like infrastructure. Versioned, observable, checkpointable, and built to fail safely. That is what makes the difference between a demo and something you can actually ship.
This is the part nobody wants to hear. The LLM call being 5% of the system is exactly right. Most agent demos look impressive because they skip the hard parts: versioning, state management, human checkpoints, error recovery. The teams that win will be the ones building boring infrastructure, not flashy demos.