Agentic Workflow Engineering
Agentic Workflow Engineering, by Stephen Redmond and ChatGPT

Agentic Workflow Engineering

Building real agentic systems has taught me a number of lessons, particularly when those systems need to run at scale and without surprises. The gap between what works in a demo and what survives daily operation is wider than many people expect.

A common narrative from vendors is that autonomous agents can plan and execute tasks end to end, including orchestrating other agents. With increasingly capable reasoning models, this story is compelling. It suggests that complexity can be delegated and that systems can manage themselves.

There are a number of issues that arise. The one that has stung me, and many others, is that costs can rise quickly. The agents use excessive context, the model ends up over-reasoning, and then operations are retried multiple times. All of this amplifies token consumption.

I can't always predict what the model is going to do, so I have ended up over-engineering prompts to try and cover all of the probabilities, and still end up seeing outcomes that I didn't expect. To add to the difficulty it can be hard to see exactly what the agent has done, so I can't even explain the behaviour that led to the outcome. If I can't always tell what the agent will do, and then I can't see exactly what it is doing, governance becomes difficult to enforce.

These issues do not always appear immediately, but they often surface once a system is expected to operate reliably over time.

Lessons learned

What follows is what I have learned about what actually matters when creating agentic systems that work in the real world.

Do not make the agent do too much

The most common mistake is asking a single agent to perform too many jobs. It is far simpler and more reliable to decompose work into multiple agents, each with a narrow and clearly defined responsibility.

A practical signal that an agent has become too complex is when branching logic starts to appear inside the prompt itself. At that point, reasoning and orchestration are being mixed together, which increases cost and reduces predictability. Breaking the work apart almost always leads to a better outcome.

Use the simplest model that will work

Once agents are narrow in scope, model choice becomes much simpler. Most tasks do not require the most advanced reasoning models.

For many use cases, simpler models such as Claude Haiku or gpt-5-mini are sufficient. In some cases, older models like gpt-4.1 or o3 perform perfectly well. It is also worth testing small language models such as Llama or Gemma. These models are faster, cheaper, and often more stable for focused tasks.

Simpler prompts and simpler models tend to reinforce each other. When the task is clear, there is rarely a need for high reasoning depth.

Strictly control the context each agent sees

Another recurring failure mode is excessive context. The majority of agents do not need the entire thread of a conversation to perform their task.

Providing only the context that is strictly required improves reliability and reduces unintended behaviour. It also makes agent behaviour easier to reason about and debug. Context should be treated as a deliberate input, not as something that accumulates by default.

Manage the workflow with code

Agent workflows are often more deterministic than people expect. Keeping track of system state, managing task lists, maintaining running totals, and handling branching or looping logic are all areas where code performs extremely well.

In these cases, using AI to orchestrate the workflow introduces unnecessary uncertainty. Code should manage the workflow by default, with agents invoked at well defined points to handle tasks that genuinely require interpretation or judgement.

Do not use AI to write your agent prompts

This often surprises people. In my experience, AI tends to produce bloated prompts. Each iteration adds more rules, more examples, and more words. Asking an AI to review a prompt usually results in even more being added.

The most effective and cost efficient prompts are short, explicit, and usually written by humans. They define boundaries clearly and avoid unnecessary guidance. Prompt writing benefits from restraint more than creativity.

Agentic workflow engineering

I refer to this overall approach as agentic workflow engineering. Humans explicitly design how work moves through a system, where decisions are made, and on what context those decisions are based. Control is deliberate and observable.

This approach is made up of agentic workflow design, context engineering, state management, and prompt engineering. None of these are glamorous, but together they determine whether a system behaves predictably over time.

Agentic systems work best when they are engineered by humans.

To view or add a comment, sign in

More articles by Stephen Redmond

Others also viewed

Explore content categories