Agentic Workflow Engineering

Stephen Redmond

Published Dec 30, 2025

Building real agentic systems has taught me a number of lessons, particularly when those systems need to run at scale and without surprises. The gap between what works in a demo and what survives daily operation is wider than many people expect.

A common narrative from vendors is that autonomous agents can plan and execute tasks end to end, including orchestrating other agents. With increasingly capable reasoning models, this story is compelling. It suggests that complexity can be delegated and that systems can manage themselves.

There are a number of issues that arise. The one that has stung me, and many others, is that costs can rise quickly. The agents use excessive context, the model ends up over-reasoning, and then operations are retried multiple times. All of this amplifies token consumption.

I can't always predict what the model is going to do, so I have ended up over-engineering prompts to try and cover all of the probabilities, and still end up seeing outcomes that I didn't expect. To add to the difficulty it can be hard to see exactly what the agent has done, so I can't even explain the behaviour that led to the outcome. If I can't always tell what the agent will do, and then I can't see exactly what it is doing, governance becomes difficult to enforce.

These issues do not always appear immediately, but they often surface once a system is expected to operate reliably over time.

Lessons learned

What follows is what I have learned about what actually matters when creating agentic systems that work in the real world.

Do not make the agent do too much

The most common mistake is asking a single agent to perform too many jobs. It is far simpler and more reliable to decompose work into multiple agents, each with a narrow and clearly defined responsibility.

A practical signal that an agent has become too complex is when branching logic starts to appear inside the prompt itself. At that point, reasoning and orchestration are being mixed together, which increases cost and reduces predictability. Breaking the work apart almost always leads to a better outcome.

Use the simplest model that will work

Once agents are narrow in scope, model choice becomes much simpler. Most tasks do not require the most advanced reasoning models.

For many use cases, simpler models such as Claude Haiku or gpt-5-mini are sufficient. In some cases, older models like gpt-4.1 or o3 perform perfectly well. It is also worth testing small language models such as Llama or Gemma. These models are faster, cheaper, and often more stable for focused tasks.

Simpler prompts and simpler models tend to reinforce each other. When the task is clear, there is rarely a need for high reasoning depth.

Recommended by LinkedIn

AI Skills & Concepts Cheat Sheet

Sanika Tungare 9 months ago

Unlocking the Power of Prompt Engineering: Building…

Abdul Aziz Abdullah Al Saudi 2 years ago

Crafting Prompt Engineering for broader implications.

Md. Hafizullah 9 months ago

Strictly control the context each agent sees

Another recurring failure mode is excessive context. The majority of agents do not need the entire thread of a conversation to perform their task.

Providing only the context that is strictly required improves reliability and reduces unintended behaviour. It also makes agent behaviour easier to reason about and debug. Context should be treated as a deliberate input, not as something that accumulates by default.

Manage the workflow with code

Agent workflows are often more deterministic than people expect. Keeping track of system state, managing task lists, maintaining running totals, and handling branching or looping logic are all areas where code performs extremely well.

In these cases, using AI to orchestrate the workflow introduces unnecessary uncertainty. Code should manage the workflow by default, with agents invoked at well defined points to handle tasks that genuinely require interpretation or judgement.

Do not use AI to write your agent prompts

This often surprises people. In my experience, AI tends to produce bloated prompts. Each iteration adds more rules, more examples, and more words. Asking an AI to review a prompt usually results in even more being added.

The most effective and cost efficient prompts are short, explicit, and usually written by humans. They define boundaries clearly and avoid unnecessary guidance. Prompt writing benefits from restraint more than creativity.