Using Spec Driven Development for Agentic Delivery

Using Spec Driven Development for Agentic Delivery

Almost 6 months ago I wrote an article where I stated we have begun the epoch of "no human coding", an era where AI takes on much of the design, coding, and testing, while humans provide oversight, context, and direction.

https://www.garudax.id/pulse/from-spark-working-software-intentdriven-development-using-rutter-sv4ne/

How things have changed in just a few months. Models have improved in capability and ability to run for long periods with minimal supervision. The coding agents have planning workflows, context integration with MCP and Skills. Specification driven development is called out in the Thoughtworks Technology Radar and  is being positioned by major vendors and practitioner communities as a repeatable workflow. The end-to-end delivery of new features from specification to working product is now achievable using agents for many use cases.

Everything I wrote in my previous article is still valid, and even more so. As models increase in capability and autonomous runtime, it is even more important that we concentrate on how we use AI to improve the specification, and let the AI deal with the implementation and testing, with the human in the loop validating the approach and the output.

Workflow and the latest models over specific coding assistants

There is an interesting podcast with Lex Fridman and Peter Steinberger (originator of OpenClaw) that goes deeper than I can here on the practical differences between Claude and OpenAI's coding models. The discussion matched my experience: the underlying capability is broadly comparable, with some nuances. Peter also notes that switching models has a learning curve - he suggests giving it about a week to develop a gut feel. https://lexfridman.com/peter-steinberger-transcript/

I have tried various AI coding assistants and specification driven development frameworks. I think it is fair to say that the coding assistants are in such competition with each other that all the leading ones are sufficiently capable. What is more important, is equipping developers with the latest models, using the planning workflows effectively and improving the repeatability of your workflow.

A quick clarification on vibe coding

A year ago you could argue that vibe coding was the only practical way to keep a coding assistant on the rails. You made a small request, got a small change, corrected it, and repeated. Also some people work well like this; it can feel reassuring because you are always involved. And perhaps it still has a place in exploration, spikes, and for learning.

Now that agents can plan better, run longer, and produce coherent end-to-end changes, the "vibe incremental loop" approach becomes inefficient, a poor substitute for a proper spec. You are not engineering the target. You are compensating for the absence of one, by steering constantly.

Once you are used to vibe coding, re-learning to do things a different way can be challenging. Like with swimming or skiing, as a beginner you may be able to do a couple of lengths or a blue run and are completely exhausted. You rightly feel a sense of accomplishment. However to get to the next level you need to unlearn some bad habits. As you correct the fundamentals - posture, breathing, rhythm – you can swim or ski with a lot less effort and at a far greater pace.

Vibe coding rewards the wrong behaviours. Fast feedback. A sense of control. Dopamine hits from frequent of small wins. And before you know it, your "spec" is a trail of chat messages and half decisions. Specification-driven development is a fundamental change; less wasteful, faster and cheaper.

Stop driving your Ferrari in first gear!

Specification Engineering

Specification engineering is key to be able turn intent into working software. How do we define all the different aspects of a problem. The architecture approach of breaking this down into the: why, what, how and with what, seems applicable.

The ‘why’ is requirements and outcomes. Who are we serving? What does success look like? What trade-offs are acceptable?

The ‘what’ is the specification itself, but it is not just functional requirements. It is defining the target. When does the agent know when its successful? What are the constraints that must not be violated?

The ‘how’ is the plan. Earlier we might have been assigning small tasks to the agent. Agents are capable of operating at feature level. Features might still need decomposing into smaller tasks, just like you would have done previously with a team of developers. The difference is that the agent can do that decomposition quickly, and you can review the plan like you would review a design.

The ‘with what’ is your guardrails. This is codified into your policy and instruction files. It is critical to define the constraints within which the agent operates.

Supplement all of this with your enterprise, domain and project information. In practice that context is often split across places. The project context is in git. Work tracking and requirements are in Jira or ADO or PRDs. Enterprise context and ADRs are in SharePoint. The modern challenge is not "can the agent code", it is "can the agent see the whole truth".

(Also see my article on context and AI walled gardens: https://www.garudax.id/pulse/agentic-tsunami-your-enterprise-ready-agents-act-david-rutter-ztm1e/)

The biggest shift: iterate early, then let the agent run

Iteration is still very valuable, but do it in the beginning.

Use the assistant to run a structured question and answer session to bottom out ambiguity. There is a lot of information that is not codified in documentation. A Q&A elaboration with your agent allows you to create full and complete specifications, collaboratively. Once the spec is clear enough, the agent can move from synchronous coding to more autonomous execution.

The planning modes in the top coding assistants have been greatly enhanced to facilitate this Q&A process. And you may find that clarification questions are asked as a batch, with the tooling doing the iteration, rather than the agent.

That distinction matters. It changes what your time is spent on. You stop being a human linter for half-formed ideas, and you become a reviewer of intent, constraints, and trade-offs.

A note on “experienced engineers” and “juniors”

There is a common narrative that AI is accelerating the output of experienced developers and engineers and that agents are replacing the roles of Juniors.

We are finding that when appropriate guidance and mentoring is provided, our more junior developers are just as capable of driving these agentic workflows.

In fact, in a spec-driven world, the centre of competence shifts. It becomes less about memorising patterns and more about asking good questions, writing clear specs, and validating outcomes against explicit criteria. Those are teachable skills. That matters if you care about scaling delivery capability, not just individual productivity.

A real example: building an MCP server for SharePoint and OneDrive search

Talking about specification driven development can sound abstract, so I want to bring it to life with a practical example. I describe a simple example below and here is a link to a short video showing the workflow end-to-end. https://youtu.be/E8YgHZNRsMw

First the background. Agents need the full context; in many organizations the context of the solution, the ADRs, potentially the high level PRDs and the enterprise is in SharePoint. So in this example we build an MCP server for SharePoint and OneDrive that I can use in my coding assistant.

(see my article on AI walled gardens for more on the topic of context: https://www.garudax.id/pulse/agentic-tsunami-your-enterprise-ready-agents-act-david-rutter-ztm1e)

This is the basic prompt:

Create a Local MCP server with SharePoint and OneDrive search The purpose is that as a knowledge worker using an MCP-enabled client, I want to issue a natural language query against my organization’s SharePoint or OneDrive content so that I can obtain a concise list of relevant items I am permitted to access and use that context in downstream reasoning.

- Authentication is through the MCP client, with options for Device Code Flow and OBO Authentication and use of a .env file
- Provide two tools searchSharePoint/searchOneDrive
- Use the copilot retrieval API
- Returns a passthrough retrievalHits list (no local normalization). 

Also provide a command line tool for testing.
Use node for the implementation. 
Ask clarification questions before proceeding.        

This prompt has the key elements needed by agents: the why (the purpose and user value), the what (local MCP server with two types of search) the how (passthrough of raw results and testing tool) and the with what (auth model and retrieval API, use of Node)

Do I need the "how" and “with what”? I could have left it out and relied on the agent to ask me clarification questions. However it is best to remove as much ambiguity at the start to avoid wasting the agents and your time. Even with the above, the agent will still ask for some clarifications (see later the discussion on drift control).

There is an important assumption in this prompt that is worth calling out. The above prompt assumes that the agent is provided through MCP servers with context about MCP clients, retrieval APIs and OBO authentication.

That is what specification engineering looks like in practice. It is not only about the words in one prompt. It is about the full context surface area the agent can access, and the constraints you enforce.

Why I did not over-optimise the spec

The prompt leaves room for improvement. I could have expanded the technical details and provided a proper definition of done. However the latest models are quite forgiving, and it passes the litmus test - it works.

If I were working within a project team, I would have additional scaffolding. The key requirements would be in a ticketing system. I would have skills and/or custom prompts. I would have guidance on coding and testing guidelines. In that world my prompts would be much simpler:

/elaborate ticket #10065 
/implement        

This is an important point for larger teams. Spec-driven does not mean "write more in chat". It means "make sure the spec is  accessible to the agent".

Making specs unambiguous, testable and verifiable

In my previous article, I recommended making specifications both unambiguous and testable by combining:

  • EARS (Easy Approach to Requirements Syntax): a lightweight, controlled natural language framework for writing clear requirements.
  • Gherkin: a business readable DSL for defining testable, behaviour driven scenarios.

Whilst you do not need to do this, it helps ensure better specifications. It forces you to define success conditions in a way that both humans and machines can check and thus providing a level of verification. One of the advantages of tools like Kiro is that they generate requirements in a structured EARS format, which makes assumptions explicit and easier to verify.

Here is a simple example of the difference it makes.

"Resetting a password should log the user out everywhere."        

That reads well, but it is vague. How fast? What counts as "everywhere"? What about token refresh? What about active sessions?

"When a user completes a password reset, the system shall invalidate all active sessions and authentication tokens for that user within 60 seconds.”        

Now you have a measurable constraint, and you have surfaced design implications early. This is the kind of sentence that prevents days of rework later.

Frameworks vs process

There are several popular SDD frameworks, for example Kiro, spec-kit, and GSD. My view is that it is the process that matters not the framework. The planning modes you find in Claude Code and GitHub Copilot are now so powerful, and also easier to combine with your own prompt and skill files.

I have a test case that I have used with several coding assistants and SDD frameworks. The best results for me have come from using

  • a capable LLM,
  • with a planning mode
  • simple custom prompts e.g.  /elaborate and /implement
  • explicit acceptance criteria and constraints
  • version control of the specification

This is a useful message for enterprise adoption. You do not have to bet the farm on a single framework to start getting the benefits. You need a disciplined workflow and a clear target.

Drift Control

If I make decisions in a question and answer session with the agent, I am at risk of replicating the vibe coding issue where the specification is hidden in the chat. To manage drift, the specification needs to live in one canonical place and be under version control (ideally in git, otherwise in the ticketing system). Any changes to the original requirements, and decisions agreed to by the developer during the specification process, such as from the question and answer iteration, are incorporated.

This is where the popular frameworks are lacking; they can’t dictate a single approach; it is relatively easy to address using your own prompt files and ways of working.

The economics

What is the cost of the spec driven development approach? When using my github copilot licence, this could be as little as 3 requests; one for the initial prompt, once for the questions, and one for the implementation; plus my time is needed for validating the approach answering questions and user acceptance testing. In between I can do other tickets, meetings, making coffee.

If I was doing this by vibe coding, I would require several more requests and would be much more hands on. (Kiro and Claude Code are more complicated to assess since the cost is token based - however again the vibe coding is likely more expensive because you are reading the whole context more times).

Recap

  • As agents become more capable and more autonomous, the bottleneck shifts. It is not typing speed. It is specification quality.
  • Vibe coding is an anti-pattern for enterprise delivery not because it never works, but because it turns your spec into a moving target and makes human attention a constant steering loop.
  • The leverage is in iterating early on the spec, using planning to flush out ambiguity, and then letting the agent execute against a target that is explicit, testable, and constrained.

What next?

Autonomous agents are no longer just implementing specs. They are increasingly capable of generating bespoke, personalised software solutions from intent and context. Plenty to explore in a future article!

Closing question

If you are using agentic coding in enterprise environments, where does it break most often for you today? Is it ambiguity in requirements, missing non-functional constraints, verification quality, or governance and tool access?

I am interested in what is working in real teams, not idealised demos. Add a comment and I will happily compare notes.

References and further reading

Spot on David. Specification engineering becoming the core skill is the shift most teams haven't fully internalized yet. The natural follow-on question is: what happens to the spec once it's written? Coding agents can ship whole features reliably — but only if the spec they're reading is current. The moment a requirement changes and the spec doesn't, you've introduced silent drift into the entire delivery pipeline. The next layer of this problem is spec infrastructure — how do you version it, gate changes through the right stakeholders, and keep it synchronized with what's actually in the codebase? That's what I'm building with Nyx. Curious how you're seeing teams handle spec governance in the agentic workflows you're working with.

Like
Reply

David what do you use to stop future changes from regressing previously delivered features?

Like
Reply

Great piece, especially the drift control point. To answer your closing question: verification quality is where it breaks. Making Gherkin scenarios executable is a strong start, but we've found adding mutation testing on top closes the last gap. It proves the tests actually catch failures, not just pass. And the surviving mutants are the real gold, they reveal blind spots in the spec itself. That creates an early feedback loop where the spec gets hardened iteratively, which matters because the initial spec is never perfect.

Like
Reply

many don't know what's coming ;) doing the traditional architecture engineering will be the next step, for me is the vibe engineering done right. lets see but i see lots of innovations lately in that direction, including the one we are working at Gluecharm.com

Like
Reply

We are also exploring GitHub Spec Kit in some projects with initial positive results

Like
Reply

To view or add a comment, sign in

More articles by David Rutter

Others also viewed

Explore content categories