Spec-driven development workflow - how to get production-ready code from AI

Spec-driven development workflow - how to get production-ready code from AI

  • If Agentic Coders do not work for you as you want them to.
  • Or if you wonder how is it possible that people do not write code anymore while what you see is very far from being production ready code.
  • Or if you wonder how the code review should look like when it was written by AI.

Spec Driven Development Workflow might be the approach that can help you. I built a plugin for Claude Code which I would like to present, and also to share Spec Driven Development concept which I find very useful to improve Agentic Engineering quality.

After 1.5 years building fully autonomous AI agents, I've learned that once you've sorted out tooling and architecture, context is the only lever left. In the case of AI coders we manage only context — and that's exactly what SDD helps to improve.


Why agentic coders underperform

Let's name the three root causes of poor agentic coder performance:

  • Request is too short — when a prompt is too short and ambiguous, it creates room for LLM creativity which might not be in line with your expectations
  • Request is too hard — when your request is too complex to deliver in one shot, the model gets confused as it might lack the required tooling or context to deliver it
  • Context is too big and messy — when context grows, model performance drops (https://claude.com/blog/1m-context-ga)

For small and simple tasks in small codebases, running plan mode, verifying the plan, and writing the code with AI might be sufficient. This doesn't scale to harder tasks and bigger codebases. The idea of Spec-Driven Development is that we need to write a detailed spec for the agent which describes what we expect in all the details. Research supports this: detailed, executable specs reduce AI code errors by up to 50% (https://arxiv.org/abs/2602.00180) and security defects by 73% (https://arxiv.org/abs/2602.02584).


sddw: spec-driven development workflow for Claude Code

The idea of my plugin (sddw) is to use the agentic coder itself to help you write the spec, split the coding flow into multiple steps, and decompose a feature into atomic coding tasks.

If you've been building ML workflows, this should look familiar — think ARGO Workflows or Kubeflow. You have a task, which you split into atomic steps in a workflow, and you have artifacts which workflow steps can write and read. That's exactly what sddw does, just for agentic coding with Claude Code.


The workflow

sddw consists of 4 steps:

  1. Write requirements — define user stories, functional requirements, acceptance criteria, constraints
  2. Analyse existing codebase (optional) — extract patterns, interfaces, conventions from the target codebase
  3. Design solution — decompose the feature into self-contained tasks with architecture, contracts, and data models
  4. Implement tasks — implement one task at a time following TDD

Article content

After every step you clear the context or start a new fresh session.

After every step an artifact is created.

That artifact becomes the input to the next step.

This artifact-passing mechanism gives us modularity which otherwise is not possible with Claude Code today.


How It works in practice

The role of sddw is not only to code you a solution but to help you define a spec — how the solution should look like in the first place. sddw navigates you through a predefined flow, one step at a time:

  1. Asks you a question that helps build the spec
  2. Does research and analysis
  3. Proposes a solution

You are in control — approving, declining, or modifying the suggestions. You can ask for help answering a question, ask to analyse state-of-the-art solutions, or do research. At the end of this interactive process, a spec is born. Clear context, start a new session, and move to the next step — which follows the same pattern, unless it's the final implementation step.


The spec is the artifact for peer review, not AI-generated code

The spec contains valuable compressed information — a result of your inputs and AI research, approved by you. It has structure: acceptance criteria, TDD approach, architecture decisions with rationale, rejected alternatives. This is what you review and iterate on. The code that follows is a verified implementation of an already-approved spec.

There is not much point in reviewing AI-generated code on its own. If the review doesn't change the agentic coder's context — its instructions, skills, or system prompt — the same problems will appear in the next pull request. The coder doesn't learn from your comments.

Reviewing the spec is more valuable. The spec is the most important part of the model's context — it clarifies expectations and provides relevant information. It's also more interesting  to review: it was written with human supervision and captures high-level design decisions, not implementation details.


How sddw addresses the root causes

Let's get back to the issues we defined:

  • Request is too short → the generated spec is bigger, better structured, and more thorough — it contains acceptance criteria, TDD approach, functional requirements, constraints
  • Request is too hard → two levels of decomposition make it simpler. We progress step by step through the workflow: defining requirements, analysing code, splitting the solution into tasks, implementing tasks one by one
  • Context is too big and messy → we clear context after every step. We keep only what's relevant for a single step or a single coding task


How sddw is built

I used commands, not skills, because commands are namespaced while skills are not. This allows using /sddw:requirements instead of just /requirements. Commands are thin entry points — they don't contain instructions inline but reference them via @ file includes. The structure of every command is identical:

  • Reference to instruction (process rules)
  • Reference to dialog flow (questionnaire)
  • Reference to output spec template
  • References to available input specs (delivered by previous steps)
  • Service metadata: name, description, previous step, next step

This keeps the structure modular and clear. It's easy to add or remove a component.


What's next

I think Claude Code would benefit from a workflows abstraction. sddw is a first step towards understanding how such workflows might look like. I'm exploring whether these abstractions — steps, specs, questionnaires, instructions — are sufficient to build agentic workflows in general:

  • Define steps in a workflow
  • Use agent assistance to build the required subcomponents for every step
  • Automatically wire references to previous artifacts and adjacent steps


Try it

Link to sddw: https://github.com/sermakarevich/sddw

I'm very curious about your experience with agentic coding — it's basically greenfield and everyone is figuring things out. Have you used or heard of SDD? If you tried sddw and find it useful, please consider commenting, sharing, or starring the repo.

If you are an organisation planning to use sddw, please consider sponsoring the project on GitHub.

I've landed somewhere in between. I start with a lightweight spec, build the smallest working version, then iterate the spec alongside the code. Full upfront specs assume you know the problem well enough, which is rarely true at the start. The spec grows and improves as the feedback and features come in. Claude Code handles the iteration speed, you handle the judgment calls. So the input for any new feature is:- the current spec- the feature description, acceptance criteria.This goes through planning mode.Output: the implementation and the updated spec, where relevant.

requirements -> code-analysis -> tasks -> implement

  • No alternative text description for this image

The architectural shift you are describing goes deeper than most people realize. The hard problem is not building the elicitation interface. It is teaching the agent when to ask. Most current human-in-the-loop implementations are either too conservative (interrupt on everything, defeating the purpose of automation) or too aggressive (only surface critical failures after they have propagated). The elicitation pattern essentially requires the agent to maintain a calibrated uncertainty estimate over its own decision boundary. It needs to distinguish between 'I am 70% confident and should proceed' versus 'I am 70% confident but the cost of being wrong here is catastrophic, so I should ask.' That is a fundamentally different capability than just generating good outputs. It requires the agent to reason about its own epistemic state relative to the stakes of the decision. The interesting follow-up question is whether this calibration can be learned from interaction history, or whether it needs to be architecturally specified per workflow.

Like
Reply

waiting for the plugin demo - curious if the spec becomes the new "unit test" for code quality

Like
Reply

To view or add a comment, sign in

More articles by Sergii Makarevych

Others also viewed

Explore content categories