From Spark to Working Software: Intent‑driven Development using GenAI

From Spark to Working Software: Intent‑driven Development using GenAI

Introduction

We have begun the epoch of 'no human coding'; an era where AI takes on much of the design, coding, and testing, while humans provide oversight, context, and direction.

Imagine sketching an idea in plain language in the morning and seeing it come alive as a working feature by the afternoon. That is the disruptive promise of Generative AI in software engineering. By combining intelligent coding assistants, orchestration frameworks, and the use of simple prompts within a structured context facilitated by MCP, we are moving from code‑centric development to intent‑driven creation.

Of course, this shift isn’t without friction. What feels like a burst of creative energy - vibe coding, or prompt‑only development - can resemble a garage project on steroids: thrilling, rapid, and most likely chaotic. The challenge is not just to build something fast, but to build something maintainable, testable, and scalable. In other words: how do we create new GenAI‑driven workflows that can transform sparks of inspiration into working, professional‑grade code?

This article introduces an innovation in GenAI workflows – a repeatable, scalable process that can be applied to real client projects.

From a business perspective, this shift has tangible outcomes: faster time‑to‑market as features move from idea to deployment in hours or days rather than weeks; lower delivery costs as repetitive coding work is automated; stronger consistency and compliance thanks to standards embedded into the workflow; and better talent leverage, freeing teams to focus on strategy, design, and innovation instead of mechanical coding.

This article will be valuable to business and technical leaders exploring how Generative AI workflows, AI coding assistants (e.g. GitHub Copilot, Claude Code, Cursor, Kiro), and intent-driven delivery transform time-to-value, quality, and team productivity.

From Idea to Requirements

Every application feature starts with an idea. In traditional development, this might become a specification document or a Feature item in Jira or Azure DevOps, which may take many hours or days to elaborate, discuss, and refine. (I am keeping this generic, some might use the term Product Requirements document or PRD).

In our GenAI‑driven approach, we take that idea (often just a few lines of plain English) and, with the existing documentation and code as context, use a single, focused prompt to have the assistant produce a first‑cut, usable specification. This is typically more prescriptive, more detailed to ensure the coding assistant has the required context,

This specification‑first approach ensures the knowledge of why, what, and how lives in durable requirements and documentation, not in a series of ephemeral prompts. The assistant then advances the work one step at a time: elaborating requirements, generating backlog items, writing code, and drafting tests; always anchored to the specification.

To make specifications both unambiguous and testable, we combine (see Appendix for details):

  • EARS (Easy Approach to Requirements Syntax): a lightweight, controlled natural‑language framework for writing clear requirements.
  • Gherkin: a business‑readable DSL for defining testable, behaviour‑driven scenarios.

Take the feature request: “User Personas.” The requirement is: “During a conversation, each user should be able to select a persona from a list. The personas and their descriptions come from a configuration file under source control.”

By providing the GenAI assistant we can turn this into a draft specification with a simple prompt: “Create me this feature in Jira / ADO and elaborate so that someone could break this down later into a sequence of backlog items.”

In EARS style:

“The system shall allow a user to select a persona from a configuration‑defined list during a conversation.”        

In Gherkin:

Scenario: User selects a persona during a conversation
  Given the system has a configuration with personas
  When the user selects a persona from the list
  Then the system applies the corresponding system prompt        

Grounding everything in specifications reduces ambiguity and creates a repeatable, client‑ready workflow that guides AI assistants and other agents without leaving critical knowledge hidden in the chat history.

Many people find mock-ups helpful to visualize the new feature. We can supplement the specification with a quick Figma mock-up created by a designer, or better still, we guide the GenAI assistant to generate the mock-up directly from the existing codebase and design guidelines. Prompt: “Create a mock-up for this new feature.”

At this stage, just like in traditional approaches, it’s essential to review and revise the specification.

Backlog Creation

The amount of effort that a state‑of‑the‑art model can achieve with a single prompt (think unsupervised work) seems to be approximately the equivalent of an hour of human effort. It is unlikely that the GenAI assistant will reliably implement a whole new feature in one go. We need to break it down into smaller chunks.

It is important to note that this metric is increasing rapidly and is likely to increase in the future. Independent evaluations by METR (Model Evaluation & Threat Research) show that the length of software‑engineering tasks which frontier AI models can reliably complete has been doubling approximately every seven months between 2019 and 2024 (metr.org). As of mid‑2025, METR measured that GPT‑5 has a 50% time horizon of around two hours on agentic software‑engineering tasks—meaning it has a 50% chance of correctly completing tasks that take a human professional unfamiliar with the codebase around that long (metr.github.io). This provides an external benchmark for how quickly this capability is scaling.

Using the high‑level specification and design, we prompt the GenAI assistant to elaborate it into a backlog. The assistant breaks the feature into independent, testable backlog items, each with its own acceptance criteria.

Prompt: “Break down this feature into a set of backlog items.”        

This transformation ensures that each backlog item can be developed, tested, and validated independently. Especially for a complex feature, we may want to validate the breakdown, and again the GenAI assistant can do this with a simple prompt: “Review the feature specification and validate that it is consistent with its backlog items.”

Hosted vs. Self‑Hosted Models

As an aside, I am often asked whether we can use self‑ or client‑hosted models for the GenAI assistant. The current reality is that state‑of‑the‑art models are significantly more capable than self‑hosted alternatives. A good measure for whether a model is “good enough” is the amount of unsupervised work that a model can do reliably.

Building with AI Assistants + MCP

Now that we have a backlog in hand, the real work begins. At this stage, because we have been using our ticketing system, we can choose to use AI or a developer for each backlog item.

AI agents, acting as developers, pick up items and implement them using AI assistants integrated with MCP.

The prompts for the coding assistant might look like:

“Implement and test backlog item nnnn.”.        

The intent is to iterate until we have met our definition of done. This goes beyond unit tests: it includes passing tests against the acceptance criteria, integration checks where relevant, adherence to coding standards, security standards, updated documentation, and alignment with project guidance.

To set expectations, the GenAI assistant is unlikely to get everything right first time. The assistant generates and runs code and tests in a feedback loop. Whether you see tests as coming first or code as coming first, the key point is that both evolve together until the definition of done is met. It will automatically correct build errors, and when it runs tests it may need to redo some of the code. Sometimes the developer will need to intervene.

There are two key contexts that the GenAI assistant needs for this process:

  • The specification and design which are in the backlog item; think of this as the what we are building.
  • Instructions and guidance, typically in the form of Markdown files, that provide the how we are building.

The human‑in‑the‑loop is essential to this stage of the process: to review what the agent is doing and intervene when the GenAI asks for assistance or where the developer has concerns about the steps the agent is performing. Whilst it is tempting for a developer to intervene by writing code or prompts, to make this a simple repeatable process, the developer instead needs to:

  • Identify the cause of the concern or issue.
  • If it relates to the what being built, update the specification (and align other backlog items).
  • If it relates to the how the backlog item is being implemented, update the instructions.

This process of code creation is not magic, and the developer’s role and experience are essential. It is an iterative collaboration.

Why MCP Matters

The Model Context Protocol (MCP) is crucial to making this whole process repeatable and reliable. MCP ensures that the Coding Assistant has structured access to all the context it needs: the specification, implementation instructions, standards, library documentation, test scripts, and other project artefacts. Without this context, even the most capable model would operate with gaps, leading to errors, rework, and drift.

By supplying structured context rather than relying on memory or long prompts, MCP enables the assistant to:

  • Align every action to the specification and design.
  • Apply client- and project-specific standards consistently.
  • Reuse design and environment guidance across projects.
  • Access library and API documentation on demand.
  • Generate, execute, and validate automated tests alongside implementation.

In practice, MCP transforms the assistant from a helpful autocomplete tool into a reliable participant in the development workflow, ensuring that code creation and testing remain grounded in intent, standards, and shared context.

Testing and Automation

No feature is complete until it is tested end to end and deployed to the requisite environments. The GenAI assistant can help here by generating scripts for end‑to‑end validation. Some steps can be fully automated; others are flagged for manual testing. The important point is that testing is broader than just unit tests—it includes integration, acceptance, security, and deployment checks. This ensures that code remains aligned with specifications and that changes don’t introduce regressions.

Testing becomes part of the same prompt‑driven workflow, linking requirements → backlog → code → tests → deployment → validation.

Issues and Learnings

Working with GenAI isn’t frictionless. Some hard‑earned lessons include:

  • Human guidance is non‑negotiable. AI can drift without supervision.
  • Context contamination is real. Start a new chat per backlog item to avoid bleeding instructions.
  • Agents overreach. Without guidance, they attempt refactoring or unrelated tasks.
  • Tooling gaps exist.

On the last point, although coding agents still have limitations, they are advancing at a remarkable pace. Today their strengths markedly outweigh their shortcomings, and their trajectory suggests even greater capability ahead. I have deliberately avoided naming specific tools in this article because the techniques described are tooling‑agnostic. Vendors themselves do not possess a unique 'secret sauce'; they all rely on the latest large language models with prompts and patterns that are discoverable and repeatable by other vendors.

Beyond the Code

It is tempting to assume that software can be fully understood from its source code, but this is not the case. Code captures what the system does, but much of the why—the decisions, trade-offs, and cultural influences—is lost. Decisions made during design, trade‑offs between performance and maintainability, organizational culture, and even the skill level of the original developers all shape the final product. These aspects cannot be reverse engineered from the code alone.

Other irretrievable elements include:

  • Business intent and priorities: Code may implement a feature, but it rarely explains why it was prioritized over alternatives.
  • Cultural context: Coding style, naming conventions, and architecture choices often reflect team culture and norms.
  • Constraints and compromises: Security concerns, compliance requirements, and time pressures drive decisions invisible in the code.

This highlights why specifications, documentation, and structured GenAI workflows are essential. They preserve context that cannot be reconstructed from source code, ensuring that future development builds on intent rather than guesswork.

As AI builds software reliably from intent and specification, ‘code’ loses its primacy—it becomes a byproduct. This mirrors the philosophy of reproducible builds, where binaries lose meaning as long as they can be deterministically generated from the source

Consequences for Teams and Organizations

GenAI in the Software Development Life Cycle (SDLC) isn’t just a developer story. Business analysts, testers, architects, and PMs all benefit when AI tools plug into shared documentation and ticketing systems. Consistency demands that everyone has access to the same context, not just the coders.

This shift changes the shape of collaboration. Instead of throwing documents over the wall, the team works through a shared AI‑enhanced fabric. It also has implications for governance and compliance: consistent, transparent documentation of requirements, backlog, and tests makes it easier to demonstrate alignment with regulatory obligations and internal quality standards.

Conclusion

Generative AI reframes software development. The work shifts from typing syntax to codifying intent. The lifecycle becomes an unbroken chain:

Idea → Requirements → Backlog → Code → Tests → Deployment → Working Feature.

It is faster, more traceable, and potentially more inclusive. But it still relies on thoughtful human guidance. Our potential to generate business value with software has long been constrained by cost, resource availability, and complexity. New GenAI capabilities massively unlock this potential.

This is not theory - this is how smart teams already deliver - intent-driven software with measurable speed, consistency, and impact. Whilst its not appropriate to share details here, there are initiatives like spec-kit from Github that will provide you with an accelerator for this approach.

Looking ahead, as AI builds software reliably from intent and specification, ‘code’ loses its primacy—it becomes a byproduct.

Appendix

The following is included for readers unfamiliar with EARS and Gherkin.

EARS (Easy Approach to Requirements Syntax)

A lightweight, controlled‑natural‑language framework that constrains free‑text requirements into defined patterns. It uses keywords like While, When, If, and Where to create unambiguous statements. Each requirement is written in a simple template that makes it easier to read, review, and translate into tests. Typical patterns include:

  • Event‑driven: "When X happens, the system shall do Y." These capture cause‑and‑effect behaviors clearly.
  • State‑driven: "While X is true, the system shall do Y." These describe continuous conditions and responses.
  • Optional: "If X occurs, the system shall do Y." These cover conditional or exceptional requirements.
  • Unwanted behavior: "Where X occurs, the system shall not do Y." These help capture safety and security requirements.

By applying these patterns consistently, teams avoid ambiguity and make requirements easier to validate and automate. EARS is especially effective for specifications because it reduces ambiguity, provides a consistent way to capture the why, what, and how of a system, and accelerates the path to automation. Example:

“When the user clicks Submit, the Payment System shall validate card details.”        

Gherkin

A business‑readable domain‑specific language used to describe behavior in plain language. It is the standard for Behavior‑Driven Development (BDD). Gherkin uses a simple syntax with Given, When, Then statements to define testable scenarios. Gherkin is particularly valuable for acceptance tests, ensuring that requirements are translated into clear, executable validations. Example:

Scenario: Withdraw cash when funds are sufficient
  Given the account balance is $100
  And the card is valid
  And the ATM contains enough money
  When the account holder requests $20
  Then the ATM should dispense $20
  And the account balance should be $80
  And the card should be returned        


Hello David, hope you are well! Loved how you broke this topic down providing simple and usable cues and insights. Thank you for sharing your perspective, always loved knowing your take on things.

Like
Reply

Thanks for sharing your view on this important topic, David🤩

To view or add a comment, sign in

More articles by David Rutter

Others also viewed

Explore content categories