Stop vibe coding blind. Start using Test Driven Navigation

Andre Kaminski

Published Feb 18, 2026

This is a pattern I observe in posts and articles on agentic development. A developer prompts Claude Code or Cursor to build a feature. The agent generates code. The developer scans the output, tweaks a variable name, and ships it.

A week later, something breaks in production. Nobody traces the failure to a specific decision. Because there was no decision. There was a prompt and an output. The space between them was unstructured.

Steve Yegge and Gene Kim describe this exact failure mode in their book “Vibe Coding”. Steve puts it bluntly: the coding agent had silently deleted or hacked the tests to make them work, and had outright deleted 80 percent of test cases in one large suite. Gene frames 4,000 lines in four days as a redefinition of what is possible. The question Test Driven Navigation answers is: possible for whom? Without tests derived from the business context, those 4,000 lines are unverified assumptions.

The problem is not the speed. The problem is the absence of verifiable success criteria.

Alex Bunardzic discussed the Test-Driven Navigation concept at Devoxx Belgium 2025, demonstrating how AI, when paired with tests, safely guides refactoring through legacy code.

In traditional TDD, tests are about design discipline. Kent Beck 's Red-Green-Refactor loop has been around for 25 years. Most developers know it. Most do not practice it because the upfront cost feels high when you write both the tests and the implementation.

Agentic tools change the equation. The cost of generating code dropped to near-zero. But the cost of wrong code did not drop. The agent generates confidently, quickly, and at volume. A human developer who is uncertain will pause and think. An LLM will not. It will write more code.

Test-Driven Navigation reframes what tests do in this new workflow. Tests are not about design. They are about navigation. They tell the agent where to go, confirm it arrived, and prevent it from drifting into confident-sounding nonsense.

But tests need to come from somewhere. Prompting the agent with "write tests for a discount calculator" produces tests validating whatever the agent decides to build. This tells you nothing about whether the code does what your business needs.

In my book “The AI-Native Software Development Lifecycle”, I describe a Five-Layer Prompt Architecture. Layer 1, the Context Prompt, defines the application's domain, purpose, constraints, and success criteria. Every downstream layer inherits from it. Layer 4, Testing and Validation Prompts, inherits directly from the Context Prompt and the module specifications above it.

This is the key discipline: you define the Context Prompt. You use AI to help you generate tests based on it. Then the agent implements code to satisfy those tests. The Context Prompt is the source of truth. The tests are the executable form of the truth. The agent is the engine.

Let me make this concrete with Python.

You have a Context Prompt specifying: customers in the gold tier get 20% discount, customers in the silver tier gets 10%, customers in the unknown tiers get no discount, and negative prices are invalid.

You generate the tests from this context:

Recommended by LinkedIn

Stop Prompting for Code. Start Prompting for Specs:…

Christian Merkwirth 1 month ago

How We Broke Build Success While Fixing Problems: A…

Dragan Spiridonov 7 months ago

Vibe Coding: Master the Guardrails, Not the Hype.

Pushkar Rattana 6 months ago

# test_discount.py

import pytest

from discount import calculate_discount

def test_gold_tier_gets_20_percent():

    assert calculate_discount(100.0, "gold") == 80.0

def test_silver_tier_gets_10_percent():

    assert calculate_discount(100.0, "silver") == 90.0

def test_unknown_tier_gets_no_discount():

    assert calculate_discount(100.0, "bronze") == 100.0

def test_negative_price_raises_error():

    with pytest.raises(ValueError):

        calculate_discount(-50.0, "gold")

Now you tell the agent: implement calculate_discount in discount.py to make these tests pass.

The agent produces:

# discount.py

TIER_DISCOUNTS = {"gold": 0.20, "silver": 0.10}

 

def calculate_discount(price: float, tier: str) -> float:

    if price < 0:

        raise ValueError("Price cannot be negative")

    rate = TIER_DISCOUNTS.get(tier, 0.0)

    return round(price * (1 - rate), 2)

Clean. Minimal. Exactly what the tests demanded. No surprise "platinum" tier nobody asked for. No 50-line class hierarchy when a dictionary lookup does the job.

One of the biggest risks in agentic development is not wrong code. It is too much code. Code you did not need. Code introducing complexity you now have to maintain. Tests derived from a Context Prompt constrain the output to exactly the scope you specified.

In practice, the loop runs like this:

Context Prompt -> AI generates tests -> Agent implements -> pytest runs

If RED: agent reads failures, iterates

If GREEN: you review the diff, commit

The moment you let the agent write both the tests and the implementation, you lose the feedback loop. The agent will write tests validating its own code. I have seen teams try this shortcut. The agent produces tests and implementation together, all tests pass, everyone feels productive. Then in code review someone notices the tests do not cover negative prices, or the edge case where tier is None, or the behavior when the price is zero. The tests were written to validate the implementation, not to define the requirement.

An LLM does not "know" what your function should do. It has a statistical model of what functions like yours tend to look like. A fundamentally different thing. When a human developer writes code without tests, there is still a brain running informal verification. The LLM does not have this. Tests give it something it lacks on its own: a concrete, executable definition of correct behavior.

If you are using an agentic coding tool today and you are not doing this, start small. Write a Context Prompt for one module. Four or five lines of business rules and constraints. Use AI to help you generate a test file from the context. Then ask the agent to implement against those tests. Compare the output to what you get from a bare "build me a function" prompt.

The difference is usually obvious on the first try.

The Context Prompt concept extends beyond code generation into a full five-layer prompt hierarchy governing architecture, modules, testing, and operations. I cover this in ‘The AI-Native Software Development Lifecycle’. The code generation layer is where most teams start, because it is where the pain is most visible. But the architectural thinking is the same at every layer.

Christian Reyes 2mo

the "too much code" problem is a definition problem in disguise. AI builds more when it hasn't been told what "enough" looks like. it fills the blank - a 50-line hierarchy, three abstraction layers, helper functions four deep. not because it's bad at coding. because no one defined the constraints. the code bloat is a planning failure, not a model failure.

Girish Limaye 2mo

In many cases the risk isn’t correctness - it’s complexity inflation. Without clear constraints, agents tend to optimize for completeness rather than simplicity. Guardrails like tests or scoped prompts become essential.

1 Reaction

Johnny Nel . 2mo

I've learned to stop agents before they over-engineer solutions. I request the simplest version first, then iterate only when necessary. This approach saves hours of refactoring unnecessary complexity.

Stop vibe coding blind. Start using Test Driven Navigation

Andre Kaminski

Recommended by LinkedIn

More articles by Andre Kaminski

Others also viewed

The Compound Engineering Plugin: Why It Matters

Vibe Coding: The Promise and Peril of AI-Assisted Development

From Vibe Coding to Spec-Driven Development: How Kiro Changes AI-Assisted Coding

On adapting the software factory to agentic coding at Omniscient

Agentic AI in Code Review: The Future of Intelligent Development with VS Code

Engineering the Co-Architect: Why the Next Generation of IDEs Must Be Agent-Native

Vibe Coding Got You Here. Now What?

Software Déjà Vu: The "Vibe Coding" Crisis and the Rise of Specification-Driven Development

Introducing Iterative Flow Development (IFD): How I Built 13,000 Lines of Production Code in One Day

Keeping the Spec Alive: A Practical Take on Spec-Driven Development

Explore content categories

Recommended by LinkedIn

More articles by Andre Kaminski

I Turned My Book Into a Platform. Its First Application Came Alive in Two Hours.

The AI Skill Your Best Developers Don't Have (And Your Juniors Do)

89% of Organizations Are Automating Broken Processes With AI. Only 11% Are Redesigning Them.

Your AI Coding Assistant Has a Context Problem. Here's the Fix

When Tech Giants Try to Bolt AI Onto Legacy Architecture: A $50 Billion Lesson

The AI Revolution Nobody's Talking About: Why Your Organization is Building Tomorrow's Software with Yesterday's Thinking

Stop Asking If AI Will Replace You - Ask How It Will Change You

Unlock the Power of Productive Meetings: A Blueprint for Success

The shift from Project to Product thinking – why is it important

Common Engineering - To Standardize or Not to Standardize

Others also viewed

The Compound Engineering Plugin: Why It Matters

Vibe Coding: The Promise and Peril of AI-Assisted Development

From Vibe Coding to Spec-Driven Development: How Kiro Changes AI-Assisted Coding

On adapting the software factory to agentic coding at Omniscient

Agentic AI in Code Review: The Future of Intelligent Development with VS Code

Engineering the Co-Architect: Why the Next Generation of IDEs Must Be Agent-Native

Vibe Coding Got You Here. Now What?

Software Déjà Vu: The "Vibe Coding" Crisis and the Rise of Specification-Driven Development

Introducing Iterative Flow Development (IFD): How I Built 13,000 Lines of Production Code in One Day

Keeping the Spec Alive: A Practical Take on Spec-Driven Development

Similar topics

Vibe Coding and Its Impact on Software Engineering

Why LLM Code Needs More Than Unit Tests

How to Refactor Code Thoroughly

Explore content categories