Show, Don't Tell: The Power of Sample Implementation for Coding Agents

Zachary Markin

Published Apr 23, 2026

Really this title should read “Show and Tell” but more about that later. If you stick with me until the end, you’ll see what I mean.

There are so many opinions being shared about AI in software engineering but not much real data. To cut through the noise, our team at HTD has been running experiments to figure out what actually works. I'll be sharing results as we go, including what worked, what didn't and what surprised us.

Recently, the HTD Labs team ran an experiment to answer a question that sounds simple but turns out to be surprisingly poorly understood: what drives code quality when AI agents generate software components?

Some people may suggest that “code quality” does not matter in a world where agents generate all the code. Who cares about readability when the bots generate everything?

We take a more modest perspective, particularly when building software for critical systems such as healthcare and medical devices, that code quality is multi-dimensional and will always matter. Readability, traceability, security, runtime performance, accessibility and regulatory compliance are just some of the dimensions that will always matter in production software systems. As a result, as a firm we are betting on quality as a foundation.

There’s also a practical reason: large language models fundamentally work in a way where high quality, well structured code, that is readable and semantically well structured, facilitates more efficient automated code generation. Fundamentally LLM's are producing output which is mathematically closest in a multi-dimensional space relative to an input which has been transformed and functionally constrained to the same output dimensionality. As a result, it stands to reason that dimensional consistent inputs will be hugely beneficial for generating high quality outputs.

In our experiment we gave a coding agent a user story and supporting context to generate a React component. In some cases, the context was limited to well specified markdown guidance for implementation, in this case specified for ADA compliance. In other cases, the context included that same guidance along with executable code examples.

While we expected that machine guidance provided together with a sample implementation would perform better than guidance alone, we were surprised to see just how much of a difference it made:

Implementations generated from both guidance and code examples were two to three times more accurate and in this case fully compliant with ADA guidelines (scoring 8-9 out of 10 in our scorecard)
Implementations leveraging only markdown guidance scored 3-4 out of 10
These results held for implementations that were functionally different from the code samples provided

Recommended by LinkedIn

The AI Code Quality Crisis Nobody is Talking About…

Raul Proenza 1 month ago

From Snapshot Evaluation to Session‑Level Quality in…

Soundararajan Srinivasan 1 month ago

Orchestrating Multi-Agent Systems for Software Delivery

NxtGen Cloud Technologies Pvt Ltd 3 months ago

The implication about what the AI agents are actually doing with the information you provide is clear: AI agents appear to more effectively compose from examples compared to interpreting and following written rules.

This makes sense when recognizing that LLMs (even “reasoning” LLMs) do not reason in the sense of formal logic. Instead as stated above, they generate a result which honors a constraint function and is closest in a computed multi-dimensional space to the untransformed input. Naturally then generation will be more effective when provided with both instruction and a prototype.

The documentation most engineering teams have built in the past was designed for human developers. Style guides, compliance checklists, and architecture decision records all assume a reader who can interpret an abstract rule and apply it to a specific situation. That's a sound assumption when the reader is a person. Our data suggests it's a flawed one when the reader is an AI coding agent.

Memorization vs. Intelligence

The common mental model is that AI agents read documentation, understand what's required, and reason about implementation. What we observed in our experiment looks closer to pattern composition. The agent looks at the working code in its context and produces output that mirrors those patterns. When the examples include accessibility attributes, the output includes them. When accessibility exists only as a checklist in a separate document, it doesn't reliably transfer even though that document was in the same context window.

LLMs process natural language and therefore prose context matters for scoping tasks and conveying business logic. But when it comes to generating code that conforms to specific technical standards, pattern composition appears to dominate over rule interpretation by a wide margin.

The key takeaway is not that LLMs are unintelligent, but rather that the behavior we observed looks much more like sophisticated composition from examples than reasoning.

The practical consequence here is that most organizations' documentation investments need a parallel layer they don't currently have. The prose standards still serve human team members and still help scope what the AI builds. But those same standards also need to be expressed as executable code embedded directly in the patterns the AI will compose from.

This is a clear signal from one experiment, not a universal law. But if the actual mechanism is pattern composition rather than rule interpretation, then a lot of the current assumptions about AI-assisted development, such as the tooling and the way teams prepare context, may be built on a misunderstanding of what the AI is doing with the information we give it.

If you're investing in AI-assisted development, it's worth asking: are your standards expressed as rules for humans to read, or as patterns for agents to compose from?

As we run more experiments, we'll keep sharing what we find.

Dominik Dyga 1w

Code quality always matters. AI agents need human oversight to ensure generated code is production-ready. Codaro verifies AI output before deployment.

1 Reaction

Brendan Keeler 1w

Locked in

2 Reactions

See more comments

To view or add a comment, sign in

Show, Don't Tell: The Power of Sample Implementation for Coding Agents

Zachary Markin

Recommended by LinkedIn

More articles by Zachary Markin

Others also viewed

Requirements-as-Code for AI-Augmented Software Engineers

Engineering Eval Systems for LLM Agents

Faster Coding with AI and Increased Regressions

From Vibe Coding to Vibe Engineering

The New Code: Why Spec-Writing Is the Real Superpower in the Age of AI

Why Software Fundamentals Matter More Than Ever in the Age of AI

How I Avoid AI "Dark Code"

🚨 We're Coding 4x Faster But Our Validation Systems Are Failing 76% of the Time

When AI Agent Swarms Write Code Faster Than You Can Delete It

When AI Writes Code vs. When You Build Together

Using Code Generators for Reliable Software Development

The Future of Coding in an AI-Driven Environment

Solving Coding Challenges With LLM Tools

Improving LLM Coding Accuracy with Code Intelligence

Explore content categories

Recommended by LinkedIn

More articles by Zachary Markin

Does Technology Make It Better?

Data Integrations and an Inclusive Web

Others also viewed

Requirements-as-Code for AI-Augmented Software Engineers

Engineering Eval Systems for LLM Agents

Faster Coding with AI and Increased Regressions

From Vibe Coding to Vibe Engineering

The New Code: Why Spec-Writing Is the Real Superpower in the Age of AI

Why Software Fundamentals Matter More Than Ever in the Age of AI

How I Avoid AI "Dark Code"

🚨 We're Coding 4x Faster But Our Validation Systems Are Failing 76% of the Time

When AI Agent Swarms Write Code Faster Than You Can Delete It

When AI Writes Code vs. When You Build Together

Similar topics

Using Code Generators for Reliable Software Development

The Future of Coding in an AI-Driven Environment

Solving Coding Challenges With LLM Tools

Improving LLM Coding Accuracy with Code Intelligence

Explore content categories