From Vibe Coding to Harness Engineering
In a little over a year, AI-assisted coding has moved through four distinct eras — vibe coding, prompt engineering, context engineering, and now harness engineering. Each one has quietly shifted where the real leverage lives, and if you blinked, you may have missed the point where the game stopped being about the model and started being about everything wrapped around it.
It started with vibe coding. Andrej Karpathy coined the term, and for many developers it was their first taste of writing software by feel rather than by syntax. Before that, most of us lived inside tab completions. You'd write a comment at the top of a file and let the model finish the rest, one suggestion at a time.
Prompt engineering was the first real break from that pattern. Instead of coaxing the model line by line, you could describe what you wanted in natural language and get a meaningful chunk of code back. A wave of VS Code extensions landed around the same time, introducing inline prompting and conversational edits directly inside the editor. For anyone who knew what they were doing, it was at least a 5–10x speed-up over tab completion.
Then came context engineering. This was an evolution on top of prompt engineering, but the center of gravity moved. The prompt itself didn't have to be elegant anymore, it just had to be okay. What mattered was the context you fed the model, and context engineering really comes down to four parts:
This is the era where RAG chatbots really thrived. The prompt engineering era produced them; context engineering made them actually work.
Recommended by LinkedIn
And now we're in the harness engineering era. Every big lab is focused on perfecting their harness, and mid-sized companies will follow in the coming months.
A harness is anything that helps the LLM. Think of it as the orchestration layer around the model. When you type a prompt into Claude Code or Codex, the harness is what decides: if the task needs a file search, call the search function; if the output is code, render it in a bash terminal; if a command needs to be executed, go through the terminal and report the result; if the request is ambiguous, ask the user a clarifying question, and make it a multiple-choice one so it's easy to answer. All of those decisions are fixed in the harness, not in the model, and together they give the LLM the best possible chance of producing a high-quality final response.
The whole arc, from vibe coding to harness engineering, tells us something uncomfortable: the LLMs themselves slowed down their evolution roughly a year and a half ago. Most of the real progress since then has come from context engineering and harness engineering, even at OpenAI and Anthropic. You can read it straight off the model naming conventions. No one really wants to say it out loud, but we're at Claude Opus 4.7 low medium, high, extra high, max.. and GPT 5.4 low, medium, high. The labs are clearly finding it harder to make the next leap with the base models, and are putting their energy into the apparatus around them instead.
If that's the ceiling for now, it's probably the right move. The big labs have largely exhausted the public data pool. What's left is steering these models — teaching them to use tools, giving them the right context, and wrapping them in harnesses that make their responses genuinely useful.
The good news for everyone outside the big labs: SaaS companies are now in a position to build their own harnesses, borrowing patterns from the frontier labs and customizing them for their specific domains. As the underlying models evolve, the shape of those harnesses will keep shifting. But the window to build something genuinely custom is open right now, and every SaaS company should be using it. That's what I am trying to do at least.
Agree on the era framing. One nuance worth adding: the lock-in isn't only big-lab shaped — OSS harness frameworks are evolving weekly, and today's clean abstraction is next quarter's rewrite. The move I'd bet on is building thin, use-case-driven harnesses you actually own: policy gates, tool routing, context strategy, evals. Borrow patterns from the frameworks, don't inherit their abstractions. The moat is in your use case, not in whichever harness you picked in 2025 or in 2026.
The harness engineering era framing is accurate and the timing is right. The teams that shipped reliable agents in 2025 stopped treating the orchestration layer as boilerplate and started treating it as the product. Policy gates, eval harnesses, context routers, tool call auditing: the harness is where agent reliability either gets built in or does not exist. SaaS companies that let the big labs define the harness shape will find themselves locked into whoever controls the runtime. Building the harness is building the moat.