From Executable Code to Executable Judgment

Sudeep Vishnumurthy

Published Mar 30, 2026

A few weeks ago I published an article called “Stop Building Mouse Traps.” The core argument: most engineering teams build elaborate platforms before confirming the problem is real. I laid out the trap patterns, the warning signs, the questions to ask. My team jokes about printing the euphemisms on a wall. It made for a decent read. Maybe it sticks in the back of your mind next time someone pitches an internal framework.

Now consider this. Sahil Lavingia took the entire methodology from The Minimalist Entrepreneur and published it as executable Claude Code skills. One of them is called Validate Idea. You invoke it when you have a business idea. It does not tell you what to think about. It asks you who exactly has the problem, what workarounds they use today, whether you can solve it manually before writing a line of code. It pushes back: if you cannot name ten specific people with this problem, stop. If nobody is already paying for an inferior solution, stop. It runs you through a validation framework and gives you a verdict: validated, needs more evidence, or pivot.

My article transfers knowledge. His skill transfers judgment. Both encode operating experience. One you read and hope to remember. The other you invoke at the exact moment you need it, and it applies the author’s reasoning to your specific situation.

That distinction is what this piece is about.

The Real Bottleneck

Data alone is insufficient. Large language models already have more factual knowledge than any individual. They retrieve, summarize, and synthesize faster than any team. The missing layer is judgment: not “What are the facts?” but “Given these facts, what should I do?”

This is the knowledge that has always lived in experienced operators heads. The exception logic, the pattern recognition, the taste that tells you when something is off before you can articulate why. The SRE who can tell from a dashboard glance whether an incident is real or a monitoring blip. The product manager who kills features before they waste a quarter. None of them would describe what they do as “judgment.” They would call it experience.

I ran into this myself. I gave my AI agents every fact I had: financial records, health data, career goals, domain files, calendars. I gave them memory. The output was competent but generic. They knew what I knew. They did not know how I think. You could interview experts and try to extract this knowledge, but the nuance, the sequencing, the conditional logic: it evaporated in translation.

That is changing. Not through better interviews. Through a new medium.

What Garry Tan Actually Open-Sourced

When Garry Tan released gstack in March 2026, the headlines focused on the numbers: 10,000 lines of code per week, 100 pull requests, 20,000 GitHub stars in days. Those numbers are real, and they are also the less interesting part of what happened.

What Tan actually open-sourced was how he thinks about shipping software. Gstack is not a framework. It is not a library. It is fifteen structured prompts that encode his judgment about what matters at each stage of the development process. A CEO skill that rethinks the product before touching code. An engineering manager skill that locks architecture decisions. A designer skill that catches AI slop. A QA skill that opens a real browser.

Each skill is a markdown file. There is no clever engineering. The value is the judgment encoded in those files: decades of experience building products at Y Combinator, compressed into reusable, transferable instructions that any developer can install and invoke.

The repo is MIT licensed. Anyone can fork it. But what they are forking is not code. It is Garry Tan’s operating model.

Recommended by LinkedIn

The Context Engineering Paradigm in Modern LLM's and…

Soham Datta 2 weeks ago

What Claude Code gets right, what it gets wrong, and…

Scott Turnbull, PMP 3 weeks ago

What Claude Code's architecture teaches us about…

Scott Turnbull, PMP 3 weeks ago

What Actually Changes

When judgment becomes executable, three things shift in practice.

Delegation becomes auditable. When I hand a task to my AI agent, the skill defines how it reasons through the problem. I can read the logic. I can see where it weighs one factor over another. If the output is wrong, I do not debug code. I update the judgment. Two sentences in a markdown file. The fix propagates to every future invocation. Inside an enterprise, this changes what delegation means. A manager does not explain their reasoning in a meeting and hope it sticks. The reasoning lives in a skill that every agent on the team applies consistently.

Judgment becomes repeatable. Before I encoded my financial planning skill, I would re-derive the same tradeoffs every time a tax question came up: Roth versus traditional, conversion timing around income changes, RSU sequencing around vesting schedules. Now the agent applies my reasoning consistently. The quality of the output no longer depends on whether I remembered all the variables on a given day. It depends on whether the skill captures them. For a contact center, this is the difference between one case worker who knows when a prior authorization will get denied on appeal and an entire team that applies the same judgment to every case.

Operator taste becomes transferable. This is the one that matters most. When Garry Tan publishes gstack, any developer can ship software using his judgment about what matters at each stage. When Sahil publishes his entrepreneurship skills, any founder can apply his framework to their specific situation. The taste, the prioritization instinct, the pattern recognition that took years to develop: it moves. Not perfectly. But enough to close the gap between a novice operating alone and a novice operating with an expert’s judgment embedded in their tools. For organizations, this is the most expensive problem in the building. Operational judgment walks out the door every time someone leaves. Skills do not leave. They compound.

The book (or the manual) teaches the framework. The skill applies it.

You Do Not Need the Expert to Write It Down

Everything so far assumes the expert sits down and intentionally encodes their judgment. Garry Tan writes gstack. Sahil publishes his nine skills. That works when the expert is willing and self-aware enough to articulate how they think.

Not the case always. Most operational judgment inside an enterprise is not written down because the people who hold it do not always know they hold it. They just do the work.

There is a second path. Instead of asking experts to write their mental models top-down, you trace their work bottom-up. Log the conversations, the decisions, the context surrounding each one. Then generalize backward: what patterns emerge? What principles explain why this person consistently makes better calls than their peers?

This is implementable today with humans doing the pattern extraction. A team lead reviews a quarter of decision traces from their best operator, identifies the recurring tradeoffs and heuristics, and formalizes those into a skill. The expert does not need to introspect. Their work speaks for itself. The skill gets validated against new situations, refined, and deployed.

The sequence matters: trace first, encode second, automate third. You do not start by handing an AI agent a vague mandate to “make decisions like Sarah.” You start by understanding how Sarah actually decides, turning that into an inspectable skill, and then letting an agent apply it. The agent is the last step, not the first.

This is where decision traceability becomes critical. Not just logging what happened, but logging the context, the alternatives considered, the factors weighed, and the reasoning that connected them. When that data exists, generalizing it into reusable judgment is a human-scale problem today, and an AI-scale problem tomorrow.

The Principle

For years, the world’s best operating knowledge was locked inside experienced practitioners. Inspectable only through conversation. Transferable only through apprenticeship. Lost entirely when people moved on.

Open source solved this for code. Executable judgment solves it for operational competence.

The people who sit down and write their mental models as skills are doing the easy version. The harder, more valuable version is building the traceability infrastructure that lets you extract judgment from how your best people already work. Trace first. Encode second. Automate third. The organizations that get this sequence right will not just have better AI agents. They will have turned their most perishable asset into their most durable one.

From Executable Code to Executable Judgment

Sudeep Vishnumurthy

The Real Bottleneck

What Garry Tan Actually Open-Sourced

Recommended by LinkedIn

What Actually Changes

You Do Not Need the Expert to Write It Down

The Principle

More articles by this author

Others also viewed

Welcome to the Dark Side.

Agent engineering: Rethinking how we build and integrate in the age of AI

Time to stop writing "software"

Why Most RAG Systems Fail in Production (And It Has Nothing to Do With the LLM)

Beyond the Demo: The Production-Grade Architecture for Reliable LLM Systems

Refactoring 029 - Replace NULL With Collection

Beyond the Prototype: The Engineering Imperatives for Production-Grade Agents with Gemini

Four operating scenarios for directing agentic AI as an engineering execution layer

Building Reliable Agents with LangGraph

Explore content categories

The Real Bottleneck

What Garry Tan Actually Open-Sourced

Recommended by LinkedIn

What Actually Changes

You Do Not Need the Expert to Write It Down

The Principle

Orchestration Is Not Competence

Mar 23, 2026

Small Wins, Closing Loops

Mar 17, 2026

A Few Simple Ways to Break Your Agent

Mar 11, 2026

Stop Building Mouse Traps

Mar 9, 2026

I Don't Use AI to Write. I Use It to Ship.

Mar 5, 2026

Agent Swarms Are the New Microservices

Mar 5, 2026

The Abundance Paradox

Feb 26, 2026

The MCP Conundrum

Feb 25, 2026

Agents That Remember vs. Agents That Learn

Feb 22, 2026

Tokens Are the New Memory Leak

Feb 22, 2026

Others also viewed

Welcome to the Dark Side.

Agent engineering: Rethinking how we build and integrate in the age of AI

Time to stop writing "software"

Why Most RAG Systems Fail in Production (And It Has Nothing to Do With the LLM)

Beyond the Demo: The Production-Grade Architecture for Reliable LLM Systems

Refactoring 029 - Replace NULL With Collection

Beyond the Prototype: The Engineering Imperatives for Production-Grade Agents with Gemini

Four operating scenarios for directing agentic AI as an engineering execution layer

Building Reliable Agents with LangGraph

Explore content categories