Building for the Wrong Model
Most enterprise AI investments are compensating for limitations that are about to disappear. When they do, the scaffolding becomes the problem.
Every enterprise AI deployment I have read about in the last six months has the same architecture. A thick layer of prompt scaffolding to compensate for model confusion. A retrieval pipeline to spoon-feed context the model cannot find on its own. Hardcoded business rules because the model cannot infer them reliably. Procedural guardrails bolted on after the model did something unexpected in production. Human review gates at every stage because nobody trusts the output enough to let it ship.
This architecture is rational. It reflects the models we have today. It also means that most enterprise AI investment is not building capability. It is building compensation. And compensation has a shelf life.
The Rough Edges Are Disappearing
Claude Mythos found a 27-year-old bug in OpenBSD that survived decades of human review. It produced 181 working exploits against Firefox where the previous best model managed two. It was not trained for cybersecurity. It was trained to understand code deeply, and the security capability emerged as a side effect of reasoning well.
This is not an incremental improvement. It is the kind of step change that makes entire categories of scaffolding unnecessary. A model that reasons this well about code does not need a 3,000-token procedural prompt telling it to classify intent, then check for hallucinated URLs, then verify against the knowledge base. It needs to know the goal and the constraints. The rest it can work out.
The rough edges we have spent the last two years building around, the hallucinations, the context window limitations, the inability to follow multi-step processes reliably, are not permanent features of AI. They are symptoms of current model capability. And current model capability is about to change significantly.
The Stranded Investment Problem
Here is the uncomfortable pattern. The more an organization has invested in compensating for model limitations, the harder it will be to take advantage of models that no longer have those limitations.
That 3,000-token system prompt is not just unnecessary with a smarter model. It is actively harmful. It over-constrains the model, forces it into procedural steps it no longer needs, and wastes context window on instructions that a more capable model would handle implicitly.
I learned this the hard way. I had built an elaborate planning skill for my development workflow: detailed SDLC commands, step-by-step procedures, structured phases. One day I forgot to invoke it. Claude planned the work better without it than it ever had with it. The detailed procedure was not helping. It was micromanaging. I ended up throwing away most of the commands I had built.
The lesson generalizes. AI is becoming more like skilled people. We do not micromanage a senior engineer’s implementation steps. We tell them what we need and why it matters, and they figure out the how. They choose the tools, the sequence, the architecture. That is exactly what capable models now do when we get out of the way. Every procedural instruction we leave in the prompt is a vote of no confidence in a system that no longer needs it.
The same pattern repeats across the stack. Complex RAG pipelines that pre-chunk, re-rank, and filter before the model ever sees the data: a model with better reasoning and larger effective context can often find what it needs from a well-organized repository without all that machinery. Hardcoded business rules embedded in prompts: a smarter model infers them from a single example. Human review gates at every handoff: when the model’s logic error rate drops, those gates become bottlenecks rather than safety nets.
The scaffolding that was infrastructure becomes technical debt overnight.
What Enterprises Should Be Building Instead
The organizations that will pivot fastest are the ones building around outcomes rather than around limitations. The difference is structural.
Outcome specifications, not procedural prompts. Instead of telling the model how to handle a customer inquiry step by step, specify what a good resolution looks like and what constraints it must respect. “Resolve this customer’s issue using our knowledge base and policies. The customer should leave satisfied. The resolution must comply with our return policy.” Compare that to the 14-category intent classifier, the five-article retrieval step, the response template. One survives a model upgrade. The other becomes a liability.
Recommended by LinkedIn
Constraints and guardrails that are model-agnostic. “Never disclose customer financial data” is a business rule that holds regardless of how smart the model gets. “Always classify intent into one of 14 categories before responding” is a process artifact that exists because the model needed it. Learn to tell the difference. Keep the business rules. Delete the process compensation.
Evaluation at the end, not checkpoints along the way. When models produce output that is right 99% of the time instead of 85%, intermediate review gates create more drag than value. Build one comprehensive evaluation at the end of the pipeline that tests everything: functional requirements, non-functional requirements, edge cases. If it passes, ship. If it does not, send it back. This is how you scale without making humans the bottleneck.
Tools with clear interfaces, not orchestration logic. Define what your tools do. Let the model decide when to call them and in what order. The model is increasingly better than we are at sequencing tool calls. Our job is to make sure the tools themselves are reliable and well-documented.
The Org Problem Nobody Is Talking About
The technical pivot is the easy part. The organizational pivot is where enterprises will struggle.
Most AI teams today are structured around compensating for model limitations. There are prompt engineers refining system prompts. There are pipeline engineers building retrieval architectures. There are review teams evaluating model output at every stage. When the models improve enough to make much of that work unnecessary, those roles do not disappear. They transform. The prompt engineer becomes an outcome specifier. The pipeline engineer becomes a tool designer. The reviewer becomes an evaluator of final output rather than intermediate checkpoints.
But this transformation requires organizational awareness that the ground is shifting. The teams that are rewarded for building ever-more-complex scaffolding need to hear, clearly, that simplification is the goal. That deleting a thousand lines of prompt is more valuable than adding a hundred. That the art of working with these models is increasingly about what we leave out.
In Montreal the research community at Mila has been studying these models longer than most. The interpretability work coming out of Bengio’s lab is not abstract. It is telling us, concretely, what these models can and cannot do, and the boundary is moving faster than most enterprise roadmaps account for.
The Window
Mythos is not the last model of its class. OpenAI, Google, and others will ship comparable capabilities within months. The question is not whether this step change is coming. It is whether your architecture is ready to benefit from it or whether it will need to be rebuilt first.
The organizations that invested in clean data, clear outcome specifications, model-agnostic guardrails, and end-to-end evaluation will plug in a smarter model and immediately see the benefit. The organizations that invested in compensating for a specific model’s weaknesses will find that their compensation layer is now the thing standing between them and the next generation of capability.
The window to simplify is now. Not when Mythos ships. Now. Because simplification takes longer than anyone expects, and the models are not waiting.
If you have been following The Hitchhiker’s Guide to the K-Shaped Economy, this is the lesson that ties the series together. Specification, Judgment, Decomposition, Orchestration, Intent, and Evaluation: every one of those skills becomes more valuable as models improve, and every piece of scaffolding that substitutes for them becomes less.