Architecture After Code in the AI Era

Architecture After Code in the AI Era

I read Annie’s article Finding Comfort in the Uncertainty, and I realized my first reaction to it was too generic. It’s easy to read her piece as a meditation on ambiguity, or as another reminder that “things are changing fast.” But that’s not what she actually wrote. Her article is much more concrete. It’s a report from a very specific place and time: a room full of experienced practitioners who came together expecting answers, and instead discovered that nobody really has them yet. Annie describes conversations that mix technical skepticism (“AI-generated code is pretty crap”) with very real pressure (“people are losing their jobs”).

“And nobody has it all figured out.”

The problem isn’t uncertainty. It’s where it shows up now.

One of Annie’s key observations is that for many teams, engineering capacity is no longer the bottleneck. Agents can generate code faster than organizations can absorb change. Backlogs disappear, but delivery doesn’t speed up in the way people expected.

What slows things down instead is decision-making, verification, coordination, and trust.

Architecturally, this is a big shift. For years we optimized systems around the idea that writing code was expensive. AI breaks that assumption. Code is now cheap. Understanding is not.

This is why reviews feel heavier, not lighter. This is why people approve changes they don’t fully understand. The bottleneck didn’t go away. It moved.

Architecture has to move with it.

When code is cheap, intent becomes the scarce resource

The most architecturally interesting part of Annie’s article is where she describes sessions questioning the primacy of source code.

She quotes people talking about code as “just another projection” of intended behavior. That sounds abstract until you look at what agents actually do. Given enough context, they can regenerate an implementation again and again. What they can’t reliably regenerate is intent unless we make it explicit. This is where TDD and BDD quietly change meaning. Tests stop being a safety net for code. They become the durable artifact. They are the thing that survives regeneration.

A TDD test pins down a local contract. A BDD scenario pins down domain meaning. Together, they act as executable intent. In an AI-assisted workflow, this flips the usual hierarchy. Code becomes disposable. Tests become architectural.

Architecture moves into the “middle loop”

Annie uses the phrase “supervisory engineering” to describe the kind of work she brought into the retreat. That phrase sticks because it names something many teams already feel but haven’t articulated.

She describes a new “middle loop” emerging. The inner loop of coding is increasingly automated. The outer loop of delivery still exists. But between them sits a growing layer of supervision: steering agents, evaluating output, deciding what to trust, and understanding what actually changed.

That middle loop is architecture now.

This is also where risk accumulates. Annie mentions skill decay and trust debt. If people stop understanding the systems they ship, their ability to supervise collapses. And once supervision collapses, speed turns into chaos.

So the architectural conclusion isn’t “we should code faster with agents.” It’s: we have to redesign the system of work so human judgment remains sustainable. Because we just relocated the hardest work into the human brain.

Why Java suddenly makes a lot of sense again

The case for Java in the AI era is not that it’s trendy. It’s that it behaves like a constraint system with an enormous ecosystem of constraint tooling. AI thrives when the world is crisp.

Java is one of the few languages where constraints are a first-class design goal

“The Java programming language is strongly and statically typed.”

Why that matters for AI-assisted development is straightforward: the more mistakes you can surface as deterministic compiler feedback, the less you rely on human intuition to “smell” correctness in machine-generated output.

In practice, Java gives AI:

  • A narrower space of valid programs (because types, visibility, and declarations constrain what can compile).
  • Higher-quality, earlier failure signals (compiler errors are fast, specific, and repeatable).
  • More reliable refactoring surfaces (because the compiler becomes part of your verification pipeline).

This doesn’t eliminate uncertainty, but it reshapes it into something that can be iterated on cheaply. In an AI-assisted workflow, every mistake caught deterministically is one less thing a human has to reason about in the middle loop.

Benchmarks suggest AI is already broadly competent in Java, and the hard part moved elsewhere

Adam Bien showed the AutoBench results in his Jfokus session recently. Across popular languages, the difference between top models’ average Pass@1 is small, which the authors interpret as evidence that models have been adequately trained on widely used languages.

This supports a pragmatic architectural stance: if you’re building on Java, the baseline “AI can write plausible code in this ecosystem” assumption is more defensible than it is in lower-resource languages. The differentiator is not whether AI can write Java syntax, it’s whether your architecture makes intent verifiable.

AutoCodeBench’s other results reinforce the point. Performance drops significantly on multi-logical problems (tasks that require implementing multiple distinct functions/classes), and the authors explicitly connect this to real-world agent scenarios. That is exactly where architecture lives: orchestration, boundaries, cross-cutting behaviors, and invariants, i.e., the things LLMs struggle with when structure is implicit.

Quarkus makes Java even more “AI-shaped” by standardizing the shape of applications

When you pair Java with modern frameworks like Quarkus, you’re not just picking a runtime. You’re picking an opinionated structure generator that reduces the number of architectural degrees of freedom.

Quarkus REST is explicitly an implementation of Jakarta REST built to be tightly integrated and to move “a lot of work to build time.” That build-time shift is architectural leverage: it’s where metadata becomes enforceable, where code generation becomes consistent, where mistakes can be caught before runtime.

Quarkus also leans into standards and predictable annotation-based programming models. Jakarta REST itself is defined as a foundational API for RESTful web services under Jakarta EE. Stable standards matter for AI because agents generate better code when the “right way” is publicly documented, conventional, and consistent across projects.

Add in frictionless local environments (Dev Services auto-provisioning) and you get a platform that is structurally legible to both humans and agents.

This is the deeper point: good frameworks reduce the cost of supervision. They compress an enormous space of possible architectures into a smaller set of conventional, verifiable shapes.

APIs that are easy for AI are the same APIs that are easy to govern

Vella’s “ledger” theme is the governance endpoint of this whole conversation:

“we need a complete, verifiable record of everything agents do.”

This “ledger” idea is not mystical. It’s an extension of classic audit trail thinking: the ability to reconstruct events and changes is foundational to accountability, recovery, and trust.

Java + Quarkus + standards-based APIs push you toward systems where:

  • Changes are version-controlled and compile-verified.
  • Behaviors are executable via tests (TDD/BDD).
  • Interfaces are describable and machine-checkable (e.g., via generated API specifications).
  • Platform defaults reduce variance, which reduces governance surface area.

That is exactly the architectural direction implied by Vella’s retreat notes: if the world can now generate unlimited software, then our job becomes building the constraint systems that make that software trustworthy, habitable, and accountable.

In the AI era, architecture doesn’t disappear.

It moves upstream.

Integration testing and architectural decisions are becoming increasingly important and probability of colossal fails increasing rapidly with so called vibe coders who have no idea how to supervise a system.

Like
Reply

You are always sharing very good articles, thanks!

I thoroughly enjoyed your piece, thanks Markus 🙏

To view or add a comment, sign in

More articles by Markus Eisele

Others also viewed

Explore content categories