Forward-Deployed Engineering, Human Judgment, and the Reality of Building with AI

Forward-Deployed Engineering, Human Judgment, and the Reality of Building with AI

By: Nathan Stricker, Chief Portfolio Officer, Sogeti, part of Capgemini

Executive Summary

In the span of five intensely focused days, I designed, built, and deployed a mission‑critical, live interaction app to support Sogeti’s annual Global Kickoff. On the day of the event, over 200 leaders interacted with it in real time to ask questions, allocate investments, and converge on a set of portfolio priorities that would shape real follow‑on action.

The app worked. That matters—but it is not the most interesting part of the story.

What mattered more was how it came together under pressure, what nearly derailed it, and what the experience revealed about building software in an AI‑augmented world when the outcome actually counts. This is not advice from a distance. It is a firsthand account of the decisions, missteps, resets, and tradeoffs involved in forward‑deployed engineering with AI in the loop.

Along the way, I confronted several questions many enterprises are quietly wrestling with today:

How fast is too fast? When does acceleration turn into fragility? And why do traditional engineering disciplines still matter when AI can generate working code in minutes?

The core conclusion is simple, but earned:

AI can dramatically accelerate delivery, but only human judgment, orchestration, and disciplined, end-to-end Quality Engineering make results trustworthy and recoverable when change is inevitable.

Why I Did This

I did not approach this effort as a theoretical exercise. I did it deliberately to put my money where my mouth is.

I spend a lot of time talking with clients and teams about AI‑accelerated delivery, modern engineering practices, and addressing growing enthusiasm around “vibe coding.” I am often asked where the real limits are, where quality breaks down, and how much governance is actually necessary.

I did not want to answer those questions from concepts, secondhand examples, or abstract frameworks. I wanted firsthand exposure to both the excitement and the discomfort. I wanted to feel the pull to move faster than was safe. I wanted to experience how easily momentum can be mistaken for progress. And I wanted to do it in a context where failure would be visible, not hypothetical.

A live executive event provides exactly that kind of constraint. There is no hiding behind roadmaps. Whatever you ship is what leaders experience. That pressure changes how you think, how you design, and how you decide.

The Context: Why This Was Different

The app was not intended to be a demo or an innovation showcase. It existed to support a very specific moment: a live, tightly choreographed executive forum where senior leaders would listen, engage, challenge ideas, and ultimately commit to priorities.

That meant the system had to do several things simultaneously:

  • Support multiple roles with real authority, not just viewers
  • Enforce timing and structure without drawing attention to itself
  • Make participation feel fair and transparent
  • Produce outputs that leaders could trust immediately

There would be no opportunity to “explain the app” during the event. If something was confusing, slow, or ambiguous, it would break the flow—and credibility—instantly.

This context eliminated a lot of otherwise tempting options. Clever features, experimental UX, and optimistic assumptions all became liabilities.

How the Work Actually Started

The first step was not writing code. It was deciding what not to allow.

Before any implementation, I spent time defining roles, phases, and boundaries. Who is allowed to act when? What must be locked? What is authoritative, and what is merely draft? These questions sound procedural, but they are foundational when you expect hundreds of people to interact simultaneously.

At this stage, AI was enormously helpful. Drafting specs, stress‑testing flows, and challenging assumptions went quickly. The danger, in hindsight, was that things felt too smooth. It was easy to believe that good prompts and clear issues would naturally translate into clean execution.

That assumption did not survive contact with reality.

The First Major Breakdown: Parallelism Without Control

Once the core issues and agent prompts were written, I made what seemed like a modern, rational decision: assign everything in parallel and let execution scale.

On paper, it looked efficient. Each agent had a clear task. The boundaries were documented. The intent was explicit.

In practice, the system began to drift almost immediately.

Each agent optimized locally. Small interpretation differences accumulated. Shared surfaces were touched in ways that technically made sense but broke overall coherence. Integration became increasingly fragile, and reviewing progress started to feel like archaeology.

The real warning sign was a simple question I could no longer answer confidently: Can I trust the current state of the system?

At that point, the only responsible move was to stop. Roughly two days of work were discarded. The codebase was reset.

That decision hurt—but pushing forward would have been worse.

The Reset: Slowing Down to Regain Control

After the reset, the operating model changed completely.

All work flowed through a single orchestrator directly overseen by me. Agents no longer operated independently. Instead, work was tightly sequenced. Each step had a clear starting point, a narrow scope, and an explicit definition of done.

Progress felt slower at first. There was less visible activity. Fewer things were happening at once.

But something important returned almost immediately: confidence.

At every point, I knew what had changed, why it had changed, and what state the system was in. Integration issues surfaced early instead of compounding silently. Decisions became easier because the blast radius was controlled.

This was not the fastest possible way to work. It was the safest way to move quickly under pressure.

Where Quality Engineering Showed Its Value

One of the most important lessons from this effort is how quietly Quality Engineering carried the load.

There was no formal “testing phase.” Instead, quality was embedded end-to-end throughout the exercise:

  • Unambiguity and testability of all requirements
  • Clear contracts instead of inferred behavior
  • Server‑side enforcement instead of optimistic UI flows
  • Explicit failure modes instead of silent degradation
  • Adaptable, continuous testing

AI can generate code quickly. What it cannot do is judge whether a system will remain trustworthy when stressed by real users, real timing constraints, and real authority.

That judgment comes from experience—and from treating quality as a first‑order design concern, not a clean‑up activity.

A Micro-Moment That Changed the Shape of the System

One moment in particular crystallized what forward-deployed engineering actually feels like.

In my original vision for the experience, audience interaction was rich and expansive. Participants could generate questions, up‑vote and down‑vote them, draft investments continuously, revise allocations in real time, and generally interact with the system throughout the flow of the session. On paper, it was engaging, democratic, and powerful.

Then we rehearsed.

Very quickly, it became clear that while each interaction made sense in isolation, the combined experience was cognitively heavy. People hesitated. Attention shifted from the conversation on stage to the mechanics on their screens. What felt elegant in design started to feel busy in practice.

This created a moment of real tension. The event was close. The system was largely built. And yet the end experience was not quite right.

The obvious risk was feature creep at the worst possible time. Last‑minute changes are how systems unravel. I had seen that movie before.

At the same time, doing nothing would have meant shipping an experience that was technically impressive but experientially wrong.

The only viable option was to simplify—surgically.

Because the guardrails, sequencing, and contracts were already in place, I was able to instruct the agents to selectively feature‑toggle pieces of interaction out of the live experience without destabilizing the system. Some capabilities remained in the codebase but were intentionally hidden. Others were deferred entirely.

Before making those changes, I tagged the system in GitHub. That tag was not ceremony—it was psychological safety. It meant I could move forward knowing there was a clean recovery point if something went wrong.

What followed was one of the most reassuring moments of the entire effort. The changes landed cleanly. Nothing else broke. The system remained coherent.

That was the payoff of discipline earlier in the process. Because sequencing had been enforced and boundaries respected, late refinements did not cascade into chaos.

The result was an experience that felt more straightforward, peaceful, and efficient—not because it offered less, but because only the essentials appeared when they were most needed.

The Live Event

By the time the Global Kickoff arrived, the system felt calm—and that calm was intentional.

What was on screen had been deliberately reduced to what mattered most in the moment. Interaction was focused. Cognitive load was low. Participants knew where to look, when to act, and when to simply pay attention to the conversation on stage.

Over 200 people interacted with the app in real time. Questions flowed when they were supposed to. Decisions and commitments were made without hesitation. Phases progressed without explanation or interruption.

Equally important were the things that did not happen. There was no visible confusion. No frantic last-second troubleshooting. No need to explain mechanics mid-session.

The system did not demand attention. It supported it.

That outcome was not the result of clever features or late heroics. It was the compound effect of earlier discipline: clear boundaries, enforced sequencing, embedded quality, and the willingness to simplify when the experience called for it.

Think Twice Before Doing This Alone

From the outside, it is tempting to look at this experience and conclude that the lesson is simply to “move faster with AI.” That conclusion misses the point.

In the end, the app I built was a relatively simplistic. How does my experience scale up to enterprise-grade, mission critical systems?

The hardest parts of this journey had nothing to do with writing code. They had everything to do with recognizing when speed was creating fragility, knowing when to reset, and having the discipline to impose structure when optimism was highest.

Most software teams do not fail at efforts like this because they lack talent or motivation. They fail because the experience required to govern AI‑accelerated delivery under pressure is still rare.

What looks like velocity from the outside can quickly become instability on the inside.

A Question I Had to Answer for Myself

One question lingered throughout this effort, and I think it is one many enterprises are quietly wrestling with right now:

Why not take a shortcut?

There are now excellent platforms that promise rapid application creation with minimal ceremony. Tools like Base44, Lovable, Replit, or Bolt can produce working software at remarkable speed. In some contexts, they are exactly the right choice.

I considered them.

Ultimately, I chose not to use them—not because they are flawed, but because they optimize for a different set of tradeoffs than the one I was facing.

This effort was not about seeing if something could be built. It was more about understanding how to react (and why we advise clients to follow this pattern) when something goes wrong.

I needed to be able to stop. I needed to be able to reset. I needed to be able to simplify late without breaking unrelated behavior. I needed to know, with confidence, what state the system was in at any moment.

Those needs pushed me toward a more traditional, structured, and frankly heavier approach: explicit repositories, clear ownership, deliberate sequencing, version control, and strong boundaries.

What I gained in return was not elegance or speed. It was recoverability.

When parallel execution collapsed into drift, I could reset.

When rehearsal revealed cognitive overload, I could feature-toggle instead of panic.

When late changes were required, I could make them surgically rather than globally.

Many rapid-build platforms intentionally hide this complexity, and that is their strength for exploration. But in an environment where outcomes matter, where credibility is on the line, and where change is inevitable, that hidden complexity becomes a source of risk rather than relief.

This experience reinforced a belief I already held but had not fully articulated: enterprise patterns exist less to slow teams down than to give them safe ways to change their minds.

AI makes building faster. It does not make being wrong less likely.

When the cost of being wrong is high, the ability to recover matters more than the ability to start.

Implications for Software Teams

There are three implications I would highlight for software teams navigating AI-accelerated work—but they need to be understood in more concrete, operational terms than they are today.

First, orchestration is not overhead. It is the job.

In an AI-augmented delivery model, orchestration is what prevents speed from collapsing into rework. Someone must own coherence end-to-end: how intent is captured, how it is translated into behavior, how changes propagate, and how correctness is continuously validated. Without that ownership, AI does exactly what it is designed to do—it accelerates divergence.

Second, Quality Engineering must span the entire lifecycle, not just the moment code exists.

One of the quiet risks of vibe coding is that it concentrates attention on code generation while allowing ambiguity to leak in upstream and instability to emerge downstream. In practice, quality starts well before implementation and extends well beyond it.

In this effort, quality discipline showed up in ways that were easy to miss but decisive:

  • During requirements and intent definition, ambiguity was treated as a defect. If something could not be clearly stated, it could not be safely built.
  • When shaping user stories and flows, the constant question was whether those stories were testable—not just plausible or appealing.
  • As behavior emerged, explicit contracts and enforcement points made it possible to validate correctness continuously, rather than assuming it based on UI behavior.

This is how mature Quality Engineering operates at scale. Requirements are written so they can be validated. User stories are constructed so they map cleanly to test scenarios. Test strategy is not an afterthought—it is the mechanism that keeps fast change from becoming fragile change.

AI can accelerate each of these steps, but it does not remove the need for them. In fact, it makes them more important, because the system evolves faster than any individual can reason about intuitively.

Third, testing is what enables late change without fear.

The rehearsal-driven simplification late in this effort would have been reckless without confidence in the surrounding system. The reason it was survivable was not luck. It was because boundaries were clear, behavior was testable, and there was a reliable way to know whether a change had unintended consequences.

This is where end-to-end Quality Engineering quietly becomes a strategic advantage. When teams know how to validate intent, behavior, and outcomes continuously, they gain optionality. They can simplify, refine, and adapt late—without triggering systemic collapse.

For software teams, the implication is not that AI makes quality less important. It is that AI makes disciplined Quality Engineering and Testing the only way speed remains sustainable.

And it is in this area where Sogeti’s AI-Powered Quality Engineering and Testing offering plays a valuable role for our clients.

Author Perspective

I brought more than curiosity to this effort. I brought over 30 years of hands‑on experience across software engineering, product design, portfolio leadership, and technology consulting.

That experience shaped how quickly risks were recognized, when resets were necessary, and how tradeoffs were made under pressure. The lessons I learned in this exercise are credible precisely because they were earned through real delivery, failure, recovery, and success.

Closing Reflection

The most important lesson from this journey is not how quickly something can be built with AI. It is whether the result holds up when it matters.

In an AI‑augmented world, the organizations that succeed will not be those that chase speed in isolation. They will be the ones that combine human judgment, disciplined Quality Engineering, and deliberate execution.

As we discussed it was excellent experience to use it.. you engaged 200 + team members without breaking it. It’s awesome to know you build it in just 5 days with help of AI ! Super impressive thought behind it!

Like
Reply

Very interesting perspective from Nathan! Saw the end result in action which was really impressive, but even more impressive on his analysis and learnings from the VibeCode journey !

To view or add a comment, sign in

More articles by Nathan Stricker

Others also viewed

Explore content categories