What Does a Dev Manager Actually Do When AI Writes the Code?
Something quietly shifted in the last year. AI agents now write a significant share of code at companies like Anthropic (reportedly 70–90 percent) and Spotify, where senior engineers reportedly haven't written a line of code since December 2024. The tools are real. The speed gains are real.
From the conversations I've been having with engineering teams, the main concern isn't whether AI can generate code faster. It's whether teams can still review that code well enough to trust it. Quality and review burden come up again and again. Faros AI's data points in the same direction: across more than 10,000 developers, teams using agentic workflows merged 98 percent more pull requests, while review times grew 91 percent longer, and PR sizes increased by 154 percent. Code generation sped up. Careful judgment did not.
That gap is where dev managers either step up or get buried.
Here are four things I think engineering leaders need to take seriously right now. The evidence here ranges from formal studies to vendor telemetry and company case studies, but the pattern is consistent: AI speeds authoring faster than it speeds judgment.
1. Treat Documentation as Infrastructure
This one sounds boring. It's not.
If an API doesn't have a clear contract, an AI agent can't use it reliably. If business logic lives in someone's head, the agent will guess — and guess wrong. Anthropic's context-engineering guidance and Thoughtworks' technical writing both support the same logic: for many teams, documentation quality is one of the most important practical levers of AI-generated code quality.
For dev managers, this means documentation stops being a nice-to-have and becomes load-bearing infrastructure. The scarce skill is shifting upward — from syntax production toward specification, judgment, and validation. Teams that can specify clearly get dramatically better output from agents. Teams that can't will spend their time fixing what the agent misunderstood.
And here's the thing — building this documentation doesn't have to be yet another manual chore. Documentation is widely treated as toil by developers. AI can help. It can draft API references, architectural decision records, and onboarding guides, removing the part of the job most people quietly avoid.
But the same caution applies here as everywhere else. METR's research found that AI capabilities are comparatively lower in settings with implicit quality standards — and documentation is full of implicit standards. Qodo's data shows that only 3.8% of developers report both low hallucination rates and high confidence in shipping AI output without review. So the same gated model applies: AI drafts, a human reviews for accuracy and the kind of institutional context only a team member would know.
This tracks with what I take from DORA's 2025 data: AI tends to amplify whatever engineering quality already exists. Documentation created by AI without review risks amplifying confusion. Documentation created by AI with human review amplifies knowledge. The difference is the loop.
Worth trying:
Consider updating your definition of done so that no feature ships without its API contracts and architectural decision records in place. And experiment with using AI to draft that documentation — then have it reviewed by a team member who knows the system's history and quirks. You may find that the combination removes the biggest excuse teams have for skipping docs entirely.
2. Build Review Capacity — and Don't Go It Alone
When AI generates code at scale, review becomes the bottleneck. Not because reviewers are slow — because there's dramatically more to review, and the diffs are bigger and less familiar.
Some teams are already experimenting with agent-assisted review to handle this. HubSpot built an internal tool called Sidekick that assigns AI agents to review pull requests, with a separate "judge agent" that filters noise before comments reach human developers. The result: 90 percent faster first feedback on PRs and an 80 percent engineer approval rate. That's a company case study, not a controlled experiment — but it's a concrete signal.
The broader data support the bottleneck argument. Qodo's State of AI Code Quality report, drawing on data from monday.com's engineering team, found that 17 percent of PRs contained high-severity issues invisible to standard diff inspection. Taken together, DORA’s 2025 findings suggest that AI works best inside disciplined engineering systems rather than replacing them.
Worth trying:
Consider a two-layer model: AI agents handle the first pass — style conformance, common bug patterns, security scanning, and flag issues before a human reviewer sees the PR. Humans then focus on architecture, business logic, and the things agents reliably miss. Try tracking Mean Time to Verification as a team metric. It might reveal more about your real throughput than lines of code merged.
Recommended by LinkedIn
3. Adopt a Gated Workflow
This is a framework I've been developing and testing with my team. It's not an established research model — it's a managerial design pattern synthesized from what the evidence suggests works. The component pieces each have independent support; the combination is mine.
The workflow has three stages, each with a human gate before proceeding:
Stage 1: Specify. An interactive brainstorming session with an AI agent produces a rich feature spec — edge cases, acceptance criteria, constraints. A human reviews and approves the spec before anything else happens.
Stage 2: Decompose. The agent generates development tasks from the approved spec. A human reviews the task breakdown — checking dependencies, risk, and whether any tasks need human-only handling — before approving.
Stage 3: Implement and Review. The agent executes against the approved plan. A human reviews the code, with agent-assisted review handling the first pass.
Two design choices make this more than a checklist. First, any stage can be restarted if the results aren't satisfactory. This sounds obvious, but most linear workflows create sunk-cost pressure to push forward. Explicit restart permission changes team behaviour. Second, feedback is recorded at each gate — even if it's just a manual note. Over time, patterns emerge: if gate 3 keeps catching the same type of issue, your Step 1 spec template probably needs updating.
Why do I think this works? In one 2025 METR study of experienced open-source developers working in familiar repositories, unstructured AI use actually slowed task completion by 19 percent — a reminder that tool gains depend heavily on context and workflow design. IBM and NIST data consistently show that fixing defects at integration costs 10–30x more than catching them at the design stage. Specification quality matters. Early gates matter. These aren't controversial claims — but combining them into a daily operating rhythm is harder than it sounds.
Worth trying:
Start with one feature. Run it through all three gates. Track where the human reviewer pushes back and why. After five or six cycles, you'll have enough data to see whether the gates are catching real problems or just adding process. Adjust from there.
4. Manage Energy, Not Just Throughput
This might be the most important point, and it's still under-discussed in engineering leadership conversations.
NYU's Julian Togelius has been observing how developers interact with AI agents, and a few of his comments stuck with me. Managing multiple agents feels "super powerful," he notes — but constantly context-switching between agent outputs gives developers less agency, not more. Watching AI work delivers dopamine hits similar to scrolling social media. People feel productive. Whether they actually are is a different question.
For dev managers, this creates a new kind of risk. Your team's velocity metrics might look great. Commits are up. PRs are merging. But if your senior engineers are spending their days reviewing AI-generated output instead of thinking deeply about architecture and design, you're trading long-term capability for short-term throughput.
Worth trying:
Experiment with protecting blocks of time where senior engineers aren't reviewing AI output at all — they're thinking about system design, mentoring, or working through hard problems without an agent in the loop. Consider tracking not just what your team ships, but how they feel about the work. It might be the opposite of what your dashboards suggest.
For many teams, raw code generation is getting cheaper faster than review, validation, and integration. That's the real shift. Not that AI writes code, but that the human judgment around it is now the constraining resource.
Dev managers who treat this as a tooling upgrade will struggle. The ones who treat it as an operating model change — restructuring how their teams review, specify, and protect their capacity for judgment — will build something durable.
The code writes itself now. The question is whether anyone's still paying close enough attention.