Ready for agentic coding at scale?

Ready for agentic coding at scale?

Agentic AI amplifies all the issues and potential risks identified by GenAI by an order of magnitude. The same dynamics now operate at machine speed.

If you want evidence of today's issues, see Part 1. This follow-up looks to the future. Underlying assumption: Over the next 24–36 months, agentic coding tools will not replace teams, but they will force us to reconsider how we specify, verify, measure and organize. There are real benefits to be had if we’re ready, but there are challenges, too.

We will examine the road ahead through five lenses based on the current state of research: People, Organization, Technical, Measurement and Opportunities. Hopefully, you will be able to take away clear, evidence-based insights on how to benefit from Agentic Coding Tools.

Organisational Lens

Agentic AI has the potential to transform the way organizations develop and deliver software. The challenge does not lie in adopting a faster toolset, but in adapting workflows, roles, and guardrails to ensure that increased productivity does not result in a loss of coherence.

Team topology: the consequences of increased productivity

How will teams change if agentic tools materially increase productivity?

  • Scrum defines teams as cross-functional, meaning they can create value each Sprint idealy without external dependencies. If agentic tools increase productivity and widen skillsets, which aspects of cross-functionality will remain human, which will shift into service interfaces and automation contracts?
  • If more work can be produced quickly, operating risk shifts to batch size. Do teams shorten planning cycles and keep diffs small to avoid overwhelming the review and release process? Agile advocates are already predicting shorter cycles as capacity increases, with the weight of ceremony decreasing and guardrails increasing.
  • If the coordination load increases between services rather than functions, will scaling frameworks shift from adding meetings to strengthening interface contracts, audit trails and automated handshakes between teams and services? Dare one say: More processes and tools?

Bottlenecks

Automation relocates constraints; it does not remove them. Likely candidates to watch out for as new bottlenecks:

  • Orchestration capacity: coordinating many concurrent human–agent interactions across services, repositories, and pipelines.
  • Intent translation: Ambiguous goals still result in rework, but clear, testable specifications at the interface boundary reduce thrash and increase productivity.
  • Verification load: reviewing, testing and integrating agent output can create a human bottleneck. However, containerised, reproducible evaluation harnesses are a pragmatic solution.

Work within bounded service interfaces

Early evaluations scope agents within clearly defined boundaries, typically in the form of containerised environments. This suggests that effective autonomy is confined to systems rather than extending freely across stacks. But what happens when those boundaries start to blur? For example, what if agents request or implement changes across services?

People Lens

Redefining roles and competencies

The human dimension of software development is evolving. As AI systems evolve from assistive copilots to autonomous agents, the engineers role is shifting from execution to direction. The 2025 DORA data already reflects this change: although developers report higher perceived productivity and flow, objective delivery metrics remain unchanged, revealing a pattern of feeling faster while going slower and indicating new friction and cognitive overhead in the system. This paradox signals a deeper change: productivity is no longer (or even less) a function of individual velocity, but of system throughput and orchestration capacity. As AI autonomy increases, this tension will only grow.

From Coding to Intent Engineering

Emerging research on promptable systems and agentic workflows describes a decisive transition: developers are shifting their focus from writing code to expressing intent. Rather than mastering syntax, the critical skill is the ability to articulate what success looks like, the constraints under which it is achieved, and the boundaries within which it is executed.

This transformation is similar to the shift from 'AI-assisted coding' to 'agent-based engineering': instead of providing solutions, engineers now design missions for autonomous agents to carry out and verify.

To thrive in this new landscape, teams must cultivate cognitive skills such as clarity in goal formulation, adversarial thinking in verification, and fluency in communicating with non-human collaborators. In this emerging discipline, the ability to specify intent clearly and verifiably will define engineering excellence — consider specification-driven development, for example.

Shifting or new Roles

Given the hypothesis that agents will carry out significant parts of the work, research suggests the evolution of new roles. These roles have not yet been proven in reality; however, they provide an indication of what to look out for, in the same way that the 'Prompt Engineer' role signalled that a new skill had become important. Notable roles include:

  • Agent Orchestrators define and coordinate multi-agent workflows.
  • Verifiers audit correctness, safety, and compliance of AI-generated outputs.
  • Enablers maintain the infrastructure and automation pipelines that make this collaboration possible.

Competencies Over Skills

Upskilling is no taking a 'prompt engineering' course. It means developing a broader range of competencies: These include AI literacy, data reasoning and systemic thinking. AI literacy, in particular, is becoming fundamental. Understanding model limitations, probabilistic behaviour and bias is now an integral part of core engineering practice. Collaboration with agents that is both sustainable and effective depends on a shared conceptual model between human and machine — one grounded in transparency, not blind trust.

Psychological Safety

Teams that integrate agentic tools face new cognitive risks, such as unclear authorship, verification fatigue, and shifting accountability. Psychological safety remains an important factor, particularly during times of change and disruption to established patterns. This was a success criteria for teams in the past and is likely to remain as important as ever.

Technical lens

Promptability

'Promptability' has no formal, peer-reviewed definition. In a world where humans use systems less and agents use them more, the key issue is machine usability: can a machine discover what your system does, access it safely, observe the effects and prove the outcomes without bespoke integration? In this context, 'promptability' is a buzzword that encapsulates the concept of machine usability across APIs, documentation, data contracts and runtime signals in the short term. Its impact is practical rather than philosophical; it largely determines how well agents will perform in your environment.

Enable “human in the loop”

Put the human at the heart of the process. Emerging research uses terms such as 'evaluation harness' and 'merge-readiness pack' to enable this. Interpret these as guidance, not as instructions. An evaluation harness indicates that agent output should be evaluated against reproducible and verifiable evidence rather than narrative claims. A merge-readiness pack indicates that reviewers should be provided with a concise proof bundle that clearly shows intent, changes and outcomes at a glance. Neither term dictates tools or rituals; both point to the same goal: reducing cognitive load, raising the profile of what matters and enabling humans to exercise judgement regarding when to question, when to proceed and when to escalate. What constitutes "enough evidence" varies by domain, risk tolerance and stack maturity, and will evolve. The aim is not to standardize people, but to provide clarity in areas where judgement creates the most value.

CI is the control system

It's not a silver bullet, but continuous integration is the place where fast, machine-generated change can be checked, corrected and contained. The top risks associated with LLM code generation are well-known and exacerbated by agentic tools, including code inflation, technical debt, verification and integration bottlenecks, and security [SG(7] exposure. CI is where these risks become visible and manageable. In order to benefit from higher throughput, the level of automation in CI must increase. This requires more automated checks, clearer evidence and faster feedback, enabling people to focus on judgement rather than reconstruction. One practical point to note is that this does not happen by accident. The enabling and platform teams that provide small product teams with ready-to-use CI configurations, standards and support are a quiet success factor.

Measurement lens

Be data-driven

Teams often report faster progress with AI, while system data remains unchanged or decreases. In an agentic future, this disparity is likely to increase. The consequence is clear: a more data-driven, outcome-oriented view of productivity will become mandatory. The point is not to count AI (or human) activity; the point is to determine whether value moves through the system more quickly, safely and reliably.

Lead Time for Changes, Deployment Frequency, Change Failure Rate and MTTR as defined by the DORA report remain the true indicators of delivery. Keep these metrics at the forefront to determine whether autonomy improves flow and stability rather than just local speed. Around this core, find a strategy to identify hidden issues such as short-term churn, duplication, refactoring versus reuse, dependency freshness and dead code removal. Indicators for this reveal whether increased velocity is creating technical debt.

The Economics

Token prices look low today, but they’re not a constant. Providers change tiers and features, so one should be mindful that prices are a moving target, especially since costs for data center build out will eventually catch up with reality. Additionally to tokes that are many additional costs connected to this change. The real question isn’t what tokens cost, it’s what they get you.

In an agentic future,  activity will explode as Agents will lead to more experiments or have to do more iterations to get it right[SG(8] [MH9] . The temptation to read “more tokens” as “more progress” will get stronger. Resist that. What counts is conversion: intent into evidence, evidence into decisions, decisions into durable changes. Prices will move; autonomy will rise and fall with use cases. What do you get for the money?[SG(10] 

Final thought on measurement

 Accept that measurement is behind the curve. There are no peer-reviewed replacements for DORA at the delivery level. There are also no validated metrics for promptability, automability, API agent-friendliness, agent-specific DevOps outcomes or production success criteria for autonomous changes. Organisations moving towards agentic workflows will be operating partly blind unless they expand today's measurements. The pragmatic approach is to retain DORA as the foundation and incorporate cost, quality and evidence indicators. Agentic AI has the potential to increase throughput. Only system-level, cost-aware measurement will indicate whether that speed has been converted into reliable delivery rather than larger queues, greater differences, and more costly failures.

Opportunities lens

What becomes possible if we get this right?

Agentic Coding tools have great potential, however they can also exacerbate existing problems. If we change the way we specify, verify, measure and organise things, we can achieve more with the same number of people. We can also create products that previously did not have a viable business case.

Projects that were previously considered unimportant, such as internal tools, data clean-up, niche integrations and bespoke reports, become worthwhile when an agent can perform the repetitive, well-defined tasks and a human can oversee, verify and combine the results. The same effect applies to legacy modernisation: once contracts are explicit and test oracles exist, agents can work on migrations, documentation and refactoring that would never have been prioritised in a human-only queue. 

There’s also faster exploration. With a promptable architecture, teams can run more experiments per week, such as alternative algorithms, UI variations and infrastructure configurations, without committing to large rewrites. Evidence-first pull requests make it safe to experiment and easy to revert changes. In product terms, this means more validated learning and fewer speculative epics.

In terms of personnel, the work shifts towards higher-leverage activities. Engineers spend less time on scaffolding and more time framing problems, designing interfaces and making trade-offs explicit. Agents can produce initial versions that seniors can then refine rather than recreate. This is not a replacement per se, but a different division of labor that keeps judgement human. Be ready for those changes.

At the organizational level, the opportunity lies in smaller, sharper teams that coordinate in new ways, cutting out the complexity that inevitable comes with the exponential growth of communication channels. Take a moment and remember the best Product Owner, Engineer and QA you’ve encountered and imagine what they might be able to achieve?

None of this suggests that the road will be smooth. However, it does suggest that the destination is worth aiming for: more value shipped, fewer slowdowns and safer speed. Agentic AI won't write all the code or run your processes for you. However, if we learn to use it effectively by changing how we work, not just the tools we use, we can achieve what was previously impossible.

To view or add a comment, sign in

More articles by Michael Heß

Others also viewed

Explore content categories