The Advent of Dark Code

The Advent of Dark Code

Over the past two years, I have led a meaningful transformation in how my engineering organization writes software. It's been awesome. We have embedded AI agents into our SDLC. We have accelerated delivery on steroids. I am genuinely excited about what collaborative intelligence between humans and machines makes possible.

And yet, the more deeply I work inside this new world, the more clearly I see something accumulating in our codebases that we do not have a name for yet. Or at least I am not aware of one that is used ubiquitously.

I am calling it Dark Code.




Not dark as in malicious. Dark as in unilluminated.

There is an important distinction between code that has been forgotten and code that was never known. Legacy systems carry a different kind of darkness. That code was written by someone who understood it, debated it, possibly argued over it in a review. The light existed once. It just went out as teams turned over and institutional memory eroded.

Dark Code is different in kind, not just degree. It is darkness at birth. It enters the codebase without ever passing through a human mind that interrogated it. No one held it up to the light. No one asked why. No one took accountability for what it does or what it hides. It exists the way dark matter exists in the universe, measurable in aggregate effect, invisible to direct observation, and exerting gravitational pull on your system long before you realize it is there.

I have already seen the cultural symptoms appear. Engineers announcing in Slack that a decision should be made because Cursor said so. Work products shared with the implicit disclaimer that they were generated by an agent, which in practice means: do not ask me to defend this. The agent is becoming a shield against accountability, and that is exactly the wrong direction.

The tool should amplify your judgment, not replace it.




The death of the interrogation ritual

I want to be precise about where the danger lives, because it is not where most people are looking.

Organizations are reaching for metrics to manage this. Test coverage. Cyclomatic complexity. Dependency graphs. Mutation testing scores. These are not useless. They measure something real. But they measure the illuminated parts of the codebase. They are instruments pointed at the wrong sky.

Coverage tells you whether the code does what you asked. It cannot tell you whether the code means what your system intends, given years of architectural decisions the agent never saw. It cannot tell you whether a new abstraction has quietly expanded your attack surface. It cannot tell you whether dead code has been introduced that will compound into unreadability, or whether a missed factorization will betray you under load.

But the deeper problem is not the metrics. The deeper problem is that we have been quietly retiring the interrogation ritual that makes code review load-bearing in the first place.

When a senior engineer reviews a PR and asks why you structured something a particular way, that question is not bureaucratic friction. It is a meaning transfer mechanism. The act of explaining forces the author to surface the gap between what they built and what the system actually needs. Bugs caught in code review are not caught by reading the code. They are caught by the author's own articulation while explaining it.

Agents cannot participate in that ritual. You can interrogate the output. You cannot interrogate the process. There was no moment of judgment you can examine. The reasoning is either absent or reconstructed after the fact, which is not the same thing at all.




Why this matters more in payments

I run engineering for payments. That context is not incidental to this argument. It sharpens it.

PCI DSS and SOC2 compliance do not just require working systems. They require traceable decisions. Auditors want to understand why a security boundary was drawn where it was drawn, why an encryption choice was made, why a particular data retention pattern was implemented. Dark Code has no decision trail. The intent behind a boundary was never articulated by a human who could be held accountable for it. In a breach post-mortem or a compliance gap surfaced in an audit, that absence is not just uncomfortable. There is no forensic trail to reconstruct because one was never laid.

The second angle is equally serious. When bank rules change, when card network mandates shift, when a regulator introduces new settlement requirements, an experienced payments engineer knows which assumptions in the existing code are now load-bearing and newly wrong. That knowledge is not in any context window. It is not in any markdown file you prompt an agent with. It is embodied in the engineer who has lived through the previous change cycle and knows where the hidden dependencies are. An agent writes confidently against the regulatory reality it was trained on. The tests still pass. The assumption is now wrong.




Gradually, then suddenly

There is a Hemingway line about bankruptcy that I keep returning to.

How did you go bankrupt? First gradually, then suddenly.

Dark Code accumulates exactly that way. Each merged PR feels fine. Each sprint closes green. The coverage number holds. The velocity is real and the business value is real. No alarm sounds. And underneath all of that, the mass grows quietly in the spaces between velocity and accountability.

The danger is clear and present. It is not a future warning. I am watching this gradually happen in real time across the industry, as I collaborate with leaders outside of work. Ignorance is not a bliss here. Neither is it optional. The “suddenly”, when it comes, will not announce itself as an AI problem. It will announce itself as a breach, a silent failure, a compliance gap, a system that no one alive can fully explain. And by then the remediation cost is not linear. You cannot review retroactively what was never reviewed. You cannot reconstruct intent that was never recorded.




Experience is not about slowing down. It is about knowing what to distrust.

The difference between a senior engineer and a novice in an agentic world is not who uses the tools. Everyone uses the tools. The difference is who stays paranoid about the output.

A senior engineer works with the agent and then interrogates it. Not because they distrust the technology but because they carry system memory the agent cannot possibly have. No context window holds the full weight of architectural decisions made over years, the regulatory change that broke something three years ago, the performance failure that revealed a factorization mistake, the security incident that rewrote how the team thinks about data boundaries. That accumulated judgment is not transferable through a prompt. It cannot be fully captured in a markdown file. It lives in the engineer.

The novice sees green tests and calls it done. The experienced engineer sees green tests and starts asking what the tests are not measuring.




The future is collaborative, not abdicating

I want to be clear about what I am not saying. I am not saying slow down. I am not saying stop using agents. I have built an organization that uses them deliberately and I believe deeply in the productivity and quality gains that thoughtful AI adoption in the SDLC delivers.

What I am saying is that the future of engineering is collaborative intelligence between humans and machines, not the replacement of human judgment with machine output. The agent accelerates. The engineer governs. The agent generates. The engineer interrogates. That is the model. For now, at least. Until we improve the inherent biases and deficiencies in the models. The moment an organization stops distinguishing between the two, it has started trading accountability for velocity on terms it does not yet understand.

Trust the tool. Verify the output. Stay paranoid about what your instruments cannot measure. That is not a rejection of the AI revolution. That is how you survive it long enough to benefit from it.




A horizon worth watching

There is a version of this problem that is further out but worth naming. Today most codebases are hybrid, human-originated foundations with agent-generated layers building on top. Dark Code is dangerous here because agents inherit context they do not understand. But greenfield systems are increasingly being built agent-first, from the ground up, with no human-originated foundation at all. The nature of the darkness in those systems will be different and in some ways more profound. Not code that escaped interrogation, but code for which the concept of interrogation was never part of the design. That is a conversation this industry has not seriously started yet. We should start it before the gradually becomes suddenly there too.

To view or add a comment, sign in

More articles by Karthik Narayanan

  • One Habit to Drop to Unlock Engineering Excellence

    A few weeks ago, over a catch‑up lunch, a peer asked me how to turn around her underperforming org. I have seen a lot…

  • Why We Keep Rewarding the Wrong Leaders

    While reading Adam Grant’s “Why We Fall for Narcissistic Leaders, Starting in Grade School” in The New York Times this…

    2 Comments
  • When Your AI Agent Double-Charges a Client

    It is easy to build an AI agent that can connect to a payment API and process a transaction. You can probably prototype…

    1 Comment
  • People Leader's Most Important Job

    Remember the movie scene from Jack Reacher about Freedom. (https://www.

    1 Comment
  • In Praise of Generalists in Technology

    Generalists have garnered notorious name – jack of all trades. Being generalist means that you know what is being…

    1 Comment
  • Walk Away from Jobs that explicitly ask for Multitasking

    If there is one filter that I would like for Google Jobs, it is multitasking. The moment you see something on the lines…

    2 Comments
  • Meet the Adams Family

    "Doing deals doesn't yield the deep rewards that come from building up people." - Clay Christensen.

Others also viewed

Explore content categories