The Missing Piece in Agentic Software Engineering: Security by Design
Image generated with claude.ai

The Missing Piece in Agentic Software Engineering: Security by Design

My previous articles described what works for me in AI-assisted development, and how disciplined AI-assisted engineering addresses the understanding bottleneck and the risk of cognitive debt.

Those articles focused on interactive tools like Claude Code, where the developer stays in the loop. That model has its own security challenges, but tools like Claude Code are at least moving in the right direction: OS-level sandboxing with filesystem and network isolation, explicit permission boundaries, and the developer as a human checkpoint on every significant action. It's not perfect, but the developer's presence limits the blast radius.

The industry, however, is moving toward fully autonomous agents. And that's where the current security model breaks down.

The Productivity Is Real. The Risk Is Too.

AI agents writing code, triaging issues, scanning for opportunities, 24/7. If you've watched an agent autonomously work through a well-scoped GitHub issue, you know the feeling. It borders on magic.

The productivity gains can be real, though they aren't guaranteed. Open source maintainers are already drowning in high-volume, low-quality pull requests generated by autonomous agents, what many call "AI slop." That's a discipline problem, not a fundamental one, and I believe it's solvable with the engineering practices I described in my previous articles. The security problem is different. It may not be easily resolved.

To be useful, these agents need access. To your codebase. To your requirements and architecture, the artifacts meant to distinguish your business from competitors. To your credentials. That access is what makes agents powerful. It's also what makes them a target.

Now Read It Again as a CISO

The dominant agent frameworks today give agents broad, often unrestricted access to system resources. Shell commands. File systems. API calls with real credentials sitting right there in the context window.

Now read that sentence again, but this time imagine you're a CISO at a bank. Or a healthcare company. Or any organisation operating under regulatory oversight.

The numbers paint a grim picture. A recent Gravitee survey of 900+ executives found that 81% of teams have moved past the planning phase with AI agents, but only 14.4% have full security approval for what they've deployed [1]. Nearly 88% of organisations reported confirmed or suspected AI agent security incidents in the past year [1]. And Gartner forecasts 40% of enterprise applications will feature task-specific agents by 2026 [2], while research shows a mere 6% of organisations have an advanced AI security strategy in place [3].

Prompt Injection: A Structural Problem, Not a Bug

The most fundamental issue is prompt injection, and the uncomfortable truth is that it may never be fully solved. Even OpenAI has admitted this publicly, acknowledging that their Atlas browser expands the security threat surface and that deterministic security guarantees aren't possible [4]. When the company with arguably the most resources on the planet to throw at this problem says it's structurally unsolvable, we should listen.

And these aren't theoretical risks. In January 2026, the "Reprompt" attack against Microsoft Copilot showed how a single phished link could hijack an authenticated session and exfiltrate data without the user ever noticing [5]. Indirect prompt injections—malicious instructions hidden in emails, documents, and web pages that agents process—are succeeding with fewer attempts than direct attacks, according to Lakera's Q4 2025 data [6].

For agentic software engineering specifically, think about what this means: your agent reads a GitHub issue, and that issue contains adversarial content designed to make the agent do something you didn't authorise. Maybe it exfiltrates credentials from its context. Maybe it writes malicious code.

Why "Just Be Careful" Isn't Good Enough

The way most agent frameworks are built makes it nearly impossible to be careful enough. The trust model is the problem.

Most agent security today relies on one of two approaches, both fundamentally flawed:

Advisory security: telling the agent via system prompts not to do bad things. This is the equivalent of putting up a "Please Don't Rob Us" sign. OpenClaw is the cautionary tale: a wildly popular open-source agent that went from zero to 145,000 GitHub stars in weeks, built on the premise that skills-based instructions are sufficient guardrails. They aren't. The agent can even install new skills autonomously, without human approval. Cisco's AI Defense team ran their scanner against OpenClaw's most popular community skill and found it was functionally malware, silently exfiltrating data to attacker-controlled servers [7]. Snyk's audit of nearly 4,000 skills on ClawHub found that 36% contained prompt injection vulnerabilities and identified 76 confirmed malicious payloads designed for credential theft and data exfiltration [8]. The barrier to publishing a skill? A Markdown file and a week-old GitHub account. No code signing. No security review. No sandbox.

Bolt-on governance: intercepting agent tool calls through MCP gateways or proxy layers. Better, but these only govern tools exposed via the MCP protocol. Most of an agent's capabilities, shell access, file system operations, direct API calls, don't go through MCP at all. You're governing a fraction of what the agent can do, while it still has the keys to everything.

Both approaches share the same fundamental flaw: the agent has access to things it shouldn't, and we're trying to prevent it from using that access. The agent sees credentials. The agent can reach arbitrary endpoints. The agent has broad file system access. We're just asking it nicely not to abuse it.

What Enterprise-Grade Security Actually Requires

After spending considerable time thinking about this, and working through what a proper solution might look like, I've come to believe the answer requires a philosophical shift, not just a technical one.

Security by absence, not by interception. The agent shouldn't be constrained by governance layers that watch what it does. The agent should simply not have capabilities beyond what's explicitly granted. No shell access. No direct API calls. No credentials in the context window. If the capability isn't explicitly exposed through a controlled gateway, it doesn't exist.

Mandatory access control, not discretionary. The agent cannot grant itself additional permissions. It cannot override policies. It cannot escalate privileges. Policies are centrally administered and enforced regardless of what the agent requests. Inspired by how SELinux works, not how most agent frameworks work.

Credential isolation. The agent authenticates with a single scoped token. All upstream service credentials live elsewhere: in a gateway, a vault, somewhere the agent never touches. A compromised agent context leaks exactly nothing useful.

Bidirectional content inspection. Everything flowing into and out of the agent gets scanned. Prompt injection heuristics on inbound content. Sensitive information detection on outbound content. Not as advisory checks, but as pipeline enforcement.

Human approval gates on destructive operations. The agent wants to push code? Open a PR? Delete something? A human approves or denies, with full context of what the agent is trying to do and why. Think App Store parental controls.

The Cost of Security

Every one of these security measures costs something. Credential isolation means the gateway adds latency to every request. Mandatory access control means someone has to write and maintain policies. Human approval gates mean the agent stops and waits, and every time it waits, you lose the autonomy that made agents valuable in the first place.

Taken to the extreme, you end up with an agent that needs permission for everything, at which point you might as well do the work yourself. The value of an agent is precisely that it acts on your behalf. Security that eliminates agency eliminates the point.

The question isn't whether to constrain agents. It's where to draw the line. And the answer depends on what the agent is doing. A read-only scan of public web pages needs no approval gates, but the content still needs to be scanned for prompt injection. Pushing code to a production repository and running a deployment pipeline needs maximum governance. The same agent, the same gateway, different policy profiles for different risk levels.

The goal isn't zero risk. It's proportional risk, with full visibility into what the agent actually did.

The Regulated Enterprise Won't Compromise

Regulated industries don't get to move fast and break things. When you're dealing with patient data, financial records, or critical infrastructure, "we'll fix the security later" isn't a strategy. It's a career-ending decision.

And the regulatory landscape is catching up. OWASP's 2025 Top 10 for LLM Applications puts prompt injection at number one [9]. Compliance frameworks like NIST AI RMF and ISO 42001 now explicitly address prompt injection as a risk requiring specific controls. The first lawsuits holding executives personally liable for AI agent incidents are widely predicted for this year [3].

The enterprise adoption curve for agentic AI is going to hit a wall. Not because the technology doesn't work, but because CISOs can't sign off on the current security model. Innovation will stall not due to technical limitations, but due to an inability to prove to the board that the risks are managed.

What You Can Do Today (Even in Regulated Environments)

None of this means regulated organisations should wait on the sidelines. The software engineering revolution is happening now, and the teams that build experience early will have a decisive advantage.

Start by forming a small team that works with autonomous agents and builds discipline and understanding. Give them an isolated environment: separate network zone, reverse proxies and firewalls that prevent access to production data (or copies of it). Let them learn what works and what doesn't without putting anything critical at risk.

Pick non-mission-critical services for early projects. Replace your time reporting SaaS with a custom-built solution. Rebuild an internal booking tool or an expense tracker. These are commodity services where the downside is limited, but the learning is real. You get a custom-tailored solution, you might save some licence fees that help cover exploration costs, and most importantly, you build the organisational knowledge you'll need when the security model matures enough to tackle what matters.

Where This Leads

Agents should operate in zero-trust environments where every capability is explicitly granted, every action is logged, and destructive operations require human approval. The teams that figure this out first—security by design, not security by afterthought—will be the ones that actually get agentic AI into production at enterprises that matter.

The future of agentic software engineering isn't just about what agents can do. It's about what they verifiably can't.


Full disclosure: This article was drafted using Claude in conversation to brainstorm, structure my thoughts, and fix grammar and spelling.


References

  1. Gravitee, "The State of AI Agent Security 2026," February 2026. https://www.gravitee.io/state-of-ai-agent-security
  2. Gartner, "Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026," August 2025. https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025
  3. Palo Alto Networks, "6 Cybersecurity Predictions for the AI Economy in 2026," Harvard Business Review (sponsored), December 2025. https://hbr.org/sponsored/2025/12/6-cybersecurity-predictions-for-the-ai-economy-in-2026
  4. OpenAI, "Continuously Hardening ChatGPT Atlas Against Prompt Injection Attacks," December 2025. https://openai.com/index/hardening-atlas-against-prompt-injection/
  5. Varonis Threat Labs, "Reprompt: The Single-Click Microsoft Copilot Attack that Silently Steals Your Personal Data," January 2026. https://www.varonis.com/blog/reprompt
  6. Lakera, "The Year of the Agent: What Recent Attacks Revealed in Q4 2025," December 2025. https://www.lakera.ai/blog/the-year-of-the-agent-what-recent-attacks-revealed-in-q4-2025-and-what-it-means-for-2026
  7. Cisco, "Personal AI Agents like OpenClaw Are a Security Nightmare," February 2026. https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare
  8. Snyk, "ToxicSkills: Malicious AI Agent Skills in the ClawHub Supply Chain," February 2026. https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/
  9. OWASP, "Top 10 for Large Language Model Applications 2025—LLM01:2025 Prompt Injection." https://genai.owasp.org/llmrisk/llm01-prompt-injection/

To view or add a comment, sign in

More articles by Michael Würsch

Others also viewed

Explore content categories