Agentic code execution
https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-an-ai-agent

Agentic code execution

Agentic Code Execution: Moving Beyond Traditional Function-Based Tools

Context

When building AI agents today, the core design challenge is crafting an efficient workflow around one key element: tools.

Tools allow a model to interact with external systems, perform tasks, and retrieve information. Recently the ecosystem has matured rapidly, and we now have two main paradigms for enabling tool usage:

  1. Function Tool Calling – The model selects a function, fills in structured arguments, and receives structured results.
  2. MCP (Model Context Protocol) – A protocol-based abstraction that allows models to interact with tools in a more declarative and standardized way.

Both are powerful, but each inherit a foundational limitation: tools are static.

They must be defined in advance with fixed arguments, fixed internal logic, and fixed return types.

This rigidity becomes problematic as agents become more capable, multi-step, and autonomous.


The Problem

1. Tools are inherently static

A Python function used as a tool might look like:

def resize_image(path: str, width: int, height: int) -> str:
    ...
        

This function has:

  • Fixed arguments
  • Fixed internal code
  • Fixed return value format

If the model wants to:

  • Resize an image and crop it
  • Resize it using a different algorithm
  • Resize multiple images in parallel
  • Produce additional metadata

…it simply cannot, unless the developer modifies the function. This means tool design is a bottleneck for agent intelligence.


2. Too many tools = too many tokens

Every tool definition is injected into the model’s context window.

If you create dozens of tools, you automatically:

  • Inflate the base prompt
  • Increase latency
  • Increase token cost
  • Increase the chance of model confusion

AI engineers often try to compress functionality using feature flags:

def file_manager(path: str, delete: bool = False, read: bool = False, write: bool = False):
    ...
        

But this quickly becomes unwieldy:

  • Argument lists grow large
  • Internal code becomes complex
  • Models make more mistakes filling arguments
  • The tool becomes unclear and harder to maintain


3. Intermediate results consume tokens

Each tool call returns data that the model must read and interpret.

For multi-step reasoning, intermediate data may flood the context and waste tokens.

Example:

  1. A tool call returns a JSON list of 10,000 items
  2. That list is returned → 10,000 JSON items go into the context
  3. The model passes this list to a subsequent tool call → 10,000 JSON items go into the context
  4. Second tool call produces another list of 5,000 items → 5,000 JSON items go into the context
  5. and so on …

You pay for all of it.


The Solution — Exec: Dynamic Code Tools

Instead of exposing static functions, expose a single tool that simply executes arbitrary code generated by the model.

Example API

def execute_python(code: str) -> str:
    exec(code, globals()) # exec() will execute python code inside a string variable
        

Model usage

{
  "tool": "execute_python",
  "code": "
		files = [f for f in os.listdir('.') if f.endswith('.log')]
		files = files[:50]
		print('\\\\n'.join(files))
	"
}
        

Now your tool collapses dozens of predefined functions into one universal interface.

Advantages

  • Infinite flexibility
  • No predefined arguments needed
  • No complex branching logic
  • The agent controls its own workflow
  • Can dynamically generate bespoke code per task
  • Significant reduction in token usage for tool definitions

But there’s a big issue

Security.

Executing model-generated Python inside your main interpreter is dangerous:

  • os.remove("/")
  • exfiltration of files
  • infinite loops
  • memory exhaustion
  • environment modification

Unless executed inside a hardened sandbox, exec is unsafe.


The Solution — Subprocess Sandboxing

A safer approach is executing model-generated code inside a subprocess with strict isolation.

Example implementation

import subprocess
import tempfile
import sys

def run_python_subprocess(code: str) -> str:
    with tempfile.NamedTemporaryFile(suffix=".py", delete=False) as f:
        f.write(code.encode())
        path = f.name

    result = subprocess.run(
        [sys.executable, path],
        capture_output=True,
        text=True,
        timeout=5,
        cwd="/tmp"
    )

    return result.stdout or result.stderr

        

Why subprocess is safer

  • Interpreter runs in a separate process
  • OS-level limitations possible:seccompcgroupsAppArmor / SELinuxCPU, memory, and timeout limits
  • Can run inside a throwaway container
  • Isolated filesystem access

Example usage

{
  "tool": "run_python_subprocess",
  "code": "
		import json, glob, os

		images = glob.glob('*.png')
		result = [{'file': f, 'size': os.path.getsize(f)} for f in images]

		print(json.dumps(result))
	"
}
        

This allows:

  • dynamic logic
  • multi-step pipelines
  • data processing
  • arbitrary algorithms

…all with one flexible tool.

Alongside subprocess sandboxing (which isn't 100% safe as implemented above), you can use other techniques such as running Python in isolated mode (python -I), preventing dangerous imports by overriding builtins.__import__, adding timeout protection, containerization, and more.

Do not use this code in production—it's written only to showcase an approach you can use to overcome some limitations of traditional tool calling.


Practical Use Cases

1. Data Transformation Tools

Agents write custom logic to generate CSVs, JSON, XML, XLSX, etc.

2. CodeAssist / CodeMode

Agents write and execute code as part of their reasoning loop.

3. Autonomous Engineering Agents

Agents can:

  • explore environments
  • run tests
  • build artifacts
  • parse data

without predefined static functions.


References

If you want to explore more about agentic code execution, the topic is evolving rapidly, consider reading more:


Conclusions

Traditional function-based tools limit the flexibility and intelligence of agents.

As workflows grow more complex, static tool definitions become a bottleneck and increase token usage.

Agentic code execution—where the model writes and executes its own code—provides a solution:

  • One tool replaces dozens
  • Token usage decreases
  • Workflows become adaptive
  • Developers write less boilerplate
  • Agents gain full expressive power

exec enables this paradigm, while subprocess makes it practical and safe.

Modern agent architectures are shifting from “functions with arguments” to dynamic code execution tools, enabling the next generation of powerful agentic systems.

To view or add a comment, sign in

More articles by Leopoldo Capuano

  • AWS AgentCore is awesome

    I was experimenting with AWS AgentCore lately. I wanted to see how easy and powerful it is to write, deploy and connect…

Others also viewed

Explore content categories