Agentic code execution

Leopoldo Capuano

Published Dec 8, 2025

Agentic Code Execution: Moving Beyond Traditional Function-Based Tools

Context

When building AI agents today, the core design challenge is crafting an efficient workflow around one key element: tools.

Tools allow a model to interact with external systems, perform tasks, and retrieve information. Recently the ecosystem has matured rapidly, and we now have two main paradigms for enabling tool usage:

Function Tool Calling – The model selects a function, fills in structured arguments, and receives structured results.
MCP (Model Context Protocol) – A protocol-based abstraction that allows models to interact with tools in a more declarative and standardized way.

Both are powerful, but each inherit a foundational limitation: tools are static.

They must be defined in advance with fixed arguments, fixed internal logic, and fixed return types.

This rigidity becomes problematic as agents become more capable, multi-step, and autonomous.

The Problem

1. Tools are inherently static

A Python function used as a tool might look like:

def resize_image(path: str, width: int, height: int) -> str:
    ...

This function has:

Fixed arguments
Fixed internal code
Fixed return value format

If the model wants to:

Resize an image and crop it
Resize it using a different algorithm
Resize multiple images in parallel
Produce additional metadata

…it simply cannot, unless the developer modifies the function. This means tool design is a bottleneck for agent intelligence.

2. Too many tools = too many tokens

Every tool definition is injected into the model’s context window.

If you create dozens of tools, you automatically:

Inflate the base prompt
Increase latency
Increase token cost
Increase the chance of model confusion

AI engineers often try to compress functionality using feature flags:

def file_manager(path: str, delete: bool = False, read: bool = False, write: bool = False):
    ...

But this quickly becomes unwieldy:

Argument lists grow large
Internal code becomes complex
Models make more mistakes filling arguments
The tool becomes unclear and harder to maintain

3. Intermediate results consume tokens

Each tool call returns data that the model must read and interpret.

For multi-step reasoning, intermediate data may flood the context and waste tokens.

Example:

A tool call returns a JSON list of 10,000 items
That list is returned → 10,000 JSON items go into the context
The model passes this list to a subsequent tool call → 10,000 JSON items go into the context
Second tool call produces another list of 5,000 items → 5,000 JSON items go into the context
and so on …

You pay for all of it.

The Solution — Exec: Dynamic Code Tools

Instead of exposing static functions, expose a single tool that simply executes arbitrary code generated by the model.

Example API

def execute_python(code: str) -> str:
    exec(code, globals()) # exec() will execute python code inside a string variable

Model usage

{
  "tool": "execute_python",
  "code": "
		files = [f for f in os.listdir('.') if f.endswith('.log')]
		files = files[:50]
		print('\\\\n'.join(files))
	"
}

Now your tool collapses dozens of predefined functions into one universal interface.

Recommended by LinkedIn

Clean Code Isn’t AI—It’s Just Unfamiliar Discipline

Fidel .V 1 year ago

The AI Code Revolution: Why "Vibe Code" is the Clear…

kundan verma 10 months ago

iPaaS Meets Prompt Engineering - The Future of No-Code…

Alumio 5 months ago

Advantages

Infinite flexibility
No predefined arguments needed
No complex branching logic
The agent controls its own workflow
Can dynamically generate bespoke code per task
Significant reduction in token usage for tool definitions

But there’s a big issue

Security.

Executing model-generated Python inside your main interpreter is dangerous:

os.remove("/")
exfiltration of files
infinite loops
memory exhaustion
environment modification

Unless executed inside a hardened sandbox, exec is unsafe.

The Solution — Subprocess Sandboxing

A safer approach is executing model-generated code inside a subprocess with strict isolation.

Example implementation

import subprocess
import tempfile
import sys

def run_python_subprocess(code: str) -> str:
    with tempfile.NamedTemporaryFile(suffix=".py", delete=False) as f:
        f.write(code.encode())
        path = f.name

    result = subprocess.run(
        [sys.executable, path],
        capture_output=True,
        text=True,
        timeout=5,
        cwd="/tmp"
    )

    return result.stdout or result.stderr

Why subprocess is safer

Interpreter runs in a separate process
OS-level limitations possible:seccompcgroupsAppArmor / SELinuxCPU, memory, and timeout limits
Can run inside a throwaway container
Isolated filesystem access

Example usage

{
  "tool": "run_python_subprocess",
  "code": "
		import json, glob, os

		images = glob.glob('*.png')
		result = [{'file': f, 'size': os.path.getsize(f)} for f in images]

		print(json.dumps(result))
	"
}

This allows:

dynamic logic
multi-step pipelines
data processing
arbitrary algorithms

…all with one flexible tool.

Alongside subprocess sandboxing (which isn't 100% safe as implemented above), you can use other techniques such as running Python in isolated mode (python -I), preventing dangerous imports by overriding builtins.__import__, adding timeout protection, containerization, and more.

Do not use this code in production—it's written only to showcase an approach you can use to overcome some limitations of traditional tool calling.

Practical Use Cases

1. Data Transformation Tools

Agents write custom logic to generate CSVs, JSON, XML, XLSX, etc.

2. CodeAssist / CodeMode

Agents write and execute code as part of their reasoning loop.

3. Autonomous Engineering Agents

Agents can:

explore environments
run tests
build artifacts
parse data

without predefined static functions.

References

If you want to explore more about agentic code execution, the topic is evolving rapidly, consider reading more:

Conclusions

Traditional function-based tools limit the flexibility and intelligence of agents.

As workflows grow more complex, static tool definitions become a bottleneck and increase token usage.

Agentic code execution—where the model writes and executes its own code—provides a solution:

One tool replaces dozens
Token usage decreases
Workflows become adaptive
Developers write less boilerplate
Agents gain full expressive power

exec enables this paradigm, while subprocess makes it practical and safe.

Modern agent architectures are shifting from “functions with arguments” to dynamic code execution tools, enabling the next generation of powerful agentic systems.

Roberto Bandini 4mo

Bravo Leo!

1 Reaction

To view or add a comment, sign in

Agentic code execution

Leopoldo Capuano