How AI Agents (Claude Code / Codex) Actually Work

This is as concise/quick as could get it distilled, a complete explanation from first principles of how it goes from you writing a prompt to Claude Code writing/editing a bunch of code. Iterated on understanding this with ChatGPT, and this is basically the distillation 😊.

I covered super-basic practical flow of how LLMs turn your prompt into a response here. The key fact from that was basically:

An LLM receives text and produces text.

Super simple. Statistical token prediction/generation machine. That is all it does. So the question becomes:

How can a system that only outputs text end up editing files, running tests, and writing working code?

The answer is that another normal program reads the LLM’s text output and turns it into actions.

The Complete System

A coding agent usually contains four components.

User
Agent Program (normal software)
LLM
Tools / Environment

Example environment:

filesystem
git repository
shell commands
compiler
test runner

These pieces interact in a loop.

The Whole Process (Bird's-Eye View)

User writes a request
↓
Agent program sends context to the LLM
↓
LLM outputs structured text describing an action
↓
Agent program reads that output
↓
Agent program executes the action
↓
Result is sent back to the LLM
↓
LLM chooses the next action
↓
Repeat

This repeated cycle is called an agent loop. Each loop iteration is one step.

Step 0: The User Prompt

Example request:

Add a --json flag to this CLI tool

The agent program prepares information for the LLM:

goal
repository file tree
available tools
instructions for tool usage
recent actions

This becomes the prompt sent to the LLM.

Step 1: The LLM Generates Output

The LLM receives the prompt and produces text. But the prompt instructs the LLM to output actions in a strict format. Examples:

Read a file

{"tool":"read_file","path":"src/cli.ts"}

Search code

{"tool":"search_files","query":"cli argument parser"}

Run command

{"tool":"run_shell","command":"pnpm test"}

Apply code changes

{"tool":"apply_patch","patch":"diff text"}

Finish task

{"tool":"finish","message":"task completed"}

These are tool calls. The model does not execute them. It only outputs text describing them.

Step 2: Agent Program Interprets the Output

The agent program reads the LLM output. Example output:

{"tool":"read_file","path":"src/cli.ts"}

The program has code which handles that:

read the file src/cli.ts from the filesystem

This execution is done by ordinary code. No AI is required here.

Step 3: The Tool Returns a Result

The tool returns structured data. Example:

{
  "contents":"export function parseArgs(argv) { ... }"
}

The agent program then sends that result back to the LLM.

Step 4: LLM Uses the New Information

The prompt now contains:

Goal: add a --json flag

File src/cli.ts contents:
...

Available tools:
read_file
apply_patch
run_shell

The LLM now predicts the next output. Example:

{"tool":"read_file","path":"src/cli.test.ts"}

Again the program executes it. LOOPING.

Step 5: LLM Eventually Suggests a Code Edit

After reading relevant files, the LLM may output:

{
  "tool":"apply_patch",
  "patch":"diff adding --json flag"
}

The program applies that patch to the repository.

Step 6: LLM Requests Testing

The LLM might then output:

{"tool":"run_shell","command":"pnpm test"}

The agent program runs the tests. Result:

{
  "exit_code":1,
  "stderr":"Expected JSON output"
}

Step 7: The Error Becomes New Input

The failure output is added to the next prompt. Now the prompt contains:

Test failure:
Expected JSON output

The LLM now predicts a fix. Example:

{"tool":"apply_patch","patch":"fix output format"}

The loop continues.

Step 8: The Task Ends

Eventually tests succeed. The LLM outputs:

{"tool":"finish","message":"tests pass"}

The agent program stops the loop.

Why This Works

Because the system includes real feedback from the environment. The model does not have to guess everything correctly. It can:

try something
observe failure
adjust
try again

This loop is similar to how humans debug.

Where “Planning” Comes From

Sometimes the LLM outputs planning text:

Plan:
1. inspect CLI parser
2. inspect tests
3. add flag
4. run tests

This is simply another type of output text. It helps maintain coherence but is not required.

What the Agent Program Actually Does

The agent program is usually very simple. Pseudo-code:

while true
  prompt = build_prompt(state)

  output = call_llm(prompt)

  if output contains tool_call
    result = execute_tool(output)
    state.add(result)
    continue

  if output contains finish_signal
    break

The program keeps state such as:

goal
files already read
recent tool results
number of steps
test status

Important Clarification

The LLM never directly interacts with the real world. It only outputs text. The agent program is responsible for:

executing actions
running commands
editing files
collecting results
feeding results back

Modern coding agents like Claude Code / Codex run those actions in a sandboxed environment connected to the code repository.

Why LLMs Are Good At This

Because during training they saw many examples of:

code
bug fixes
stack traces
PR discussions
tutorials
debugging sessions

So when given code / error messages / goal, etc., they often predict useful next actions.

Final Mental Model

Without the loop:

LLM
↓
one code answer

With the loop:

LLM suggests action (structured data response)
↓
program executes it
↓
environment returns result
↓
LLM suggests next action
↓
repeat

The Key Insight

A coding agent is simply:

LLM
+ action format (tool calls)
+ program that executes those actions
+ feedback from the environment
+ repeated iteration

That is how statistical token prediction becomes working software.

Now we can walk through a more realistic practical example (having Claude make a plan doc after lots of its own research, no code writing even, to make it slightly easier to follow), if you'd like more clarification on how pedal hits the metal.

But first: What an “Agent” Actually Is

An agent is simply a normal program that repeatedly calls an LLM and executes the actions the LLM describes.

The LLM itself can only produce text. It cannot read files, run commands, or edit code directly. So the agent program sits between the LLM and the real environment and does three things:

Send context to the LLM
Read the LLM’s output
If the output requests a tool → execute it

Those requested tools are ordinary functions written by humans, such as:

read_file(path)
search_files(query)
write_file(path, content)
run_shell(command)
apply_patch(diff)
web_search(query)

The LLM outputs structured text describing the action it wants, for example:

{
  "tool": "read_file",
  "path": "src/auth.ts"
}

The agent program reads that output, executes read_file("src/auth.ts"), then sends the result back to the LLM in the next prompt.

In addition to interpreting tool calls, the agent program also manages a few practical things:

looping until the task is finished
keeping track of recent results and state
limiting steps or runtime
summarizing context if it grows too large
handling errors from tools

So the simplest accurate definition is:

An agent is a program that interprets LLM tool-call outputs, executes the corresponding tools, feeds the results back to the LLM, and repeats until the task is complete.

Realistic Example: Planning the Ideal Login Flow

Example prompt:

Plan the ideal user login flow for a modern web app, focused on product and UX, based on current security and passkey guidance

A sensible high-level target today is usually:

passkeys first
social sign-in optional
password fallback only if needed
clear recovery path
phishing-resistant MFA where appropriate

That direction matches current guidance from NIST, FIDO, Google’s passkey UX docs, and OWASP.

What the Agent Is Actually Asked to Produce

In this example, the user does not want code first.

They want a plan file such as:

plans/login-flow-plan.md

That file might contain:

goals
user journeys
recommended flow
edge cases
security requirements
rollout phases
metrics
open questions

So the task is:

research
explore repo
understand product context
draft plan
revise plan
write file

What Happens in Practice

Step 1: The Agent Starts With Very Little

Initial prompt might contain:

User request
Available tools
Repo tree
Instruction: create a high-level product plan

At this point the model still only outputs text.

So it might output a tool call like:

{"tool":"search_files","query":"auth login signup session user account"}

The program executes that search.

Step 2: Early Exploration

The point of early exploration is to answer:

What kind of app is this?
What auth already exists?
Where are product docs?
Are there current login screens or auth APIs?

So the agent may do things like:

{"tool":"search_files","query":"auth"}
{"tool":"read_file","path":"README.md"}
{"tool":"search_files","query":"login OR sign-in OR passkey OR oauth"}
{"tool":"list_files","path":"note/"}
{"tool":"read_file","path":"src/auth.ts"}

This is the explore phase. Nothing magical is happening. The model is just repeatedly outputting action requests like:

read this
search that
open this doc

The program executes them and feeds back results.

Why This Can Take Many Minutes

Because the loop may run many times. Not because one giant answer is being generated in one shot. A longer task might involve:

20 searches
15 file reads
5 web lookups
3 plan rewrites
2 file writes
2 validation passes

That can easily add up to tens of thousands of tokens because each round includes:

instructions
prior findings
tool results
new reasoning
next action

So when people say a coding agent spent several minutes and 50k+ tokens, that often means:

many small loop iterations
not one enormous burst of thinking

Where the “Latest Research” Gets Plugged In

If the user asked for a plan based on current best practice, the agent may also use web sources. For login flow planning, that often means a bunch of random websites of importance (how it knows which ones is probably more LLM looping haha). Examples of what it may extract:

prefer phishing-resistant options such as passkeys
offer recovery when creating passkey-first accounts
verify social login tokens on the backend
avoid arbitrary periodic password resets

Those points would be basically the distillation of current best-practices after a few loops. So the loop may temporarily look like:

search repo
↓
realize external guidance is needed
↓
search trusted web sources
↓
bring findings back into the next prompt
↓
continue planning

A Realistic Loop Walkthrough

Iteration 1: Find Product Context

LLM output:

{"tool":"read_file","path":"README.md"}

Tool result says:

consumer SaaS app
mobile + web
email login exists today

Iteration 2: Find Existing Auth

LLM output:

{"tool":"search_files","query":"login signup session oauth password"}

Tool result shows:

src/routes/login.tsx
src/api/session.ts
src/lib/oauth.ts

Iteration 3: Inspect Current UX

LLM output:

{"tool":"read_file","path":"src/routes/login.tsx"}

Tool result shows:

email + password form
Google sign-in button
no passkeys
forgot password link

Now the model has enough context to begin a rough internal picture:

current state = traditional login
desired state = modernized login plan

Iteration 4: Check Current Best Practice

LLM output may now be something like:

{"tool":"web_search","query":"current passkey UX best practices recovery NIST OWASP FIDO"}

The external guidance comes back. The useful conclusions may be summarized as:

use passkeys as a primary option where possible keep recovery simple and explicit support account recovery before removing password paths make social sign-in secure on the server side Those are grounded in current guidance. [NIST Publications link or whatever]

Iteration 5: The Model Forms a Draft Plan

At this point the model may output plain text instead of a tool call:

Draft plan:
1. Introduce passkey-first sign-in
2. Keep Google sign-in as secondary option
3. Retain password login as fallback during migration
4. Add explicit recovery setup during passkey enrollment
5. Add account settings page for managing passkeys
6. Measure adoption and failed-login rates

This is the plan phase. Again, this is just text output. The surrounding program may store that summary and then ask the model to keep going.

Writing the Plan File

Iteration 6: Create a Document Structure

The model may now output a tool call like:

{"tool":"write_file","path":"note/login-flow-plan.md","content":"# Ideal Login Flow\n\n## Goals\n..."}

The first draft might include:

Goals
Principles
Recommended user journeys
Signup flow
Returning user flow
Recovery flow
Risk-based step-up auth
Rollout plan
Success metrics

Iteration 7: Improve the Draft

Now the program can read back the file and ask the model to refine it. The model may notice missing things like:

account recovery
device loss
enterprise SSO later
copywriting guidance
edge cases for shared computers

Then it outputs either:

{"tool":"apply_patch","patch":"add recovery and rollout sections"}

or another full rewrite. This is common. The first draft is rarely the final draft.

What “Explore → Plan → Write” Really Looks Like

In a real agent run, it is usually not:

explore completely
then plan completely
then write completely

It is more like:

explore a bit
form a rough plan
explore one gap
improve the plan
write a first draft
research one missing point
patch the draft

So the real loop is closer to:

explore
plan
explore
write
revise
write
validate
finish

What the Prompt Keeps Growing With

As the loop runs, the prompt accumulates useful state such as:

Goal: create ideal login flow plan
App type: consumer SaaS
Current auth: email/password + Google
Desired direction: passkey-first
Research findings: recovery required, backend token verification, avoid forced password resets
Current draft file: note/login-flow-plan.md
Outstanding gaps: rollout phases, metrics, edge cases

That growing state is what lets the next LLM call stay coherent.

Why the Plan Can Become Good

Because each round improves one piece. Example sequence:

Round 1
Find current login implementation

Round 2
Find current product constraints

Round 3
Check external best practices

Round 4
Draft recommended flow

Round 5
Write markdown plan file

Round 6
Patch missing recovery section

Round 7
Patch rollout section

Round 8
Patch metrics section

Round 9
Finish

The quality comes from:

many grounded corrections
not one perfect first answer

What the Final Plan Might Recommend

A realistic modern recommendation for many apps would look something like:

New users can create an account with passkey or Google
If using passkey, require a recovery method to be set up
Returning users see passkey first, then Google, then password fallback
Password remains during migration, then can be deemphasized
Sensitive actions may require step-up verification
Users can manage passkeys and recovery options in settings
Backend verifies all social identity tokens server-side
Success is measured by login completion, recovery success, support volume, and passkey adoption

The Shortest Accurate Mental Model

For a long planning task, the real mechanism is:

the model keeps outputting small next moves
the program keeps executing them
the results keep coming back
the draft keeps improving
until the plan file is good enough

That is what “the agent spent several minutes planning” usually means in practice. Not one giant mysterious thought. Haha. I had no idea how it worked til making this writing.

It is instead many small grounded loops:

search
read
summarize
research
draft
patch
refine
finish

Final Intuition

The power comes from:

small step
real feedback
small step
real feedback
small step
real feedback

not from hidden magic. Yay. It's actually understandable. To some degree :]

The Complete System

The Whole Process (Bird's-Eye View)

Step 0: The User Prompt

Step 1: The LLM Generates Output

Read a file

Search code

Run command

Apply code changes

Finish task

Step 2: Agent Program Interprets the Output

Step 3: The Tool Returns a Result

Step 4: LLM Uses the New Information

Step 5: LLM Eventually Suggests a Code Edit

Step 6: LLM Requests Testing

Step 7: The Error Becomes New Input

Step 8: The Task Ends

Why This Works

Where “Planning” Comes From

What the Agent Program Actually Does

Important Clarification

Why LLMs Are Good At This

Final Mental Model

The Key Insight

But first: What an “Agent” Actually Is

Recommended by LinkedIn

Realistic Example: Planning the Ideal Login Flow

What the Agent Is Actually Asked to Produce

What Happens in Practice

Step 1: The Agent Starts With Very Little

Step 2: Early Exploration

Why This Can Take Many Minutes

Where the “Latest Research” Gets Plugged In

A Realistic Loop Walkthrough

Iteration 1: Find Product Context

Iteration 2: Find Existing Auth

Iteration 3: Inspect Current UX

Iteration 4: Check Current Best Practice

Iteration 5: The Model Forms a Draft Plan

Writing the Plan File

Iteration 6: Create a Document Structure

Iteration 7: Improve the Draft

What “Explore → Plan → Write” Really Looks Like

What the Prompt Keeps Growing With

Why the Plan Can Become Good

What the Final Plan Might Recommend

The Shortest Accurate Mental Model

Final Intuition

More articles by Lance Pollard

One Way to Build Big Apps with AI

How LLMs Work: A Practical but Brief Deep Dive

Others also viewed

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

Train AI to code using your internal library

RAG Pipeline

Claude vs GPT-4o in 2026: An Honest Developer's Comparison

GitNexus: The Nervous System Your AI Agent Was Missing

Beyond the API Call: 5 Hard-Won Lessons for Moving LLMs to Production

Tool Grounding: Why Function Calls Fail Without Semantic Contracts

Beyond “Vibes”: The 2026 Guide to LLM Evaluation Frameworks

Your AI Prompts Have a Vendor Lock-In Problem

Similar topics

How LLMs Handle Selective Reading Prompts

How to Use Step-by-Step Prompting in LLMs

LLM Prompting Techniques for Non-Programmers

How to Make LLM Output More Human-Like

How Llms Process Language

LLM Applications for Intermediate Programming Tasks

Solving Coding Challenges With LLM Tools

Explore content categories