How Claude Code Orchestrate Team of Agents

How Claude Code Orchestrate Team of Agents

I opened ~/.claude/teams While Swarm Mode Were Running.

Here's the Entire Coordination Protocol for

Agent Teams

.

Today I ran three Claude Code agents in parallel. They reviewed mock exam papers, checked answers, flagged syllabus violations, and sent reports back to a team lead agent. The whole thing took about five minutes.

When it finished, I opened a second terminal and started poking around the filesystem. What I found surprised me. The entire multi-agent coordination system runs on plain JSON files sitting in two directories under your home folder. No database, no message queue, no socket connections. Just files.

I spent the next hour reading every single one. This is what the system actually looks like from the inside.

Two directories. That's the whole system.

After the session, I navigated to the team directory:

~/.claude/teams/exam-review  ls
config.json inboxes
        

And inside inboxes:

answer-checker.json   syllabus-checker.json
style-reviewer.json   team-lead.json
        

One JSON file per agent. Including the team lead itself.

The sibling directory holds the task board:

~/.claude/tasks/exam-review  ls
.lock  1.json  2.json  3.json  4.json  5.json  6.json        

That's it. The entire state of a three-agent team lives in eleven files across two folders.

The team registry tells you everything about every agent.

I opened config.json and found the complete membership record. Each spawned agent gets an entry with its name, the model it runs on, what color its messages appear in the terminal, and most interestingly, the full system prompt it was given at spawn time.

Here is one of the member entries, trimmed for readability:

{
"agentId": "syllabus-checker@exam-review",
"name": "syllabus-checker",
"agentType": "general-purpose",
"model": "claude-opus-4-6",
"prompt": "You are a curriculum compliance reviewer for 111...",
"color": "green",
"backendType": "in-process",
"joinedAt": 1770331883262
}        

The prompt field contains the entire instruction set. Hundreds of words. After a session ends, you can open this file and see exactly what every agent was told to do. There is no hidden state.

The team lead entry looks slightly different. Its tmuxPaneId is an empty string while spawned agents have "in-process". This hints at two possible execution backends: one where agents run as subprocesses inside the lead's process, and another where they might each get their own terminal pane through tmux. The current default is in-process.

There is also a subscriptions field on every member, always set to an empty array. Looks like infrastructure for a pub/sub routing layer that either hasn't shipped yet or gets used internally.

The task board is a folder of numbered JSON files

Article content

Each task is its own file. The first three were the actual work items I created:

Task 1 asked the answer-checker to independently solve every exam question and compare answers. Task 2 asked the syllabus-checker to verify nothing fell outside the allowed lecture scope. Task 3 asked the style-reviewer to compare question patterns against the professor's actual exam.

Each file records the task subject, a detailed description, the current status (pending, in_progress, or completed), and dependency information through blocks and blockedBy arrays. Agents claim tasks by writing their name into an owner field.

There is no central scheduler distributing work. Each agent reads the task directory on its own, looks for unclaimed items with no unresolved dependencies, and grabs one. A .lock file in the directory serializes writes so two agents don't accidentally claim the same task.

Then I noticed something I didn't expect. Files 4, 5, and 6 existed too, but they looked different:

{
"id": "4",
"subject": "answer-checker",
"description": "You are an expert Java instructor reviewing mock exam answer keys for a COSC 111 (Intro to Java) mid",
"status": "in_progress",
"metadata": { "_internal": true }
}        

The system creates a shadow task for every spawned agent. The subject is just the agent's name, the description is a truncated copy of its prompt, and the _internal flag marks it as a system artifact rather than a user-created task. These shadow entries let the task board double as a process tracker. Scanning the task list shows both what work exists and which agents are alive.

Inboxes are where it gets interesting

Article content

Each agent has a file in the inboxes directory. The file is a JSON array. Sending a message to an agent means appending an object to their array.

I opened the answer-checker's inbox first. It contained exactly four messages, all from the team lead:

The first was a plain text work assignment: "Please start working on Task 1." The second arrived about 40 seconds later, a progress check: "How is your progress?" The third came two minutes in, more direct: "Please send me your complete report." The fourth was a shutdown request.

Every message had a read field set to true, confirming the agent consumed all of them before the session ended. The message format is simple: from, text, summary, timestamp, read. The summary field shows up as a preview label in the terminal UI.

The syllabus-checker's inbox revealed something different. Its first message was from itself:

{
"from": "syllabus-checker",
"text": "{\"type\":\"task_assignment\",\"taskId\":\"2\",\"subject\":\"Check syllabus compliance...\"}",
"color": "green",
"read": false
}        

Agents write to their own inbox at spawn time. The message contains a stringified JSON object with type: "task_assignment" and the full task description. When the agent's first turn begins, the system reads this self-addressed message and injects it as conversation context. It solves the cold-start problem: the agent "remembers" its assignment from the moment it wakes up.

This first message has read: false, which makes sense. It was never delivered through the normal polling cycle. The system consumed it directly during initialization.

The style-reviewer's inbox showed the same pattern. Self-assignment first, then progress pings from the lead, then shutdown. Its color field said "yellow" matching the color assigned in config.json.

Article content

The team lead's inbox is the complete audit trail

Article content

This was the longest file. It contained every report from every agent, every idle notification, and every shutdown confirmation. Reading it top to bottom reconstructs the entire session.

The first two messages arrived within five seconds of each other: the answer-checker's verification report and the style-reviewer's alignment analysis. Both were long, detailed plain text. Then came an idle notification, which is where the protocol gets weird.

Idle notifications, shutdown requests, and shutdown confirmations are all encoded as stringified JSON inside the text field of a regular message:

{
"from": "answer-checker",
"text": "{\"type\":\"idle_notification\",\"from\":\"answer-checker\",\"idleReason\":\"available\"
}"
        

The system overloads the human-readable text channel for machine-readable protocol messages. It works, but it means the inbox format contains two different kinds of data in the same field. A plain text report from an agent and a structured shutdown handshake look identical at the envelope level. The system has to try parsing the text field as JSON to figure out which is which.

My guess is the inbox format was designed for chat messages first, then protocol messages were added later without changing the schema. Pragmatic, if a bit awkward when you're reading the raw files.

The shutdown handshake runs on request IDs

The team lead fired all three shutdown requests within 1.2 seconds of each other. Each one contained a requestId built from a timestamp and the target agent's name: shutdown-1770332090297@answer-checker, shutdown-1770332090894@syllabus-checker, shutdown-1770332091487@style-reviewer.

The agents echoed back the same request ID in their approval messages. The answer-checker responded in 2.8 seconds. The style-reviewer took 31 seconds. The syllabus-checker took 45 seconds. The slower two were re-sending their full reports one more time before confirming the shutdown. You can see the long report messages timestamped between the request and the approval in the lead's inbox.

The request ID echo matters because the lead sent three requests almost simultaneously and approvals returned out of order. Without the ID, the lead couldn't tell which agent approved which request.

An agent can also reject a shutdown. The protocol supports sending back approve: false with a reason, like "Still working on task 3, need 5 more minutes." This prevents the lead from killing an agent that's in the middle of something.

Reconstructing the timeline from file timestamps.

Every message has a timestamp. Piecing them together across all four inbox files gives the full picture.

The team was created at 22:50:18. Task files appeared a second later. The syllabus-checker spawned at 22:51:23 and immediately wrote a self-assignment to its own inbox. The style-reviewer followed nine seconds later. The team lead sent its first work instruction at 22:51:56.

At 22:52:33, the lead pinged all three agents asking for progress. All three pings landed within two seconds of each other. Then about 60 seconds of silence while the agents read PDFs, traced Java code, and composed their reports.

The answer-checker reported at 22:53:38. The style-reviewer five seconds later. The syllabus-checker came in at 22:54:12, a bit behind because it was reading the actual syllabus PDF.

Shutdown requests went out at 22:54:50. The last agent confirmed at 22:55:36. Cleanup deleted both directories. Total wall time: about five minutes and twenty seconds for three agents working in parallel, each reading multiple documents and producing multi-page analysis reports.

What this design gets right.

The whole thing is inspectable. While agents are running, you can open another terminal, cat any inbox file, and watch messages appear. You can read task files to see what's claimed and what's pending. There is no opaque runtime state. Everything is on disk, in a format you can read with a text editor.

If Claude Code crashes mid-session, the directories survive. Unread messages stay in inbox files. Incomplete tasks keep their in_progress status. A resumed session could theoretically pick up from where things broke.

There is natural backpressure built into the design. Messages land in inbox files and sit there until the recipient's next turn starts. An agent processing a complex task won't get interrupted by a flood of incoming messages. They queue up and get delivered in batch. This avoids the cascading interruption problem that makes real-time multi-agent setups fragile.

And the auditability story is strong. The prompt field in config.json shows what every agent was told. The inbox files preserve the complete conversation. The task board shows what work was defined, who owned it, and what finished. After a session, you can reconstruct every decision.

What's awkward.

The JSON-in-JSON encoding is the biggest rough edge. Protocol messages get serialized into a string, then stuffed into the text field of a regular message object. Reading raw inbox files means mentally parsing escaped quotes inside escaped quotes. A dedicated type field at the message envelope level would make the files much easier to read by hand.

The self-assignment pattern, where agents write a task_assignment message to their own inbox at startup, feels like a workaround rather than a designed feature. The same information already exists in the agent's prompt field in config.json. Storing it twice, in two different formats, adds redundancy.

Scaling has a natural ceiling. File I/O per message works fine for three to five agents. At fifty agents with high message frequency, the append-and-read cycle would slow down noticeably. But that's not what this system is for. It's a CLI tool running on one machine. The design fits the use case.

There is no real-time message delivery either. If Agent A sends an urgent correction to Agent B while B is mid-computation, B won't see it until its current turn ends. Sometimes that's a feature, sometimes it's a limitation. Depends on the situation.

The architecture in one paragraph.

When you spawn a team, Claude Code creates two sibling directories under ~/.claude/. The teams/ directory holds a config.json (the member registry with full prompts) and an inboxes folder with one JSON array file per agent. The tasks/ directory holds numbered JSON files for the shared task board plus a lock file. Communication happens by appending messages to inbox files. Agents bootstrap by writing self-assignment messages to their own inbox. The system tracks agent lifecycle through shadow tasks marked with _internal: true. Shutdown follows a two-phase commit pattern with echoed request IDs. Cleanup deletes both directories after all agents confirm termination.

No servers. No databases. No sockets. Eleven JSON files and a lock file in two folders on your local disk. That's the entire protocol for multi-agent coordination in Claude Code.

If you run a team session, try opening ~/.claude/teams/ in another terminal while it's still going. Watch the inbox files grow. It's the most transparent multi-agent system I've come across, and the simplicity of the implementation is the most interesting part.

Article content

All snippets in this post come from actual files observed on disk during a real session. The agent teams feature is experimental, gated behind CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS in settings.json, and the internals will probably change. But the filesystem-as-coordination-layer philosophy is the kind of design decision that tends to stick.

Fantastic deep-dive on the coordination protocol. The filesystem-based approach is elegant for inspectability, and you nailed the scaling observation: 'File I/O per message works fine for three to five agents. At fifty agents... would slow down noticeably.' The bigger bottleneck isn't the coordination layer though—it's that agents start every session from scratch. No memory of past runs, past failures, or architectural decisions. Carlini's C compiler project hit this hard: ~25-40% of compute spent on orientation overhead, duplicate work, and repeating failed approaches. The coordination protocol you documented handles *how* agents talk to each other within one session. But cross-session memory—what was tried before, what failed, what architectural constraints exist—that's still manual (READMEs, lock files, oracle systems). As these teams scale past 5 agents and run repeatedly over weeks, the memory problem compounds faster than the I/O problem. Wrote up the cost math here: https://medium.com/@mrsandelin/your-ai-agent-teams-are-burning-money-heres-the-math-939e3b3b9d88

Sigrid, the filesystem-based architecture is what surprised me most when I dug into Agent Teams on launch day. JSON files for coordination sounds almost too simple, but the auditability advantage is real. You can literally read the conversation history between agents in a text editor while they work. When I ran 4 Opus 4.6 agents in parallel, being able to inspect the task board and message queues in real-time was invaluable for understanding how agents self-organized. The two-phase shutdown protocol you describe solved a real problem I hit: knowing when agents are actually done versus just idle. Full experiment: https://thoughts.jock.pl/p/opus-4-6-agent-experiment-2026

To view or add a comment, sign in

More articles by Sigrid Jin (Jin Hyung Park) 🌈

  • What you need to learn from claw-code repo

    https://github.com/instructkr/claw-code Stop Staring at the Files People are losing their minds over the fact that the…

    8 Comments
  • Unconstrained Agent Ontologies Considered Harmful

    Let's talk about building a Palantir style ontology. Many people are experimenting with this right now, and I want to…

  • Essay: The loop was always there

    There's a narrative forming in the AI engineering world that goes something like this. Generation is getting cheap, so…

    1 Comment
  • How the Claude Code Authentication Actually Works Under the Hood

    If you have ever typed claude auth login into a terminal and watched a browser tab pop open, you already know the…

  • Releasing bb25 0.2.0: Why Bayesian BM25 (bb25) extends well far-beyond search?

    1 Background Bayesian BM25, shortly known as bb25, from Cognica wraps BM25 scores in Bayes' theorem to turn them into…

    1 Comment
  • HuggingFace TEI(Text Embeddings Inference) 아키텍처 심층 분석

    https://github.com/huggingface/text-embeddings-inference 들어가며: TEI가 왜 필요한가 자연어 처리 프로젝트를 하다 보면 텍스트를 숫자 벡터로 바꾸는 임베딩 작업을…

    8 Comments
  • AdaptiveAvgPool1d: DeepSeekOCR 에서의 시퀀스 압축의 숨은 주역

    딥러닝을 공부하다 보면 수많은 레이어와 연산자들을 만나게 됩니다. 그중에서 Pooling 레이어는 CNN(합성곱 신경망)에서 워낙 기본적으로 다뤄지다 보니, 많은 분들이 이미지 처리에서만 사용되는 것으로 생각하기…

    5 Comments
  • NVSwitch 기반 멀티 GPU 환경의 복잡한 장애 상황

    A100 SXM 아키텍처의 특수성 대규모 딥러닝 워크로드를 처리하는 데이터센터 환경에서 NVIDIA A100 SXM 시스템은 가장 강력한 컴퓨팅 플랫폼 중 하나로 자리잡았다. 8개의 A100 GPU가…

    2 Comments
  • 문서 벡터는 PQ로 압축하겠는데, 쿼리 벡터도 같이 양자화할까?

    고차원 벡터 공간에서 근사 최근접 이웃(Approximate Nearest Neighbor, ANN)을 빠르고, 그리고 무엇보다 “메모리를 거의 쓰지 않고” 찾기 위한 방법을 제안하는, 지금 기준으로 보면 벡터…

  • 시속 300km를 달리는 AI 와 함께 개발한다는 것은

    최근 몇 달 사이 AI 코딩 에이전트는 개발 현장의 일상으로 자리 잡았습니다. Claude Code 같은 도구들이 이제 코드의 상당 부분을 직접 작성하는 시대가 된 것이죠.

    9 Comments

Others also viewed

Explore content categories