Claude Code Architecture
A deep dive into Anthropic's agentic CLI — the query loop, tool execution harness, hooks system, permission model, multi-agent coordination, and prompt caching strategy.
High-Level Architecture
Claude Code is a terminal-based agentic coding assistant. At its core, it's a loop: send messages to the Claude API, receive tool calls, execute them with permission checks and hooks, feed results back, and repeat.
The entire system is built as async generators — each layer yields messages upstream for the UI to render in real-time. This is what makes tool progress, streaming text, and hook feedback all appear incrementally.
The Query Loop query.ts
The heart of Claude Code. queryLoop() is a while(true) loop that handles message preparation, API calls, tool execution, and error recovery across turns.
Each Iteration
callModel() generator, receive text/thinking/tool_use blocks as they streamstop_reason === "tool_use", append results and loop. Otherwise, run stop hooks and return.continue sites — collapse drain retry, reactive compact retry, max output token escalation, max output token recovery, stop hook blocking, token budget continuation, and the normal next-turn continue. Each transitions the State object with tracking of why the continue happened.
Streaming & Fallback
The API call uses a generator (callModel()) that yields StreamEvent messages as they arrive from the Anthropic API. Text, thinking blocks, and tool_use blocks arrive incrementally.
Streaming Tool Executor
While the model is still generating, tool_use blocks that have completed streaming are immediately dispatched for execution via the StreamingToolExecutor. This means tools can finish before the model even stops generating.
// While model still streams, start executing completed tool_use blocks if (streamingToolExecutor) { streamingToolExecutor.addTool(toolUseBlock) // Feed in as they arrive yield* streamingToolExecutor.drainResults() // Yield completed results }
Fallback Model
If the primary model throws a FallbackTriggeredError (overloaded, rate limited), the loop catches it, clears partial state, switches to the fallback model, and retries from the same point. Thinking blocks are stripped from the retry to avoid signature mismatches between models.
Compaction Pipeline 5 Layers
Before each API call, the conversation is compressed through up to 5 stages to stay within context limits:
| Stage | Strategy | When |
|---|---|---|
| Tool Result Budget | Limits per-message tool output size | Always |
| History Snip | Removes messages beyond a threshold | HISTORY_SNIP gate |
| Microcompact | Client-side tool call summarization (time-based decay) | Always |
| Context Collapse | Strategic staged summaries at breakpoints | CONTEXT_COLLAPSE gate |
| Autocompact | Full conversation summary when over token limit | When approaching limit |
Error Recovery
The query loop has multiple recovery strategies that fire in escalating order:
| Error | Recovery | Max Retries |
|---|---|---|
| Prompt too long | 1. Drain staged context collapses 2. Reactive compact (full summary) |
Until exhausted |
| Max output tokens | 1. Escalate to 64K cap 2. Multi-turn recovery with nudge message |
3 turns |
| Model overloaded | Switch to fallback model, strip thinking blocks, retry | 1 |
| Abort (Ctrl+C) | Synthetic tool_results for orphaned tool_use blocks, cleanup hooks | — |
Tool Architecture Tool.ts
Every tool is built with buildTool() which applies fail-closed defaults:
isConcurrencySafe: falseisReadOnly: falseisDestructive: falseisEnabled: true
Tool Capabilities
| Property | Purpose |
|---|---|
inputSchema | Zod v4 schema for input validation |
call() | The actual execution function |
checkPermissions() | Tool-specific permission logic |
preparePermissionMatcher() | Normalizes input for permission rule matching |
backfillObservableInput() | Adds legacy fields before permission checks |
maxResultSizeChars | Limit before result is persisted to disk |
searchHint | 3-10 word description for ToolSearch matching |
isDeferred | Schema loaded lazily via ToolSearch |
Tool Execution Lifecycle toolExecution.ts
Every tool call follows this exact sequence:
runToolUse(toolUseBlock) // 1. Find tool by name (with deprecated alias fallback) // 2. Validate input against Zod schema // 3. Run PreToolUse hooks → executePreToolHooks(toolName, input) // Hooks can: approve, deny, modify input, inject context // 4. Check permissions → resolveHookPermissionDecision() // Combines hook decision + canUseTool() + rule-based checks // Priority: deny > ask > allow > passthrough // 5. Execute tool → tool.call(validatedInput, context) // Returns: { data, newMessages?, success } // 6. Run PostToolUse hooks → runPostToolUseHooks(toolName, input, output) // Hooks can: modify MCP output, inject context, stop continuation // 7. On error: run PostToolUseFailure hooks instead
"allow" does NOT bypass settings.json deny/ask rules. The hook permission and the rule-based permission are combined, with deny always winning.
Concurrency Model toolOrchestration.ts
When the model returns multiple tool_use blocks, they're partitioned into batches:
partitionToolCalls([Read, Read, Edit, Grep, Bash]) → Batch 1: [Read, Read] // concurrent (both read-only) → Batch 2: [Edit] // serial (writes) → Batch 3: [Grep] // concurrent (read-only) → Batch 4: [Bash] // serial (not concurrency-safe)
Concurrent batches run up to 10 tools in parallel. Context modifiers from concurrent tools are queued and applied after the batch completes. Serial tools apply context modifiers immediately.
Streaming Tool Executor
The most interesting optimization: tools start executing while the model is still generating.
Hooks System 5,000+ lines
The hooks system is the extensibility layer of the harness. There are 25+ hook events across the entire lifecycle:
| Event | When | Can Do |
|---|---|---|
PreToolUse | Before tool execution | Block, modify input, set permission |
PostToolUse | After tool success | Modify MCP output, inject context |
PostToolUseFailure | After tool error | React to failures |
UserPromptSubmit | User sends message | Transform/block input |
Stop | Model finishes turn | Block stop with feedback, rewake |
SessionStart | Session begins | Setup, inject context, set watch paths |
SessionEnd | Session ends | Teardown |
SubagentStart/Stop | Agent spawning | Track sub-agents |
PreCompact/PostCompact | Context compression | React to compaction |
FileChanged | FS watch fires | React to external changes |
PermissionRequest | Permission check | Custom permission logic |
Hook Configuration
Hooks are configured in settings.json at user, project, or policy level:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "my-lint-check.sh",
"if": "Bash(npm *)",
"timeout": 10
}
]
}
]
}
}
Four Hook Types
async and asyncRewake for background execution.
Shell can be bash or powershell.
$ARGUMENTS placeholder.
Can specify a different model (e.g. claude-sonnet-4-6).
$VAR_NAME
(requires allowedEnvVars list). Must return JSON.
HookJSONOutput.
Matchers & Conditions
Each hook entry has a matcher (which tool to match) and optional if condition:
"matcher": "Write"— exact match on tool name"matcher": "Write|Edit"— pipe-separated alternatives"matcher": "^Write.*"— regex match"matcher": "*"— matches all tools"if": "Bash(git *)"— permission rule syntax, filters before spawning
stdin/stdout Protocol Command Hooks
Command hooks communicate via JSON over stdin/stdout. Here's the exact protocol:
Input (stdin)
{
"session_id": "uuid",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/working/directory",
"permission_mode": "default",
"hook_event_name": "PreToolUse",
"tool_name": "Bash",
"tool_input": { "command": "git status" },
"tool_use_id": "toolu_abc123"
}
Output (stdout)
{
"decision": "approve", // or "block"
"reason": "Looks safe",
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "allow", // "allow" | "deny" | "ask"
"updatedInput": { "command": "git status --short" },
"additionalContext": "Reminder: this repo uses trunk-based development"
}
}
Async Detection
A hook can background itself by emitting this as its first line of stdout:
{ "async": true, "asyncTimeout": 5000 }
The harness immediately returns an empty result and the hook continues running in the AsyncHookRegistry. If asyncRewake is set and the hook exits with code 2, it re-wakes the model with its output.
Exit Code Semantics
| Exit Code | Behavior | Who Sees Output |
|---|---|---|
| 0 | Success | stdout shown in transcript mode only (Ctrl+O) |
| 2 | Blocking error — halts tool execution | stderr injected into conversation as system message (the model sees it) |
| 1, 3+ | Non-blocking error | stderr shown to user only, not to model |
Permission System 5 Rule Sources
Every tool invocation passes through a multi-layer permission check:
~/.claude/settings.json — user's global preferences.claude/settings.json — repo-level configuration--allow / --deny flags on invocationPermission Behaviors
allow— tool runs without promptingdeny— tool is blocked, model sees denial messageask— user is prompted interactively
Rule Syntax
// Allow all git commands { "tool": "Bash(git *)", "permission": "allow" } // Deny writes to node_modules { "tool": "Write(node_modules/**)", "permission": "deny" } // Ask before any destructive bash command { "tool": "Bash(rm *)", "permission": "ask" }
Rules use bash-style glob patterns. Tools implement preparePermissionMatcher() to normalize their input for pattern matching (e.g., FileEditTool matches against file paths, BashTool matches against commands).
Speculative Classifiers
In auto-mode, two classifiers run speculatively in the background while the permission prompt is shown to the user:
Both are gated by feature flags (BASH_CLASSIFIER, TRANSCRIPT_CLASSIFIER). There's also denial tracking — after N auto-denials, the system falls back to interactive prompting.
System Prompt Assembly
The system prompt is built from a 5-level priority stack:
| Priority | Source | Behavior |
|---|---|---|
| 0 (highest) | Override | Replaces everything. Used by loop mode, testing. |
| 1 | Coordinator | Multi-worker mode prompt. Feature-gated. |
| 2 | Agent | Sub-agent definitions. In proactive mode: appended to default. Otherwise: replaces default. |
| 3 | Custom | --system-prompt flag. |
| 4 (lowest) | Default | The standard Claude Code prompt. |
appendSystemPrompt is always added at the end (unless override is set).
Default Prompt Sections
The default system prompt is assembled from these sections in prompts.ts:
- Intro (model description, safety instructions)
- System section (tool execution, tags, compression)
- Doing tasks section (task approach guidance)
- Actions section (reversibility, blast radius)
- Using your tools section (tool-specific instructions)
- Tone and style section
- Output efficiency section
- — DYNAMIC BOUNDARY —
- Session guidance, memory, environment info, language, output style, MCP instructions...
Cache Boundary Critical Optimization
The system prompt is split at __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__:
Section Caching
Dynamic sections use two caching strategies:
systemPromptSection()— computed once, cached until/clearor/compactDANGEROUS_uncachedSystemPromptSection()— recomputed every turn, breaks prompt cache on changes. Used for MCP instructions (servers connect/disconnect between turns)
Context Injection
Two context objects are injected alongside the system prompt:
System Context
- Git status — branch, main branch, status (max 2,000 chars), recent commits, git user
- Cache breaker — ANT-only ephemeral injection for debugging
User Context
- CLAUDE.md files — auto-discovered via directory walk, filtered to remove injected memory files
- Current date — always included
System context is prepended, user context is appended. Both are memoized at module load and cached for the session.
Memory System memdir/
Persistent memory stored in ~/.claude/projects/<slug>/memory/:
| Constant | Value |
|---|---|
| Max MEMORY.md lines | 200 |
| Max MEMORY.md bytes | 25,000 |
| Memory types | user, feedback, project, reference |
Three Memory Modes
Daily-Log Mode (KAIROS)
Append-only daily logs at memory/logs/YYYY/MM/YYYY-MM-DD.md. No MEMORY.md content loading. Used in assistant/proactive mode.
Team Memory Mode (TEAMMEM)
Sync'd across teammates via permissionSync.ts. Secret-guarded: checkTeamMemSecrets() prevents committing credentials to shared memory.
Auto Memory (default)
Standard mode. Memory directory created idempotently once per session. Builds behavioral instructions + MEMORY.md content into system prompt. Auto-extracted from conversations via extractMemories service.
Sub-Agent System AgentTool
The AgentTool spawns isolated sub-agents that run their own query loops with restricted tool sets.
Agent Properties
- Each agent gets a
subagent_type, custom system prompt, and tool restrictions - Optional
isolation: "worktree"for git worktree isolation - Auto-backgrounds after 120 seconds (GrowthBook-gated)
- Background tasks disabled globally via
CLAUDE_CODE_DISABLE_BACKGROUND_TASKS - Results arrive as
<task-notification>XML blocks
Fork Subagent FORK_SUBAGENT gate
The most sophisticated agent pattern. When enabled, omitting subagent_type triggers an implicit fork of the current agent.
Cache Optimization
Fork children receive the parent's exact system prompt bytes to preserve prompt cache identity. The fork message contains:
- The entire parent assistant message (all tool_use blocks, thinking, text)
- Identical placeholder tool results for all sibling forks:
"Fork started — processing in background" - Only the final text block differs per child (the directive)
This maximizes cache hits across siblings — the messages are identical until the final divergence point.
Fork Child Rules
- You ARE a fork — don't spawn sub-agents
- No conversation; use tools directly
- Commit changes before reporting (include hash)
- Minimal output: "Scope:", "Result:", "Key files:" format
Coordinator Mode
Gate: COORDINATOR_MODE + CLAUDE_CODE_COORDINATOR_MODE env var.
A single coordinator spawns multiple worker agents via the Agent tool. Workers run as background agents with restricted tool sets:
Workers can share knowledge via a scratchpad directory (gated by tengu_scratch). The coordinator waits for actual results — no prediction.
Swarm Infrastructure
Full multi-process agent orchestration with multiple terminal backends:
| Backend | File | How |
|---|---|---|
| In-Process | inProcessRunner.ts | Isolated agent in same process (53KB) |
| tmux | backends/tmux | Spawns panes in tmux session |
| iTerm2 | backends/iterm2 | Uses iTerm2 scripting API |
| Kitty | backends/kitty | Uses Kitty remote control |
Permission bubbling: teammate agents can escalate permission requests up to the leader. Team files managed via teamHelpers.ts with task numbering.
Feature Flags ~50+ gates
Bun's feature() function enables build-time dead code elimination. Disabled features are completely stripped from the binary.
| Gate | Controls |
|---|---|
PROACTIVE / KAIROS | Autonomous assistant mode, daily logs, brief tool |
COORDINATOR_MODE | Multi-worker orchestration |
FORK_SUBAGENT | Implicit fork delegation |
VOICE_MODE | Voice input command |
AGENT_TRIGGERS | Cron-scheduled agents |
BRIDGE_MODE | IDE extension communication |
CACHED_MICROCOMPACT | Prompt cache-aware compaction |
TRANSCRIPT_CLASSIFIER | Auto-mode transcript analysis |
BASH_CLASSIFIER | Auto-mode bash safety check |
TOKEN_BUDGET | Per-task token budget tracking |
CONTEXT_COLLAPSE | Staged context summaries |
HISTORY_SNIP | Old history removal |
REACTIVE_COMPACT | On-error full compaction |
EXPERIMENTAL_SKILL_SEARCH | Skill discovery |
TEAMMEM | Team memory sync |
ANT-Only Code
process.env.USER_TYPE === 'ant' is a build-time constant. Ant-only features are DCE'd from external builds:
- Remote isolation mode for agents
agents-platformcommand- Internal telemetry, cache breaking injection
- Numeric length anchors (token reduction hints)
- Model codename references
Speculative Execution Hidden Feature
Claude Code can speculatively execute the next task in the background before you even ask.
How It Works
- After completing a turn, a forked agent runs in an isolated overlay directory
- The overlay allows write tools (Edit, Write) to modify copies of files
- Read-only tools (Read, Glob, Grep) access the real filesystem
- Max 20 turns, 100 messages per speculation
- If the user accepts the suggestion, overlay files are copied to the real working directory
- If rejected, the overlay is discarded
Undercover Mode ANT-Only
When Anthropic engineers use Claude Code on public repos, this auto-activates:
- Strips model codenames (Tengu, Capybara, etc.) from commits and PRs
- Strips internal repo names and Slack references
- External repos are default — only an internal allowlist turns it off
- No force-off — safety guarantee that codenames never leak
- Controlled via
CLAUDE_CODE_UNDERCOVER=1or auto-detection
Cost Tracking cost-tracker.ts
Token usage and costs are tracked per-session and persisted across resumes:
- Input, output, cache read, cache write tokens tracked per model
- API duration, tool duration, wall duration tracked
- Lines of code changed tracked
- FPS metrics (terminal rendering performance) optionally captured
- Advisor tool usage tracked recursively
- Persisted to project config on session switch, restored on resume
Limits & Constants
| Limit | Value | Where |
|---|---|---|
| MEMORY.md max lines | 200 | memdir.ts |
| MEMORY.md max bytes | 25,000 | memdir.ts |
| Git status max chars | 2,000 | context.ts |
| Max concurrent tools | 10 | toolOrchestration.ts |
| Max output token recovery turns | 3 | query.ts |
| Max compact retries | 2 | compact.ts |
| Post-compact restored files | 5 | compact.ts |
| Post-compact token budget | 50,000 | compact.ts |
| Post-compact per file max | 5,000 tokens | compact.ts |
| Speculation max turns | 20 | speculation.ts |
| Speculation max messages | 100 | speculation.ts |
| FileEditTool max file size | 1 GiB | FileEditTool.ts |
| BashTool security rules | 102 KB | bashSecurity.ts |
| Auto-background agent timeout | 120 seconds | AgentTool.tsx |
| Bash classifier timeout | 2 seconds | useCanUseTool.tsx |
Trust Gate
All hooks require workspace trust in interactive mode. This is a defense-in-depth measure:
function shouldSkipHookDueToTrust(): boolean { const isInteractive = !getIsNonInteractiveSession() if (!isInteractive) return false // SDK mode: trust implicit const hasTrust = checkHasTrustDialogAccepted() return !hasTrust // Interactive: all hooks require trust }
This prevents a malicious .claude/settings.json in a cloned repo from executing arbitrary commands before the user accepts the trust dialog. Applied to ALL hook types with no exceptions.
instructkr/claude-code.