Claude Code Architecture

A deep dive into Anthropic's agentic CLI — the query loop, tool execution harness, hooks system, permission model, multi-agent coordination, and prompt caching strategy.

1,902 TypeScript files 512,685 lines of code Runtime: Bun UI: React + Ink

High-Level Architecture

Claude Code is a terminal-based agentic coding assistant. At its core, it's a loop: send messages to the Claude API, receive tool calls, execute them with permission checks and hooks, feed results back, and repeat.

Core Execution Path

Entry

REPL.tsx

screens/REPL.tsx

→

Orchestrator

QueryEngine

QueryEngine.ts

→

Core Loop

queryLoop()

query.ts

→

API Call

callModel()

services/api/

→

Dispatch

runTools()

toolOrchestration.ts

→

Execute

runToolUse()

toolExecution.ts

The entire system is built as async generators — each layer yields messages upstream for the UI to render in real-time. This is what makes tool progress, streaming text, and hook feedback all appear incrementally.

The Query Loop query.ts

The heart of Claude Code. queryLoop() is a while(true) loop that handles message preparation, API calls, tool execution, and error recovery across turns.

Each Iteration

Compaction

Apply tool result budgets, snip old history, microcompact, context collapse, autocompact

API Streaming

Call callModel() generator, receive text/thinking/tool_use blocks as they stream

Tool Dispatch

Partition tool calls into concurrent/serial batches, execute with permissions and hooks

Result Collection

Gather tool results, attachment messages, queued commands, task notifications

Recurse or Stop

If stop_reason === "tool_use", append results and loop. Otherwise, run stop hooks and return.

Key insight: The loop has 7 different continue sites — collapse drain retry, reactive compact retry, max output token escalation, max output token recovery, stop hook blocking, token budget continuation, and the normal next-turn continue. Each transitions the State object with tracking of why the continue happened.

Streaming & Fallback

The API call uses a generator (callModel()) that yields StreamEvent messages as they arrive from the Anthropic API. Text, thinking blocks, and tool_use blocks arrive incrementally.

Streaming Tool Executor

While the model is still generating, tool_use blocks that have completed streaming are immediately dispatched for execution via the StreamingToolExecutor. This means tools can finish before the model even stops generating.

// While model still streams, start executing completed tool_use blocks
if (streamingToolExecutor) {
  streamingToolExecutor.addTool(toolUseBlock)       // Feed in as they arrive
  yield* streamingToolExecutor.drainResults()     // Yield completed results
}

Fallback Model

If the primary model throws a FallbackTriggeredError (overloaded, rate limited), the loop catches it, clears partial state, switches to the fallback model, and retries from the same point. Thinking blocks are stripped from the retry to avoid signature mismatches between models.

Compaction Pipeline 5 Layers

Before each API call, the conversation is compressed through up to 5 stages to stay within context limits:

Stage	Strategy	When
Tool Result Budget	Limits per-message tool output size	Always
History Snip	Removes messages beyond a threshold	`HISTORY_SNIP` gate
Microcompact	Client-side tool call summarization (time-based decay)	Always
Context Collapse	Strategic staged summaries at breakpoints	`CONTEXT_COLLAPSE` gate
Autocompact	Full conversation summary when over token limit	When approaching limit

Post-compact restoration: After a full compact, up to 5 recently-read files are re-injected (budget: 50K tokens total, 5K per file) so the model doesn't lose track of what it was working on.

Error Recovery

The query loop has multiple recovery strategies that fire in escalating order:

Error	Recovery	Max Retries
Prompt too long	1. Drain staged context collapses 2. Reactive compact (full summary)	Until exhausted
Max output tokens	1. Escalate to 64K cap 2. Multi-turn recovery with nudge message	3 turns
Model overloaded	Switch to fallback model, strip thinking blocks, retry	1
Abort (Ctrl+C)	Synthetic tool_results for orphaned tool_use blocks, cleanup hooks	—

Tool Architecture Tool.ts

Every tool is built with buildTool() which applies fail-closed defaults:

Defaults (safe)

isConcurrencySafe: false
isReadOnly: false
isDestructive: false
isEnabled: true

Must Opt In

Tools must explicitly declare themselves safe for concurrency, read-only, or non-destructive. The default assumes the worst.

Tool Capabilities

Property	Purpose
`inputSchema`	Zod v4 schema for input validation
`call()`	The actual execution function
`checkPermissions()`	Tool-specific permission logic
`preparePermissionMatcher()`	Normalizes input for permission rule matching
`backfillObservableInput()`	Adds legacy fields before permission checks
`maxResultSizeChars`	Limit before result is persisted to disk
`searchHint`	3-10 word description for ToolSearch matching
`isDeferred`	Schema loaded lazily via ToolSearch

Tool Execution Lifecycle toolExecution.ts

Every tool call follows this exact sequence:

runToolUse(toolUseBlock)
  // 1. Find tool by name (with deprecated alias fallback)
  // 2. Validate input against Zod schema
  // 3. Run PreToolUse hooks
  → executePreToolHooks(toolName, input)
      // Hooks can: approve, deny, modify input, inject context

  // 4. Check permissions
  → resolveHookPermissionDecision()
      // Combines hook decision + canUseTool() + rule-based checks
      // Priority: deny > ask > allow > passthrough

  // 5. Execute tool
  → tool.call(validatedInput, context)
      // Returns: { data, newMessages?, success }

  // 6. Run PostToolUse hooks
  → runPostToolUseHooks(toolName, input, output)
      // Hooks can: modify MCP output, inject context, stop continuation

  // 7. On error: run PostToolUseFailure hooks instead

Permission precedence: A PreToolUse hook returning "allow" does NOT bypass settings.json deny/ask rules. The hook permission and the rule-based permission are combined, with deny always winning.

Concurrency Model toolOrchestration.ts

When the model returns multiple tool_use blocks, they're partitioned into batches:

partitionToolCalls([Read, Read, Edit, Grep, Bash])
  → Batch 1: [Read, Read]         // concurrent (both read-only)
  → Batch 2: [Edit]               // serial (writes)
  → Batch 3: [Grep]               // concurrent (read-only)
  → Batch 4: [Bash]               // serial (not concurrency-safe)

Concurrent batches run up to 10 tools in parallel. Context modifiers from concurrent tools are queued and applied after the batch completes. Serial tools apply context modifiers immediately.

Streaming Tool Executor

The most interesting optimization: tools start executing while the model is still generating.

Timeline

API Stream

Model generates tool_use blocks...

→

Parallel

Completed blocks dispatched immediately

→

Result

Tool results ready before model stops

Hooks System 5,000+ lines

The hooks system is the extensibility layer of the harness. There are 25+ hook events across the entire lifecycle:

Event	When	Can Do
`PreToolUse`	Before tool execution	Block, modify input, set permission
`PostToolUse`	After tool success	Modify MCP output, inject context
`PostToolUseFailure`	After tool error	React to failures
`UserPromptSubmit`	User sends message	Transform/block input
`Stop`	Model finishes turn	Block stop with feedback, rewake
`SessionStart`	Session begins	Setup, inject context, set watch paths
`SessionEnd`	Session ends	Teardown
`SubagentStart/Stop`	Agent spawning	Track sub-agents
`PreCompact/PostCompact`	Context compression	React to compaction
`FileChanged`	FS watch fires	React to external changes
`PermissionRequest`	Permission check	Custom permission logic

Hook Configuration

Hooks are configured in settings.json at user, project, or policy level:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "my-lint-check.sh",
            "if": "Bash(npm *)",
            "timeout": 10
          }
        ]
      }
    ]
  }
}

Four Hook Types

command

Spawns a shell process. Receives JSON on stdin, returns JSON on stdout. Supports async and asyncRewake for background execution. Shell can be bash or powershell.

prompt

Sends a prompt to an LLM for evaluation. Supports $ARGUMENTS placeholder. Can specify a different model (e.g. claude-sonnet-4-6).

http

POSTs JSON to an external URL. Headers can interpolate env vars via $VAR_NAME (requires allowedEnvVars list). Must return JSON.

agent

Spins up a full agentic verification loop with a prompt. Can specify model. Returns structured HookJSONOutput.

Matchers & Conditions

Each hook entry has a matcher (which tool to match) and optional if condition:

"matcher": "Write" — exact match on tool name
"matcher": "Write|Edit" — pipe-separated alternatives
"matcher": "^Write.*" — regex match
"matcher": "*" — matches all tools
"if": "Bash(git *)" — permission rule syntax, filters before spawning

stdin/stdout Protocol Command Hooks

Command hooks communicate via JSON over stdin/stdout. Here's the exact protocol:

Input (stdin)

{
  "session_id": "uuid",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/working/directory",
  "permission_mode": "default",
  "hook_event_name": "PreToolUse",
  "tool_name": "Bash",
  "tool_input": { "command": "git status" },
  "tool_use_id": "toolu_abc123"
}

Output (stdout)

{
  "decision": "approve",               // or "block"
  "reason": "Looks safe",
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "allow",     // "allow" | "deny" | "ask"
    "updatedInput": { "command": "git status --short" },
    "additionalContext": "Reminder: this repo uses trunk-based development"
  }
}

Async Detection

A hook can background itself by emitting this as its first line of stdout:

{ "async": true, "asyncTimeout": 5000 }

The harness immediately returns an empty result and the hook continues running in the AsyncHookRegistry. If asyncRewake is set and the hook exits with code 2, it re-wakes the model with its output.

Exit Code Semantics

Exit Code	Behavior	Who Sees Output
0	Success	stdout shown in transcript mode only (Ctrl+O)
2	Blocking error — halts tool execution	stderr injected into conversation as system message (the model sees it)
1, 3+	Non-blocking error	stderr shown to user only, not to model

Exit code 2 is the power feature. It lets a hook inject feedback directly into the model's context. Example: a linter hook that returns exit 2 with the lint errors — the model sees them and can fix the code.

Permission System 5 Rule Sources

Every tool invocation passes through a multi-layer permission check:

Policy Settings

Admin-enforced, managed remotely. Highest priority — cannot be overridden.

User Settings

~/.claude/settings.json — user's global preferences

Project Settings

.claude/settings.json — repo-level configuration

CLI Arguments

--allow / --deny flags on invocation

Session Rules

Transient rules set during the session via REPL commands

Permission Behaviors

allow — tool runs without prompting
deny — tool is blocked, model sees denial message
ask — user is prompted interactively

Rule Syntax

// Allow all git commands
{ "tool": "Bash(git *)", "permission": "allow" }

// Deny writes to node_modules
{ "tool": "Write(node_modules/**)", "permission": "deny" }

// Ask before any destructive bash command
{ "tool": "Bash(rm *)", "permission": "ask" }

Rules use bash-style glob patterns. Tools implement preparePermissionMatcher() to normalize their input for pattern matching (e.g., FileEditTool matches against file paths, BashTool matches against commands).

Speculative Classifiers

In auto-mode, two classifiers run speculatively in the background while the permission prompt is shown to the user:

Bash Classifier

Trained on safe bash patterns. Runs with a 2-second timeout. If it classifies a command as safe before the user responds, auto-approves.

Transcript Classifier

Analyzes the full conversation transcript to determine if a tool call is safe. More powerful but slower than bash classifier.

Both are gated by feature flags (BASH_CLASSIFIER, TRANSCRIPT_CLASSIFIER). There's also denial tracking — after N auto-denials, the system falls back to interactive prompting.

System Prompt Assembly

The system prompt is built from a 5-level priority stack:

Priority	Source	Behavior
0 (highest)	Override	Replaces everything. Used by loop mode, testing.
1	Coordinator	Multi-worker mode prompt. Feature-gated.
2	Agent	Sub-agent definitions. In proactive mode: appended to default. Otherwise: replaces default.
3	Custom	`--system-prompt` flag.
4 (lowest)	Default	The standard Claude Code prompt.

appendSystemPrompt is always added at the end (unless override is set).

Default Prompt Sections

The default system prompt is assembled from these sections in prompts.ts:

Intro (model description, safety instructions)
System section (tool execution, tags, compression)
Doing tasks section (task approach guidance)
Actions section (reversibility, blast radius)
Using your tools section (tool-specific instructions)
Tone and style section
Output efficiency section
— DYNAMIC BOUNDARY —
Session guidance, memory, environment info, language, output style, MCP instructions...

Cache Boundary Critical Optimization

The system prompt is split at __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__:

Before Boundary (scope: global)

Static instructions, tool descriptions, safety rules. Cacheable across all users and organizations. Computed once, reused for every API call.

After Boundary (scope: session)

User context, CLAUDE.md files, git status, memory, MCP instructions. User/session-specific, not cached cross-org.

Section Caching

Dynamic sections use two caching strategies:

systemPromptSection() — computed once, cached until /clear or /compact
DANGEROUS_uncachedSystemPromptSection() — recomputed every turn, breaks prompt cache on changes. Used for MCP instructions (servers connect/disconnect between turns)

Context Injection

Two context objects are injected alongside the system prompt:

System Context

Git status — branch, main branch, status (max 2,000 chars), recent commits, git user
Cache breaker — ANT-only ephemeral injection for debugging

User Context

CLAUDE.md files — auto-discovered via directory walk, filtered to remove injected memory files
Current date — always included

System context is prepended, user context is appended. Both are memoized at module load and cached for the session.

Memory System memdir/

Persistent memory stored in ~/.claude/projects/<slug>/memory/:

Constant	Value
Max MEMORY.md lines	200
Max MEMORY.md bytes	25,000
Memory types	user, feedback, project, reference

Three Memory Modes

Daily-Log Mode (KAIROS)

Append-only daily logs at memory/logs/YYYY/MM/YYYY-MM-DD.md. No MEMORY.md content loading. Used in assistant/proactive mode.

Team Memory Mode (TEAMMEM)

Sync'd across teammates via permissionSync.ts. Secret-guarded: checkTeamMemSecrets() prevents committing credentials to shared memory.

Auto Memory (default)

Standard mode. Memory directory created idempotently once per session. Builds behavioral instructions + MEMORY.md content into system prompt. Auto-extracted from conversations via extractMemories service.

Sub-Agent System AgentTool

The AgentTool spawns isolated sub-agents that run their own query loops with restricted tool sets.

Agent Properties

Each agent gets a subagent_type, custom system prompt, and tool restrictions
Optional isolation: "worktree" for git worktree isolation
Auto-backgrounds after 120 seconds (GrowthBook-gated)
Background tasks disabled globally via CLAUDE_CODE_DISABLE_BACKGROUND_TASKS
Results arrive as <task-notification> XML blocks

Fork Subagent FORK_SUBAGENT gate

The most sophisticated agent pattern. When enabled, omitting subagent_type triggers an implicit fork of the current agent.

Cache Optimization

Fork children receive the parent's exact system prompt bytes to preserve prompt cache identity. The fork message contains:

The entire parent assistant message (all tool_use blocks, thinking, text)
Identical placeholder tool results for all sibling forks: "Fork started — processing in background"
Only the final text block differs per child (the directive)

This maximizes cache hits across siblings — the messages are identical until the final divergence point.

Fork Child Rules

You ARE a fork — don't spawn sub-agents
No conversation; use tools directly
Commit changes before reporting (include hash)
Minimal output: "Scope:", "Result:", "Key files:" format

Coordinator Mode

Gate: COORDINATOR_MODE + CLAUDE_CODE_COORDINATOR_MODE env var.

A single coordinator spawns multiple worker agents via the Agent tool. Workers run as background agents with restricted tool sets:

Simple Mode

Workers get: Bash, Read, Edit only

Full Mode

Workers get all tools except: TeamCreate, TeamDelete, SendMessage, StructuredOutput

Workers can share knowledge via a scratchpad directory (gated by tengu_scratch). The coordinator waits for actual results — no prediction.

Swarm Infrastructure

Full multi-process agent orchestration with multiple terminal backends:

Backend	File	How
In-Process	`inProcessRunner.ts`	Isolated agent in same process (53KB)
tmux	`backends/tmux`	Spawns panes in tmux session
iTerm2	`backends/iterm2`	Uses iTerm2 scripting API
Kitty	`backends/kitty`	Uses Kitty remote control

Permission bubbling: teammate agents can escalate permission requests up to the leader. Team files managed via teamHelpers.ts with task numbering.

Feature Flags ~50+ gates

Bun's feature() function enables build-time dead code elimination. Disabled features are completely stripped from the binary.

Gate	Controls
`PROACTIVE` / `KAIROS`	Autonomous assistant mode, daily logs, brief tool
`COORDINATOR_MODE`	Multi-worker orchestration
`FORK_SUBAGENT`	Implicit fork delegation
`VOICE_MODE`	Voice input command
`AGENT_TRIGGERS`	Cron-scheduled agents
`BRIDGE_MODE`	IDE extension communication
`CACHED_MICROCOMPACT`	Prompt cache-aware compaction
`TRANSCRIPT_CLASSIFIER`	Auto-mode transcript analysis
`BASH_CLASSIFIER`	Auto-mode bash safety check
`TOKEN_BUDGET`	Per-task token budget tracking
`CONTEXT_COLLAPSE`	Staged context summaries
`HISTORY_SNIP`	Old history removal
`REACTIVE_COMPACT`	On-error full compaction
`EXPERIMENTAL_SKILL_SEARCH`	Skill discovery
`TEAMMEM`	Team memory sync

ANT-Only Code

process.env.USER_TYPE === 'ant' is a build-time constant. Ant-only features are DCE'd from external builds:

Remote isolation mode for agents
agents-platform command
Internal telemetry, cache breaking injection
Numeric length anchors (token reduction hints)
Model codename references

Speculative Execution Hidden Feature

Claude Code can speculatively execute the next task in the background before you even ask.

How It Works

After completing a turn, a forked agent runs in an isolated overlay directory
The overlay allows write tools (Edit, Write) to modify copies of files
Read-only tools (Read, Glob, Grep) access the real filesystem
Max 20 turns, 100 messages per speculation
If the user accepts the suggestion, overlay files are copied to the real working directory
If rejected, the overlay is discarded

This is why Claude Code feels fast. It's pre-computing work speculatively so results appear instant when you accept a suggestion.

Undercover Mode ANT-Only

When Anthropic engineers use Claude Code on public repos, this auto-activates:

Strips model codenames (Tengu, Capybara, etc.) from commits and PRs
Strips internal repo names and Slack references
External repos are default — only an internal allowlist turns it off
No force-off — safety guarantee that codenames never leak
Controlled via CLAUDE_CODE_UNDERCOVER=1 or auto-detection

Cost Tracking cost-tracker.ts

Token usage and costs are tracked per-session and persisted across resumes:

Input, output, cache read, cache write tokens tracked per model
API duration, tool duration, wall duration tracked
Lines of code changed tracked
FPS metrics (terminal rendering performance) optionally captured
Advisor tool usage tracked recursively
Persisted to project config on session switch, restored on resume

Limits & Constants

Limit	Value	Where
MEMORY.md max lines	200	`memdir.ts`
MEMORY.md max bytes	25,000	`memdir.ts`
Git status max chars	2,000	`context.ts`
Max concurrent tools	10	`toolOrchestration.ts`
Max output token recovery turns	3	`query.ts`
Max compact retries	2	`compact.ts`
Post-compact restored files	5	`compact.ts`
Post-compact token budget	50,000	`compact.ts`
Post-compact per file max	5,000 tokens	`compact.ts`
Speculation max turns	20	`speculation.ts`
Speculation max messages	100	`speculation.ts`
FileEditTool max file size	1 GiB	`FileEditTool.ts`
BashTool security rules	102 KB	`bashSecurity.ts`
Auto-background agent timeout	120 seconds	`AgentTool.tsx`
Bash classifier timeout	2 seconds	`useCanUseTool.tsx`

Trust Gate

All hooks require workspace trust in interactive mode. This is a defense-in-depth measure:

function shouldSkipHookDueToTrust(): boolean {
  const isInteractive = !getIsNonInteractiveSession()
  if (!isInteractive) return false  // SDK mode: trust implicit
  const hasTrust = checkHasTrustDialogAccepted()
  return !hasTrust  // Interactive: all hooks require trust
}

This prevents a malicious .claude/settings.json in a cloned repo from executing arbitrary commands before the user accepts the trust dialog. Applied to ALL hook types with no exceptions.

Built from analysis of the Claude Code source snapshot (March 31, 2026). Source: instructkr/claude-code.