Claude Code Architecture

A deep dive into Anthropic's agentic CLI — the query loop, tool execution harness, hooks system, permission model, multi-agent coordination, and prompt caching strategy.

1,902 TypeScript files 512,685 lines of code Runtime: Bun UI: React + Ink

High-Level Architecture

Claude Code is a terminal-based agentic coding assistant. At its core, it's a loop: send messages to the Claude API, receive tool calls, execute them with permission checks and hooks, feed results back, and repeat.

Core Execution Path
Entry
REPL.tsx
screens/REPL.tsx
Orchestrator
QueryEngine
QueryEngine.ts
Core Loop
queryLoop()
query.ts
API Call
callModel()
services/api/
Dispatch
runTools()
toolOrchestration.ts
Execute
runToolUse()
toolExecution.ts

The entire system is built as async generators — each layer yields messages upstream for the UI to render in real-time. This is what makes tool progress, streaming text, and hook feedback all appear incrementally.

The Query Loop query.ts

The heart of Claude Code. queryLoop() is a while(true) loop that handles message preparation, API calls, tool execution, and error recovery across turns.

Each Iteration

1
Compaction
Apply tool result budgets, snip old history, microcompact, context collapse, autocompact
2
API Streaming
Call callModel() generator, receive text/thinking/tool_use blocks as they stream
3
Tool Dispatch
Partition tool calls into concurrent/serial batches, execute with permissions and hooks
4
Result Collection
Gather tool results, attachment messages, queued commands, task notifications
5
Recurse or Stop
If stop_reason === "tool_use", append results and loop. Otherwise, run stop hooks and return.
Key insight: The loop has 7 different continue sites — collapse drain retry, reactive compact retry, max output token escalation, max output token recovery, stop hook blocking, token budget continuation, and the normal next-turn continue. Each transitions the State object with tracking of why the continue happened.

Streaming & Fallback

The API call uses a generator (callModel()) that yields StreamEvent messages as they arrive from the Anthropic API. Text, thinking blocks, and tool_use blocks arrive incrementally.

Streaming Tool Executor

While the model is still generating, tool_use blocks that have completed streaming are immediately dispatched for execution via the StreamingToolExecutor. This means tools can finish before the model even stops generating.

// While model still streams, start executing completed tool_use blocks
if (streamingToolExecutor) {
  streamingToolExecutor.addTool(toolUseBlock)       // Feed in as they arrive
  yield* streamingToolExecutor.drainResults()     // Yield completed results
}

Fallback Model

If the primary model throws a FallbackTriggeredError (overloaded, rate limited), the loop catches it, clears partial state, switches to the fallback model, and retries from the same point. Thinking blocks are stripped from the retry to avoid signature mismatches between models.

Compaction Pipeline 5 Layers

Before each API call, the conversation is compressed through up to 5 stages to stay within context limits:

StageStrategyWhen
Tool Result Budget Limits per-message tool output size Always
History Snip Removes messages beyond a threshold HISTORY_SNIP gate
Microcompact Client-side tool call summarization (time-based decay) Always
Context Collapse Strategic staged summaries at breakpoints CONTEXT_COLLAPSE gate
Autocompact Full conversation summary when over token limit When approaching limit
Post-compact restoration: After a full compact, up to 5 recently-read files are re-injected (budget: 50K tokens total, 5K per file) so the model doesn't lose track of what it was working on.

Error Recovery

The query loop has multiple recovery strategies that fire in escalating order:

ErrorRecoveryMax Retries
Prompt too long 1. Drain staged context collapses
2. Reactive compact (full summary)
Until exhausted
Max output tokens 1. Escalate to 64K cap
2. Multi-turn recovery with nudge message
3 turns
Model overloaded Switch to fallback model, strip thinking blocks, retry 1
Abort (Ctrl+C) Synthetic tool_results for orphaned tool_use blocks, cleanup hooks

Tool Architecture Tool.ts

Every tool is built with buildTool() which applies fail-closed defaults:

Defaults (safe)
isConcurrencySafe: false
isReadOnly: false
isDestructive: false
isEnabled: true
Must Opt In
Tools must explicitly declare themselves safe for concurrency, read-only, or non-destructive. The default assumes the worst.

Tool Capabilities

PropertyPurpose
inputSchemaZod v4 schema for input validation
call()The actual execution function
checkPermissions()Tool-specific permission logic
preparePermissionMatcher()Normalizes input for permission rule matching
backfillObservableInput()Adds legacy fields before permission checks
maxResultSizeCharsLimit before result is persisted to disk
searchHint3-10 word description for ToolSearch matching
isDeferredSchema loaded lazily via ToolSearch

Tool Execution Lifecycle toolExecution.ts

Every tool call follows this exact sequence:

runToolUse(toolUseBlock)
  // 1. Find tool by name (with deprecated alias fallback)
  // 2. Validate input against Zod schema
  // 3. Run PreToolUse hooksexecutePreToolHooks(toolName, input)
      // Hooks can: approve, deny, modify input, inject context

  // 4. Check permissionsresolveHookPermissionDecision()
      // Combines hook decision + canUseTool() + rule-based checks
      // Priority: deny > ask > allow > passthrough

  // 5. Execute tool
  → tool.call(validatedInput, context)
      // Returns: { data, newMessages?, success }

  // 6. Run PostToolUse hooksrunPostToolUseHooks(toolName, input, output)
      // Hooks can: modify MCP output, inject context, stop continuation

  // 7. On error: run PostToolUseFailure hooks instead
Permission precedence: A PreToolUse hook returning "allow" does NOT bypass settings.json deny/ask rules. The hook permission and the rule-based permission are combined, with deny always winning.

Concurrency Model toolOrchestration.ts

When the model returns multiple tool_use blocks, they're partitioned into batches:

partitionToolCalls([Read, Read, Edit, Grep, Bash])
  → Batch 1: [Read, Read]         // concurrent (both read-only)
  → Batch 2: [Edit]               // serial (writes)
  → Batch 3: [Grep]               // concurrent (read-only)
  → Batch 4: [Bash]               // serial (not concurrency-safe)

Concurrent batches run up to 10 tools in parallel. Context modifiers from concurrent tools are queued and applied after the batch completes. Serial tools apply context modifiers immediately.

Streaming Tool Executor

The most interesting optimization: tools start executing while the model is still generating.

Timeline
API Stream
Model generates tool_use blocks...
Parallel
Completed blocks dispatched immediately
Result
Tool results ready before model stops

Hooks System 5,000+ lines

The hooks system is the extensibility layer of the harness. There are 25+ hook events across the entire lifecycle:

EventWhenCan Do
PreToolUseBefore tool executionBlock, modify input, set permission
PostToolUseAfter tool successModify MCP output, inject context
PostToolUseFailureAfter tool errorReact to failures
UserPromptSubmitUser sends messageTransform/block input
StopModel finishes turnBlock stop with feedback, rewake
SessionStartSession beginsSetup, inject context, set watch paths
SessionEndSession endsTeardown
SubagentStart/StopAgent spawningTrack sub-agents
PreCompact/PostCompactContext compressionReact to compaction
FileChangedFS watch firesReact to external changes
PermissionRequestPermission checkCustom permission logic

Hook Configuration

Hooks are configured in settings.json at user, project, or policy level:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "my-lint-check.sh",
            "if": "Bash(npm *)",
            "timeout": 10
          }
        ]
      }
    ]
  }
}

Four Hook Types

command
Spawns a shell process. Receives JSON on stdin, returns JSON on stdout. Supports async and asyncRewake for background execution. Shell can be bash or powershell.
prompt
Sends a prompt to an LLM for evaluation. Supports $ARGUMENTS placeholder. Can specify a different model (e.g. claude-sonnet-4-6).
http
POSTs JSON to an external URL. Headers can interpolate env vars via $VAR_NAME (requires allowedEnvVars list). Must return JSON.
agent
Spins up a full agentic verification loop with a prompt. Can specify model. Returns structured HookJSONOutput.

Matchers & Conditions

Each hook entry has a matcher (which tool to match) and optional if condition:

stdin/stdout Protocol Command Hooks

Command hooks communicate via JSON over stdin/stdout. Here's the exact protocol:

Input (stdin)

{
  "session_id": "uuid",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/working/directory",
  "permission_mode": "default",
  "hook_event_name": "PreToolUse",
  "tool_name": "Bash",
  "tool_input": { "command": "git status" },
  "tool_use_id": "toolu_abc123"
}

Output (stdout)

{
  "decision": "approve",               // or "block"
  "reason": "Looks safe",
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "allow",     // "allow" | "deny" | "ask"
    "updatedInput": { "command": "git status --short" },
    "additionalContext": "Reminder: this repo uses trunk-based development"
  }
}

Async Detection

A hook can background itself by emitting this as its first line of stdout:

{ "async": true, "asyncTimeout": 5000 }

The harness immediately returns an empty result and the hook continues running in the AsyncHookRegistry. If asyncRewake is set and the hook exits with code 2, it re-wakes the model with its output.

Exit Code Semantics

Exit CodeBehaviorWho Sees Output
0 Success stdout shown in transcript mode only (Ctrl+O)
2 Blocking error — halts tool execution stderr injected into conversation as system message (the model sees it)
1, 3+ Non-blocking error stderr shown to user only, not to model
Exit code 2 is the power feature. It lets a hook inject feedback directly into the model's context. Example: a linter hook that returns exit 2 with the lint errors — the model sees them and can fix the code.

Permission System 5 Rule Sources

Every tool invocation passes through a multi-layer permission check:

1
Policy Settings
Admin-enforced, managed remotely. Highest priority — cannot be overridden.
2
User Settings
~/.claude/settings.json — user's global preferences
3
Project Settings
.claude/settings.json — repo-level configuration
4
CLI Arguments
--allow / --deny flags on invocation
5
Session Rules
Transient rules set during the session via REPL commands

Permission Behaviors

Rule Syntax

// Allow all git commands
{ "tool": "Bash(git *)", "permission": "allow" }

// Deny writes to node_modules
{ "tool": "Write(node_modules/**)", "permission": "deny" }

// Ask before any destructive bash command
{ "tool": "Bash(rm *)", "permission": "ask" }

Rules use bash-style glob patterns. Tools implement preparePermissionMatcher() to normalize their input for pattern matching (e.g., FileEditTool matches against file paths, BashTool matches against commands).

Speculative Classifiers

In auto-mode, two classifiers run speculatively in the background while the permission prompt is shown to the user:

Bash Classifier
Trained on safe bash patterns. Runs with a 2-second timeout. If it classifies a command as safe before the user responds, auto-approves.
Transcript Classifier
Analyzes the full conversation transcript to determine if a tool call is safe. More powerful but slower than bash classifier.

Both are gated by feature flags (BASH_CLASSIFIER, TRANSCRIPT_CLASSIFIER). There's also denial tracking — after N auto-denials, the system falls back to interactive prompting.

System Prompt Assembly

The system prompt is built from a 5-level priority stack:

PrioritySourceBehavior
0 (highest) Override Replaces everything. Used by loop mode, testing.
1 Coordinator Multi-worker mode prompt. Feature-gated.
2 Agent Sub-agent definitions. In proactive mode: appended to default. Otherwise: replaces default.
3 Custom --system-prompt flag.
4 (lowest) Default The standard Claude Code prompt.

appendSystemPrompt is always added at the end (unless override is set).

Default Prompt Sections

The default system prompt is assembled from these sections in prompts.ts:

  1. Intro (model description, safety instructions)
  2. System section (tool execution, tags, compression)
  3. Doing tasks section (task approach guidance)
  4. Actions section (reversibility, blast radius)
  5. Using your tools section (tool-specific instructions)
  6. Tone and style section
  7. Output efficiency section
  8. — DYNAMIC BOUNDARY —
  9. Session guidance, memory, environment info, language, output style, MCP instructions...

Cache Boundary Critical Optimization

The system prompt is split at __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__:

Before Boundary (scope: global)
Static instructions, tool descriptions, safety rules. Cacheable across all users and organizations. Computed once, reused for every API call.
After Boundary (scope: session)
User context, CLAUDE.md files, git status, memory, MCP instructions. User/session-specific, not cached cross-org.

Section Caching

Dynamic sections use two caching strategies:

Context Injection

Two context objects are injected alongside the system prompt:

System Context

User Context

System context is prepended, user context is appended. Both are memoized at module load and cached for the session.

Memory System memdir/

Persistent memory stored in ~/.claude/projects/<slug>/memory/:

ConstantValue
Max MEMORY.md lines200
Max MEMORY.md bytes25,000
Memory typesuser, feedback, project, reference

Three Memory Modes

Daily-Log Mode (KAIROS)

Append-only daily logs at memory/logs/YYYY/MM/YYYY-MM-DD.md. No MEMORY.md content loading. Used in assistant/proactive mode.

Team Memory Mode (TEAMMEM)

Sync'd across teammates via permissionSync.ts. Secret-guarded: checkTeamMemSecrets() prevents committing credentials to shared memory.

Auto Memory (default)

Standard mode. Memory directory created idempotently once per session. Builds behavioral instructions + MEMORY.md content into system prompt. Auto-extracted from conversations via extractMemories service.

Sub-Agent System AgentTool

The AgentTool spawns isolated sub-agents that run their own query loops with restricted tool sets.

Agent Properties

Fork Subagent FORK_SUBAGENT gate

The most sophisticated agent pattern. When enabled, omitting subagent_type triggers an implicit fork of the current agent.

Cache Optimization

Fork children receive the parent's exact system prompt bytes to preserve prompt cache identity. The fork message contains:

This maximizes cache hits across siblings — the messages are identical until the final divergence point.

Fork Child Rules

  1. You ARE a fork — don't spawn sub-agents
  2. No conversation; use tools directly
  3. Commit changes before reporting (include hash)
  4. Minimal output: "Scope:", "Result:", "Key files:" format

Coordinator Mode

Gate: COORDINATOR_MODE + CLAUDE_CODE_COORDINATOR_MODE env var.

A single coordinator spawns multiple worker agents via the Agent tool. Workers run as background agents with restricted tool sets:

Simple Mode
Workers get: Bash, Read, Edit only
Full Mode
Workers get all tools except: TeamCreate, TeamDelete, SendMessage, StructuredOutput

Workers can share knowledge via a scratchpad directory (gated by tengu_scratch). The coordinator waits for actual results — no prediction.

Swarm Infrastructure

Full multi-process agent orchestration with multiple terminal backends:

BackendFileHow
In-ProcessinProcessRunner.tsIsolated agent in same process (53KB)
tmuxbackends/tmuxSpawns panes in tmux session
iTerm2backends/iterm2Uses iTerm2 scripting API
Kittybackends/kittyUses Kitty remote control

Permission bubbling: teammate agents can escalate permission requests up to the leader. Team files managed via teamHelpers.ts with task numbering.

Feature Flags ~50+ gates

Bun's feature() function enables build-time dead code elimination. Disabled features are completely stripped from the binary.

GateControls
PROACTIVE / KAIROSAutonomous assistant mode, daily logs, brief tool
COORDINATOR_MODEMulti-worker orchestration
FORK_SUBAGENTImplicit fork delegation
VOICE_MODEVoice input command
AGENT_TRIGGERSCron-scheduled agents
BRIDGE_MODEIDE extension communication
CACHED_MICROCOMPACTPrompt cache-aware compaction
TRANSCRIPT_CLASSIFIERAuto-mode transcript analysis
BASH_CLASSIFIERAuto-mode bash safety check
TOKEN_BUDGETPer-task token budget tracking
CONTEXT_COLLAPSEStaged context summaries
HISTORY_SNIPOld history removal
REACTIVE_COMPACTOn-error full compaction
EXPERIMENTAL_SKILL_SEARCHSkill discovery
TEAMMEMTeam memory sync

ANT-Only Code

process.env.USER_TYPE === 'ant' is a build-time constant. Ant-only features are DCE'd from external builds:

Speculative Execution Hidden Feature

Claude Code can speculatively execute the next task in the background before you even ask.

How It Works

  1. After completing a turn, a forked agent runs in an isolated overlay directory
  2. The overlay allows write tools (Edit, Write) to modify copies of files
  3. Read-only tools (Read, Glob, Grep) access the real filesystem
  4. Max 20 turns, 100 messages per speculation
  5. If the user accepts the suggestion, overlay files are copied to the real working directory
  6. If rejected, the overlay is discarded
This is why Claude Code feels fast. It's pre-computing work speculatively so results appear instant when you accept a suggestion.

Undercover Mode ANT-Only

When Anthropic engineers use Claude Code on public repos, this auto-activates:

Cost Tracking cost-tracker.ts

Token usage and costs are tracked per-session and persisted across resumes:

Limits & Constants

LimitValueWhere
MEMORY.md max lines200memdir.ts
MEMORY.md max bytes25,000memdir.ts
Git status max chars2,000context.ts
Max concurrent tools10toolOrchestration.ts
Max output token recovery turns3query.ts
Max compact retries2compact.ts
Post-compact restored files5compact.ts
Post-compact token budget50,000compact.ts
Post-compact per file max5,000 tokenscompact.ts
Speculation max turns20speculation.ts
Speculation max messages100speculation.ts
FileEditTool max file size1 GiBFileEditTool.ts
BashTool security rules102 KBbashSecurity.ts
Auto-background agent timeout120 secondsAgentTool.tsx
Bash classifier timeout2 secondsuseCanUseTool.tsx

Trust Gate

All hooks require workspace trust in interactive mode. This is a defense-in-depth measure:

function shouldSkipHookDueToTrust(): boolean {
  const isInteractive = !getIsNonInteractiveSession()
  if (!isInteractive) return false  // SDK mode: trust implicit
  const hasTrust = checkHasTrustDialogAccepted()
  return !hasTrust  // Interactive: all hooks require trust
}

This prevents a malicious .claude/settings.json in a cloned repo from executing arbitrary commands before the user accepts the trust dialog. Applied to ALL hook types with no exceptions.

Built from analysis of the Claude Code source snapshot (March 31, 2026). Source: instructkr/claude-code.