The Governance Gap Inside the World's Leading AI Agent

Anthropic's Claude Code source leak reveals the most sophisticated permission system in production agentic AI. It also reveals what's missing: proof.

What happened

You might assume the most sophisticated AI coding agent in production also has the most sophisticated governance. Risk classification on every action. ML-based auto-approval. Path traversal prevention. Permission modes from interactive prompts to full autonomy. Claude Code has all of it. What it does not have is a single durable record that any of it happened.

On March 31, 2026, Anthropic shipped Claude Code v2.1.88 to the npm registry with a 59.8MB source map file containing the entire original TypeScript codebase. Over 512,000 lines across roughly 1,900 files. The tool system, permission model, system prompts, multi-agent orchestration logic, and unreleased feature roadmap are now permanently public. VentureBeat and others have covered the incident itself.

This post covers what the leaked code reveals about the state of agentic governance. I'll walk through what Anthropic built, where the proof layer is missing, how the hooks architecture creates a literal integration surface for external governance, and why multi-agent swarms and autonomous daemon modes make this gap existential.

The engineering is genuinely impressive. Claude Code is not a wrapper around an API. It is a production-grade agent harness with deeply considered architecture. That's what makes the governance gap so instructive. If the best-resourced AI lab in the world ships enforcement without proof, the pattern is structural, not accidental.

The permission system: what they built

Claude Code implements a permission gate on every tool invocation. There are roughly 40 tools, each defined as a self-contained module with its own input schema, permission model, and execution logic. Every action routes through a toolPermission hook that checks the configured permission mode and either prompts the user, auto-resolves via an ML-based classifier, or blocks outright.

The risk classification system scores every tool action as LOW, MEDIUM, or HIGH. Protected files (.gitconfig, .bashrc, .zshrc, .mcp.json) are guarded from automatic editing. Path traversal prevention handles URL-encoded attacks, Unicode normalization, backslash injection, and case-insensitive path manipulation. A separate LLM call generates human-readable explanations of tool risks before the user approves.

The permission modes range from default (interactive prompts), to auto (ML classifier decides), to plan (read-only analysis before edits), to bypassPermissions (full autonomous access). There's also a hooks system that supports external policy enforcement via HTTP endpoints.

This is real governance engineering. And none of it produces a durable record.

The gap: enforcement without proof

The permission checks are ephemeral. They fire at runtime, make a decision, and the execution continues. Nothing is signed. There is no artifact that captures which policy was in effect, what the tool was attempting, or what the decision was. The governance is self-attested by the system that performed it.

For a developer sitting at their terminal, this might be fine. They saw the prompt. They clicked approve. They trust the tool.

For the enterprise deploying agentic AI at scale, it is not fine. When the White House AI framework shifted downstream liability to deployers, it created a class of buyer that needs to prove governance was applied, not just assert it. Regulators, auditors, and counterparties need evidence. An ephemeral runtime check is not evidence.

Governance Flow: Enforcement Without Proof
Agent requests tool execution
PreToolUse hook fires
Risk classification: LOW / MEDIUM / HIGH
Permission mode resolves: approve / deny / prompt
⚠ No durable record produced below this line
Decision is ephemeral. No receipt. No signature.
Tool executes
PostToolUse fires, then discarded
Governance Flow: With Sanna
Agent requests tool execution
PreToolUse hook fires
Sanna constitution enforcement evaluates action
Ed25519-signed Governance Receipt produced
constitution hash + action + decision + timestamp
Decision returned: approve / deny
Tool executes (if approved)
Receipt stored: tamper-evident, offline-verifiable

The integration surface already exists

Claude Code's hooks architecture defines lifecycle events that external systems can intercept. PreToolUse fires before tool execution. PostToolUse fires after. PermissionRequest fires when a permission dialog is about to be shown. HTTP hooks can return structured JSON decisions that allow, deny, or escalate tool calls.

This is a literal integration surface for external governance. A Sanna endpoint sitting behind a PreToolUse HTTP hook would evaluate every tool invocation against the deployer's constitution, return an allow/deny decision, and produce a signed Governance Receipt, all before the tool executes.

.claude/settings.json JSON
{
  "hooks": {
    "PreToolUse": [
      {
        "type": "http",
        "url": "https://api.sanna.cloud/v1/evaluate",
        "timeout_ms": 5000
      }
    ]
  }
}
Sanna evaluation response JSON
{
  "decision": "allow",
  "receipt": {
    "id": "rcpt_a1b2c3d4",
    "constitution_hash": "sha256:9f86d08...",
    "action": "BashTool",
    "input_hash": "sha256:e3b0c44...",
    "decision": "allow",
    "timestamp": "2026-03-31T18:42:07Z",
    "signature": "ed25519:b5bb9d80..."
  }
}

The receipt is the artifact that closes the gap. It binds the constitution hash (which specific policy was in effect), the action (what the tool was attempting), the decision (allow or deny), and the timestamp into a single Ed25519-signed object. Anyone can verify it after the fact without trusting Sanna, the agent, or the deployer. The receipt is self-proving.

Where it gets harder: swarms and daemons

The leaked code reveals two capabilities that make the governance gap significantly worse.

Multi-agent orchestration. Claude Code can spawn sub-agents ("swarms") through an AgentTool, with a coordinator module handling orchestration. Each sub-agent runs in its own context with specific tool permissions. The ULTRAPLAN feature offloads complex planning to a remote container running for up to 30 minutes. When a primary agent delegates to sub-agents, who governs the swarm? The code shows that sub-agents inherit the parent's permission mode. But inheritance is not proof. There is no chain-of-custody across the orchestration boundary.

KAIROS (autonomous daemon mode). Gated behind feature flags but fully built, KAIROS is an always-on background agent that doesn't wait for user input. It observes, logs, and proactively acts. A background process called autoDream performs "memory consolidation" during idle periods, merging observations, removing contradictions, and converting tentative insights into verified facts. An agent that acts without a human in the loop breaks the permission-prompt model entirely. There is no one to click "approve." Governance must be policy-as-code, evaluated automatically, with durable proof that evaluation happened.

The permission-prompt model was designed for a human at a terminal. Multi-agent swarms and autonomous daemons are what's shipping next. The governance architecture needs to evolve ahead of the capability, not behind it.

Sanna's receipt architecture was designed from day one to support cross-agent trust. A receipt chain that binds constitution-hash to action across agent boundaries provides the chain-of-custody that inheritance alone cannot. When Agent A delegates to Agent B, and Agent B invokes a tool, the receipt proves which constitution governed that action, regardless of which agent executed it. This is not a Phase 5 aspiration. It's baked into the receipt primitive.

Direct comparison

Dimension Claude Code permissions Sanna governed capability
Pre-execution enforcement Yes. PreToolUse hook with risk classification. Yes. Constitution evaluation before execution.
Durable proof of governance No. Decision is ephemeral. Yes. Ed25519-signed Governance Receipt.
Offline verification No. Must trust the system's assertion. Yes. Receipt verifiable without network access.
Constitution binding Partial. Permission mode is set per-session. Yes. Constitution hash bound to each receipt.
Cross-agent chain of custody No. Sub-agents inherit mode, no proof chain. Yes. Receipt chains across agent boundaries.
Autonomous / daemon mode support No. Permission model assumes human in loop. Yes. Policy-as-code with automatic receipt generation.
Third-party auditability No. No exportable governance artifact. Yes. Receipts are portable, tamper-evident records.

The structural argument

This is not a gap unique to Claude Code. It is the industry default. Every major agentic framework, whether from Anthropic, OpenAI, Google, or the open-source ecosystem, implements some form of runtime permission checking. None of them produce durable, independently verifiable proof that governance was applied at the moment of action.

The reason is structural. Model providers cannot credibly self-govern. An agent asserting that it checked its own permissions is a self-referential claim. The proof must come from an independent layer that is open-source, cryptographically verifiable, and not controlled by the same entity that benefits from the agent's execution.

This is what Sanna provides. An open-source governance protocol (Apache 2.0) built on two equal pillars. Constitution enforcement blocks out-of-scope actions before execution. Ed25519-signed Governance Receipts produce tamper-evident, offline-verifiable proof that governance was applied at the moment of action. Neither pillar works without the other.

Enforcement that produces no proof is incomplete. And proof without enforcement is just audit theatre. Sanna provides both as a single integration at the point of action.

The hooks architecture that Claude Code already ships confirms the integration pattern. PreToolUse with HTTP endpoints returning structured decisions is the natural insertion point for an external governance layer. The same pattern applies to any agentic framework that implements lifecycle hooks. Sanna sits at this point across all of them.

What this means for deployers

If you are deploying agentic AI in an enterprise context, the Claude Code source leak just showed you the ceiling of what runtime permission systems provide. The agent checks permissions, makes a decision, executes the tool, and then the evidence is gone.

As agents move from interactive tools to autonomous systems (multi-agent orchestration, background daemons, proactive action without user prompts), the governance gap doesn't shrink. It widens. Every new capability that reduces human oversight increases the need for a durable, verifiable governance layer.

Liability sits with the deployer now. The receipt is how they prove they governed.

Build with verifiable governance

Sanna is open-source trust infrastructure for the agentic economy. We're working with design partners to integrate governance receipts into production agent deployments.

Become a design partner