The Governance Gap Inside the World's Leading AI Agent
Anthropic's Claude Code source leak reveals the most sophisticated permission system in production agentic AI. It also reveals what's missing: proof.
What happened
You might assume the most sophisticated AI coding agent in production also has the most sophisticated governance. Risk classification on every action. ML-based auto-approval. Path traversal prevention. Permission modes from interactive prompts to full autonomy. Claude Code has all of it. What it does not have is a single durable record that any of it happened.
On March 31, 2026, Anthropic shipped Claude Code v2.1.88 to the npm registry with a 59.8MB source map file containing the entire original TypeScript codebase. Over 512,000 lines across roughly 1,900 files. The tool system, permission model, system prompts, multi-agent orchestration logic, and unreleased feature roadmap are now permanently public. VentureBeat and others have covered the incident itself.
This post covers what the leaked code reveals about the state of agentic governance. I'll walk through what Anthropic built, where the proof layer is missing, how the hooks architecture creates a literal integration surface for external governance, and why multi-agent swarms and autonomous daemon modes make this gap existential.
The engineering is genuinely impressive. Claude Code is not a wrapper around an API. It is a production-grade agent harness with deeply considered architecture. That's what makes the governance gap so instructive. If the best-resourced AI lab in the world ships enforcement without proof, the pattern is structural, not accidental.
The permission system: what they built
Claude Code implements a permission gate on every tool invocation. There are roughly 40 tools, each defined as a self-contained module with its own input schema, permission model, and execution logic. Every action routes through a toolPermission hook that checks the configured permission mode and either prompts the user, auto-resolves via an ML-based classifier, or blocks outright.
The risk classification system scores every tool action as LOW, MEDIUM, or HIGH. Protected files (.gitconfig, .bashrc, .zshrc, .mcp.json) are guarded from automatic editing. Path traversal prevention handles URL-encoded attacks, Unicode normalization, backslash injection, and case-insensitive path manipulation. A separate LLM call generates human-readable explanations of tool risks before the user approves.
The permission modes range from default (interactive prompts), to auto (ML classifier decides), to plan (read-only analysis before edits), to bypassPermissions (full autonomous access). There's also a hooks system that supports external policy enforcement via HTTP endpoints.
This is real governance engineering. And none of it produces a durable record.
The gap: enforcement without proof
The permission checks are ephemeral. They fire at runtime, make a decision, and the execution continues. Nothing is signed. There is no artifact that captures which policy was in effect, what the tool was attempting, or what the decision was. The governance is self-attested by the system that performed it.
For a developer sitting at their terminal, this might be fine. They saw the prompt. They clicked approve. They trust the tool.
For the enterprise deploying agentic AI at scale, it is not fine. When the White House AI framework shifted downstream liability to deployers, it created a class of buyer that needs to prove governance was applied, not just assert it. Regulators, auditors, and counterparties need evidence. An ephemeral runtime check is not evidence.
The integration surface already exists
Claude Code's hooks architecture defines lifecycle events that external systems can intercept. PreToolUse fires before tool execution. PostToolUse fires after. PermissionRequest fires when a permission dialog is about to be shown. HTTP hooks can return structured JSON decisions that allow, deny, or escalate tool calls.
This is a literal integration surface for external governance. A Sanna endpoint sitting behind a PreToolUse HTTP hook would evaluate every tool invocation against the deployer's constitution, return an allow/deny decision, and produce a signed Governance Receipt, all before the tool executes.
{
"hooks": {
"PreToolUse": [
{
"type": "http",
"url": "https://api.sanna.cloud/v1/evaluate",
"timeout_ms": 5000
}
]
}
}
{
"decision": "allow",
"receipt": {
"id": "rcpt_a1b2c3d4",
"constitution_hash": "sha256:9f86d08...",
"action": "BashTool",
"input_hash": "sha256:e3b0c44...",
"decision": "allow",
"timestamp": "2026-03-31T18:42:07Z",
"signature": "ed25519:b5bb9d80..."
}
}
The receipt is the artifact that closes the gap. It binds the constitution hash (which specific policy was in effect), the action (what the tool was attempting), the decision (allow or deny), and the timestamp into a single Ed25519-signed object. Anyone can verify it after the fact without trusting Sanna, the agent, or the deployer. The receipt is self-proving.
Where it gets harder: swarms and daemons
The leaked code reveals two capabilities that make the governance gap significantly worse.
Multi-agent orchestration. Claude Code can spawn sub-agents ("swarms") through an AgentTool, with a coordinator module handling orchestration. Each sub-agent runs in its own context with specific tool permissions. The ULTRAPLAN feature offloads complex planning to a remote container running for up to 30 minutes. When a primary agent delegates to sub-agents, who governs the swarm? The code shows that sub-agents inherit the parent's permission mode. But inheritance is not proof. There is no chain-of-custody across the orchestration boundary.
KAIROS (autonomous daemon mode). Gated behind feature flags but fully built, KAIROS is an always-on background agent that doesn't wait for user input. It observes, logs, and proactively acts. A background process called autoDream performs "memory consolidation" during idle periods, merging observations, removing contradictions, and converting tentative insights into verified facts. An agent that acts without a human in the loop breaks the permission-prompt model entirely. There is no one to click "approve." Governance must be policy-as-code, evaluated automatically, with durable proof that evaluation happened.
Sanna's receipt architecture was designed from day one to support cross-agent trust. A receipt chain that binds constitution-hash to action across agent boundaries provides the chain-of-custody that inheritance alone cannot. When Agent A delegates to Agent B, and Agent B invokes a tool, the receipt proves which constitution governed that action, regardless of which agent executed it. This is not a Phase 5 aspiration. It's baked into the receipt primitive.
Direct comparison
| Dimension | Claude Code permissions | Sanna governed capability |
|---|---|---|
| Pre-execution enforcement | Yes. PreToolUse hook with risk classification. | Yes. Constitution evaluation before execution. |
| Durable proof of governance | No. Decision is ephemeral. | Yes. Ed25519-signed Governance Receipt. |
| Offline verification | No. Must trust the system's assertion. | Yes. Receipt verifiable without network access. |
| Constitution binding | Partial. Permission mode is set per-session. | Yes. Constitution hash bound to each receipt. |
| Cross-agent chain of custody | No. Sub-agents inherit mode, no proof chain. | Yes. Receipt chains across agent boundaries. |
| Autonomous / daemon mode support | No. Permission model assumes human in loop. | Yes. Policy-as-code with automatic receipt generation. |
| Third-party auditability | No. No exportable governance artifact. | Yes. Receipts are portable, tamper-evident records. |
The structural argument
This is not a gap unique to Claude Code. It is the industry default. Every major agentic framework, whether from Anthropic, OpenAI, Google, or the open-source ecosystem, implements some form of runtime permission checking. None of them produce durable, independently verifiable proof that governance was applied at the moment of action.
The reason is structural. Model providers cannot credibly self-govern. An agent asserting that it checked its own permissions is a self-referential claim. The proof must come from an independent layer that is open-source, cryptographically verifiable, and not controlled by the same entity that benefits from the agent's execution.
This is what Sanna provides. An open-source governance protocol (Apache 2.0) built on two equal pillars. Constitution enforcement blocks out-of-scope actions before execution. Ed25519-signed Governance Receipts produce tamper-evident, offline-verifiable proof that governance was applied at the moment of action. Neither pillar works without the other.
The hooks architecture that Claude Code already ships confirms the integration pattern. PreToolUse with HTTP endpoints returning structured decisions is the natural insertion point for an external governance layer. The same pattern applies to any agentic framework that implements lifecycle hooks. Sanna sits at this point across all of them.
What this means for deployers
If you are deploying agentic AI in an enterprise context, the Claude Code source leak just showed you the ceiling of what runtime permission systems provide. The agent checks permissions, makes a decision, executes the tool, and then the evidence is gone.
As agents move from interactive tools to autonomous systems (multi-agent orchestration, background daemons, proactive action without user prompts), the governance gap doesn't shrink. It widens. Every new capability that reduces human oversight increases the need for a durable, verifiable governance layer.
Liability sits with the deployer now. The receipt is how they prove they governed.
Build with verifiable governance
Sanna is open-source trust infrastructure for the agentic economy. We're working with design partners to integrate governance receipts into production agent deployments.
Become a design partner