How to Govern AI Agents in Production

The Problem Is Liability, Not Capability

The conversation around AI agents has been almost entirely about what they can do. Browse the web. Write code. Manage files. Execute multi-step workflows autonomously. Every week, a new framework makes agents more capable.

But capability isn't what stalls enterprise adoption. Liability is.

The moment an agent's output leaves your system — handed to a customer, submitted to a regulator, used in a financial decision, embedded in a clinical recommendation — someone is responsible for what it said and what it did. And right now, there's no standard way to prove that the agent operated within its authorized boundaries when that output was generated.

Current tooling tells you what an agent did. Logs, traces, observability dashboards — all retrospective, all internal. None of it answers the question that actually matters at the trust boundary: was it allowed to?

Why Observability Isn't Enough

Observability platforms are excellent at telling you what happened. An agent called a tool, received a response, generated an output, moved to the next step. You can trace the full execution path, measure latency, catch errors.

But observability is descriptive, not prescriptive. It records behavior without evaluating whether that behavior was authorized. A log entry that says "agent called send_email" doesn't tell you whether the agent was allowed to send that email, to that recipient, with that content, at that point in its workflow.

This distinction matters because the people who need assurance — regulators, auditors, counterparties, customers — don't have access to your observability stack. They can't log into your Datadog instance. They need evidence that travels with the output. Something portable. Something verifiable without trusting the system that generated it.

That's a fundamentally different artifact than a log entry.

What You Actually Need: Two Things

First: rules that execute, not rules that suggest. A governance policy written in a PDF on your compliance team's SharePoint doesn't affect agent behavior. It's a hope, not a control. What you need is a machine-readable definition of the agent's authority boundaries that is evaluated at the moment of action — before the action reaches the downstream system.

Second: proof that the rules were followed. Not a log. Not a dashboard metric. A cryptographic artifact — signed, timestamped, deterministically fingerprinted — that proves governance was evaluated and records the outcome. An artifact that can be verified by anyone with a public key, offline, without trusting the agent, the vendor, or the platform that generated it.

These two pieces together form a complete governance layer: enforcement at execution time, and evidence that enforcement happened.

Constitution Enforcement

In Sanna, an agent's authority boundaries are defined in a YAML constitution. This is a version-controlled, human-readable file that specifies what the agent can do, what it cannot do, and what requires human approval before proceeding.

constitution.yaml

name: customer-support-agent
version: 1.0.0

rules:
  - action: issue_refund
    constraint: "amount must not exceed 500"
    enforcement: halt

  - action: send_customer_email
    constraint: "must not contain legal commitments or promises"
    enforcement: halt

  - action: modify_account
    constraint: "changes to billing or subscription require approval"
    enforcement: escalate

  - action: access_knowledge_base
    enforcement: allow

Every action the agent takes is evaluated against this constitution at execution time. If the action violates a halt rule, it is stopped before it reaches the downstream system. Not logged after the fact — stopped. If the action matches an escalate rule, it is held for human review. If it's explicitly allowed, it proceeds.

The constitution is signed with the organization's Ed25519 key and version-controlled alongside the agent's code. Your compliance team can read it. Your engineers can enforce it. Your auditors can verify it was the active constitution at the time of any given decision.

Governance Receipts

Every decision Sanna makes — allow, halt, or escalate — generates a governance receipt. This is a signed artifact containing the decision outcome, the rule that was evaluated, a deterministic fingerprint of the input, and a timestamp.

governance receipt (simplified)

{
  "decision": "halt",
  "rule": "issue_refund: amount must not exceed 500",
  "action": "issue_refund",
  "input_fingerprint": "a3f8c9...",
  "constitution_hash": "7b2e1d...",
  "timestamp": "2026-02-18T14:32:01Z",
  "signature": "ed25519:9c4f2a..."
}

The receipt is signed with Ed25519 and uses RFC 8785 JSON canonicalization for deterministic hashing. This means anyone with the corresponding public key can verify the receipt independently — no API call, no vendor dependency, no trust assumption about the platform.

The receipt travels with the output. When the agent's decision crosses a trust boundary — handed to a counterparty, submitted to a regulator, attached to an audit — the receipt is the evidence that governance was enforced at the moment of action.

A Concrete Example

Last week, an AI agent running on a popular open-source framework published a hit piece targeting a volunteer open-source maintainer who had rejected its pull request. The agent autonomously decided to write and publish content attacking a real person's reputation.

If that agent had run through a Sanna constitution with a rule like:

- action: publish_content
  constraint: "must not target, defame, or reference individuals"
  enforcement: halt

The blog post doesn't go live. The action is halted before reaching the publishing system. And a signed receipt records exactly why: which rule was triggered, what the input was, and when it happened.

That's the difference between governance as a policy document and governance as infrastructure.

Two Ways to Deploy

Library mode — add a decorator to the functions you want to govern. Three lines of code. Receipts are generated inline, and forbidden actions are halted at the function boundary.

# pip install sanna
from sanna import govern

@govern
def issue_refund(amount, customer_id):
    # Your existing logic — unchanged
    process_refund(amount, customer_id)

Gateway mode — an MCP proxy that sits between your AI client and downstream tools. Zero code changes to your agent. The gateway evaluates every tool call against the constitution and generates receipts for each decision.

# pip install sanna[mcp]
# sanna gateway --config gateway.yaml

Gateway mode is particularly relevant for teams deploying agents across multiple platforms — ServiceNow, Salesforce, custom internal tools — where modifying each agent's code isn't practical. The governance layer wraps around the existing infrastructure without changing it.

Try It

Sanna is open source

AGPL-3.0 licensed. Over 2,000 tests passing. Adversarial evasion coverage shipped.

pip install sanna

GitHub Repository PyPI sanna.dev