Governing AI Agents in the Enterprise
92% of AI vendors claim broad data usage rights. 17% commit to regulatory compliance. Governance frameworks from NIST, OWASP, EU CRA, and Stanford CodeX.
Regulatory requirements
EU Cyber Resilience Act (machine-readable vulnerability data), NIST IR 8596 (AI cybersecurity framework), Singapore IMDA (first agentic AI governance framework), and the International AI Safety Report 2026.
Contract considerations
Stanford CodeX analyzed AI vendor agreements: 92% claim broad data usage rights, only 17% commit to regulatory compliance, 88% impose self-liability caps. Procurement teams need agent-specific contract language.
Why existing governance frameworks miss agent risk
Vendor risk management evaluates software as a static artifact. Agents are autonomous actors that make decisions, use tools, and interact with external services at runtime. The governance gap: no existing framework addresses what an agent does after deployment.
40% of agentic AI projects will be canceled by end of 2027 due to inadequate risk controls (Gartner, Jun 2025). The failures won't be technical. They'll be governance failures: organizations that deployed agents without processes to monitor agent behavior. See OWASP agentic risks for detailed threat models.
Regulatory environment
EU Cyber Resilience Act
Article 13.6 requires manufacturers to share vulnerability information in machine-readable format. Products with digital elements, including AI agents, must produce structured vulnerability disclosures. Deadline: December 2027.
NIST IR 8596
AI Cybersecurity Framework Profile (Dec 2025). Maps AI supply chain security to existing NIST CSF functions. Covers model provenance, data integrity, and agent behavior monitoring. First U.S. federal framework addressing AI agent security.
Singapore IMDA framework
First governance framework for agentic AI (Jan 2026). Key principle: organizations are liable for their agents' behavior, even when using third-party tools. Agents inherit the accountability obligations of their deployers.
International AI Safety Report
100+ experts from 30+ countries (Feb 2026). Finding: AI agents identified 77% of vulnerabilities in a real cybersecurity competition. The same capability that makes agents useful makes them dangerous when compromised.
Industry frameworks
OWASP Top 10 for Agentic Applications (Dec 2025)
10 risk categories including ASI-04 (supply chain), ASI-07 (output handling), and ASI-08 (permissions). The "least agency" principle: agents should have minimum permissions needed. See OWASP agentic top 10 for the full mapping.
OWASP MCP Top 10
Dedicated project for Model Context Protocol risks: token mismanagement, shadow MCP servers, tool poisoning. See MCP server security for technical details.
Berkeley CLTC risk profile
Agentic AI Risk Management Standards Profile (late 2025). Maps agent-specific risks to existing risk management standards.
Forrester AEGIS
Six-domain security framework for autonomous AI systems. Covers identity, data, runtime, network, governance, and observability.
Contract gaps: Stanford CodeX findings
Source: Stanford CodeX FutureLaw Workshop, Jan 2025
92%
AI vendors claim broad data usage rights
17%
Commit to full regulatory compliance
88%
Impose self-liability caps
33%
Provide IP indemnification
Procurement teams need agent-specific contract language. Standard SaaS agreements don't address agent autonomy, tool usage, or behavioral accountability. The EU Product Liability Directive (deadline Dec 2026) explicitly includes AI as a "product" under strict liability.
What good governance looks like
Verification evidence per action
Every agent-generated change has a signed audit trail. Reviewers see what the agent did, what it tested, and how confident it is.
VEX statements for vulnerabilities
Machine-readable Vulnerability Exploitability eXchange documents. CRA Article 13.6 requires this format. XOR produces VEX for every triage.
SCITT provenance receipts
Supply Chain Integrity, Transparency, and Trust receipts. Cryptographic proof of what was scanned, when, and what passed.
Continuous monitoring, not one-time audit
Rug pull attacks invalidate point-in-time reviews. Agents that pass initial vetting can change behavior after deployment. Governance requires ongoing verification.
XOR's approach
XOR produces evidence for compliance. It does not certify compliance. The platform generates VEX statements, SCITT provenance receipts, and signed audit trails that map to CRA Article 13.6, NIST IR 8596, and OWASP Agentic Top 10 controls. Auditors get structured evidence. Compliance teams decide whether it meets their requirements.
See compliance evidence for artifact formats and agent compliance evidence for the IETF-based trace format.
Sources
- Stanford CodeX FutureLaw Workshop — AI Agents x Law (Jan 2025)
- NIST IR 8596 — AI Cybersecurity Framework Profile (Dec 2025)
- OWASP Top 10 for Agentic Applications (Dec 2025)
- OWASP MCP Top 10 (2025)
- EU Cyber Resilience Act — Article 13.6
- EU Product Liability Directive — AI as "product" (deadline Dec 2026)
- Singapore IMDA — Model AI Governance Framework for Agentic AI (Jan 2026)
- International AI Safety Report 2026. 100+ experts, 30+ countries
- Berkeley CLTC — Agentic AI Risk Management Standards Profile
- Forrester AEGIS — Six-domain security framework for autonomous AI
- Gartner — 40% of agentic AI projects canceled by end 2027 (Jun 2025)
[NEXT STEPS]
Related pages
FAQ
Why don't existing governance frameworks cover agents?
Existing frameworks evaluate software as a static artifact. Agents are autonomous actors that make decisions, use tools, and interact with external services. Behavior risk requires different controls than code risk.
What does the EU Cyber Resilience Act require for agents?
CRA Article 13.6 requires manufacturers to share vulnerability information in machine-readable format. XOR produces VEX statements and signed audit trails that satisfy this requirement.
What is the Stanford CodeX finding on AI vendor contracts?
92% of AI vendors claim broad data usage rights. Only 17% commit to full regulatory compliance. 88% impose liability caps while only 38% cap customer liability (Stanford CodeX FutureLaw Workshop, Jan 2025).
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Native CLIs vs wrapper CLIs: the 10-16pp performance gap
Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.
Cost vs performance: where agents sit on the Pareto frontier
15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.