Agent Trace Format
Cryptographically signed traces for AI agent actions. Tamper-proof records for compliance and audit.
Signed proof of what every agent did
OutcomeProduce audit-ready evidence chains for every patch and every agent run.
MechanismEvery tool call, file edit, and reasoning step is cryptographically signed. Auditors verify independently.
ProofIETF-aligned trace format. Produces evidence for SOC 2, EU AI Act, Cyber Resilience Act, PCI DSS, and FedRAMP.
Trace structure
Each trace is a complete record of agent behavior. Contains: input, reasoning, actions, output, and cryptographic proof.
Compliance alignment
Signed traces provide evidence for SOC 2, PCI DSS, EU AI Act, and NIST CSF.
Agent ──▶ Action ──▶ Structured Envelope ──▶ Transparency Registry │ │ │ │ │ │ │ ▼ │ │ │ Transparency Log │ │ ▼ (tamper-evident) │ │ Signature + │ │ Timestamp │ ▼ │ Observation │ (tool output) ▼ Decision (next action) Schema validated against CVE repair trajectories from CVE-Agent-Bench.
What a trace looks like
// Sanitized trace excerpt
{
"@type": "AgentTrace",
"timestamp": "2026-02-07T14:23:01Z",
"actions": [
{ "tool": "write_file", "path": "src/fix.c", "lines": 23 },
{ "tool": "run_tests", "result": "pass", "exit_code": 0 },
{ "tool": "verify_patch", "cve": "CVE-2024-XXXX", "status": "resolved" }
],
"signature": "digitally-signed-attestation..."
}
Why traces matter
Unsigned logs can be spoofed. Verifiable traces create audit-ready evidence: what the agent did, when it did it, and which patch was verified. Without signed traces, there is no way to distinguish a legitimate agent action from a tampered record. This distinction matters when regulators or auditors ask for proof of automated changes.
- Compliance-ready evidence for SOC 2 and EU AI Act requirements
- Tamper-proof records that cannot be altered after the fact
- Each trace links to the specific bug and test result
Trace fields
- Which agent ran and when
- Every tool call and output (file edits, commands, results)
- Which bug was targeted, what fix was applied, and whether it passed
- Digital signature proving the record is authentic
How traces connect to audits
Audit teams need traceable evidence of who changed what, when, and whether the fix passed. XOR traces are built on an open IETF standard and satisfy SOC 2 and ISO 27001 change control requirements. Regulatory auditors (SEC, CISA, etc.) increasingly require proof of what AI systems did in production. Traces are that proof, in a format that is cryptographically verified and tamper-evident.
SOC 2
Signed traces satisfy audit trail requirements for change control audits.
ISO 27001
Tamper-proof records of who (agent), what (fix), when (timestamp), and outcome (pass/fail).
Regulatory audits
Signed traces are the evidence regulators need when they ask for proof of what your AI agents did.
[NEXT STEPS]
1,920 traces and counting
Every test run in the XOR benchmark produces a signed audit log. See verified results from real-world testing.
FAQ
What does a signed agent trace contain?
Tool calls, file edits, reasoning steps, outcome (pass/fail/error), and cryptographic signature (COSE_Sign1).
How is the trace signed?
COSE_Sign1 standard per IETF RFC 9052. Traces are tamper-evident. Auditors can verify independently.
Can I export traces?
Yes. JSON format (human-readable), CBOR format (compact), or YAML (audit-log friendly). Export to your SIEM or compliance database.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Native CLIs vs wrapper CLIs: the 10-16pp performance gap
Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.
Cost vs performance: where agents sit on the Pareto frontier
15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.