Agent Trace Format

Cryptographically signed traces for AI agent actions. Tamper-proof records for compliance and audit.

Signed proof of what every agent did

OutcomeProduce audit-ready evidence chains for every patch and every agent run.

MechanismEvery tool call, file edit, and reasoning step is cryptographically signed. Auditors verify independently.

ProofIETF-aligned trace format. Produces evidence for SOC 2, EU AI Act, Cyber Resilience Act, PCI DSS, and FedRAMP.

Trace structure

Each trace is a complete record of agent behavior. Contains: input, reasoning, actions, output, and cryptographic proof.

Compliance alignment

Signed traces provide evidence for SOC 2, PCI DSS, EU AI Act, and NIST CSF.

[VERIFIABLE VIBES]

Agent ──▶ Action ──▶ Structured Envelope ──▶ Transparency Registry
  │         │            │                          │
  │         │            │                          ▼
  │         │            │                  Transparency Log
  │         │            ▼            (tamper-evident)
  │         │       Signature +
  │         │       Timestamp
  │         ▼
  │    Observation
  │    (tool output)
  ▼
Decision
(next action)

Schema validated against vulnerability repair trajectories from Vulnerability-Agent-Bench.

What a trace looks like

// Sanitized trace excerpt

{

"@type": "AgentTrace",

"timestamp": "2026-02-07T14:23:01Z",

"actions": [

{ "tool": "write_file", "path": "src/fix.c", "lines": 23 },

{ "tool": "run_tests", "result": "pass", "exit_code": 0 },

{ "tool": "verify_patch", "cve": "CVE-2024-XXXX", "status": "resolved" }

"signature": "digitally-signed-attestation..."

}

Why traces matter

Unsigned logs can be spoofed. Verifiable traces create audit-ready evidence: what the agent did, when it did it, and which patch was verified. Without signed traces, there is no way to distinguish a legitimate agent action from a tampered record. This distinction matters when regulators or auditors ask for proof of automated changes.

Compliance-ready evidence for SOC 2 and EU AI Act requirements
Tamper-proof records that cannot be altered after the fact
Each trace links to the specific bug and test result

Trace fields

Which agent ran and when
Every tool call and output (file edits, commands, results)
Which bug was targeted, what fix was applied, and whether it passed
Digital signature proving the record is authentic

How traces connect to audits

Audit teams need traceable evidence of who changed what, when, and whether the fix passed. XOR traces are built on an open IETF standard and satisfy SOC 2 and ISO 27001 change control requirements. Regulatory auditors (SEC, CISA, etc.) increasingly require proof of what AI systems did in production. Traces are that proof, in a format that is cryptographically verified and tamper-evident.

SOC 2

Signed traces satisfy audit trail requirements for change control audits.

ISO 27001

Tamper-proof records of who (agent), what (fix), when (timestamp), and outcome (pass/fail).

Regulatory audits

Signed traces are the evidence regulators need when they ask for proof of what your AI agents did.

[NEXT STEPS]

1,920 traces and counting

Every test run in the XOR benchmark produces a signed audit log. See verified results from real-world testing.

See which agents fix real vulnerabilities →Standards compliance →How verification works →

FAQ

What does a signed agent trace contain?

Tool calls, file edits, reasoning steps, outcome (pass/fail/error), and cryptographic signature (COSE_Sign1).

How is the trace signed?

COSE_Sign1 standard per IETF RFC 9052. Traces are tamper-evident. Auditors can verify independently.

Can I export traces?

Yes. JSON format (human-readable), CBOR format (compact), or YAML (audit-log friendly). Export to your SIEM or compliance database.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.

Agent Configurations

15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.

Native CLIs vs wrapper CLIs: the 10-16pp performance gap

Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.

Cost vs performance: where agents sit on the Pareto frontier

15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.

See which agents produce fixes that work

128 vulnerabilities. 15 agents. 1,920 evaluations. Agents learn from every run.