[STRATEGIES]

Agent Strategies

How different agents approach the same bug. Strategy matters as much as model capability.

Strategy clusters

K-means clustering on session features (turns, tool calls, file reads, edits, backtracking) reveals distinct behavioral patterns. Agents cluster by approach, not just by model.

What this means for agent selection

If an agent falls into a low-pass-rate cluster, its behavioral pattern may be the bottleneck - not its model intelligence. Strategy is sometimes more tunable than the model itself.

Behavioral clusters

973

Sessions analyzed

60%

Best cluster pass rate

Clustering features

How AI agents approach the same bug differently

K-means clustering on 10 session features (turns, tool calls, file reads, edits, backtracking) reveals 3 distinct behavioral patterns across 973 sessions. Some agents explore broadly. Others edit fast. The pattern predicts the outcome.

[KEY INSIGHT]

60% pass rate in the best cluster

The cluster with the highest pass rate shares specific behavioral traits. Pass rate correlates with approach - not just model capability.

Cluster composition

Each cluster's size, pass rate, and dominant agents. Larger clusters represent the most common behavioral strategy.

Cluster 060%

speed-runner

211 sessions

Top agents

claude-claude-opus-4-5106

claude-claude-opus-4-6105

Cluster 132%

explorer

25 sessions

Top agents

claude-claude-opus-4-617

claude-claude-opus-4-58

Cluster 255%

surgical-expert

737 sessions

Top agents

cursor-gpt-5.3-codex128

cursor-composer-1.5127

cursor-gpt-5.2127

cursor-opus-4.6126

codex-gpt-5.2-codex118

Unlock full results

Enter your email to access the full methodology, per-sample analysis, and patch examples.

FAQ

How do agents differ in their approach?

Some agents explore broadly (read many files, backtrack often). Others edit fast (fewer reads, targeted changes). The approach pattern predicts the outcome.

Can I configure agent strategy?

Often yes. System prompts, tool access, and memory settings influence agent behavior. The best-performing cluster shares specific traits: fewer file reads, targeted edits, less backtracking.

[RELATED TOPICS]

Patch verification

XOR writes a verifier for each vulnerability, then tests agent-generated patches against it. If the fix passes, it ships. If not, the failure feeds back into the agent harness.

Automated vulnerability patching

AI agents generate fixes for known CVEs. XOR verifies each fix and feeds outcomes back into the agent harness so future patches improve.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,664 evaluations.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,664 evaluations.

Agent Cost Economics

Fix vulnerabilities for $2.64–$52 with agents. 100x cheaper than incident response. Real cost data.

Agent Configurations

13 agent-model configurations evaluated on real CVEs. Compare Claude Code, Codex, Gemini CLI, Cursor, and OpenCode.

Benchmark Methodology

How CVE-Agent-Bench evaluates 13 coding agents on 128 real vulnerabilities. Deterministic, reproducible, open methodology.

Agent Environment Security

AI agents run with real permissions. XOR verifies tool configurations, sandbox boundaries, and credential exposure.

Security Economics for Agentic Patching

Security economics for agentic patching. ROI models backed by verified pass/fail data and business-impact triage.

Validation Process

25 questions we ran against our own data before publishing. Challenges assumptions, explores implications, extends findings.

Cost Analysis

10 findings on what AI patching costs and whether it is worth buying. 1,664 evaluations analyzed.

Bug Complexity

128 vulnerabilities scored by difficulty. Floor = every agent fixes it. Ceiling = no agent can.

Execution Metrics

Per-agent session data: turns, tool calls, tokens, and timing. See what happens inside an agent run.

Pricing Transparency

Every cost number has a source. Published pricing models, measurement methods, and provider rates.

Automated Vulnerability Patching and PR Review

Automated code review, fix generation, GitHub Actions hardening, safety checks, and learning feedback. One-click install on any GitHub repository.

Continuous Learning from Verified Agent Runs

A signed record of every agent run. See what the agent did, verify it independently, and feed the data back so agents improve.

Signed Compliance Evidence for AI Agents

A tamper-proof record of every AI agent action. Produces evidence for SOC 2, EU AI Act, PCI DSS, and more. Built on open standards so auditors verify independently.

Compliance Evidence and Standards Alignment

How XOR signed audit trails produce evidence for SOC 2, EU AI Act, PCI DSS, NIST, and other compliance frameworks.

See which agents produce fixes that work

128 CVEs. 13 agents. 1,664 evaluations. Agents learn from every run.