Cost Analysis
10 findings on what AI patching costs and whether it is worth buying. 1,920 evaluations analyzed.
Should you buy
Break-even against manual fixing ($150/hr) happens at 10 CVEs per year for the cheapest agent. Most teams patch more than that.
Which agent to buy
Four agents sit on the Pareto frontier: best trade-off between cost and pass rate. Nine agents are dominated - another agent is both cheaper and more accurate.
What AI vulnerability patching costs and whether it is worth it
9 findings from the benchmark economics analysis. 6 backed by measured token data. 1 rely on heuristic cost estimates with lower confidence.
These findings answer practical questions. What does an agent cost per CVE when you account for failures? Is it cheaper to use the most expensive agent or run multiple cheap agents? When does manual patching beat AI? The data comes from real API logs, measured in production conditions.
[KEY INSIGHT]
3-agent waterfall = 66.9% at $4.22/pass
The top 3 agents cover 66.9% of bugs. Agents 4 through 13 add only 7.4 percentage points more. Diminishing returns hit fast.
This matters because you do not need all 13 agents. Running 3 agents in a waterfall strategy (try the cheapest first, escalate on failure) fixes two-thirds of bugs at lower total cost than running the most expensive single agent. The drop-off after agent 3 is sharp.
Confidence distribution
Each finding rated by evidence quality. High = measured token data from API logs. Medium = partial measurements. Low = heuristic estimates only.
We distinguish between measured and estimated costs so you know which numbers are rock-solid and which ones need more data. Measured means we have actual API response logs showing token counts. Heuristic means we estimated from published rates and context window patterns. Both are published so you can see the methodology.
All 9 findings
Is the $4.85/fix Gemini cost real or artifact of turn_heuristic?
Verdict
UNCERTAIN — heuristic-only, no measured token data
Gemini cost of $4.85/pass uses turn_heuristic (0 sessions with token data). This assumes 1.96/eval based on agent-average turn counts. Only Claude agents have measured token data (114+122 sessions). The heuristic could over- or under-estimate by 2-5x depending on actual Gemini token usage patterns (...
Are cost/pass efficiency rankings stable if infra failures excluded?
Verdict
UNSTABLE — ranking order changes
Excluding 377 infra failures from denominators changes the cost/pass ranking. Agents with high infra failure rates (claude-claude-opus-4-5) benefit most from exclusion. Top-3 order: claude-claude-opus-4-5, claude-claude-opus-4-6, cursor-composer-1.5.
Optimal sequential dispatch strategy for 3-agent ensemble?
Verdict
Best 3-agent waterfall: claude-claude-opus-4-5 → claude-claude-opus-4-6 → gemini-gemini-3-pro-preview
Simulated all 504 possible 3-agent dispatch orderings across 136 samples. Best strategy: claude-claude-opus-4-5 → claude-claude-opus-4-6 → gemini-gemini-3-pro-preview achieves 66.9% pass rate at $4.22/pass ($2.83/sample avg). Cheapest-3 agents (claude-claude-opus-4-5, claude-claude-opus-4-6, cursor-...
Marginal value of 4th through 9th agent in ensemble?
Verdict
Top-3 cover 66.9%, agents 4-9 add only 7.4pp more
Adding agents in cost-efficiency order: first 3 agents cover 66.9% of solvable samples. Agents 4-9 collectively add 7.4pp coverage at rapidly increasing cost. The marginal value of each additional agent decreases sharply after the 3rd.
Break-even ROI for early stopping at turn 5-8 at 1000 CVEs/year?
Verdict
Turn-8 cutoff saves ~$4658/run, kills 125 passes
Analyzed 465 sessions. Median turns for passes: 5, for fails: 9. Early stopping at turn 8 saves 4607 turns (~$4658 compute) but kills 125 successful patches (45.1% false positive rate). At 1000 CVEs/year: ~$34248 annual savings.
At what infra failure rate does retry cost dominate agent selection?
Verdict
At current 21.8% rate, retries add ~$1885 waste/run
Current infra failure rate is 21.8% (377/1727). For expensive agents (OpenCode-Claude at $22.12/eval), each infra failure wastes $22.12. At >15% infra rate, retry cost for expensive agents exceeds the savings from not using a cheaper agent. For cheap agents (Claude-4.5 at $1.13/eval), infra rate wou...
Build Pareto frontier for production agent selection — which agents are dominated?
Verdict
4 Pareto-optimal agents, 9 dominated
Pareto-optimal agents: cursor-opus-4.6, codex-gpt-5.2, claude-claude-opus-4-6, claude-claude-opus-4-5. Dominated agents (should never be deployed): cursor-gpt-5.3-codex, opencode-gpt-5.2, codex-gpt-5.2-codex, opencode-claude-opus-4-6, cursor-gpt-5.2, cursor-composer-1.5, gemini-gemini-3-pro-preview,...
Design sequential waterfall dispatch protocol and model expected cost per fix
Verdict
Waterfall achieves 101/136 (74.3%) at $31.85/pass
Full 9-agent waterfall (cheapest first): 101/136 passes (74.3%) at $31.85/pass ($23.65/sample avg). Average 2.0 agents tried per resolved sample. The cheapest agent resolves most samples, with diminishing returns from escalation.
Break-even: automated patching vs manual developer fixes ($150/hr)?
Verdict
ALL agents are cost-effective vs manual ($600/fix). Best: 227.4x cheaper.
At $150/hr × 4hr = $600/manual fix, every agent is cheaper: from $2.64/fix (claude-claude-opus-4-5, 227.4x cheaper) to $76.54/fix (cursor-opus-4.6, 7.8x cheaper). Break-even: 10 CVEs/year justifies infrastructure for the cheapest agent.
Unlock full results
Enter your email to access the full methodology, per-sample analysis, and patch examples.
[NEXT STEPS]
Use these numbers
Cost rankings, ensemble economics, and Pareto analysis are all on the main economics page. Cost model sources and methodology are documented separately.
Explore more
- Validation process
- the 25 questions that produced these findings
FAQ
What does AI vulnerability patching cost?
$2.64 to $52 per verified fix, depending on agent and model. 227x cheaper than manual fixing at $150/hr.
How many agents do I need?
Three agents cover 66.9% of bugs. Agents 4 through 13 add only 7.4 percentage points more. Diminishing returns hit fast.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Agent Cost Economics
Fix vulnerabilities for $2.64–$52 with agents. 100x cheaper than incident response. Real cost data.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Validation Process
25 questions we ran against our own data before publishing. Challenges assumptions, explores implications, extends findings.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.