Agent Cost Economics
Fix vulnerabilities for $2.64–$52 with agents. 100x cheaper than incident response. Real cost data.
Agent Cost Tiers
- Standard agents: $2.64–$15 per fix
- Advanced agents: $15–$45 per fix
- Frontier agents: $30–$52 per fix
Hidden Costs of Incident Response
When a CVE hits production, costs multiply: engineer time, customer notifications, reputation damage, regulatory fines. Fixing in pre-production saves orders of magnitude.
Cost Optimization
Use cheaper agents for easy bugs (syntax errors, refactors). Reserve frontier agents for hard architectural problems. XOR tracks which agent solves which classes of bugs best.
Prefer interactive charts? Open the Benchmark Explorer →
What it costs to fix a bug with AI
We spent $0 running 15 agents across 128 real bugs. The cheapest agent fixes bugs for $2.64 each. The most accurate costs $52/fix. Growing to 6,138+ vulnerabilities across 250+ projects.
Cost matters because you will run this repeatedly. If a bug costs $5 to patch and you run this across 500 vulnerabilities, your spend is $2,500. The same bugs with a $0.50 agent cost $250. These numbers drive real procurement decisions. We measure actual token consumption from API logs, not estimates.
Budget with real data
Security ROI = (risk reduced - cost) / cost. These tested cost-per-fix numbers replace guesswork in your budget.
See Agentic SecEcon →Cost vs Performance
Each dot is an agent. X-axis: cost per successful patch (log scale). Y-axis: pass rate. The dashed line shows the best trade-off - no agent below it is both cheaper and more accurate.
The scatter reveals agent clusters. Some agents cluster together at 40-50% pass rate with similar costs - they are functionally equivalent. But outliers exist: agents that are cheap but miss easy bugs, or expensive but uniquely capable on hard ones. Your choice depends on your budget and your bug distribution.
Pareto Frontier with Confidence Intervals
Cost efficiency frontier with 95% Wald confidence intervals on pass rates. The Pareto frontier identifies agents where no alternative is both cheaper AND more accurate. Every agent on this line represents a genuine trade-off decision: lower cost or higher accuracy, but not both.
Confidence intervals show the statistical range around each agent's pass rate. Wider intervals indicate greater uncertainty, typically from agents with more edge cases or lower sample counts. Agents on the frontier with tight confidence intervals are more reliable choices than those with wide bands.
Oracle Set Cover
Greedy set cover showing marginal value of adding each agent to the ensemble. This analysis answers a practical question: if you can run multiple agents on the same bug, which ones should you add to maximize coverage? Start with the best agent (highest pass rate) and add agents that fix bugs the leader misses.
The first agent covers maybe 60% of bugs. Adding the second agent might bring you to 72%. Adding a third might hit 78%. But at some point, the marginal gain from each new agent drops below the cost. This visualization helps you find the optimal ensemble size for your budget.
Cost Efficiency Rankings
| Rank | Agent | $/Pass | API Cost | Pass Rate | Passes |
|---|---|---|---|---|---|
| 1 | claude-claude-opus-4-5 | $2.64 | $153 | 45.7% | 58 |
| 2 | claude-claude-opus-4-6 | $2.93 | $225 | 61.6% | 77 |
| 3 | gemini31-gemini-3.1-pro-preview | $3.92 | $251 | 58.7% | 64 |
| 4 | cursor-composer-1.5 | $3.93 | $224 | 45.2% | 57 |
| 5 | gemini-gemini-3-pro-preview | $4.85 | $267 | 43.0% | 55 |
| 6 | codex-gpt-5.2 | $5.30 | $419 | 62.7% | 79 |
| 7 | opencode-gemini-gemini-3.1-pro-preview | $5.81 | $389 | 54.9% | 67 |
| 8 | cursor-gpt-5.3-codex | $6.16 | $394 | 50.4% | 64 |
| 9 | cursor-gpt-5.2 | $6.26 | $394 | 51.6% | 63 |
| 10 | codex-gpt-5.2-codex | $6.65 | $419 | 49.2% | 63 |
| 11 | opencode-gpt-5.2 | $6.65 | $419 | 51.6% | 63 |
| 12 | opencode-gpt-5.2-codex | $8.73 | $419 | 37.8% | 48 |
| 13 | cursor-opus-4.6 | $35.40 | $2832 | 62.5% | 80 |
| 14 | opencode-claude-opus-4-5 | $40.13 | $1846 | 36.8% | 46 |
| 15 | opencode-claude-opus-4-6 | $51.88 | $3009 | 47.5% | 58 |
Unlock full results
Enter your email to access the full methodology, per-sample analysis, and patch examples.
[NEXT STEPS]
Optimize your agent spend
The cheapest path: claude-claude-opus-4-5 at $2.64/fix. The most accurate: opencode-claude-opus-4-6 at $52/fix. For most teams, the best pair covers 96/128 bugs.
Explore more
- Agent leaderboard
- pass rates and rankings
- Agent profiles
- how agents differ, where they agree
- How verification works
- isolation, safety checks, bug reproduction
FAQ
How much does an agent fix cost?
$2.64 to $52 depending on agent and model. Calculated from real API costs across 1,920 evaluations.
Why such a wide range?
Different agents have different API costs (Claude vs Codex vs Gemini). Different bugs require different reasoning depth. Some agents solve in one attempt; others need multiple tries.
How does this compare to incident response?
Incident response for a critical CVE typically costs $10K–$50K in engineer time + downtime. Agent-based pre-production fixing costs dollars. 100x–1000x cheaper.
What if the agent fails?
Failed fixes still provide learning signals. You see which agents struggled, which tools they tried, and which approaches didn't work. No wasted money-just data.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Validation Process
25 questions we ran against our own data before publishing. Challenges assumptions, explores implications, extends findings.
Cost Analysis
10 findings on what AI patching costs and whether it is worth buying. 1,920 evaluations analyzed.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.