Skip to main content
[COST ANALYSIS]

Cost Analysis

10 findings on what AI patching costs and whether it is worth buying. 1,920 evaluations analyzed.

Should you buy

Break-even against manual fixing ($150/hr) happens at 10 CVEs per year for the cheapest agent. Most teams patch more than that.

Which agent to buy

Four agents sit on the Pareto frontier: best trade-off between cost and pass rate. Nine agents are dominated - another agent is both cheaper and more accurate.

9
Economics findings
6
High confidence
$2.64
Cheapest fix
227x
vs manual fixing

What AI vulnerability patching costs and whether it is worth it

9 findings from the benchmark economics analysis. 6 backed by measured token data. 1 rely on heuristic cost estimates with lower confidence.

These findings answer practical questions. What does an agent cost per CVE when you account for failures? Is it cheaper to use the most expensive agent or run multiple cheap agents? When does manual patching beat AI? The data comes from real API logs, measured in production conditions.

[KEY INSIGHT]

3-agent waterfall = 66.9% at $4.22/pass

The top 3 agents cover 66.9% of bugs. Agents 4 through 13 add only 7.4 percentage points more. Diminishing returns hit fast.

This matters because you do not need all 13 agents. Running 3 agents in a waterfall strategy (try the cheapest first, escalate on failure) fixes two-thirds of bugs at lower total cost than running the most expensive single agent. The drop-off after agent 3 is sharp.

Confidence distribution

Each finding rated by evidence quality. High = measured token data from API logs. Medium = partial measurements. Low = heuristic estimates only.

We distinguish between measured and estimated costs so you know which numbers are rock-solid and which ones need more data. Measured means we have actual API response logs showing token counts. Heuristic means we estimated from published rates and context window patterns. Both are published so you can see the methodology.

High
6
Medium
2
Low
1

All 9 findings

#1low

Is the $4.85/fix Gemini cost real or artifact of turn_heuristic?

Verdict

UNCERTAIN — heuristic-only, no measured token data

Gemini cost of $4.85/pass uses turn_heuristic (0 sessions with token data). This assumes 1.96/eval based on agent-average turn counts. Only Claude agents have measured token data (114+122 sessions). The heuristic could over- or under-estimate by 2-5x depending on actual Gemini token usage patterns (...

#2high

Are cost/pass efficiency rankings stable if infra failures excluded?

Verdict

UNSTABLE — ranking order changes

Excluding 377 infra failures from denominators changes the cost/pass ranking. Agents with high infra failure rates (claude-claude-opus-4-5) benefit most from exclusion. Top-3 order: claude-claude-opus-4-5, claude-claude-opus-4-6, cursor-composer-1.5.

#3high

Optimal sequential dispatch strategy for 3-agent ensemble?

Verdict

Best 3-agent waterfall: claude-claude-opus-4-5 → claude-claude-opus-4-6 → gemini-gemini-3-pro-preview

Simulated all 504 possible 3-agent dispatch orderings across 136 samples. Best strategy: claude-claude-opus-4-5 → claude-claude-opus-4-6 → gemini-gemini-3-pro-preview achieves 66.9% pass rate at $4.22/pass ($2.83/sample avg). Cheapest-3 agents (claude-claude-opus-4-5, claude-claude-opus-4-6, cursor-...

#4high

Marginal value of 4th through 9th agent in ensemble?

Verdict

Top-3 cover 66.9%, agents 4-9 add only 7.4pp more

Adding agents in cost-efficiency order: first 3 agents cover 66.9% of solvable samples. Agents 4-9 collectively add 7.4pp coverage at rapidly increasing cost. The marginal value of each additional agent decreases sharply after the 3rd.

#5medium

Break-even ROI for early stopping at turn 5-8 at 1000 CVEs/year?

Verdict

Turn-8 cutoff saves ~$4658/run, kills 125 passes

Analyzed 465 sessions. Median turns for passes: 5, for fails: 9. Early stopping at turn 8 saves 4607 turns (~$4658 compute) but kills 125 successful patches (45.1% false positive rate). At 1000 CVEs/year: ~$34248 annual savings.

#6medium

At what infra failure rate does retry cost dominate agent selection?

Verdict

At current 21.8% rate, retries add ~$1885 waste/run

Current infra failure rate is 21.8% (377/1727). For expensive agents (OpenCode-Claude at $22.12/eval), each infra failure wastes $22.12. At >15% infra rate, retry cost for expensive agents exceeds the savings from not using a cheaper agent. For cheap agents (Claude-4.5 at $1.13/eval), infra rate wou...

#7high

Build Pareto frontier for production agent selection — which agents are dominated?

Verdict

4 Pareto-optimal agents, 9 dominated

Pareto-optimal agents: cursor-opus-4.6, codex-gpt-5.2, claude-claude-opus-4-6, claude-claude-opus-4-5. Dominated agents (should never be deployed): cursor-gpt-5.3-codex, opencode-gpt-5.2, codex-gpt-5.2-codex, opencode-claude-opus-4-6, cursor-gpt-5.2, cursor-composer-1.5, gemini-gemini-3-pro-preview,...

#8high

Design sequential waterfall dispatch protocol and model expected cost per fix

Verdict

Waterfall achieves 101/136 (74.3%) at $31.85/pass

Full 9-agent waterfall (cheapest first): 101/136 passes (74.3%) at $31.85/pass ($23.65/sample avg). Average 2.0 agents tried per resolved sample. The cheapest agent resolves most samples, with diminishing returns from escalation.

#9high

Break-even: automated patching vs manual developer fixes ($150/hr)?

Verdict

ALL agents are cost-effective vs manual ($600/fix). Best: 227.4x cheaper.

At $150/hr × 4hr = $600/manual fix, every agent is cheaper: from $2.64/fix (claude-claude-opus-4-5, 227.4x cheaper) to $76.54/fix (cursor-opus-4.6, 7.8x cheaper). Break-even: 10 CVEs/year justifies infrastructure for the cheapest agent.

Unlock full results

Enter your email to access the full methodology, per-sample analysis, and patch examples.

[NEXT STEPS]

Use these numbers

Cost rankings, ensemble economics, and Pareto analysis are all on the main economics page. Cost model sources and methodology are documented separately.

Explore more

FAQ

What does AI vulnerability patching cost?

$2.64 to $52 per verified fix, depending on agent and model. 227x cheaper than manual fixing at $150/hr.

How many agents do I need?

Three agents cover 66.9% of bugs. Agents 4 through 13 add only 7.4 percentage points more. Diminishing returns hit fast.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.