Skip to main content
[COMPARISON]

Cost vs performance: where agents sit on the Pareto frontier

15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.

[PARETO ANALYSIS]

Pareto frontier: 4 agents dominate the rest

When plotting cost against accuracy, only 4 of 15 agents sit on the Pareto frontier. All others are dominated by at least one frontier agent: either lower cost at equal accuracy, or better accuracy at equal cost.

The frontier agents represent the efficient trade-offs:

AgentModelPass rateCost per fixFrontier?
Claude Opus 4.5Opus 4.545.7%$2.64Yes
Claude Opus 4.6Opus 4.661.6%$2.93Yes
Gemini 3.1 Pro3.1 Pro58.7%$3.92Yes
Codex GPT-5.2GPT-5.262.7%$5.30Yes

Everything else is economically dominated. OpenCode with Opus 4.6 costs $51.88 per fix at 47.5% accuracy. Claude Opus 4.6 achieves 61.6% at $2.93. That's 14.1 percentage points higher accuracy for 1.8% of the cost.

Below the frontier

The 11 non-frontier agents fall into two categories: same-accuracy-worse-cost or worse-accuracy-same-cost.

Cursor with Opus 4.6 (62.5% at $35.40) nearly matches Codex's accuracy (62.7% at $5.30) but costs 6.7 times more. The accuracy is there. The economics aren't.

OpenCode with GPT-5.2 (51.6% at $6.65) and Cursor with GPT-5.2 (51.6% at $6.26) both underperform Codex by 11.1 percentage points. If you need GPT-5.2, use the native Codex CLI, not the wrappers.

Gemini 3.0 (43.0% at $4.85) is dominated by Gemini 3.1 Pro (58.7% at $3.92). The 3.1 upgrade is 15.7 percentage points better at lower cost. The single largest model improvement in the benchmark.

Cursor with Composer 1.5 (45.2% at $3.93) is beaten by Claude Opus 4.5 (45.7% at $2.64) on accuracy with lower cost, and by Claude Opus 4.6 (61.6% at $2.93) by a wide margin.

The value zone

Everything on the frontier costs $2.64 to $5.30 per fix. Anything above $6 per fix is economically dominated by frontier agents.

With a $100 budget:

  • Claude Opus 4.5: fixes 37.9 CVEs
  • Claude Opus 4.6: fixes 34.1 CVEs
  • Gemini 3.1 Pro: fixes 25.5 CVEs
  • Codex GPT-5.2: fixes 18.9 CVEs
  • OpenCode Opus 4.6: fixes 1.9 CVEs

The cost differences determine throughput more than the accuracy differences. Claude Opus 4.6 (61.6% accurate) and Codex GPT-5.2 (62.7% accurate) are nearly identical on accuracy, but Claude CLI fixes more CVEs per dollar because the model is cheaper to call.

Beyond the frontier

No agent sits above the frontier. The highest-accuracy agent (Codex GPT-5.2 at 62.7%) is also on the frontier at $5.30 per fix. You cannot buy higher accuracy by spending more money.

This means the benchmark has hit a capability wall around 62.7%. None of the 15 configurations exceed that. Three come within 1 percentage point (Claude Opus 4.6 at 61.6%, Cursor Opus 4.6 at 62.5%), but none surpass the leader.

Model upgrades matter. The Gemini 3.0 to 3.1 jump shows 15.7pp improvement. The Claude Opus 4.5 to 4.6 improvement is harder to measure directly, but Claude Opus 4.6 (61.6%) clearly outperforms similar-era alternatives.

Decision framework

If accuracy is paramount: Use Codex GPT-5.2 (62.7% pass rate). It's the highest point on the frontier.

If cost matters equally: Use Claude Opus 4.6 (61.6% pass rate at $2.93 per fix). Only 1.1 percentage points behind the leader, but 1.8 times cheaper.

If budget is tight: Use Claude Opus 4.5 (45.7% pass rate at $2.64 per fix). Cheapest frontier option, lowest pass rate, but still beats all non-frontier agents on value.

If you're evaluating a new model: Check whether it sits on the frontier or below it. If below, it's dominated. No amount of tuning will change that until the underlying model improves.

If you need multi-model deployment: Run each model's native CLI. Don't use wrappers. The 10-16 percentage point gap is too large to justify for flexibility.

See Benchmark results | Native vs wrapper analysis | Economics analysis | Codex GPT-5.2 profile

FAQ

Which agent has the best cost-accuracy tradeoff?

Claude Opus 4.6 at $2.93/pass and 61.6% pass rate sits on the Pareto frontier. Gemini 3.1 Pro at $3.92/pass and 58.7% is the next-best option.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.