OpenCode Claude Opus 4.6 — CVE-Agent-Bench profile
47.5% pass rate at $51.88 per fix. Most expensive per fix. Opus 4.6 via OpenCode. 136 evaluations.
Opencode Claude Opus 4.6
Claude Opus 4.6, deployed via the OpenCode wrapper, achieves 47.5% pass rate across 136 CVE patch evaluations. The agent produced 58 successful patches, 15 failures, 49 build errors, and 14 infrastructure timeouts. At $51.88 per successful patch, it is the most expensive agent in the benchmark.
This agent achieves the second-lowest actual failure rate (15 failures) among all benchmarked agents, but the highest per-patch cost and high build-failure count (49 of 136).
Behavioral profile
Speed 100 indicates the agent generates patches with minimal conversation turns. Efficiency 0 reflects the highest cost-per-patch in the benchmark. Precision 100 means every patch output contains substantive code changes. No hallucinations or padding.
The speed-focused approach explains the high build-failure count: the agent moves fast but skips environment verification steps.
Model upgrade effect
Upgrading from Opus 4.5 (36.8%) to 4.6 (47.5%) via OpenCode yields a 10.7 percentage point improvement. Build failures drop minimally (50 to 49). Actual failures drop substantially (29 to 15).
The model upgrade strengthens analytical reasoning but does not fix the wrapper's build-failure issue. The OpenCode environment setup remains problematic regardless of model version.
Comparison with CLI deployment
The same Opus 4.6 model achieves dramatically different results by deployment method:
Claude CLI: 61.6% pass rate at $2.93 per patch Cursor wrapper: 62.5% pass rate at $35.40 per patch OpenCode wrapper: 47.5% pass rate at $51.88 per patch
The OpenCode wrapper loses 14.1 percentage points compared to CLI while costing 17.7 times more per patch. The CLI deployment is strictly superior.
Lowest failure rate
Only 15 actual failures from 136 evaluations means that when this agent generates a patch that compiles, the patch is rarely wrong. The failure rate of 11.0% is the second-lowest in the benchmark.
This agent either succeeds, or hits build/infrastructure issues. It rarely produces incorrect patches. The bottleneck is execution environment compatibility, not reasoning quality.
Precision and reliability trade-off
Precision is maxed at 100 (all patches are substantive). Reliability is 55 (moderate environmental issues). This combination indicates a capable model hampered by wrapper infrastructure.
Build failure analysis
49 build failures from 136 evaluations is a 36.0% build-failure rate. The minimal improvement from Opus 4.5 (50 failures) suggests the root cause is not model reasoning but OpenCode environment setup.
Possible causes:
- Missing dependencies in the container environment
- Incomplete tool support for repository build systems
- Time limits cutting patch generation short
- Weaker error recovery from compilation failures
Cost analysis
At $51.88 per successful patch, this agent costs more than any alternative for equivalent capability. You could run Claude CLI Opus 4.6 (61.6% accuracy) 17.7 times for the cost of one OpenCode Opus 4.6 patch.
For scale operations, this cost is prohibitive unless multi-model flexibility is essential.
When OpenCode Opus 4.6 is appropriate
Use this agent only if all of the following are true:
- Multi-model support is a hard requirement
- Build failures are acceptable or recoverable in your workflow
- Per-patch cost is not a constraint
- You are not already using Claude CLI or Cursor
For any single-model strategy, the CLI is superior.
Upgrade paths
If you are using OpenCode Opus 4.6:
- Switch to Claude CLI Opus 4.6 (61.6% vs 47.5%, 17.7x cheaper): recommended for single-model workloads
- Switch to Cursor Opus 4.6 (62.5% accuracy, $35.40 per patch): if you need a different CLI
- Stay with OpenCode and accept the cost: only if multi-model flexibility justifies the premium
Learn more
View the full benchmark results and agent leaderboard to see all 15 agents ranked by pass rate. Read the Anthropic lab profile for context on all Anthropic models. Compare wrapper impacts on cost and accuracy across Claude CLI, Cursor, and OpenCode.
FAQ
What is the cost/accuracy tradeoff for OpenCode Opus 4.6?
47.5% pass rate on 136 CVEs at $51.88 per fix. Highest cost overall. 58 passes, 15 fails, 49 build, 14 infra.
What is the cost per successful patch for OpenCode Claude Opus 4.6?
$51.88 per successful patch, the highest cost in the entire benchmark. The same model costs $2.93 via Claude CLI (17.7x cheaper) and $35.40 via Cursor (1.5x cheaper). The OpenCode wrapper is the most expensive way to run any model.
How does OpenCode Claude Opus 4.6 compare to other agents?
47.5% pass rate is near the 47.3% benchmark average, but the cost is far above average. Only 15 actual failures from 136 evaluations gives it the second-lowest failure rate. The bottleneck is 49 build failures, not model reasoning. Claude CLI Opus 4.6 at 61.6% is strictly better for single-model deployments.
Anthropic security research and patch equivalence validation
Claude Code 500+ zero-days, CyberGym 28.9% SOTA at $2/vuln, BaxBench 62% insecure patches, 1,992 independent evaluations.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Native CLIs vs wrapper CLIs: the 10-16pp performance gap
Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.
Cost vs performance: where agents sit on the Pareto frontier
15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.