OpenCode Claude Opus 4.5 — CVE-Agent-Bench profile
36.8% pass rate at $40.13 per fix. Same Opus 4.5 via OpenCode wrapper. 136 evaluations.
Opencode Claude Opus 4.5
Claude Opus 4.5, deployed via the OpenCode wrapper, achieves 36.8% pass rate across 136 CVE patch evaluations. The agent produced 46 successful patches, 29 failures, 50 build errors, and 11 infrastructure timeouts. At $40.13 per successful patch, it is among the most expensive agents in the benchmark.
The 50 build failures represent a 36.8% build-failure rate. Highest in the benchmark. The cost per patch is 15.2 times higher than the same model deployed through Claude CLI.
Behavioral profile
This profile indicates a speed-runner agent that reaches conclusions quickly but with minimal exploration. Speed 100 means fewest conversation turns. Breadth 0 means minimal tool invocation. The agent attempts to generate patches in the shortest path, which often leads to build failures.
Despite the low accuracy score, precision is maxed at 100. When patches succeed, they contain actual substantive code changes, not padding or hallucinations.
Wrapper penalty analysis
The same Opus 4.5 model achieves 45.7% pass rate at $2.64 per patch when deployed through Claude CLI. Through the OpenCode wrapper, it achieves only 36.8% at $40.13 per patch.
This is a 15.2x cost multiplier for 8.9 percentage point accuracy loss. The OpenCode wrapper does not replicate the Claude CLI's tool calling optimization, environment setup, or error recovery.
The wrapper penalty is the dominant factor in this agent's poor performance. The model itself is competent. The deployment infrastructure is the bottleneck.
Build failure root causes
50 out of 136 evaluations end in build failures. This is the highest rate in the benchmark. Possible causes include:
- Repository environment setup mismatches (missing dependencies, version conflicts)
- Incomplete tool support in the OpenCode wrapper (fewer system commands available)
- Aggressive time limits that cut patch generation short
- Weaker error handling for compilation errors
The OpenCode wrapper's speed-first design may be terminating patch attempts before the agent completes dependency resolution or environment configuration.
Accuracy dimension at zero
The accuracy 0 score is the lowest in the benchmark. This reflects the combination of 29 actual failures (incorrect patches) plus 50 build failures that block validation. When the agent attempts to work quickly, it misses context that would prevent wrong patches.
Precision preserved
Despite accuracy 0, precision remains at 100. Every patch that compiles contains real code changes. The agent does not produce padding, explanations, or empty diffs. This precision is valuable in mixed portfolios. If you filter by compilation success, the remaining patches are substantive.
Multi-model advantage
The primary reason to use OpenCode despite benchmark disadvantage is multi-model support. OpenCode is a single CLI that works with multiple models (Claude, GPT, Gemini, Llama, etc.). If your workflow requires flexibility across model providers, OpenCode offers that integration.
For single-model use cases focused on Claude, the CLI delivers better performance at lower cost.
When OpenCode Opus 4.5 is appropriate
Use this agent only if:
- Multi-model flexibility is non-negotiable, and
- Build failures are acceptable or handleable in your environment, and
- You are not cost-sensitive
For most single-model workloads, Claude CLI Opus 4.5 is superior.
Upgrade path
If you are using OpenCode Opus 4.5 and want to improve, switch to Claude CLI Opus 4.5 (8.9pp improvement, 15x cost reduction) or switch to OpenCode Opus 4.6 (10.7pp improvement, same cost).
Learn more
Compare this agent against all evaluated agents in the benchmark. Read the Anthropic lab profile to understand Anthropic's model positioning. Explore wrapper impacts on performance across all three Claude deployment methods.
FAQ
Why does OpenCode Opus 4.5 underperform?
36.8% pass rate vs 45.7% via Claude CLI. OpenCode wrapper adds 8.9pp penalty. 46 passes, 29 fails, 50 build, 11 infra on 136 CVEs.
What is the cost per successful patch for OpenCode Claude Opus 4.5?
$40.13 per successful patch, 15.2 times more than Claude CLI Opus 4.5 ($2.64). The OpenCode wrapper adds overhead from API translation and environment setup. For most workloads, Claude CLI is strictly superior.
How does OpenCode Claude Opus 4.5 compare to other agents?
36.8% pass rate is below the 47.3% benchmark average. It has the highest build-failure rate in the benchmark: 50 out of 136 evaluations. The same model via Claude CLI achieves 8.9 percentage points higher accuracy at 15x lower cost. Use OpenCode only if multi-model CLI flexibility is mandatory.
Anthropic security research and patch equivalence validation
Claude Code 500+ zero-days, CyberGym 28.9% SOTA at $2/vuln, BaxBench 62% insecure patches, 1,992 independent evaluations.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Native CLIs vs wrapper CLIs: the 10-16pp performance gap
Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.
Cost vs performance: where agents sit on the Pareto frontier
15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.