[AGENT]

OpenCode Claude Opus 4.5 — CVE-Agent-Bench profile

36.8% pass rate at $40.13 per fix. Same Opus 4.5 via OpenCode wrapper. 128 evaluations.

[ANTHROPIC]

Opencode Claude Opus 4.5

36.8%

Pass Rate

$40.13

Cost per Pass

136

Total Evals

opencode

CLI

Outcome Distribution

Pass (46)

Fail (29)

Build (50)

Infra (11)

Claude Opus 4.5, deployed via the OpenCode wrapper, achieves 36.8% pass rate across 128 CVE patch evaluations. The agent produced 46 successful patches, 29 failures, 50 build errors, and 11 infrastructure timeouts. At $40.13 per successful patch, it is among the most expensive agents in the benchmark.

The 50 build failures represent a 36.8% build-failure rate. Highest in the benchmark. The cost per patch is 15.2 times higher than the same model deployed through Claude CLI.

Behavioral profile

This profile indicates a speed-runner agent that reaches conclusions quickly but with minimal exploration. Speed 100 means fewest conversation turns. Breadth 0 means minimal tool invocation. The agent attempts to generate patches in the shortest path, which often leads to build failures.

Despite the low accuracy score, precision is maxed at 100. When patches succeed, they contain actual substantive code changes, not padding or hallucinations.

Wrapper penalty analysis

The same Opus 4.5 model achieves 45.7% pass rate at $2.64 per patch when deployed through Claude CLI. Through the OpenCode wrapper, it achieves only 36.8% at $40.13 per patch.

This is a 15.2x cost multiplier for 8.9 percentage point accuracy loss. The OpenCode wrapper does not replicate the Claude CLI's tool calling optimization, environment setup, or error recovery.

The wrapper penalty is the dominant factor in this agent's poor performance. The model itself is competent. The deployment infrastructure is the bottleneck.

Build failure root causes

50 out of 128 evaluations end in build failures. This is the highest rate in the benchmark. Possible causes include:

Repository environment setup mismatches (missing dependencies, version conflicts)
Incomplete tool support in the OpenCode wrapper (fewer system commands available)
Aggressive time limits that cut patch generation short
Weaker error handling for compilation errors

The OpenCode wrapper's speed-first design may be terminating patch attempts before the agent completes dependency resolution or environment configuration.

Accuracy dimension at zero

The accuracy 0 score is the lowest in the benchmark. This reflects the combination of 29 actual failures (incorrect patches) plus 50 build failures that block validation. When the agent attempts to work quickly, it misses context that would prevent wrong patches.

Precision preserved

Despite accuracy 0, precision remains at 100. Every patch that compiles contains real code changes. The agent does not produce padding, explanations, or empty diffs. This precision is valuable in mixed portfolios. If you filter by compilation success, the remaining patches are substantive.

Multi-model advantage

The primary reason to use OpenCode despite benchmark disadvantage is multi-model support. OpenCode is a single CLI that works with multiple models (Claude, GPT, Gemini, Llama, etc.). If your workflow requires flexibility across model providers, OpenCode offers that integration.

For single-model use cases focused on Claude, the CLI delivers better performance at lower cost.

When OpenCode Opus 4.5 is appropriate

Use this agent only if:

Multi-model flexibility is non-negotiable, and
Build failures are acceptable or handleable in your environment, and
You are not cost-sensitive

For most single-model workloads, Claude CLI Opus 4.5 is superior.

Upgrade path

If you are using OpenCode Opus 4.5 and want to improve, switch to Claude CLI Opus 4.5 (8.9pp improvement, 15x cost reduction) or switch to OpenCode Opus 4.6 (10.7pp improvement, same cost).

Learn more

Compare this agent against all evaluated agents in the benchmark. Read the Anthropic lab profile to understand Anthropic's model positioning. Explore wrapper impacts on performance across all three Claude deployment methods.

FAQ

Why does OpenCode Opus 4.5 underperform?

36.8% pass rate vs 45.7% via Claude CLI. OpenCode wrapper adds 8.9pp penalty. 46 passes, 29 fails, 50 build, 11 infra on 128 CVEs.

What is the cost per successful patch for OpenCode Claude Opus 4.5?

$40.13 per successful patch, 15.2 times more than Claude CLI Opus 4.5 ($2.64). The OpenCode wrapper adds overhead from API translation and environment setup. For most workloads, Claude CLI is strictly superior.

How does OpenCode Claude Opus 4.5 compare to other agents?

36.8% pass rate is below the 47.3% benchmark average. It has the highest build-failure rate in the benchmark: 50 out of 128 evaluations. The same model via Claude CLI achieves 8.9 percentage points higher accuracy at 15x lower cost. Use OpenCode only if multi-model CLI flexibility is mandatory.

[RELATED TOPICS]

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.

Agent Configurations

15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.

Native CLIs vs wrapper CLIs: the 10-16pp performance gap

Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.

Cost vs performance: where agents sit on the Pareto frontier

15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.

Claude Opus 4.5 — CVE-Agent-Bench profile

45.7% pass rate at $2.64 per fix. Anthropic model via Claude Code CLI. 128 real CVEs evaluated.

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.