Skip to main content
[AGENT]

OpenCode Claude Opus 4.6 — CVE-Agent-Bench profile

47.5% pass rate at $51.88 per fix. Most expensive per fix. Opus 4.6 via OpenCode. 136 evaluations.

[ANTHROPIC]

Opencode Claude Opus 4.6

47.5%
Pass Rate
$51.88
Cost per Pass
136
Total Evals
opencode
CLI
Outcome Distribution
Pass (58)
Fail (15)
Build (49)
Infra (14)

Claude Opus 4.6, deployed via the OpenCode wrapper, achieves 47.5% pass rate across 136 CVE patch evaluations. The agent produced 58 successful patches, 15 failures, 49 build errors, and 14 infrastructure timeouts. At $51.88 per successful patch, it is the most expensive agent in the benchmark.

This agent achieves the second-lowest actual failure rate (15 failures) among all benchmarked agents, but the highest per-patch cost and high build-failure count (49 of 136).

Behavioral profile

Agent personality radar chartAccuracySpeedEfficiencyPrecisionBreadthReliability

Speed 100 indicates the agent generates patches with minimal conversation turns. Efficiency 0 reflects the highest cost-per-patch in the benchmark. Precision 100 means every patch output contains substantive code changes. No hallucinations or padding.

The speed-focused approach explains the high build-failure count: the agent moves fast but skips environment verification steps.

Model upgrade effect

Upgrading from Opus 4.5 (36.8%) to 4.6 (47.5%) via OpenCode yields a 10.7 percentage point improvement. Build failures drop minimally (50 to 49). Actual failures drop substantially (29 to 15).

The model upgrade strengthens analytical reasoning but does not fix the wrapper's build-failure issue. The OpenCode environment setup remains problematic regardless of model version.

Comparison with CLI deployment

The same Opus 4.6 model achieves dramatically different results by deployment method:

Claude CLI: 61.6% pass rate at $2.93 per patch Cursor wrapper: 62.5% pass rate at $35.40 per patch OpenCode wrapper: 47.5% pass rate at $51.88 per patch

The OpenCode wrapper loses 14.1 percentage points compared to CLI while costing 17.7 times more per patch. The CLI deployment is strictly superior.

Lowest failure rate

Only 15 actual failures from 136 evaluations means that when this agent generates a patch that compiles, the patch is rarely wrong. The failure rate of 11.0% is the second-lowest in the benchmark.

This agent either succeeds, or hits build/infrastructure issues. It rarely produces incorrect patches. The bottleneck is execution environment compatibility, not reasoning quality.

Precision and reliability trade-off

Precision is maxed at 100 (all patches are substantive). Reliability is 55 (moderate environmental issues). This combination indicates a capable model hampered by wrapper infrastructure.

Build failure analysis

49 build failures from 136 evaluations is a 36.0% build-failure rate. The minimal improvement from Opus 4.5 (50 failures) suggests the root cause is not model reasoning but OpenCode environment setup.

Possible causes:

  • Missing dependencies in the container environment
  • Incomplete tool support for repository build systems
  • Time limits cutting patch generation short
  • Weaker error recovery from compilation failures

Cost analysis

At $51.88 per successful patch, this agent costs more than any alternative for equivalent capability. You could run Claude CLI Opus 4.6 (61.6% accuracy) 17.7 times for the cost of one OpenCode Opus 4.6 patch.

For scale operations, this cost is prohibitive unless multi-model flexibility is essential.

When OpenCode Opus 4.6 is appropriate

Use this agent only if all of the following are true:

  1. Multi-model support is a hard requirement
  2. Build failures are acceptable or recoverable in your workflow
  3. Per-patch cost is not a constraint
  4. You are not already using Claude CLI or Cursor

For any single-model strategy, the CLI is superior.

Upgrade paths

If you are using OpenCode Opus 4.6:

  • Switch to Claude CLI Opus 4.6 (61.6% vs 47.5%, 17.7x cheaper): recommended for single-model workloads
  • Switch to Cursor Opus 4.6 (62.5% accuracy, $35.40 per patch): if you need a different CLI
  • Stay with OpenCode and accept the cost: only if multi-model flexibility justifies the premium

Learn more

View the full benchmark results and agent leaderboard to see all 15 agents ranked by pass rate. Read the Anthropic lab profile for context on all Anthropic models. Compare wrapper impacts on cost and accuracy across Claude CLI, Cursor, and OpenCode.

FAQ

What is the cost/accuracy tradeoff for OpenCode Opus 4.6?

47.5% pass rate on 136 CVEs at $51.88 per fix. Highest cost overall. 58 passes, 15 fails, 49 build, 14 infra.

What is the cost per successful patch for OpenCode Claude Opus 4.6?

$51.88 per successful patch, the highest cost in the entire benchmark. The same model costs $2.93 via Claude CLI (17.7x cheaper) and $35.40 via Cursor (1.5x cheaper). The OpenCode wrapper is the most expensive way to run any model.

How does OpenCode Claude Opus 4.6 compare to other agents?

47.5% pass rate is near the 47.3% benchmark average, but the cost is far above average. Only 15 actual failures from 136 evaluations gives it the second-lowest failure rate. The bottleneck is 49 build failures, not model reasoning. Claude CLI Opus 4.6 at 61.6% is strictly better for single-model deployments.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.