Skip to main content
[COMPARISON]

Native CLIs vs wrapper CLIs: the 10-16pp performance gap

Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.

[BENCHMARK COMPARISON]

Native CLIs consistently outperform wrappers

The benchmark reveals a systematic accuracy gap between native CLIs and wrapper CLIs running the same underlying model. Native configurations achieve 10-16 percentage point higher pass rates while costing 1.5 to 18 times less per fix.

Claude CLI with Opus 4.6 reaches 61.6% accuracy at $2.93 per fix. OpenCode with the same model hits 47.5% at $51.88. 14.1 percentage points lower while costing 17.7 times more. Cursor with Opus 4.6 achieves 62.5% but at $35.40 per fix, making it 12 times more expensive than Claude CLI despite nearly identical accuracy.

This pattern holds across all three major model families:

ConfigurationModelPass rateCost per fixGap from native
Claude CLIOpus 4.661.6%$2.93.
OpenCodeOpus 4.647.5%$51.88-14.1pp
CursorOpus 4.662.5%$35.40+0.9pp
CodexGPT-5.262.7%$5.30.
OpenCodeGPT-5.251.6%$6.65-11.1pp
CursorGPT-5.251.6%$6.26-11.1pp
Gemini CLI3.1 Pro58.7%$3.92.
OpenCode3.1 Pro54.9%$5.81-3.8pp

Why native CLIs win

Three factors explain the performance gap. First, native CLIs handle environment setup more reliably. Fewer build failures means more test runs actually execute the agent's patch. OpenCode and Cursor delegate build responsibility to the agent, which has to install dependencies, configure paths, and handle compiler flags. All steps that can fail.

Second, native CLIs integrate tools more efficiently. Claude CLI gives direct file manipulation, testing, and execution without abstraction layers. Wrappers like OpenCode and Cursor route agent actions through intermediate interfaces, introducing latency and communication overhead that slows decision cycles.

Third, native CLIs use model-specific prompt engineering. The Claude team optimizes its CLI prompts for how Opus behaves. The Codex CLI is tuned for GPT-5.2's completion patterns. Wrappers use generic prompts designed to work with multiple models, which means they optimize for none.

The cost multiplier

The cost difference amplifies the accuracy gap. Claude CLI costs 18 times less than OpenCode for Opus 4.6. This isn't just pricing. It reflects infrastructure efficiency. Native CLIs bundle the model call with the CLI runtime, minimizing overhead. Wrappers add service fees, agent abstraction layers, and API mark-ups.

For an organization fixing 100 CVEs:

  • Claude CLI: 61.6 fixes at $180 total cost ($2.93 × 61.6)
  • OpenCode: 47.5 fixes at $2,464 total cost ($51.88 × 47.6)

Same model. The wrapper version costs 13.7 times more to achieve 14.1 percentage points lower accuracy.

Cursor's exception

Cursor with Opus 4.6 is the one wrapper that approaches native performance on accuracy (62.5% vs 61.6%). However, it costs $35.40 per fix compared to Claude CLI's $2.93. The accuracy is comparable; the efficiency is not.

For 100 CVEs:

  • Claude CLI: 61.6 fixes at $180
  • Cursor: 62.5 fixes at $2,212

Even when Cursor wins on accuracy, it loses on cost. If accuracy is your only metric, Cursor matches the native CLI. If cost matters at all, it doesn't.

Recommendations

Use the native CLI for your chosen model. If you deploy Claude exclusively, use Claude CLI. If you standardize on GPT-5.2, use Codex. If Gemini is your platform, use Gemini CLI.

If you need multi-model support and can't justify three separate native tools, OpenCode with Gemini 3.1 offers the best wrapper compromise: 54.9% pass rate at $5.81 per fix. It's 3.8 percentage points below the native Gemini CLI but cheaper than Cursor and ahead of OpenCode's other configurations.

The native-vs-wrapper gap exists because CLIs are built for one model. Wrappers are built for all models. One beats the other.

See Benchmark results | Economics analysis | Claude Opus 4.6 profile | Codex GPT-5.2 profile | Gemini 3.1 Pro profile

FAQ

How much does the CLI wrapper affect agent performance?

Native CLIs outperform wrapper CLIs by 10-16 percentage points on the same model. Claude Code achieves 61.6% vs OpenCode's 47.5% with identical Opus 4.6. The cost difference is even larger: $2.93 vs $51.88 per fix.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.