Cursor Composer 1.5 — CVE-Agent-Bench profile
45.2% pass rate at $3.93 per fix. Cursor's proprietary model. Lowest infra failure count.
Cursor Composer 1.5
Cursor Composer 1.5 is Cursor's proprietary model for agentic code generation. It is the only agent in CVE-Agent-Bench that does not rely on OpenAI, Google, or Anthropic's base models. Across 128 evaluations, Composer 1.5 achieved 45.2% pass rate with a cost of $3.93 per successful fix. The model passed 57 samples, failed 39, experienced 30 build failures, and encountered 2 infrastructure failures.
Proprietary model positioning
Cursor's decision to develop a proprietary model sets Composer 1.5 apart. The model is not a wrapper around GPT, Claude, or Gemini. It is trained on Cursor's code generation data and fine-tuned for the Cursor IDE's use cases. In CVE-Agent-Bench, Composer 1.5 is evaluated in the same way as all other agents: a generic CLI interface with no Cursor IDE integration.
The 45.2% pass rate positions Composer 1.5 slightly below the field average (47.3%) but competitive with first-generation models like Gemini 3 Pro (40.4%) and OpenCode variants. The cost of $3.93 per fix is reasonable, landing it in the lower-cost tier alongside Gemini 3.1 Pro ($3.92).
Speed-efficiency profile
Composer 1.5's behavioral signature shows efficiency as a key strength. Efficiency is 97, speed is 85, and precision is 100. The model generates patches quickly and uses tokens conservatively. This makes Composer 1.5 useful for rapid evaluation cycles where you need many quick passes rather than a few perfect ones.
Accuracy is 32, the sixth-lowest in the benchmark. This reflects the model's design bias: speed and cost over thoroughness. Reliability is 55, in the middle range. The model is not fragile, but it is not the most stable either.
Failure modes
The 39 fail count is the third-highest in the benchmark. These are cases where Composer 1.5 generates valid patches that build and run but do not resolve the CVE. The model produces output, the test environment can execute it, but the patch is incomplete.
This is the opposite pattern from many other agents. OpenCode agents tend to fail at build time (build failures outnumber fails). Composer 1.5 fails at the semantic level: the code compiles, but the vulnerability is not fixed. This suggests the model's issue is not environment compatibility but vulnerability reasoning.
Precision is 100, meaning the output is well-formed. The failure mode is logical: the model knows how to generate valid code but sometimes misunderstands what the CVE requires.
Infrastructure reliability
Two infrastructure failures out of 128 evals (1.6%) is the second-best in the benchmark. Only cursor-gpt-5.3-codex achieves lower (1 failure). Composer 1.5's runtime is stable. You can rely on the test environment to stay running.
This stability is valuable for production systems. If you deploy Cursor Composer 1.5 in your CVE patching pipeline, expect minimal operational interruptions from environment issues.
Build failure analysis
Build failures (30) account for 23.4% of evals. This is higher than the field median (20%) but not extreme. These cases are where the test environment cannot compile Composer 1.5's patch. The most common reasons are missing dependencies, incompatible language version assumptions, or syntax that requires a toolchain the container lacks.
Composer 1.5's 30 build failures, combined with 39 semantic fails and 2 infra failures, account for 71 of the 128 evals. The remaining 57 evals pass. This breakdown shows the model's primary challenge is vulnerability reasoning, not environment integration.
Model exclusivity
Composer 1.5 is only available through Cursor's API. You cannot use it via OpenAI, Google, or Anthropic endpoints. This means evaluating Cursor Composer in your own system requires integrating with Cursor's proprietary infrastructure.
For teams that already use Cursor for IDE-based development, integrating Composer 1.5 into your CVE patching system is straightforward. For teams that don't, it requires an additional API credential and library. This is a trade-off: proprietary models are not universally accessible.
Cost positioning
At $3.93 per fix, Composer 1.5 is cost-competitive. It is cheaper than most Claude variants and comparable to Gemini 3.1 Pro ($3.92). The 45.2% pass rate means you need more evaluations than higher-accuracy models to achieve the same fix volume, but per-evaluation cost is low.
For teams running many proofs-of-concept on many CVEs, Composer 1.5 is affordable. For teams running a few high-value evaluations, the accuracy matters more than cost.
Comparison to Cursor's other agent
Cursor also offers a GPT-5.2-based variant in the benchmark. That variant achieves 51.6% pass rate, compared to Composer 1.5's 45.2%. The GPT-5.2 variant is 6.4 percentage points more accurate. Cursor offers both options: use Composer 1.5 for rapid, cheap evaluation and GPT-5.2 for accuracy when it matters.
Ecosystem positioning
Composer 1.5 is Cursor's entry into the agentic CVE patching market. The benchmark results show it is competitive on cost but lags on accuracy. For Cursor, this is a reasonable position: the company's strength is the IDE and user experience, not the base model. Integrating better base models (like GPT-5.2) into Cursor is a natural upgrade path.
When to choose Composer 1.5
Use Cursor Composer 1.5 if you need low-cost CVE patch evaluation and you have Cursor infrastructure already deployed. The 45.2% pass rate is acceptable for reconnaissance: run it on 100 CVEs, get 45 patches, manually validate the promising ones.
Use Cursor's GPT-5.2 variant if you need higher accuracy. Use other agents if you need to minimize proprietary dependencies.
Summary
Cursor Composer 1.5 is a cost-effective proprietary model for CVE patching. The 45.2% pass rate at $3.93 per fix is reasonable for rapid evaluation. Infrastructure reliability is excellent (only 2 failures). Semantic failure (vulnerability reasoning) is the primary limitation. For teams already using Cursor, Composer 1.5 is a viable option. For teams evaluating models independently, GPT and Claude variants offer higher accuracy at comparable or lower cost.
Explore CVE-Agent-Bench results to compare Cursor Composer 1.5 against other agents. Review benchmark economics to calculate evaluation cost for your CVE volume. See Cursor's GPT-5.2 variant for a higher-accuracy alternative within Cursor's ecosystem.
FAQ
How reliable is Cursor Composer 1.5?
45.2% pass rate on 128 CVEs at $3.93 per fix. Lowest infrastructure failures (2 total). 57 passes, 39 fails, 30 build failures.
What is the cost per successful patch for Cursor Composer 1.5?
$3.93 per successful patch across 128 evaluations. Cost-competitive with Gemini 3.1 Pro ($3.92) despite a lower pass rate. For teams already using Cursor, Composer 1.5 is an affordable option for rapid CVE evaluation.
How does Cursor Composer 1.5 compare to other agents?
45.2% pass rate sits slightly below the benchmark average of 47.3%. Its primary distinction is infrastructure stability: only 2 infra failures from 128 evaluations. The main weakness is semantic failure, 39 incorrect patches that compile but do not fix the CVE.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Cost vs performance: where agents sit on the Pareto frontier
15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.