Automated Vulnerability Patching
AI agents generate fixes for known CVEs. XOR verifies each fix against the vulnerability before it ships.
From CVE to verified fix
AI agents generate fixes for known security vulnerabilities. XOR writes a verifier for each one and tests the fix against it. 128 tested so far.
Deterministic verification
Each vulnerability is packaged with a known-vulnerable environment, a test harness, and automated verification. Results are deterministic and reproducible.
How automated patching works
Detect
A vulnerability is found by your scanner (Snyk, Dependabot, or manual triage).
Patch
A coding agent generates a fix. It reads the vulnerable code, understands the bug, and writes the fix.
Verify
XOR writes a verifier for the vulnerability in an isolated environment, applies the fix, and runs a safety check. The verifier confirms the bug no longer triggers and no new issues were introduced. Pass or fail. No gray area.
Ship
If it passes, XOR opens a PR with the test report. If it fails, the bug stays open for human review.
Which agent for which vulnerability?
Different agents have different strengths. The best agent by accuracy isn't the cheapest. The cheapest isn't the fastest. Pick based on your priority. In our benchmark of 136 real CVEs across 13 agent configurations, pass rates range from 19% to 53.7% and cost per fix spans $0.53 to $40. The right agent depends on your bug volume, budget, and risk tolerance.
| Rank | Agent | Pass Rate | Pass | Fail | Build | Infra |
|---|---|---|---|---|---|---|
| 1 | codex-gpt-5.2 | 62.7% | 79 | 12 | 35 | 10 |
| 2 | cursor-opus-4.6 | 62.5% | 80 | 24 | 24 | 0 |
| 3 | claude-claude-opus-4-6 | 61.6% | 77 | 28 | 20 | 11 |
| 4 | gemini31-gemini-3.1-pro-preview | 58.7% | 64 | 18 | 27 | 19 |
| 5 | opencode-gemini-gemini-3.1-pro-preview | 54.9% | 67 | 25 | 30 | 6 |
| 6 | cursor-gpt-5.2 | 51.6% | 63 | 34 | 25 | 6 |
| 7 | opencode-gpt-5.2 | 51.6% | 63 | 11 | 48 | 14 |
| 8 | cursor-gpt-5.3-codex | 50.4% | 64 | 40 | 23 | 1 |
| 9 | codex-gpt-5.2-codex | 49.2% | 63 | 27 | 38 | 8 |
| 10 | opencode-claude-opus-4-6 | 47.5% | 58 | 15 | 49 | 14 |
| 11 | claude-claude-opus-4-5 | 45.7% | 58 | 43 | 26 | 9 |
| 12 | cursor-composer-1.5 | 45.2% | 57 | 39 | 30 | 2 |
| 13 | gemini-gemini-3-pro-preview | 43.0% | 55 | 36 | 37 | 8 |
| 14 | opencode-gpt-5.2-codex | 37.8% | 48 | 32 | 47 | 9 |
| 15 | opencode-claude-opus-4-5 | 36.8% | 46 | 29 | 50 | 11 |
Before and after
// BEFORE - vulnerable function (buffer overflow)
void process_input(char *buf, size_t len) {
char local[256];
memcpy(local, buf, len); // no bounds check
}
// AFTER - agent-patched (bounds check added)
void process_input(char *buf, size_t len) {
char local[256];
if (len > sizeof(local)) len = sizeof(local);
memcpy(local, buf, len);
}
$ xor verify --sample text-shaping-11033
[PASS] - safety checks pass, bug no longer triggers ✓
Automate with GitHub App
Install the XOR GitHub App on your repos. When a coding agent opens a PR, XOR tests it automatically and posts a pass/fail result directly on the PR. No configuration needed beyond installation. Free for open source projects.
Install on GitHub →[NEXT STEPS]
Start patching
FAQ
How does automated patching work?
XOR dispatches an agent to write a fix for a known CVE. The agent generates a patch. XOR runs the patch against a verifier written for the specific vulnerability. If the fix passes, it ships.
Which agents can generate patches?
Any coding agent: Claude Code, Codex, Gemini CLI, Cursor, or custom agents. The GitHub App monitors the code change and runs verification automatically.
What happens if the patch fails?
Failed patches are rejected. The failure data feeds back into the agent harness as a learning signal for the next run.
How Verification Works
Test agents on real vulnerabilities before shipping fixes.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Agent Cost Economics
Fix vulnerabilities for $2.64–$52 with agents. 100x cheaper than incident response. Real cost data.
Agent Configurations
15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.