Skip to main content
[PATCHING]

Automated Vulnerability Patching

AI agents generate fixes for known CVEs. XOR verifies each fix against the vulnerability before it ships.

From CVE to verified fix

AI agents generate fixes for known security vulnerabilities. XOR writes a verifier for each one and tests the fix against it. 128 tested so far.

Deterministic verification

Each vulnerability is packaged with a known-vulnerable environment, a test harness, and automated verification. Results are deterministic and reproducible.

62.7%
Top agent pass rate
$2.64
Cheapest per fix
80.5%
Best possible (all agents combined)

How automated patching works

01

Detect

A vulnerability is found by your scanner (Snyk, Dependabot, or manual triage).

02

Patch

A coding agent generates a fix. It reads the vulnerable code, understands the bug, and writes the fix.

03

Verify

XOR writes a verifier for the vulnerability in an isolated environment, applies the fix, and runs a safety check. The verifier confirms the bug no longer triggers and no new issues were introduced. Pass or fail. No gray area.

04

Ship

If it passes, XOR opens a PR with the test report. If it fails, the bug stays open for human review.

Which agent for which vulnerability?

Different agents have different strengths. The best agent by accuracy isn't the cheapest. The cheapest isn't the fastest. Pick based on your priority. In our benchmark of 136 real CVEs across 13 agent configurations, pass rates range from 19% to 53.7% and cost per fix spans $0.53 to $40. The right agent depends on your bug volume, budget, and risk tolerance.

RankAgentPass RatePassFailBuildInfra
1codex-gpt-5.262.7%79123510
2cursor-opus-4.662.5%8024240
3claude-claude-opus-4-661.6%77282011
4gemini31-gemini-3.1-pro-preview58.7%64182719
5opencode-gemini-gemini-3.1-pro-preview54.9%6725306
6cursor-gpt-5.251.6%6334256
7opencode-gpt-5.251.6%63114814
8cursor-gpt-5.3-codex50.4%6440231
9codex-gpt-5.2-codex49.2%6327388
10opencode-claude-opus-4-647.5%58154914
11claude-claude-opus-4-545.7%5843269
12cursor-composer-1.545.2%5739302
13gemini-gemini-3-pro-preview43.0%5536378
14opencode-gpt-5.2-codex37.8%4832479
15opencode-claude-opus-4-536.8%46295011

See the full leaderboard →

Before and after

// BEFORE - vulnerable function (buffer overflow)

void process_input(char *buf, size_t len) {

char local[256];

memcpy(local, buf, len); // no bounds check

}

// AFTER - agent-patched (bounds check added)

void process_input(char *buf, size_t len) {

char local[256];

if (len > sizeof(local)) len = sizeof(local);

memcpy(local, buf, len);

}

$ xor verify --sample text-shaping-11033

[PASS] - safety checks pass, bug no longer triggers ✓

Automate with GitHub App

Install the XOR GitHub App on your repos. When a coding agent opens a PR, XOR tests it automatically and posts a pass/fail result directly on the PR. No configuration needed beyond installation. Free for open source projects.

Install on GitHub →

[NEXT STEPS]

Start patching

FAQ

How does automated patching work?

XOR dispatches an agent to write a fix for a known CVE. The agent generates a patch. XOR runs the patch against a verifier written for the specific vulnerability. If the fix passes, it ships.

Which agents can generate patches?

Any coding agent: Claude Code, Codex, Gemini CLI, Cursor, or custom agents. The GitHub App monitors the code change and runs verification automatically.

What happens if the patch fails?

Failed patches are rejected. The failure data feeds back into the agent harness as a learning signal for the next run.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.