Guard check vulnerabilities in CVE-Agent-Bench
155 evaluations of missing conditional guards before unsafe operations. Common fix pattern: add if-statement to check preconditions.
Guard check vulnerabilities arise when code performs an unsafe operation without first verifying that preconditions are met. A function might dereference a pointer without checking if it is null, or call a system function without verifying that the resource is available, or access an object property without confirming the object exists.
What makes guard checks necessary
155 evaluations in CVE-Agent-Bench involve missing or incorrect guard checks. These vulnerabilities require defensive programming: adding conditional logic to ensure safe preconditions before executing potentially dangerous operations.
Guard checks are common in network protocols (envoyproxy), system utilities (apache, openvswitch), and middleware that processes untrusted data. A guard check failure can lead to null pointer dereferences, use-after-free errors, double-free bugs, or unintended state transitions.
The vulnerability manifests when external input or concurrent execution creates an unexpected state. A file might be closed between the check and the use, a network socket might disconnect, or a configuration dictionary might be empty at the moment the code attempts to access a key.
The fix pattern
Guard check fixes follow a standard defensive pattern. Before performing an unsafe operation, add an if-statement to check a precondition. If the precondition is not met, either return early, skip the operation, or execute an error handler.
Common guard patterns include checking for null pointers, verifying collection size before iteration, confirming resource availability before use, and validating state before mutation. The fix typically involves wrapping the dangerous operation in an if-block or adding an early return when conditions are not safe.
Some fixes require multiple precondition checks in sequence. A function might need to verify that a pointer is not null, that a lock is held, and that a counter is above a threshold before proceeding. These multi-condition guards are more complex but follow the same pattern: validate before executing.
Agent performance on guard checks
Guard checks require agents to identify preconditions and understand when they fail. Agents perform well on simple guards (null checks) but struggle with domain-specific preconditions (resource availability, state invariants, permission checks).
Agents often correctly identify obvious null checks but miss subtle guard conditions specific to the codebase. Some agents add guards in the wrong location, checking conditions after the dangerous operation rather than before. Others add incomplete guards that check one condition but miss a second related precondition.
The benchmark results show that agents trained primarily on common programming patterns (null checks, bounds checks) outperform those that must reason about domain-specific safety properties.
Comparison to other vulnerability types
Guard checks occupy a middle ground between bounds checks and logic fixes. They are more complex than bounds checks because they require understanding domain-specific preconditions, but simpler than logic fixes because the fix pattern is straightforward once the precondition is identified.
Guard checks test whether agents can read error handling patterns, understand state management, and recognize when operations are unsafe. This skill set is essential for writing reliable software but is not always well-represented in training data.
Explore more
- Agent leaderboard, See which agents handle guard checks best
- Agent profiles, Compare agent behavior on safety-critical repairs
- Bug complexity analysis, Where guard checks fit in the difficulty distribution
- Methodology, How guard checks are evaluated and verified
FAQ
How do agents perform on guard check vulnerabilities?
Guard checks require identifying preconditions and adding defensive branches. Agents perform well on simple guards (null checks) but struggle with domain-specific preconditions (resource availability, state invariants).
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Bounds check vulnerabilities in CVE-Agent-Bench
164 evaluations of missing or incorrect bounds checks. The most common fix pattern: add length validation before buffer access.
Logic fix vulnerabilities in CVE-Agent-Bench
71 evaluations of incorrect program logic. Common fix patterns: change comparison operator, swap variables, fix control flow.
Allocation fix vulnerabilities in CVE-Agent-Bench
15 evaluations of memory allocation errors. Hardest category: wrong size, missing allocation, double-free. Requires deep memory model understanding.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.