libgit2 in CVE-Agent-Bench — 6 vulnerabilities tested
6 vulnerability samples from libgit2 (Git library), generating 90 evaluations across 15 agents.
Overview
libgit2 is the C library implementation of Git used by GitHub Desktop, GitKraken, Plastic SCM, and other Git clients. The library implements the complete Git protocol and object storage model, requiring careful handling of object graphs and pack file parsing. Git operations process repository data that may have been maliciously crafted, making input validation critical.
Benchmark coverage
6 vulnerability samples from libgit2 are included in CVE-Agent-Bench, generating 90 individual evaluations across 15 agent configurations. These samples include heap corruption bugs, integer overflow vulnerabilities, and memory safety issues in packfile parsing and object handling.
Vulnerability classes
libgit2 samples cover vulnerability patterns in complex data structure handling:
- Heap buffer overflows in packfile parsing where size calculations lead to insufficient buffer allocation
- Integer overflows in offset calculations within the Git pack format, leading to incorrect pointer arithmetic
- Out-of-bounds reads when traversing object deltas or compressed object streams
- Path traversal vulnerabilities in checkout operations where symbolic links escape repository boundaries
- Signature verification bypass bugs where cryptographic signature validation is incorrectly implemented
- Use-after-free in object reference counting during concurrent repository operations
Why libgit2 bugs are interesting for agent evaluation
libgit2 vulnerabilities test an agent's ability to work with complex data structures and binary format parsing. The codebase requires understanding of Git's object model, compression algorithms, and pack file format. Bugs often involve off-by-one errors in boundary checking or incorrect handling of variable-length encoded data. Agents must generate fixes that preserve compatibility with the Git protocol while closing memory safety gaps.
Git repositories can be cloned from untrusted sources, meaning a single malformed packfile can compromise a developer's machine. This makes repository parsing one of the highest-stakes attack surfaces in modern development workflows.
Agent performance on libgit2
Per-project performance data is not yet published. The full benchmark results aggregate performance across all projects. You can review how individual agents performed at the full results page, where results can be sorted by pass rate and cost. The benchmark methodology documents the evaluation approach.
Related projects
Projects with similar binary format parsing requirements:
- libarchive, multi-format binary parsing with compression handling
- harfbuzz, complex binary structure parsing in font files
- libjxl, image codec with variable-length data handling
Explore more
- Full benchmark results
- Agent profiles
- Methodology
- Economics analysis, cost per verified patch
FAQ
What makes libgit2 a good benchmark?
libgit2 requires understanding Git's object model and pack file format. 6 samples test agents on complex binary format parsing and memory safety.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
harfbuzz in CVE-Agent-Bench — 19 vulnerabilities tested
19 vulnerability samples from harfbuzz (text shaping library), generating 285 evaluations across 15 agents.
libarchive in CVE-Agent-Bench — 12 vulnerabilities tested
12 vulnerability samples from libarchive (archive handling), generating 180 evaluations across 15 agents.
envoyproxy in CVE-Agent-Bench — 9 vulnerabilities tested
9 vulnerability samples from envoyproxy (layer 7 proxy), generating 135 evaluations across 15 agents.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.