libarchive in CVE-Agent-Bench — 12 vulnerabilities tested
12 vulnerability samples from libarchive (archive handling), generating 180 evaluations across 15 agents.
Overview
libarchive is the standard C library for handling archive formats including tar, zip, cpio, and ISO 9660. It ships with FreeBSD, powers the macOS Installer, and is integrated into CMake. The library handles untrusted binary input across multiple archive formats, making it a critical security component. Archive handling sits at the boundary between user-supplied files and system code, making format parsing bugs particularly dangerous.
Benchmark coverage
12 vulnerability samples from libarchive are included in CVE-Agent-Bench, generating 180 individual evaluations across 15 agent configurations. These samples include heap buffer overflows, null pointer dereferences, and integer overflow bugs that occur during archive extraction and format parsing.
Vulnerability classes
libarchive samples cover vulnerability patterns specific to multi-format archive handling:
- Heap buffer overflows during decompression when output buffer size calculations are incorrect
- Path traversal vulnerabilities in archive extraction where relative paths escape the target directory
- Integer overflows in size calculations that lead to undersized buffers being allocated
- Null pointer dereferences when malformed archive headers are missing expected fields
- Use-after-free in format detection logic when archive handle state is incorrectly managed
- Format confusion bugs where polyglot archive files trigger unexpected behavior in format switching
Why libarchive bugs are interesting for agent evaluation
libarchive vulnerabilities test an agent's ability to work with multiple related file formats and defensive parsing. The codebase requires understanding of archive specification details and careful validation of untrusted file metadata. Bugs often involve edge cases in format switching, malformed header handling, and memory allocation. Agents must generate fixes that validate input correctly without breaking compatibility with existing archive formats.
Archive handling is particularly challenging because the library must support backward compatibility with decades of archive creation tools, meaning fixes cannot simply reject unusual but valid archive structures.
Agent performance on libarchive
Per-project performance data is not yet published. The benchmark aggregates results across all evaluated projects. You can view overall agent performance at the full results page and compare agents by pass rate, cost, or other criteria. For details on how agents were evaluated, see the benchmark methodology.
Related projects
Projects with similar multi-format handling challenges:
- harfbuzz, complex binary structure parsing in font files
- libjxl, image format decoding with bounds-checking requirements
- openvswitch, protocol packet parsing with multiple encapsulation formats
Explore more
- Full benchmark results
- Agent profiles
- Methodology
- Economics analysis, cost per verified patch
FAQ
What kinds of libarchive bugs are in the benchmark?
libarchive has 12 samples testing multiple archive formats (tar, zip, cpio). Agents must understand format switching and untrusted input validation.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
harfbuzz in CVE-Agent-Bench — 19 vulnerabilities tested
19 vulnerability samples from harfbuzz (text shaping library), generating 285 evaluations across 15 agents.
envoyproxy in CVE-Agent-Bench — 9 vulnerabilities tested
9 vulnerability samples from envoyproxy (layer 7 proxy), generating 135 evaluations across 15 agents.
Apache in CVE-Agent-Bench — 7 vulnerabilities tested
7 vulnerability samples from Apache HTTP Server and related projects, generating 105 evaluations across 15 agents.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.