Archive Library in Vulnerability-Agent-Bench — 12 vulnerabilities tested

12 vulnerability samples from an archive library, generating 180 evaluations across 15 agents.

Overview

This archive library is the standard C library for handling archive formats including tar, zip, cpio, and ISO 9660. It ships with FreeBSD, powers the macOS Installer, and is integrated into CMake. The library handles untrusted binary input across multiple archive formats, making it a critical security component. Archive handling sits at the boundary between user-supplied files and system code, making format parsing bugs particularly dangerous.

Benchmark coverage

12 vulnerability samples from this archive library are included in Vulnerability-Agent-Bench, generating 180 individual evaluations across 15 agent configurations. These samples include heap buffer overflows, null pointer dereferences, and integer overflow bugs that occur during archive extraction and format parsing.

Vulnerability classes

Archive library samples cover vulnerability patterns specific to multi-format archive handling:

Heap buffer overflows during decompression when output buffer size calculations are incorrect
Path traversal vulnerabilities in archive extraction where relative paths escape the target directory
Integer overflows in size calculations that lead to undersized buffers being allocated
Null pointer dereferences when malformed archive headers are missing expected fields
Use-after-free in format detection logic when archive handle state is incorrectly managed
Format confusion bugs where polyglot archive files trigger unexpected behavior in format switching

Why archive library bugs are interesting for agent evaluation

Archive library vulnerabilities test an agent's ability to work with multiple related file formats and defensive parsing. The codebase requires understanding of archive specification details and careful validation of untrusted file metadata. Bugs often involve edge cases in format switching, malformed header handling, and memory allocation. Agents must generate fixes that validate input correctly without breaking compatibility with existing archive formats.

Archive handling is particularly challenging because the library must support backward compatibility with decades of archive creation tools, meaning fixes cannot simply reject unusual but valid archive structures.

Agent performance on archive library

Per-project performance data is not yet published. The benchmark aggregates results across all evaluated projects. You can view overall agent performance at the full results page and compare agents by pass rate, cost, or other criteria. For details on how agents were evaluated, see the benchmark methodology.

Codebases with similar multi-format handling challenges:

Text Shaping, complex binary structure parsing in font files
Image Codec, image format decoding with bounds-checking requirements
Network Switch, protocol packet parsing with multiple encapsulation formats

Explore more

Full benchmark results
Agent profiles
Methodology
Economics analysis, cost per verified patch

FAQ

What kinds of archive library bugs are in the benchmark?

The archive library has 12 samples testing multiple archive formats (tar, zip, cpio). Agents must understand format switching and untrusted input validation.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.