Skip to main content
[PROJECT]

harfbuzz in CVE-Agent-Bench — 19 vulnerabilities tested

19 vulnerability samples from harfbuzz (text shaping library), generating 285 evaluations across 15 agents.

Overview

harfbuzz is a Unicode text shaping library used by Chrome, Firefox, Android, and LibreOffice to render complex scripts correctly. Font rendering requires precise memory management and bounds checking, making it a critical component in millions of software systems. The library handles font files with varying complexity, from simple Latin scripts to intricate Asian and Indic writing systems.

Benchmark coverage

19 vulnerability samples from harfbuzz are included in CVE-Agent-Bench, generating 285 individual evaluations across 15 agent configurations. These samples focus on buffer overflows, heap memory corruption, and out-of-bounds reads that occur during font parsing and glyph shaping operations.

Vulnerability classes

harfbuzz samples cover specific vulnerability patterns common in font processing code:

  • Heap buffer overflows in font table parsing, where malformed or truncated font files exceed allocated memory bounds
  • Out-of-bounds reads in glyph shaping operations, triggered by invalid glyph indices or corrupted outline data
  • Integer overflows in size calculations that lead to undersized buffer allocation
  • Use-after-free bugs in font object reference counting during decompression or format conversion
  • Null pointer dereferences when expected font table structures are missing or malformed

Why harfbuzz bugs are interesting for agent evaluation

harfbuzz vulnerabilities test an agent's ability to understand complex memory safety issues in font parsing code. The project has intricate data structures for font tables, glyph outlines, and shaping algorithms. Bugs often require domain-specific knowledge about OpenType font specification and careful handling of variable-length binary data. Agents must balance defensive programming with performance constraints in a widely-used library.

The 19 samples in the benchmark represent the types of issues that lead to remote code execution when processing untrusted font files. A single malformed font embedded in a web page, PDF, or email attachment can trigger memory corruption.

Agent performance on harfbuzz

Per-project performance data is not yet published. The full benchmark results aggregate performance across all projects. You can review how individual agents performed overall at the full results page, where you can sort by pass rate, cost, and other metrics. The methodology behind agent evaluation is documented in the benchmark methodology guide.

Other memory-safety intensive projects in the benchmark include:

  • libarchive, archive format parsing with similar bounds-checking challenges
  • libjxl, image codec with comparable decoding complexity
  • libgit2, binary format parsing with variable-length data handling

Explore more

FAQ

How do agents perform on harfbuzz vulnerabilities?

harfbuzz has 19 samples in CVE-Agent-Bench, the largest per-project sample. Font parsing bugs test domain-specific knowledge that varies across agent models.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.