Text Shaping in Vulnerability-Agent-Bench — 19 vulnerabilities tested

19 vulnerability samples from a text shaping library, generating 285 evaluations across 15 agents.

Overview

This text shaping library is a Unicode text shaping engine used by Chrome, Firefox, Android, and LibreOffice to render complex scripts correctly. Font rendering requires precise memory management and bounds checking, making it a critical component in millions of software systems. The library handles font files with varying complexity, from simple Latin scripts to intricate Asian and Indic writing systems.

Benchmark coverage

19 vulnerability samples from this text shaping library are included in Vulnerability-Agent-Bench, generating 285 individual evaluations across 15 agent configurations. These samples focus on buffer overflows, heap memory corruption, and out-of-bounds reads that occur during font parsing and glyph shaping operations.

Vulnerability classes

Text shaping samples cover specific vulnerability patterns common in font processing code:

Heap buffer overflows in font table parsing, where malformed or truncated font files exceed allocated memory bounds
Out-of-bounds reads in glyph shaping operations, triggered by invalid glyph indices or corrupted outline data
Integer overflows in size calculations that lead to undersized buffer allocation
Use-after-free bugs in font object reference counting during decompression or format conversion
Null pointer dereferences when expected font table structures are missing or malformed

Why text shaping bugs are interesting for agent evaluation

Text shaping vulnerabilities test an agent's ability to understand complex memory safety issues in font parsing code. The project has intricate data structures for font tables, glyph outlines, and shaping algorithms. Bugs often require domain-specific knowledge about OpenType font specification and careful handling of variable-length binary data. Agents must balance defensive programming with performance constraints in a widely-used library.

The 19 samples in the benchmark represent the types of issues that lead to remote code execution when processing untrusted font files. A single malformed font embedded in a web page, PDF, or email attachment can trigger memory corruption.

Agent performance on text shaping

Per-project performance data is not yet published. The full benchmark results aggregate performance across all codebases. You can review how individual agents performed overall at the full results page, where you can sort by pass rate, cost, and other metrics. The methodology behind agent evaluation is documented in the benchmark methodology guide.

Other memory-safety intensive codebases in the benchmark include:

Archive Library, archive format parsing with similar bounds-checking challenges
Image Codec, image codec with comparable decoding complexity
Git Library, binary format parsing with variable-length data handling

Explore more

Full benchmark results
Agent profiles
Methodology
Economics analysis, cost per verified patch

FAQ

How do agents perform on text shaping vulnerabilities?

The text shaping project has 19 samples in Vulnerability-Agent-Bench, the largest per-project sample. Font parsing bugs test domain-specific knowledge that varies across agent models.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.