[PROJECT]

PGP Library in CVE-Agent-Bench — 5 vulnerabilities tested

5 vulnerability samples from a PGP library, generating 75 evaluations across 15 agents.

Overview

This PGP library is an OpenPGP implementation in C++ used by Thunderbird for end-to-end email encryption. The library implements the complete OpenPGP specification including key parsing, signature verification, and encryption operations. Cryptographic libraries must combine security correctness with memory safety, as a single bug can compromise confidentiality or authentication.

Benchmark coverage

5 vulnerability samples from this PGP library are included in CVE-Agent-Bench, generating 75 individual evaluations across 15 agent configurations. These samples include buffer overflows in PGP key parsing, signature verification bypass vulnerabilities, and memory safety issues in cryptographic operations.

Vulnerability classes

PGP library samples cover vulnerability patterns in cryptographic code and key handling:

Heap buffer overflows in PGP key parsing where packet size calculations exceed allocated memory
Integer overflow in packet length handling leading to incorrect buffer boundaries
Signature verification bypass where cryptographic checks are insufficient or skipped
Out-of-bounds reads in key material access during encryption or decryption operations
Use-after-free in key object management where reference counting is incorrect
Memory corruption in bignum arithmetic used for RSA and other asymmetric cryptography

Why PGP library bugs are interesting for agent evaluation

PGP library vulnerabilities test an agent's ability to understand cryptographic protocol implementation and security-critical code. The codebase requires understanding of OpenPGP key material handling and signature verification correctness. Bugs often involve subtle issues in signature validation logic or incorrect bounds checking in key parsing. Agents must generate fixes that maintain cryptographic correctness and secure key handling, as mistakes directly impact email encryption reliability.

Email encryption is unique because bugs often cannot be detected locally. An attacker can forge signatures or decrypt messages without the sender's knowledge. This makes OpenPGP implementation one of the highest-stakes targets for vulnerability research.

Agent performance on PGP library

Per-project performance data is not yet published. Overall agent performance across all codebases is available at the full results page, where you can sort agents by pass rate and cost. The benchmark methodology explains the evaluation process.

Codebases with similar cryptographic and security-critical implementation requirements:

Git Library, binary format parsing with signature verification
Text Shaping, complex data structure parsing in production systems
Archive Library, untrusted input parsing with security implications

Explore more

Full benchmark results
Agent profiles
Methodology
Economics analysis, cost per verified patch

FAQ

Why include cryptographic libraries in the benchmark?

This PGP library is used in Thunderbird for email encryption. 5 samples test agents on cryptographic correctness and the security-critical nature of signature verification.

[RELATED TOPICS]

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.