rnpgp in CVE-Agent-Bench — 5 vulnerabilities tested
5 vulnerability samples from rnpgp (OpenPGP library), generating 75 evaluations across 15 agents.
Overview
rnpgp is an OpenPGP implementation in C++ used by Thunderbird for end-to-end email encryption. The library implements the complete OpenPGP specification including key parsing, signature verification, and encryption operations. Cryptographic libraries must combine security correctness with memory safety, as a single bug can compromise confidentiality or authentication.
Benchmark coverage
5 vulnerability samples from rnpgp are included in CVE-Agent-Bench, generating 75 individual evaluations across 15 agent configurations. These samples include buffer overflows in PGP key parsing, signature verification bypass vulnerabilities, and memory safety issues in cryptographic operations.
Vulnerability classes
rnpgp samples cover vulnerability patterns in cryptographic code and key handling:
- Heap buffer overflows in PGP key parsing where packet size calculations exceed allocated memory
- Integer overflow in packet length handling leading to incorrect buffer boundaries
- Signature verification bypass where cryptographic checks are insufficient or skipped
- Out-of-bounds reads in key material access during encryption or decryption operations
- Use-after-free in key object management where reference counting is incorrect
- Memory corruption in bignum arithmetic used for RSA and other asymmetric cryptography
Why rnpgp bugs are interesting for agent evaluation
rnpgp vulnerabilities test an agent's ability to understand cryptographic protocol implementation and security-critical code. The codebase requires understanding of OpenPGP key material handling and signature verification correctness. Bugs often involve subtle issues in signature validation logic or incorrect bounds checking in key parsing. Agents must generate fixes that maintain cryptographic correctness and secure key handling, as mistakes directly impact email encryption reliability.
Email encryption is unique because bugs often cannot be detected locally. An attacker can forge signatures or decrypt messages without the sender's knowledge. This makes OpenPGP implementation one of the highest-stakes targets for vulnerability research.
Agent performance on rnpgp
Per-project performance data is not yet published. Overall agent performance across all projects is available at the full results page, where you can sort agents by pass rate and cost. The benchmark methodology explains the evaluation process.
Related projects
Projects with similar cryptographic and security-critical implementation requirements:
- libgit2, binary format parsing with signature verification
- harfbuzz, complex data structure parsing in production systems
- libarchive, untrusted input parsing with security implications
Explore more
- Full benchmark results
- Agent profiles
- Methodology
- Economics analysis, cost per verified patch
FAQ
Why include cryptographic libraries in the benchmark?
rnpgp is used in Thunderbird for email encryption. 5 samples test agents on cryptographic correctness and the security-critical nature of signature verification.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
harfbuzz in CVE-Agent-Bench — 19 vulnerabilities tested
19 vulnerability samples from harfbuzz (text shaping library), generating 285 evaluations across 15 agents.
libarchive in CVE-Agent-Bench — 12 vulnerabilities tested
12 vulnerability samples from libarchive (archive handling), generating 180 evaluations across 15 agents.
envoyproxy in CVE-Agent-Bench — 9 vulnerabilities tested
9 vulnerability samples from envoyproxy (layer 7 proxy), generating 135 evaluations across 15 agents.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.