OpenThread in CVE-Agent-Bench — 5 vulnerabilities tested
5 vulnerability samples from OpenThread (mesh networking), generating 75 evaluations across 15 agents.
Overview
OpenThread is an implementation of the Thread networking protocol backed by Google, Apple, and Amazon. Thread is a low-power mesh networking standard used in the Matter smart home specification. The implementation handles mesh routing, encryption, and device discovery in resource-constrained IoT environments. IoT devices often cannot be updated easily, making initial code security critical.
Benchmark coverage
5 vulnerability samples from OpenThread are included in CVE-Agent-Bench, generating 75 individual evaluations across 15 agent configurations. These samples include heap overflow vulnerabilities in mesh networking, stack buffer overflows, and memory corruption in protocol parsing.
Vulnerability classes
OpenThread samples cover vulnerability patterns in embedded network protocol implementation:
- Stack buffer overflows in CoAP message handling where variable-length payloads exceed stack allocation
- Heap buffer overflows in mesh routing where neighbor discovery packets trigger out-of-bounds writes
- Out-of-bounds reads in frame parsing where field offsets are not validated against frame size
- Integer overflow in length calculations leading to undersized stack allocation
- Assertion failures in protocol state machines where unexpected messages cause crashes
- Resource exhaustion where crafted packets trigger excessive memory allocation in constrained environments
Why OpenThread bugs are interesting for agent evaluation
OpenThread vulnerabilities test an agent's ability to understand embedded networking protocols and memory constraints. The codebase handles complex mesh routing state machines with limited RAM and CPU. Bugs often involve buffer handling in constrained memory environments or incorrect bounds checking in protocol parsing. Agents must generate fixes that close security gaps while fitting within the tight resource constraints of IoT devices.
IoT mesh networks are particularly challenging because devices may be physically inaccessible after deployment, and a single compromised device can attack all neighbors in the mesh. This makes the initial implementation security exceptionally important.
Agent performance on OpenThread
Per-project performance data is not yet published. Aggregate results across all projects are available at the full results page, where you can compare agents by pass rate and cost. The benchmark methodology documents the evaluation approach.
Related projects
Projects with similar embedded and resource-constrained implementation challenges:
- libarchive, binary format parsing with variable-length data
- openvswitch, network packet processing with protocol parsing
- blosc, high-performance algorithms under memory constraints
Explore more
- Full benchmark results
- Agent profiles
- Methodology
- Economics analysis, cost per verified patch
FAQ
How does OpenThread relate to CVE-Agent-Bench?
OpenThread powers Matter smart home devices. 5 samples test agents on embedded systems, resource constraints, and mesh networking protocol correctness.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
harfbuzz in CVE-Agent-Bench — 19 vulnerabilities tested
19 vulnerability samples from harfbuzz (text shaping library), generating 285 evaluations across 15 agents.
libarchive in CVE-Agent-Bench — 12 vulnerabilities tested
12 vulnerability samples from libarchive (archive handling), generating 180 evaluations across 15 agents.
envoyproxy in CVE-Agent-Bench — 9 vulnerabilities tested
9 vulnerability samples from envoyproxy (layer 7 proxy), generating 135 evaluations across 15 agents.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.