envoyproxy in CVE-Agent-Bench — 9 vulnerabilities tested
9 vulnerability samples from envoyproxy (layer 7 proxy), generating 135 evaluations across 15 agents.
Overview
Envoy is a layer 7 proxy written in C++ used by Istio for service mesh and AWS App Mesh for container orchestration. The proxy processes HTTP/2, gRPC, and other protocols in real time, requiring careful handling of streaming data and protocol state machines. Envoy sits between clients and backend services, meaning protocol violations can leak data, corrupt streams, or enable injection attacks.
Benchmark coverage
9 vulnerability samples from envoyproxy are included in CVE-Agent-Bench, generating 135 individual evaluations across 15 agent configurations. These samples focus on HTTP/2 frame parsing bugs, header injection vulnerabilities, and connection state handling issues.
Vulnerability classes
envoyproxy samples cover vulnerability patterns in network protocol implementation:
- HTTP/2 frame parsing vulnerabilities where malformed frames trigger memory corruption or assertion failures
- Header injection bugs where insufficient validation allows attackers to inject additional headers or control characters
- Connection state machine violations where out-of-order frames or unexpected state transitions cause incorrect behavior
- Resource exhaustion vulnerabilities where attacker-controlled values trigger excessive memory or CPU consumption
- Stream handling bugs in HTTP/2 multiplexing where frame routing is incorrect
- Flow control bypass vulnerabilities where window size calculations allow exceeding connection limits
Why envoyproxy bugs are interesting for agent evaluation
envoyproxy vulnerabilities test an agent's ability to understand network protocol implementations and state machines. The proxy code requires deep understanding of HTTP/2 specification, connection pooling, and multiplexing. Bugs often involve subtle protocol violations or header validation gaps. Agents must generate fixes that enforce protocol correctness while maintaining performance in high-throughput production environments.
Network proxies are particularly difficult to reason about because bugs can silently corrupt data in transit without local detection, making fixes require careful understanding of the RFC specifications.
Agent performance on envoyproxy
Per-project performance data is not yet published. Aggregate results across all projects are available at the full results page, where you can view individual agent pass rates and costs. The benchmark methodology documents the evaluation process in detail.
Related projects
Projects with similar protocol state machine challenges:
- openvswitch, network packet processing with protocol parsing
- Apache, HTTP request handling with protocol compliance requirements
- libgit2, protocol implementation with binary format parsing
Explore more
- Full benchmark results
- Agent profiles
- Methodology
- Economics analysis, cost per verified patch
FAQ
Why is envoyproxy important for agent evaluation?
envoyproxy handles HTTP/2 and gRPC protocols in production. 9 samples test agents on network protocol implementation and state machine correctness.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
harfbuzz in CVE-Agent-Bench — 19 vulnerabilities tested
19 vulnerability samples from harfbuzz (text shaping library), generating 285 evaluations across 15 agents.
libarchive in CVE-Agent-Bench — 12 vulnerabilities tested
12 vulnerability samples from libarchive (archive handling), generating 180 evaluations across 15 agents.
Apache in CVE-Agent-Bench — 7 vulnerabilities tested
7 vulnerability samples from Apache HTTP Server and related projects, generating 105 evaluations across 15 agents.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.