Verified AgentSkills

Supply chain verification for AI agent skills. 36.82% of agent skills have known vulnerabilities.

The problem

AI agents run third-party tools with real permissions. Nobody checks if those tools are safe. Snyk found that 36.82% of 3,984 audited agent skills contain known vulnerabilities.

What Verified AgentSkills does

Scans third-party skills for vulnerabilities, signs verified skills with a tamper-proof signature, and blocks unverified skills from running in production.

[COMING SOON]

Supply chain verification for AI agent skills

36.82% of agent skills have known vulnerabilities. Verified AgentSkills checks third-party skills before your agents run them - signed, scanned, and independently tested. Learn more about secure skill development.

What agent skills are

Agent skills (also called tools, plugins, or functions) are the integrations that AI coding agents use to accomplish work. A skill might search a codebase, run tests, commit code, call APIs, or integrate with external services. Most agent platforms provide a marketplace where third-party developers publish skills. Your agents can install and call any published skill with a single line of configuration. See third-party risk for why this matters.

Why skill security matters

Skills are code execution. When your agent installs a skill, it gains the ability to run whatever code that skill contains. A malicious or vulnerable skill can exfiltrate secrets, inject backdoors, corrupt your codebase, or open access to your systems. The Snyk ToxicSkills research (Feb 2026) audited 3,984 real agent skills and found that 36.82% had at least one security flaw. 13.4% had critical issues: credential theft, data exfiltration, SQL injection, or remote code execution. These skills have been installed millions of times. See agent attack landscape for real-world attack data.

How XOR verifies skills

Verified AgentSkills applies the same approach XOR uses for code patches: run the skill in an isolated environment, watch what it does, and verify it doesn't steal secrets or corrupt data. We scan for credential leaks, analyze API calls to detect exfiltration, and test with adversarial inputs to trigger injection attacks. Each verified skill gets a signed certificate proving it passed review as of the test date.

What this means for your deployments

Instead of reviewing every skill manually or trusting a vendor's word, you can install only verified skills. Each skill carries a link to its test report, timestamp, and scope. You decide which verification standard your team requires. Public package? Free verification. Enterprise? Custom testing based on your risk profile.

Get notified at launch

Enter your email to be notified when Verified AgentSkills is available.

FAQ

What is agent skill verification?

AI agents use third-party skills (MCP servers, plugins, tool packages). 36.82% of these have known vulnerabilities. Verified AgentSkills scans, signs, and checks skills before your agents run them.

Where does the 36.82% figure come from?

Snyk ToxicSkills audit, February 2026. 3,984 agent skills audited across public marketplaces. 1,467 had known vulnerabilities.

When does Verified AgentSkills launch?

Coming soon. Enter your email on the contact page to get notified at launch.

Benchmark Results

62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.

Benchmark Methodology

How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.

Agent Configurations

15 agent-model configurations benchmarked on real vulnerabilities. Compare pass rates and costs.

Native CLIs vs wrapper CLIs: the 10-16pp performance gap

Claude CLI vs OpenCode, Gemini CLI vs OpenCode, Codex vs Cursor. Same models, different wrappers, consistent accuracy gaps of 10-16 percentage points.

Cost vs performance: where agents sit on the Pareto frontier

15 agents plotted on cost-accuracy. 4 on the Pareto frontier. Best value: claude-opus-4-6 at $2.93/pass, 61.6%.

See which agents produce fixes that work

128 vulnerabilities. 15 agents. 1,920 evaluations. Agents learn from every run.