Skip to main content
[ATTACK SURFACE]

How Agents Get Attacked

20% jailbreak success rate. 42 seconds average. 90% of successful attacks leak data. Threat data grounded in published research.

Agent attack taxonomy

Four primary vectors: prompt injection, tool poisoning, skill supply chain compromise, and protocol exploits. Each vector has distinct detection and mitigation requirements.

Real-world attack data

Pillar Security monitored 2,000+ LLM applications and found a 20% jailbreak success rate with 42-second average time. 90% of successful attacks resulted in sensitive data leakage.

20%
Jailbreak success rate
42s
Average jailbreak time
90%
Successful attacks that leak data

Agent threat surface: real-world data, not theory

Every stat on this page comes from published research. Pillar Security monitored 2,000+ LLM applications. MCPSecBench tested 17 attack types across 4 surfaces. arXiv papers documented multi-agent exploitation at scale. This is not theoretical. See OWASP agentic risks for mapped controls.

Attack taxonomy

Prompt injection

20% success rate across 2,000+ applications. Average time: 42 seconds. 90% of successful attacks result in data leakage (Pillar Security, Oct 2024).

Tool poisoning

36.5% average attack success rate. Manipulated tool descriptions trick agents into executing harmful actions. o1-mini hit 72.8% success (MCPTox, arXiv:2508.14925). Learn more about MCP security.

Skill supply chain

36.82% of 3,984 agent skills contain security flaws. 76 confirmed malicious payloads in public marketplaces (Snyk ToxicSkills). 350% rise in GitHub Actions supply chain attacks in 2025 (StepSecurity). See building secure skills for prevention.

Protocol exploits

17 attack types across 4 MCP surfaces. 85%+ of identified attacks compromise at least one platform (MCPSecBench, arXiv:2508.13220).

Multi-agent amplification

When multiple agents collaborate, a compromised agent can propagate attacks across the system. Research shows 58-90% success rates for arbitrary code execution via multi-agent orchestration systems, with some configurations reaching 100% (arXiv:2503.12188).

Prompt injection on a single agentic coding assistant can compromise the entire supply chain of projects it touches (arXiv:2601.17548).

Defense effectiveness

A meta-analysis of 78 published studies found that attackers with adaptive strategies succeed at 85%+ rates. Most defense mechanisms achieve less than 50% mitigation (arXiv:2506.23260). This gap between attack and defense effectiveness means detection and response are more reliable than prevention alone. XOR's verification pipeline focuses on output validation rather than perfect input protection.

"Prompt injection is defining the AI era". CrowdStrike 2026 Threat Report

What XOR catches

Verification pipeline

Every agent-generated fix is tested against the original vulnerability. Bad patches are rejected before review.

Guardrail review

Inline review comments on risky changes. Uncertainty stop: XOR says when confidence is low instead of guessing.

CI hardening

Actions pinned to SHA. Workflow permissions reduced to least-privilege. Counters the 350% rise in Actions supply chain attacks.

Skill scanning

Agent tools checked against vulnerability databases before execution. Unsigned tools are blocked.

Sources

  • Pillar Security, State of Attacks on GenAI (2024-2025), 2,000+ LLM apps
  • arXiv:2601.17548 — Prompt Injection Attacks on Agentic Coding Assistants
  • arXiv:2503.12188 — Multi-Agent Systems Execute Arbitrary Malicious Code
  • arXiv:2510.23883 — Agentic AI Security: Threats, Defenses, Evaluation
  • arXiv:2506.23260 — Adaptive attack strategies, 78 studies meta-analysis
  • International AI Safety Report 2026. 100+ experts, 30+ countries
  • CrowdStrike 2026 Threat Report, AI threat vectors
  • StepSecurity, GitHub Actions supply chain attacks

[NEXT STEPS]

Related pages

FAQ

How often do jailbreak attacks succeed?

20% of jailbreak attempts succeed with an average time of 42 seconds. 90% of successful attacks result in sensitive data leakage (Pillar Security, 2,000+ LLM applications monitored).

Can multi-agent systems be exploited for code execution?

Yes. Research shows 58-90% success rates for arbitrary code execution via multi-agent orchestration systems, with some configurations reaching 100% (arXiv:2503.12188).

How effective are current defenses?

Most defense mechanisms achieve less than 50% mitigation against adaptive attack strategies. Attackers with budget for multiple attempts succeed at 85%+ rates across 78 published studies (arXiv:2506.23260).

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.