Skip to main content
[STRATEGIES]

Agent Strategies

How different agents approach the same bug. Behavioral clustering reveals that strategy matters as much as model capability.

Strategy clusters

K-means clustering on session features (turns, tool calls, file reads, edits, backtracking) reveals distinct behavioral patterns. Agents cluster by approach, not just by model.

What this means for agent selection

If an agent falls into a low-pass-rate cluster, its behavioral pattern may be the bottleneck - not its model intelligence. Strategy is sometimes more tunable than the model itself.

3
Behavioral clusters
973
Sessions analyzed
60%
Best cluster pass rate
10
Clustering features

How AI agents approach the same bug differently

K-means clustering on 10 session features (turns, tool calls, file reads, edits, backtracking) reveals 3 distinct behavioral patterns across 973 sessions. Some agents explore broadly. Others edit fast. The pattern predicts the outcome.

Behavior matters because it tells you why agents succeed or fail - not just whether they do. An agent that reads every file in a project before making a single edit uses more tokens than an agent that reads targeted files. An agent that generates many candidate patches and tests them has different cluster characteristics than one that generates a single patch. These patterns emerge from the data.

[KEY INSIGHT]

60% pass rate in the best cluster

The cluster with the highest pass rate shares specific behavioral traits. Pass rate correlates with approach - not just model capability.

This is important because it means behavioral tuning can improve outcomes without changing the underlying model. If an agent falls into a low-performing cluster, its behavioral pattern (how it reads, explores, edits) might be the bottleneck. Changing the prompt or agent instructions to shift behavior could improve pass rates more than swapping to a different model.

Cluster composition

Each cluster's size, pass rate, and dominant agents. Larger clusters represent the most common behavioral strategy.

Cluster size reveals what approaches agents naturally converge on. If most agents fall into a single cluster, they are using similar strategies. If agents spread across clusters, they take diverse approaches. Diversity can be good (different strengths on different bug types) or bad (no coherent strategy).

Cluster 060%

speed-runner

211 sessions

Top agents

claude-claude-opus-4-5106
claude-claude-opus-4-6105
Cluster 132%

explorer

25 sessions

Top agents

claude-claude-opus-4-617
claude-claude-opus-4-58
Cluster 255%

surgical-expert

737 sessions

Top agents

cursor-gpt-5.3-codex128
cursor-composer-1.5127
cursor-gpt-5.2127
cursor-opus-4.6126
codex-gpt-5.2-codex118

DPO preference pairs

For each CVE sample, we construct preference pairs from the evaluation outcomes.

Gold pairs

Pass vs build-fail. Strongest signal. The winning patch fixes the vulnerability while the losing patch breaks the build.

Silver pairs

Pass vs test-fail. Medium signal. Both patches compile but only one fixes the bug.

Bronze pairs

Test-fail vs build-fail. Weakest signal. Neither fixes the bug, but one at least compiles.

Unlock full results

Enter your email to access the full methodology, per-sample analysis, and patch examples.

[NEXT STEPS]

Drill into session-level data

The session analysis page shows per-agent metrics: turns, tool calls, and token usage. The trajectory data lets you trace individual agent decision paths.

Explore more

FAQ

How do agents differ in their approach?

Some agents explore broadly (read many files, backtrack often). Others edit fast (fewer reads, targeted changes). The approach pattern predicts the outcome.

Can I configure agent strategy?

Often yes. System prompts, tool access, and memory settings influence agent behavior. The best-performing cluster shares specific traits: fewer file reads, targeted edits, less backtracking.

[RELATED TOPICS]

See which agents produce fixes that work

128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.