Agent Benchmark Landscape · 2026

The agent benchmark
landscape.

A curated map of agentic evaluation benchmarks — coding, math, research, science, and more.

0 Benchmarks
0 Agentic
0 Unsolved
7 Domains

Updated April 2026

Domain
Type
Saturation
Saturation — % of tasks solved by frontier models: Frontier <20% · 20–50% · 50–80% · Solved >80% · — Unknown
Missing a benchmark?
Get your work in front
of the community

We review submissions for quality and relevance. Open a GitHub issue with a link to your benchmark and we'll take a look.

Submit a benchmark →