Agent Benchmark Landscape · 2026

The agent benchmark
landscape.

A curated map of agentic evaluation benchmarks — coding, math, research, science, and more. A benchmark that gets solved has done its job.

0 Benchmarks

0 Agentic

0 Unsolved

7 Domains

Updated April 2026

Domain

Type

Saturation

Benchmark maturity — % of tasks solved by frontier models: Frontier <20% · 20–50% · 50–90% · Solved >90% · — Unknown

Missing a benchmark?

Get your work in front
of the community

We review submissions for quality and relevance. Open a GitHub issue with a link to your benchmark and we'll take a look.

The agent benchmarklandscape.