AI-Ladder
Arena snapshots, pricing context, and source evidence are being indexed
06/03/2026

Independent model evaluation, provenance, and cost context

Benchmark intelligence for choosing AI models

AI-Ladder turns capability, price, context windows, confidence intervals, and source timestamps into a practical decision surface for model selection.

Live panel

Capability ranking

Top measured models with score evidence.

1502
Claude Opus 4.6
1500
Claude Opus 4.7
1498
Claude Opus 4.6
1492
Claude Opus 4.7
1489
Muse Spark
1488
Gemini 3.1 Pro
1486
gemini 3 pro
Live panel

Cost context

Average USD per 1M tokens. Lower is better.

0.1
Muse Spark
0.2
Gemini 3.5 Flash
3.1
Gemini 3.1 Pro
3.1
gemini 3 pro
10
gpt 5.5 high
10
gpt 5.4 high
45
Claude Opus 4.7
Live panel

Coverage map

Evidence across text, code, vision, document, and generation tasks.

10
Text
37
Code
0
Vision
0
Doc
0
Image
0
Video

Compare benchmark slices

Avoid a single opaque score. AI-Ladder separates preference, capability, and product context.

View leaderboard

Build a model shortlist

Move two to four candidates into the comparison sandbox and judge trade-offs with cost and context limits.

Compare models

Inspect the evidence

Every public metric should carry source, version, timestamp, and caveats so rankings remain reviewable.

Read methodology

Capability vs Cost

Normalized Arena text, code, and vision scores plotted against average token cost.

Homepage set blends value frontier, top capability, and Kimi / DeepSeek / Xiaomi / Qwen provider champions
Loading analytics chart...

Model Capabilities Evolution

Historical average ELO rating changes across primary model lineages.

Loading capabilities data...