Mindber Model Index — Methodology
How we rank AI models across quality, speed, and price into one number.
Data sources
Arena.ai — Crowd ELO
Real user pairwise comparisons across 12 boards: Agent, Text, Search, Vision, Document, Code (WebDev + Image-to-WebDev). Each model's position on each board contributes a vote-weighted quality signal.
Artificial Analysis — Objective benchmarks
Provider-agnostic measurements of Intelligence Index, output speed (t/s), latency (TTFT), and blended price per 1M tokens — updated continuously across 200+ model–provider combinations.
What we exclude
Image and video generation boards (text-to-image, image-edit, text-to-video, image-to-video, video-edit) are excluded from the MMI. These boards rank creative generation models on aesthetic preference — a different task class from language reasoning. Mixing them with text LLMs produces a misleading composite.
Formula
Step 1 — Percentile normalization
Every raw metric is converted to a percentile rank within its own pool so different scales are comparable. Speed and price use lower-is-better inversion (faster = higher percentile; cheaper = higher percentile). Result: all signals live in [0, 1].
Step 2 — Quality fusion (Q)
Arena and AA each produce a quality estimate. Confidence in each source is weighted before fusing:
cA = totalVotes / (totalVotes + 2000) // Arena confidence; saturates at ≈1 for high-vote models
cAA = 1 if model appears in AA data, else 0
Q = (cA × qArena + cAA × qAA) / (cA + cAA)
qArena is the vote-weighted mean of the model's per-board Arena percentiles. qAA is its AA Intelligence Index percentile. Models with no quality signal from either source are excluded.
Step 3 — Composite score (raw)
Efficiency metrics (speed, latency, price) are blended with Q using the Overall preset weights. Missing metrics are dropped and remaining weights are renormalized, so a model with no price data isn't penalized.
raw = (W_q×Q + W_s×spd + W_l×lat + W_p×prc) / (W_q + present_weights)
Step 4 — Coverage shrinkage
Models with thin data (few votes, only one source) are shrunk toward the median to prevent a handful of votes from catapulting an obscure model to the top.
presence = (hasArena ? 1 : 0 + hasAA ? 1 : 0) / 2
coverage = presence × (0.5 + 0.5 × cA) // clamped [0, 1]
MMI = 100 × (raw × coverage + median_raw × (1 − coverage))
A model with millions of votes and AA data has coverage ≈ 1 and its MMI equals its raw score × 100. A model with 50 votes and no AA data gets pulled toward the pack median.
Rank by presets
The Rank by toggle on the Overall board re-weights and re-ranks client-side without a new data fetch. Sub-scores (quality, speed, latency, price) are shipped with each row for this purpose.
| Preset | Quality (Q) | Speed | Latency | Price | Use case |
|---|---|---|---|---|---|
| Overall | 60% | 12% | 8% | 20% | Default — balanced across all signals |
| Frontier | 90% | 3% | 3% | 4% | Quality-dominant; cost & speed ignored |
| Value | 45% | 10% | 5% | 40% | Intelligence per dollar |
| Speed | 45% | 25% | 20% | 10% | Latency-sensitive applications |
Weights are renormalized over present metrics per model — missing efficiency data does not count against a model.
Attribution
- Arena.ai — crowd preference data, Agent Arena methodology, pairwise comparison infrastructure
- Artificial Analysis — objective intelligence benchmarks, speed, latency, and pricing data
Mindber does not claim ownership of source data. MMI is a derived, compute-on-read composite computed from publicly available leaderboard snapshots. Last methodology revision: 2026-06-14.