Mindber Model Index — Methodology

How we rank AI models across quality, speed, and price into one number.

Data sources

Arena.ai — Crowd ELO

Real user pairwise comparisons across 12 boards: Agent, Text, Search, Vision, Document, Code (WebDev + Image-to-WebDev). Each model's position on each board contributes a vote-weighted quality signal.

Artificial Analysis — Objective benchmarks

Provider-agnostic measurements of Intelligence Index, output speed (t/s), latency (TTFT), and blended price per 1M tokens — updated continuously across 200+ model–provider combinations.

What we exclude

Image and video generation boards (text-to-image, image-edit, text-to-video, image-to-video, video-edit) are excluded from the MMI. These boards rank creative generation models on aesthetic preference — a different task class from language reasoning. Mixing them with text LLMs produces a misleading composite.

Formula

Step 1 — Percentile normalization

Every raw metric is converted to a percentile rank within its own pool so different scales are comparable. Speed and price use lower-is-better inversion (faster = higher percentile; cheaper = higher percentile). Result: all signals live in [0, 1].

Step 2 — Quality fusion (Q)

Arena and AA each produce a quality estimate. Confidence in each source is weighted before fusing:

cA = totalVotes / (totalVotes + 2000) // Arena confidence; saturates at ≈1 for high-vote models

cAA = 1 if model appears in AA data, else 0

Q = (cA × qArena + cAA × qAA) / (cA + cAA)

qArena is the vote-weighted mean of the model's per-board Arena percentiles. qAA is its AA Intelligence Index percentile. Models with no quality signal from either source are excluded.

Step 3 — Composite score (raw)

Efficiency metrics (speed, latency, price) are blended with Q using the Overall preset weights. Missing metrics are dropped and remaining weights are renormalized, so a model with no price data isn't penalized.

raw = (W_q×Q + W_s×spd + W_l×lat + W_p×prc) / (W_q + present_weights)

Step 4 — Coverage shrinkage

Models with thin data (few votes, only one source) are shrunk toward the median to prevent a handful of votes from catapulting an obscure model to the top.

presence = (hasArena ? 1 : 0 + hasAA ? 1 : 0) / 2

coverage = presence × (0.5 + 0.5 × cA) // clamped [0, 1]

MMI = 100 × (raw × coverage + median_raw × (1 − coverage))

A model with millions of votes and AA data has coverage ≈ 1 and its MMI equals its raw score × 100. A model with 50 votes and no AA data gets pulled toward the pack median.

Rank by presets

The Rank by toggle on the Overall board re-weights and re-ranks client-side without a new data fetch. Sub-scores (quality, speed, latency, price) are shipped with each row for this purpose.

Preset	Quality (Q)	Speed	Latency	Price	Use case
Overall	60%	12%	8%	20%	Default — balanced across all signals
Frontier	90%	3%	3%	4%	Quality-dominant; cost & speed ignored
Value	45%	10%	5%	40%	Intelligence per dollar
Speed	45%	25%	20%	10%	Latency-sensitive applications

Weights are renormalized over present metrics per model — missing efficiency data does not count against a model.

Attribution

Arena.ai — crowd preference data, Agent Arena methodology, pairwise comparison infrastructure
Artificial Analysis — objective intelligence benchmarks, speed, latency, and pricing data

Mindber does not claim ownership of source data. MMI is a derived, compute-on-read composite computed from publicly available leaderboard snapshots. Last methodology revision: 2026-06-14.

RankingsMethodology

Mindber Model Index — Methodology

How we rank AI models across quality, speed, and price into one number.

Data sources

Arena.ai — Crowd ELO

Artificial Analysis — Objective benchmarks

Provider-agnostic measurements of Intelligence Index, output speed (t/s), latency (TTFT), and blended price per 1M tokens — updated continuously across 200+ model–provider combinations.

What we exclude

Formula

Step 1 — Percentile normalization

Step 2 — Quality fusion (Q)

Arena and AA each produce a quality estimate. Confidence in each source is weighted before fusing:

cA = totalVotes / (totalVotes + 2000) // Arena confidence; saturates at ≈1 for high-vote models

cAA = 1 if model appears in AA data, else 0

Q = (cA × qArena + cAA × qAA) / (cA + cAA)

qArena is the vote-weighted mean of the model's per-board Arena percentiles. qAA is its AA Intelligence Index percentile. Models with no quality signal from either source are excluded.

Step 3 — Composite score (raw)

raw = (W_q×Q + W_s×spd + W_l×lat + W_p×prc) / (W_q + present_weights)

Step 4 — Coverage shrinkage

Models with thin data (few votes, only one source) are shrunk toward the median to prevent a handful of votes from catapulting an obscure model to the top.

presence = (hasArena ? 1 : 0 + hasAA ? 1 : 0) / 2

coverage = presence × (0.5 + 0.5 × cA) // clamped [0, 1]

MMI = 100 × (raw × coverage + median_raw × (1 − coverage))

A model with millions of votes and AA data has coverage ≈ 1 and its MMI equals its raw score × 100. A model with 50 votes and no AA data gets pulled toward the pack median.

Rank by presets

Preset	Quality (Q)	Speed	Latency	Price	Use case
Overall	60%	12%	8%	20%	Default — balanced across all signals
Frontier	90%	3%	3%	4%	Quality-dominant; cost & speed ignored
Value	45%	10%	5%	40%	Intelligence per dollar
Speed	45%	25%	20%	10%	Latency-sensitive applications

Weights are renormalized over present metrics per model — missing efficiency data does not count against a model.

Attribution

Arena.ai — crowd preference data, Agent Arena methodology, pairwise comparison infrastructure
Artificial Analysis — objective intelligence benchmarks, speed, latency, and pricing data

Mindber does not claim ownership of source data. MMI is a derived, compute-on-read composite computed from publicly available leaderboard snapshots. Last methodology revision: 2026-06-14.

RankingsMethodology

Mindber Model Index — Methodology

How we rank AI models across quality, speed, and price into one number.

Data sources

Arena.ai — Crowd ELO

Artificial Analysis — Objective benchmarks

Provider-agnostic measurements of Intelligence Index, output speed (t/s), latency (TTFT), and blended price per 1M tokens — updated continuously across 200+ model–provider combinations.

What we exclude

Formula

Step 1 — Percentile normalization

Step 2 — Quality fusion (Q)

Arena and AA each produce a quality estimate. Confidence in each source is weighted before fusing:

cA = totalVotes / (totalVotes + 2000) // Arena confidence; saturates at ≈1 for high-vote models

cAA = 1 if model appears in AA data, else 0

Q = (cA × qArena + cAA × qAA) / (cA + cAA)

qArena is the vote-weighted mean of the model's per-board Arena percentiles. qAA is its AA Intelligence Index percentile. Models with no quality signal from either source are excluded.

Step 3 — Composite score (raw)

raw = (W_q×Q + W_s×spd + W_l×lat + W_p×prc) / (W_q + present_weights)

Step 4 — Coverage shrinkage

Models with thin data (few votes, only one source) are shrunk toward the median to prevent a handful of votes from catapulting an obscure model to the top.

presence = (hasArena ? 1 : 0 + hasAA ? 1 : 0) / 2

coverage = presence × (0.5 + 0.5 × cA) // clamped [0, 1]

MMI = 100 × (raw × coverage + median_raw × (1 − coverage))

A model with millions of votes and AA data has coverage ≈ 1 and its MMI equals its raw score × 100. A model with 50 votes and no AA data gets pulled toward the pack median.

Rank by presets

Preset	Quality (Q)	Speed	Latency	Price	Use case
Overall	60%	12%	8%	20%	Default — balanced across all signals
Frontier	90%	3%	3%	4%	Quality-dominant; cost & speed ignored
Value	45%	10%	5%	40%	Intelligence per dollar
Speed	45%	25%	20%	10%	Latency-sensitive applications

Weights are renormalized over present metrics per model — missing efficiency data does not count against a model.

Attribution

Arena.ai — crowd preference data, Agent Arena methodology, pairwise comparison infrastructure
Artificial Analysis — objective intelligence benchmarks, speed, latency, and pricing data

Mindber does not claim ownership of source data. MMI is a derived, compute-on-read composite computed from publicly available leaderboard snapshots. Last methodology revision: 2026-06-14.

RankingsMethodology

Mindber Model Index — Methodology

How we rank AI models across quality, speed, and price into one number.

Data sources

Arena.ai — Crowd ELO

Artificial Analysis — Objective benchmarks

Provider-agnostic measurements of Intelligence Index, output speed (t/s), latency (TTFT), and blended price per 1M tokens — updated continuously across 200+ model–provider combinations.

What we exclude

Formula

Step 1 — Percentile normalization

Step 2 — Quality fusion (Q)

Arena and AA each produce a quality estimate. Confidence in each source is weighted before fusing:

cA = totalVotes / (totalVotes + 2000) // Arena confidence; saturates at ≈1 for high-vote models

cAA = 1 if model appears in AA data, else 0

Q = (cA × qArena + cAA × qAA) / (cA + cAA)

qArena is the vote-weighted mean of the model's per-board Arena percentiles. qAA is its AA Intelligence Index percentile. Models with no quality signal from either source are excluded.

Step 3 — Composite score (raw)

raw = (W_q×Q + W_s×spd + W_l×lat + W_p×prc) / (W_q + present_weights)

Step 4 — Coverage shrinkage

Models with thin data (few votes, only one source) are shrunk toward the median to prevent a handful of votes from catapulting an obscure model to the top.

presence = (hasArena ? 1 : 0 + hasAA ? 1 : 0) / 2

coverage = presence × (0.5 + 0.5 × cA) // clamped [0, 1]

MMI = 100 × (raw × coverage + median_raw × (1 − coverage))

A model with millions of votes and AA data has coverage ≈ 1 and its MMI equals its raw score × 100. A model with 50 votes and no AA data gets pulled toward the pack median.

Rank by presets

Preset	Quality (Q)	Speed	Latency	Price	Use case
Overall	60%	12%	8%	20%	Default — balanced across all signals
Frontier	90%	3%	3%	4%	Quality-dominant; cost & speed ignored
Value	45%	10%	5%	40%	Intelligence per dollar
Speed	45%	25%	20%	10%	Latency-sensitive applications

Weights are renormalized over present metrics per model — missing efficiency data does not count against a model.

Attribution

Arena.ai — crowd preference data, Agent Arena methodology, pairwise comparison infrastructure
Artificial Analysis — objective intelligence benchmarks, speed, latency, and pricing data

Mindber does not claim ownership of source data. MMI is a derived, compute-on-read composite computed from publicly available leaderboard snapshots. Last methodology revision: 2026-06-14.