Output Receipts

Pick a prompt to see how tools respond side-by-side - with token cost, duration, and a judge score. Every output is auditable, labelled by method, and cached for fair comparison.

analyze

Analyze a churn data snippet and propose 3 hypotheses
Expected: 300-500 words
Hypotheses are distinct, plausible, and testable with clear metrics.
Review a small PR diff
Expected: 200-400 words
Catches the validation removal + injection risk, severity ordering sane.

blog

1000-word blog intro on AI coding assistants for small teams
Expected: ~1000 words
Balanced comparison, includes a real table, identifies trade-offs, avoids vendor-speak.
800-word blog intro on sustainable fashion
Expected: ~800 words
Strong hook, scannable structure, specific stats, clear article preview, no filler.

code

email

marketing

SaaS landing page hero (headline + subhead + 2 CTAs)
Expected: 150-300 words
Variants are distinct, headlines are punchy, CTAs match angle.

social

LinkedIn post for a product launch
Expected: 180-220 words
Hook works, concrete not buzzword-laden, quote feels real, post is scannable.

summarize

translate