Output Receipts
Pick a prompt to see how tools respond side-by-side - with token cost, duration, and a judge score. Every output is auditable, labelled by method, and cached for fair comparison.
analyze
blog
- 1000-word blog intro on AI coding assistants for small teamsExpected: ~1000 words
Balanced comparison, includes a real table, identifies trade-offs, avoids vendor-speak.
- 800-word blog intro on sustainable fashionExpected: ~800 words
Strong hook, scannable structure, specific stats, clear article preview, no filler.