Python CSV dedupe with fuzzy matching
Category: code
Prompt
Write a Python 3.11 script that reads a CSV, deduplicates rows by a configurable column using rapidfuzz (threshold >=92), and writes a cleaned CSV plus a report of merges. Handle 1M+ rows efficiently with streaming.
Rubric
Streams input, correct rapidfuzz usage, outputs merge report, reasonable complexity.
No receipts for this prompt yet. An admin can trigger a run from the Outputs Queue.