Skip to main content
← All prompts

Python CSV dedupe with fuzzy matching

Category: code

Prompt
Write a Python 3.11 script that reads a CSV, deduplicates rows by a configurable column using rapidfuzz (threshold >=92), and writes a cleaned CSV plus a report of merges. Handle 1M+ rows efficiently with streaming.
Rubric

Streams input, correct rapidfuzz usage, outputs merge report, reasonable complexity.

No receipts for this prompt yet. An admin can trigger a run from the Outputs Queue.