Dedupe massive wordlists without changing order #

"The duplicut tool finds and removes duplicate entries from a wordlist, without changing the order, and without getting OOM on huge wordlists whose size exceeds available memory. ... [W]ritten in C, and optimized to be as fast and memory frugal as possible."

Refreshingly simple installation and syntax:

make release
./duplicut <WORDLIST_WITH_DUPLICATES> -o <NEW_CLEAN_WORDLIST>

UPDATE: Royce Williams kindly alerted me to possible issues around longer line lengths and non-ASCII characters, and the author of duplicut, nil0x42, was kind enough to set me straight: just needed to specify --line-max-size 254 to avoid truncation under that threshold.

/nix | Oct 30, 2019


Subscribe or visit the archives.