"The duplicut tool finds and removes duplicate entries from a wordlist, without changing the order, and without getting OOM on huge wordlists whose size exceeds available memory. ... [W]ritten in C, and optimized to be as fast and memory frugal as possible."
Refreshingly simple installation and syntax:
make release ./duplicut <WORDLIST_WITH_DUPLICATES> -o <NEW_CLEAN_WORDLIST>
UPDATE: Royce Williams kindly alerted me to possible issues around longer line lengths and non-ASCII characters, and the author of duplicut, nil0x42, was kind enough to set me straight: just needed to specify
--line-max-size 254 to avoid truncation under that threshold.
/nix | Oct 30, 2019
Subscribe or visit the archives.