tinyapps.org / blog

Deduping files in Mac OS X #

While recovering data, Photorec terminated prematurely with a "segmentation fault" error. Data Rescue was run on the same drive, and while it completed successfully, I wanted to merge and dedupe the files recovered by both apps.

For deduping, Adrian Lopez's open source fdupes was used (sudo port install fdupes*). It first compares file sizes and MD5 signatures, and then performs a bit-for-bit comparison in case two different files have the same signature.

To process filenames that contain spaces, Geekpoet suggests setting the internal field separator (IFS) to $'\n'
$ IFS=$'\n'
$ for i in $(fdupes -f ./); do echo deleting: $i; rm $i; done
I used the single line option (-1) and xargs instead:
$ fdupes -rf1 ./ | xargs rm
The Big Mean Folder Machine then merged (and automatically renamed where necessary) the remaining 20,000+ files into a single directory. However, the Finder choked when opening the directory, so two new directories were created and files moved like so:
$ find ./ -size -128k -exec mv {} ../small/ \;
$ find ./ -size -1024k -exec mv {} ../medium/ \;
Since the remaining files were all over 1024k, the directory was simply renamed to "large".

* Use MacPorts, avoid DarwinPorts .com.

/mac | Dec 24, 2010

Subscribe or visit the archives