tinyapps.org / blog

Using wget on Amazon to download search results #

Neither Google nor Amazon index used book descriptions from sellers, making it difficult to find unique or mislisted volumes. In order to download these comments for searching with ack, wget was tried:
$ wget "http://www.amazon.com/gp/offer-listing/0131103628/?condition=used"
HTTP request sent, awaiting response... 204 NoContent
Hrm. How about omitting the User-Agent header?
$ wget --user-agent="" "http://www.amazon.com/gp/offer-listing/0131103628/?condition=used"
HTTP request sent, awaiting response... 200 OK
Bingo. If there is more than one page of listings (i.e., more than 15 used books available), all pages can be downloaded via something like wget -i urls.txt --user-agent="", where urls.txt contains one URL per line:

/nix | Dec 01, 2011

Subscribe or visit the archives