tinyapps.org / blog


Using wget on Amazon to download search results #

Neither Google nor Amazon index used book descriptions from sellers, making it difficult to find unique or mislisted volumes. In order to download these comments for searching with ack, wget was tried:
$ wget "http://www.amazon.com/gp/offer-listing/0131103628/?condition=used"
...
HTTP request sent, awaiting response... 204 NoContent
...
Hrm. How about omitting the User-Agent header?
$ wget --user-agent="" "http://www.amazon.com/gp/offer-listing/0131103628/?condition=used"
...
HTTP request sent, awaiting response... 200 OK
...
Bingo. If there is more than one page of listings (i.e., more than 15 used books available), all pages can be downloaded via something like wget -i urls.txt --user-agent="", where urls.txt contains one URL per line:
http://www.amazon.com/gp/offer-listing/0131103628/ref=olp_page_1?startIndex=0&condition=used
http://www.amazon.com/gp/offer-listing/0131103628/ref=olp_page_2?startIndex=15&condition=used
http://www.amazon.com/gp/offer-listing/0131103628/ref=olp_page_3?startIndex=30&condition=used

/nix | Dec 01, 2011


Subscribe or visit the archives