tinyapps.org / blog


Sequence expressions in wget and curl #

Given a range of JPGs to batch download:
http://example.com/manga/manga_06p01.jpg
http://example.com/manga/manga_06p02.jpg
...
http://example.com/manga/manga_06p99.jpg
one approach using wget would be:
$ wget 'http://example.com/manga/manga_06p'{00..99}'.jpg'
Bash 4 is required for brace expansion using a range. Unfortunately, the expression fails to expand at all when read from a file (e.g., $ wget -i urls.txt).

curl, on the other hand, does not depend on Bash for sequencing:
$ curl -O 'http://example.com/manga/manga_06_p'[00-99]'.jpg'
so it will correctly parse sequences in a properly-formatted text file (e.g., urls.txt):
url = "http://example.com/manga/manga_06_p[00-99].jpg"
url = "http://example.com/manga/manga_07_p[00-99].jpg"
url = "http://example.com/manga/manga_08_p[00-99].jpg"
like so:
$ curl --remote-name-all -K urls.txt

UPDATE: For sites that require a login, cookie.txt export makes it easy to grab the necessary cookies.txt file from Google Chrome's cache. Then it's simply a matter of:

$ wget --user-agent="" --force-directories --load-cookies cookies.txt 'https://example.com/upvoted?id=miles&comments=t&p='{1..10}

or,

$ curl --header "User-Agent:" --cookie ./cookies.txt --remote-name 'https://example.com/upvoted?id=miles&comments=t&p='[1-10]

If filenames need to be made unique (e.g., downloaded files have the same name and would otherwise be overwritten), curl has variable renaming built-in:

$ curl --output "index_#1.html" 'https://example.org/blog/'[2012-2015]'/index.html'

References:

/nix | Sep 12, 2011


Subscribe or visit the archives