Sequence expressions in wget and curl #

Given a range of JPGs to batch download:

http://example.com/manga/manga_06p01.jpg
http://example.com/manga/manga_06p02.jpg
...
http://example.com/manga/manga_06p99.jpg

one approach using wget would be:

wget 'http://example.com/manga/manga\_06p'{00..99}'.jpg'

Bash 4 is required for brace expansion using a range. Unfortunately, the expression fails to expand at all when read from a file (e.g., wget -i urls.txt).

curl, on the other hand, does not depend on Bash for sequencing:

curl -O 'http://example.com/manga/manga\_06\_p'\[00-99\]'.jpg'

so it will correctly parse sequences in a properly-formatted text file (e.g., urls.txt):

url = "http://example.com/manga/manga_06_p[00-99].jpg"
url = "http://example.com/manga/manga_07_p[00-99].jpg"
url = "http://example.com/manga/manga_08_p[00-99].jpg"

like so:

curl --remote-name-all -K urls.txt

Update 1: Cookies

For sites that require a login, cookie.txt export makes it easy to grab the necessary cookies.txt file from Google Chrome's cache. Then it's simply a matter of:

wget --user-agent="" --force-directories --load-cookies cookies.txt 'https://example.com/upvoted?id=miles&comments=t&p='{1..10}

or

curl --header "User-Agent:" --cookie ./cookies.txt --remote-name 'https://example.com/upvoted?id=miles&comments=t&p='\[1-10\]

If filenames need to be made unique (e.g., downloaded files have the same name and would otherwise be overwritten), curl has variable renaming built-in:

curl --output "index\_#1.html" 'https://example.org/blog/'\[2012-2015\]'/index.html'

Update 2: Headers

Omit the User-Agent header in curl via -H 'User-Agent:' and in wget with --user-agent="".

References

/nix | Sep 12, 2011


Subscribe or visit the archives.