Given a range of JPGs to batch download:
http://example.com/manga/manga_06p01.jpg http://example.com/manga/manga_06p02.jpg ... http://example.com/manga/manga_06p99.jpg
one approach using wget would be:
wget 'http://example.com/manga/manga\_06p'{00..99}'.jpg'
Bash 4 is required for brace expansion using a range. Unfortunately, the expression fails to expand at all when read from a file (e.g., wget -i urls.txt
).
curl, on the other hand, does not depend on Bash for sequencing:
curl -O 'http://example.com/manga/manga\_06\_p'\[00-99\]'.jpg'
so it will correctly parse sequences in a properly-formatted text file (e.g., urls.txt):
url = "http://example.com/manga/manga_06_p[00-99].jpg" url = "http://example.com/manga/manga_07_p[00-99].jpg" url = "http://example.com/manga/manga_08_p[00-99].jpg"
like so:
curl --remote-name-all -K urls.txt
For sites that require a login, cookie.txt export makes it easy to grab the necessary cookies.txt file from Google Chrome's cache. Then it's simply a matter of:
wget --user-agent="" --force-directories --load-cookies cookies.txt 'https://example.com/upvoted?id=miles&comments=t&p='{1..10}
or
curl --header "User-Agent:" --cookie ./cookies.txt --remote-name 'https://example.com/upvoted?id=miles&comments=t&p='\[1-10\]
If filenames need to be made unique (e.g., downloaded files have the same name and would otherwise be overwritten), curl has variable renaming built-in:
curl --output "index\_#1.html" 'https://example.org/blog/'\[2012-2015\]'/index.html'
Omit the User-Agent header in curl via -H 'User-Agent:'
and in wget with --user-agent=""
.
Thor-Erik Rødland's answer to How to get past the login page with Wget?
cmlndz's answer to How can I remove default headers that cURL sends?
/nix | Sep 12, 2011