Wednesday, October 2, 2013

Best Way of Running Parallel Wgets

www.unixbabuforum.inWhat is the best way of running parallel wgets? So far I've discovered two methods(are there more?), but I'm unsure of the pros and cons of each. 

method 1: 

urllist.txt 
--http-user=user1 --http-password=pass1 -0 file1 https://site1.com 
--http-user=user2 --http-password=pass2 -0 file2 https://site2.com 
--http-user=user3 --http-password=pass3 -0 file3 https://site3.com 
--http-user=user4 --http-password=pass4 -0 file4 https://site4.com 
--http-user=user5 --http-password=pass5 -0 file5 https://site5.com 
--http-user=user6 --http-password=pass6 -0 file6 https://site6.com 


#!/bin/sh 
URL_LIST=`cat urllist.txt` 
echo $URL_LIST | xargs -n 1 -P 80 wget -q 


method 2: 

urllist.txt 
wget --http-user=user1 --http-password=pass1 -0 file1 https://site1.com 
wget --http-user=user2 --http-password=pass2 -0 file2 https://site2.com 
wget --http-user=user3 --http-password=pass3 -0 file3 https://site3.com 
wget --http-user=user4 --http-password=pass4 -0 file4 https://site4.com 
wget --http-user=user5 --http-password=pass5 -0 file5 https://site5.com 
wget --http-user=user6 --http-password=pass6 -0 file6 https://site6.com 


#!/bin/sh 
while read line 
do 
eval "(${line})&" 
done <urllist.txt 


Does anyone have an opinion on which is the best method (speed, resources, contention, etc)?
www.unixbabuforum.inMethod number one is sequential and method number two is parallel
www.unixbabuforum.inMy testing shows that for large number of request the first method using xargs is better. 
It allows you to set the number of parallel processes (I wouldn't use the 80 you've specified, I'd use maybe 20) that will occur at any given time, queuing the others. 
This allows the operating system to de-schedule them to run until more input becomes available (to avoid eating up the CPU) if they are waiting on a http response. 
This stops your system being overloaded, and potentially thrashing. 
The second method, just launches then all at once, load be damned. 
If I was worried about load, memory, and making the system behave, the xargs method plays much nicer. Better yet, use nice infront of each wget to be really nice if you launch a lot of processes.www.unixbabuforum.inThe complaint about a pipe slowing down the launch of 80 processes is a bit odd. 
However, xargs does not have to read a pipe. It reads stdin. So: 

URL_LIST=`cat urllist.txt` 
echo $URL_LIST | xargs -n 1 -P 80 wget -q 

is exactly equivalent to: 

xargs < urllist.txt -n 1 -P 80 wget -q 

Actually, it is not equivalent. Echo $URL_LIST puts the whole set of commands out on one line into xargs. That will not work. The resultant commands do not work Anybody test that? 


It is a bit odd (but perfectly valid) to put the < near the fromt of the command. But it does remind you that xargs is reading the file, not wget.

0 comments:

Post a Comment

 
Design by BABU | Dedicated to grandfather | welcome to BABU-UNIX-FORUM