Wednesday, October 2, 2013

Best Way of Running Parallel Wgets

www.unixbabuforum.inWhat is the best way of running parallel wgets? So far I've discovered two methods(are there more?), but I'm unsure of the pros and cons of each. 

method 1: 

--http-user=user1 --http-password=pass1 -0 file1 
--http-user=user2 --http-password=pass2 -0 file2 
--http-user=user3 --http-password=pass3 -0 file3 
--http-user=user4 --http-password=pass4 -0 file4 
--http-user=user5 --http-password=pass5 -0 file5 
--http-user=user6 --http-password=pass6 -0 file6 

URL_LIST=`cat urllist.txt` 
echo $URL_LIST | xargs -n 1 -P 80 wget -q 

method 2: 

wget --http-user=user1 --http-password=pass1 -0 file1 
wget --http-user=user2 --http-password=pass2 -0 file2 
wget --http-user=user3 --http-password=pass3 -0 file3 
wget --http-user=user4 --http-password=pass4 -0 file4 
wget --http-user=user5 --http-password=pass5 -0 file5 
wget --http-user=user6 --http-password=pass6 -0 file6 

while read line 
eval "(${line})&" 
done <urllist.txt 

Does anyone have an opinion on which is the best method (speed, resources, contention, etc)?
www.unixbabuforum.inMethod number one is sequential and method number two is parallel
www.unixbabuforum.inMy testing shows that for large number of request the first method using xargs is better. 
It allows you to set the number of parallel processes (I wouldn't use the 80 you've specified, I'd use maybe 20) that will occur at any given time, queuing the others. 
This allows the operating system to de-schedule them to run until more input becomes available (to avoid eating up the CPU) if they are waiting on a http response. 
This stops your system being overloaded, and potentially thrashing. 
The second method, just launches then all at once, load be damned. 
If I was worried about load, memory, and making the system behave, the xargs method plays much nicer. Better yet, use nice infront of each wget to be really nice if you launch a lot of processes.www.unixbabuforum.inThe complaint about a pipe slowing down the launch of 80 processes is a bit odd. 
However, xargs does not have to read a pipe. It reads stdin. So: 

URL_LIST=`cat urllist.txt` 
echo $URL_LIST | xargs -n 1 -P 80 wget -q 

is exactly equivalent to: 

xargs < urllist.txt -n 1 -P 80 wget -q 

Actually, it is not equivalent. Echo $URL_LIST puts the whole set of commands out on one line into xargs. That will not work. The resultant commands do not work Anybody test that? 

It is a bit odd (but perfectly valid) to put the < near the fromt of the command. But it does remind you that xargs is reading the file, not wget.


Post a Comment

Design by BABU | Dedicated to grandfather | welcome to BABU-UNIX-FORUM