Monday, October 14, 2013

Best Way of Running Parallel Wgets

What is the best way of running parallel wgets? So far I've discovered two methods(are there more?), but I'm unsure of the pros and cons of each. 

method 1: 

--http-user=user1 --http-password=pass1 -0 file1 
--http-user=user2 --http-password=pass2 -0 file2 
--http-user=user3 --http-password=pass3 -0 file3 
--http-user=user4 --http-password=pass4 -0 file4 
--http-user=user5 --http-password=pass5 -0 file5 
--http-user=user6 --http-password=pass6 -0 file6 

URL_LIST=`cat urllist.txt` 
echo $URL_LIST | xargs -n 1 -P 80 wget -q 

method 2: 

wget --http-user=user1 --http-password=pass1 -0 file1 
wget --http-user=user2 --http-password=pass2 -0 file2 
wget --http-user=user3 --http-password=pass3 -0 file3 
wget --http-user=user4 --http-password=pass4 -0 file4 
wget --http-user=user5 --http-password=pass5 -0 file5 
wget --http-user=user6 --http-password=pass6 -0 file6 

while read line 
eval "(${line})&" 
done <urllist.txt 

Does anyone have an opinion on which is the best method (speed, resources, contention, etc)?

My testing shows that for large number of request the first method using xargs is better. 
It allows you to set the number of parallel processes (I wouldn't use the 80 you've specified, I'd use maybe 20) that will occur at any given time, queuing the others. 
This allows the operating system to de-schedule them to run until more input becomes available (to avoid eating up the CPU) if they are waiting on a http response. 
This stops your system being overloaded, and potentially thrashing. 
The second method, just launches then all at once, load be damned. 
If I was worried about load, memory, and making the system behave, the xargs method plays much nicer. Better yet, use nice infront of each wget to be really nice if you launch a lot of processes.
Method number one is sequential and method number two is parallel


Post a Comment

Design by BABU | Dedicated to grandfather | welcome to BABU-UNIX-FORUM