running parallel bash tasks on OS X

How often did you needed to process huge amounts of small files, where a single task uses only a small amount of cpu and memory?
However, today I need a script which does exactly this.

I have a mysql table which contains the filenames located on my hard drive.
Now I created a little script which processes a single file in under 3 seconds. Unfortunately for 10.000+ files this would take more than 8 hours.

So what if I could run them in parallel with a maximum of 10 parallel task’s being executed? This would really speed up the computation!

Luckily in 2005 Ole Tange from GNU merged the command line tools xxargs and parallel into the a single tool ‘parallel‘.
With this great tool there is no need to write any complicated script to accomplish such tasks.
First you need to install it using homebrew.

brew install parallel

After that i had to add the path to my .profile

PATH=$PATH:/usr/local/Cellar/parallel/20110822/bin

Here’s the basic usage:

 $> echo -ne "1\n2\n3\n" | parallel -j2 "echo the number is {.}"

This would echo the numbers 1, 2, 3 to the stdout with a maximum of 2 parallel running echo’s.
Here’s the output:

the number is 1
the number is 3
the number is 2

As you can see printing a 3 outspeeds printing a 2 😉

So here is my 1 liner to process all my files:

 $> mysql -uroot -p[secretPW] my_database < \ 
    <(echo "SELECT filename FROM files")\ 
    | grep -v 'filename' | parallel -j10 "./processFile.sh {.}"

After using this it took only 37min to process my 10000+ files 🙂

Leave a Reply