Scripting big jobs for multi-core processors

What is the best way to solve the following problem?  Say you have a long list of individual tasks, each computationally intensive.  For example, you need to run one hundred photoionization models covering a grid of parameter space.  Or you want to perform psf-fitting photometry on a thousand separate images.   Your computer has multiple CPUs with multiple cores.  You want your computer to work all night, so you can walk in tomorrow morning with a cup of coffee and find the answers waiting for you.

The solution I’ve found so far is “xjobs”.  Just give xjobs a list of commands… and the number of cores you want it to use.  Xjobs works down your list, sending each command to a different core, reporting the time to finish each command, and sending new commands as each old one finishes.  Here’s how to run that Cloudy example:

> xjobs -j 3  -s script.txt

where j=3 is the number of CPU cores to use, and -s has it work through a script.  In this example,  j is smaller than my N=4 cores, so one core is available to me.  Script.txt is just a textfile with a list of commands, in this case:

cloudy.exe < sim1.in > sim1.out

cloudy.exe < sim2.in > sim2.out

and so on.

An alternative solution to this problem is “parallel”, which is similar but can handle remote machines as well.  I haven’t used it — can others comment?  What other methods are people using to solve this class of problem?

10 comments… add one
  • Michael Aye Jan 12, 2011 @ 8:29

    I’m using Python with its ‘multiprocessing’ module quite successfully. It’s not as automatic as xjobs concerning the number of cores to use, but it’s not too hard to do and one can do other fancy stuff to create the job commandlines in the first place.

  • Alastair McKinstry Jan 12, 2011 @ 8:41

    At ICHEC (Irish Centre for High-end Computing) we have a script “taskfarm” that does this,
    http://www.ichec.ie/support/documentation/task_farming.php

    Basically it will work through a bunch of jobs, similar to xjobs above, running them on our supercomputer stokes. It’s useful as you can use multiple nodes in our PBS system to do it, and run multiple single-core jobs simultaneously.

  • Matt D. Jan 12, 2011 @ 9:27

    I’ve used the Sun Grid Engine at work for this sort of thing. It basically leaves some queuing daemons running all the time and you submit jobs to the daemons and it runs some number of them concurrently and queues the rest. SGE was open source when I implemented it but things may have changed now.

  • Richard O'Shaughnessy Jan 12, 2011 @ 9:33

    Though it seems like overkill, I recommend condor, particularly if you can recompile your code to checkpoint. Since it can throttle processes for you, even multi-day, multi-week, multi-cpu jobs can be efficiently managed. Or, for lightweight problems, ‘make -jN’ for N the number of cores; I prefer to direct command-line, since the makefile provides automatic reproducibility.

  • brunetto Jan 12, 2011 @ 11:22

    I’m trying to use python+mpi4py that is the “python version” of the MPI specifications. In this way my code is less different in design respect to fortran or C/C++ and this permits a simpler rewriting of the code in a different language if needed! It’s also integrated in Sage (http://www.sagemath.org/) so you can use it without needing to install… unfortunately the mpi4py version in Sage doesn’t work for me so I had to install it!

  • Tom K Jan 12, 2011 @ 13:38

    Hi All,

    There is a very simple way of using a shell scripts and the Unix `wait` command for example the following scipt will loop over all the jobs, and set them running in batches of N_cpu, then wait for the next one to start : (the comment box didn’t like the shell script >< symbols so I wrote it here on a cloud-notepad) http://notepad.cc/nakaro99

  • Grant K Jan 12, 2011 @ 15:48

    I use Perl’s Parallel::ForkManager to parallelise my IDL SED fitting. It works nicely on my Macs and other multi-core machines when I’m impatient (i.e. all the time). Getopt::Long lets me pass arguments nicely so I can set the number of parallel jobs depending on where it’s running.

  • Marshall Perrin Jan 13, 2011 @ 16:19

    Apple’s Xgrid package provides this functionality on Macs, including dispatching jobs to remote servers. The user interface is sort of clunky so there’s a bit of a learning curve, but it works nicely once you know the tricks.

    It seems like everyone has their own preferred solution for this. So what I’d like to know is, why did you pick the job manager that you did? What features does a given package have that distinguish it from all the others?

  • Jane Rigby Aug 13, 2013 @ 10:51

    The call for parallel:
    > cat script.txt | nohup parallel

    On my Mac Pro, “parallel” uses all 8 hyperthreaded cores, whereas I can only get xjobs to use 4 cores.

  • Bruce Berriman Aug 13, 2013 @ 11:55

Leave a Reply

Your email address will not be published. Required fields are marked *