Archive for October, 2006

GridSweeper getting there…

Saturday, October 28th, 2006

Today, amidst a torrent of schoolwork and music, I got around to getting a little closer to a finished GridSweeper. (Sometimes it’s easiest to work on something when you’re using it to avoid working on something else.)

I worked through the fundamental problems of why neither Xgrid nor Sun Grid Engine wanted to run any of the jobs I was giving it. Turned out to be mostly trivial things:

  • On the CSCS machine with SGE installed, the wrong version of Java was being run to execute jobs. I hard-coded a fix for this; I need to put a check in the grunner shell script to support an optional GRIDSWEEPER_JAVA environment variable.
  • SGE doesn’t support the file-transfer mode attribute of DRMAA, so things were grinding to a halt because of that too. I just surrounded the line that set the file-transfer mode with a try/catch block; this is fine since SGE transfers all the files by default anyway. (I also changed the default behavior of XgridDRMAA to match SGE.)
  • I was trying to run jobs with XgridDRMAA from my home directory. But Xgrid jobs (with my non-fancy XgridLite setup) run as the user “nobody”, so it couldn’t access the executable. Solution: just set up a GridSweeper root directory where “nobody” can get to it. (I also added a line to XgridDRMAA to actually record the error returned by execve()…in case this happens again…)

Ah, the joy of debugging. Anyway, GridSweeper now actually runs jobs, which is pretty cool. It doesn’t monitor the jobs, or correctly extract output data from the output files it produces, but that will be easy. (And won’t have any bugs, right?)