Almost there…
June 28th, 2006I’m very close to a full DRMAA implementation for Xgrid (still just in Objective-C), or as full an implementation as is currently possible with Xgrid. The only major missing feature right now is bulk jobs.
The biggest hurdle has been the fact that Xgrid doesn’t support a number of things needed by the specification. The most important of those are: (1) setting the working directory, and (2) actually getting useful information about job execution, exit status, etc.
The only way I saw to do this was to wrap each and every Xgrid job in a proxy executable, xgrid_drmaa_proxy. This proxy sets the environment, arguments, and stdin for the command being run; runs it; and retrieves resource usage data using the wait4 system call.
Some interesting and frustrating things I learned along the way:
- I knew that
NSTaskis a great class for running other processes. Makes things so easy. But you can’t use wait4() on that process to get usage info. Apparently NSTask is doing funny things on another thread that interfere. - The combination of
fork(),dup2(),execve()andwait()is very powerful, as long as you remember the following: (1) close one end of each of the redirected pipes; and (2) manually set argv[0] to contain the launch path. - Running an
NSRunLooprecursively from something called back by running the run loop works, until you start dealing with finicky networking code to download files from Xgrid. Re-trying calls with-[NSObject performSelector:withObject:afterDelay:]is far more effective. I plan to switch all my recursive running of run loops to this model (or, if easier, condition-waits withNSConditionLock). - My biggest annoyance: XgridFoundation will accept
@"YES"and@"NO"as values for whether a submitted file is executable or not, but not, say,[NSNumber numberWithBool:YESThat’s stupid. Consider this the first (second?) in a long series of rants (and bug reports to Apple) about XgridFoundation. This one took me a *long* time—and a trip to Charles’s GridStuffer source code—to figure out
I’m going to take a break from this until Monday—work on some eco-stuff. Come Monday, bulk jobs, C bindings, and Java bindings will be the only things left (aside from a few detailed loose ends). Hopefully a release with installer early next week!