Almost there…
I’m very close to a full DRMAA implementation for Xgrid (still just in Objective-C), or as full an implementation as is currently possible with Xgrid. The only major missing feature right now is bulk jobs.
The biggest hurdle has been the fact that Xgrid doesn’t support a number of things needed by the specification. The most important of those are: (1) setting the working directory, and (2) actually getting useful information about job execution, exit status, etc.
The only way I saw to do this was to wrap each and every Xgrid job in a proxy executable, xgrid_drmaa_proxy. This proxy sets the environment, arguments, and stdin for the command being run; runs it; and retrieves resource usage data using the wait4 system call.
Some interesting and frustrating things I learned along the way:
- I knew that
NSTaskis a great class for running other processes. Makes things so easy. But you can’t use wait4() on that process to get usage info. Apparently NSTask is doing funny things on another thread that interfere. - The combination of
fork(),dup2(),execve()andwait()is very powerful, as long as you remember the following: (1) close one end of each of the redirected pipes; and (2) manually set argv[0] to contain the launch path. - Running an
NSRunLooprecursively from something called back by running the run loop works, until you start dealing with finicky networking code to download files from Xgrid. Re-trying calls with-[NSObject performSelector:withObject:afterDelay:]is far more effective. I plan to switch all my recursive running of run loops to this model (or, if easier, condition-waits withNSConditionLock). - My biggest annoyance: XgridFoundation will accept
@"YES"and@"NO"as values for whether a submitted file is executable or not, but not, say,[NSNumber numberWithBool:YESThat’s stupid. Consider this the first (second?) in a long series of rants (and bug reports to Apple) about XgridFoundation. This one took me a *long* time—and a trip to Charles’s GridStuffer source code—to figure out
I’m going to take a break from this until Monday—work on some eco-stuff. Come Monday, bulk jobs, C bindings, and Java bindings will be the only things left (aside from a few detailed loose ends). Hopefully a release with installer early next week!
June 28th, 2006 at 9:40 am
I myself banged my head on a number of other things.
Regarding the Yes/No, I seem to remember there was a little issue, but then I was lucky to figure it out quickly from the plist. A bool would show as or , while the xgrid plist was using the tags. Duh!
Nice pace, Ed, on the code. I wish I had more time to do more stuff. I will probably have a close look at the Keychain code, as I wanted to add this.
I jsut had an idea/feature request on your pref pane and global grid setting. One thing you could do is instead allow the user to add a list of grids (and have one setup as default for now). But in the future, the XgridDRMAA could add a scheduler layer when several grids are available so it looks like one big grid from the outside.
Well, on the other hand, DRMAA is useful to build a metascheduler that makes a bunch of grids (Xgrid and not) look like one big cluster, so the scheduler code would probably be above XgridDRMAA, but well, just an idea.
Finally, I really like the idea of perfecting a good general purpose “xgrid_drmaa_proxy”. I have been using a number of adhoc wrapping scrpipts, and I am curious to see what you came up with.
June 28th, 2006 at 9:42 am
ok, Word press has eaten my xml tags so you don’t see them. Let me try again:
A bool would show as \ or \, while the xgrid plist was using the tags \.
June 28th, 2006 at 9:44 am
argg. One more time:
A bool would show as
<\true>or
<\false>, while the xgrid plist was using the tags
<string>