Archive for the ‘XgridDRMAA’ Category

Submitting jobs

Tuesday, June 20th, 2006

Two pieces of good news today.

First, the honorable Charlotte W. Woolard deemed my predicament of having multiple large software projects, a Monday Summer of Code check-in deadline, and a San Francisco move-out date of August 1 worthy of getting excused from a jury.

Which made possible the second piece: job submission, at least in its basic form, is working. Jobs with arguments—but not with stdin or environment settings—can be successfully submitted via XgridDRMAA.

It wasn’t too complicated. In short, this is what happens:

  1. The client code constructs a job template, as per the DRMAA spec, which includes details of the job like what command to run, what arguments to pass the command, etc.
  2. The client code then calls -[DRMAASession runJobWithJobTemplate:error:].
  3. The runJobWithJobTemplate:error: method does the fancy work. It first asks the job template instance (which is actually an instance of the subclass XgridDRMAAJobTemplate) for an Xgrid-style job specification dictionary. It passes this on, via Distributed Objects, to a helper method.
  4. The helper method, submitXgridJobWithSpecification:, submits the job using -[XGController performSubmitJobActionWithJobSpecification:gridIdentifier:], getting back an XGActionMonitor instance. It runs the run loop on the secondary thread until either failure or success has been recorded. If successful, the job identifier is returned from the results dictionary; if not, a NSError instance is generated in the DRMAAError domain, encapsulating the BEEP error generated by Xgrid.

(For the record, sometimes I hate how long Objective-C method names are. When I use ObjC, I miss Java. And vice-versa.)

To see it in code, a super-simple OCUnit test (which assumes a working Xgrid setup and valid XgridDRMAA user defaults settings) looks like this:

- (void)testRunSimpleJob
{
	[_session begin:nil];

	DRMAAJobTemplate *jobTemplate = [_session jobTemplate];

	[jobTemplate setRemoteCommand:@"/bin/ps"];
	[jobTemplate setJobName:@"testRunSimpleJob"];

	NSError *error = nil;
	NSString *jobId = [_session runJobWithJobTemplate:jobTemplate error:&error];
	STAssertNotNil(jobId, @"job run failed");

	if(jobId)
	{
		NSLog(@"jobId: %@", jobId);
	}
	else
	{
		STAssertNotNil(error, @"no error generated on failure");
		if(error) NSLog(@"error code: %d", [error code]);
	}

	[_session end:nil];
}

Making Xgrid synchronous

Thursday, June 15th, 2006

I finally got around to making the DRMAA code actually, um, talk to Xgrid. Right now all it does is open and close connections, but everything else will follow pretty much the same pattern.

From the client code, it’s very easy to begin a DRMAA session:

DRMAASession *session = [[DRMAASystem system] session];
NSError *error = nil;
BOOL success = [session begin:&error];

That translates into a whole pile of XgridFoundation code which, in summary, does the following:

  1. First, the code spawns a second thread, passing it a couple of ports through which to send Distributed Objects messages, and setting up a DO proxy object on the main thread. The second thread runs a standard NSRunLoop loop, which does two things: (1) gives the XgridFoundation objects the opportunity to do their thing, and (2) receives DO messages from the first thread, removing the need for dealing with mutex locks.
  2. The Xgrid thread having started its run loop, the main thread sends it a message to establish a connection with the Xgrid controller. In the Xgrid thread, a connection is opened with the usual XgridFoundation calls (using settings stored in user defaults as described in the previous post). Instead of using the delegate method callbacks, which would mean I’d need to signal the first thread with a condition variable, I wrote the code to simply run the run loop until the Xgrid connection was in the “open” state (or the “closed” state on account of an error). I was surprised this worked: even though the method is being called as part of -[NSRunLoop run...] (since it’s a DO message), it can itself run the run loop.
  3. Once the connection has (hopefully) opened successfully, the second thread’s DO-called method returns, letting the main thread continue. No callbacks, no nothing—one call, and the main thread knows whether the connection succeeded or not.

(The session-closing code just calls a method via DO in similar fashion.)

XgridDRMAA should make it really easy for procedural Objective-C (no, not necessarily an oxymoron) programs to use Xgrid. (C and Java too, of course, once the wrapping is done.) GridEZ is much better for real interactive Cocoa GUI apps—you don’t want to block your main thread waiting for a response from Xgrid, obviously—but for many scientists, this model is useful.

“Contact strings” and browsing for Xgrids

Tuesday, June 13th, 2006

The DRMAA specification includes the possibility of having the API return a list of “contacts”—identifiers for different grids. This seemed like a pretty natural place to return a list of controllers/grids discovered via Bonjour, so I went ahead and implemented the service browsing on a separate thread, managing it correctly with NSRunLoop, etc., and, when I was done, realized that authentication would not work at all in the context of DRMAA unless no authentication was required. So I commented all that code out—I’ll bring back snippets of the run-loop management back for the actual XgridFoundation code.

Instead, I have returned fully to the idea of just using the defaults database and a preference pane to store all this data. I’ve added string constants for all the data that will be needed to store a selected grid to XgridDRMAATypesAndConstants.h:

extern NSString *XgridDRMAAIdentificationMethod;
extern NSString *XgridDRMAANetServiceIdentificationMethod;
extern NSString *XgridDRMAAHostnameIdentificationMethod;

extern NSString *XgridDRMAANetServiceDomain;
extern NSString *XgridDRMAANetServiceName;

extern NSString *XgridDRMAAHostnameOrIP;
extern NSString *XgridDRMAAPortNumber;

extern NSString *XgridDRMAAGridName;

extern NSString *XgridDRMAAAuthenticationMethod;
extern NSString *XgridDRMAANoAuthenticationMethod;
extern NSString *XgridDRMAAPasswordAuthenticationMethod;
extern NSString *XgridDRMAAKerberosAuthenticationMethod;

extern NSString *XgridDRMAAUsername;
extern NSString *XgridDRMAAPassword;

Not all of these need to be set, obviously: for initial development, I’ve just set these values:

defaults write NSGlobalDomain XgridDRMAAIdentificationMethod XgridDRMAANetServiceIdentificationMethod
defaults write NSGlobalDomain XgridDRMAANetServiceName Astor (my G5)
defaults write XgridDRMAAAuthenticationMethod XgridDRMAANoAuthenticationMethod

To respond to Charles’s comment about OCUnit:

For one, I’m right now using OCUnit as a way to automatically run “tests” that aren’t really unit tests, because there are no (or only trivial) OCUnit assertions—they’re just simple short programs with some debugging output. This just means I don’t have to create a separate executable target and manually run that.

As for real unit tests, I think I’m just going to make the assumption that a grid is selected properly via the user defaults mechanism (eventually, through the prefpane) before the tests are run. There’s no reason that someone building the code on their own machine needs to run the unit tests—they can just build the framework target by itself.

Minor developments & design notes

Saturday, June 10th, 2006

It’s been a little slow going the last few days with my sister and another friend in town, but I’ve added a few little touches:

OCUnit Testing There’s really no complex code to test yet, but I added a OCUnit bundle target to the Xcode project. It’s pretty nice—tests automatically get run as part of the build process.

Xgrid Bonjour Browsing Really the first part of the code that actually, well, does something (rather than being purely structural). On initialization, the XgridDRMAASystem class starts up a Bonjour service browser for _xgrid._tcp on a secondary thread, and an array gets updated as services get discovered. Pretty standard stuff. (It’s always fun to learn from sample code I wrote four years ago as an Apple Tech Pubs intern…)

This basic structure of having a secondary thread with an active NSRunLoop will carry over to actual Xgrid-to-DRMAA communication: from DRMAA’s perspective, everything’s nice and procedural; in parallel on another thread, an event-based run loop will be getting delegate messages from the Xgrid system and (in a thread-safe manner) updating the data structures read by the DRMAA method calls. It should be pretty straightforward, and pretty much what Charles suggested off the top of his head on the Xgrid list a few months ago.

A few notes on the class hierarchy of my Objective-C DRMAA bindings vs. the Java bindings:

The Java bindings mimic the C bindings quite closely, which makes sense—the closest thing to a reference implementation is a JNI wrapper around the SGE C implementation. One downside to this is that where the C bindings lack elegant object-orientation, so do the Java bindings.

Case in point: the getDrmaaImplementation(), getDrmSystem() and getContact() methods return different things depending on if they’re called before or after init()—beforehand, they return a list of possibilities; afterwards, they return the choice selected. I set up a class relationship so that the possibilities are available at the appropriate level of representation, and the choices made are attached to an object corresponding to that choice.

From the top, you have class (”static” in Java parlance) methods of the DRMAASystem class: systems, which returns an array of available systems; as well as systemsString, implementationsString, and contactsString, included only to make the mapping to the standard C bindings a little easier. There will also be methods to retrieve a specific DRM system (so far just system, which returns the default Xgrid implementation—eventually this will be more configurable).

On the next level, once you have a specific DRM system, you can query that system for its specific systemString, implementationString, or contactStrings, and retrieve a particular DRMAASession object, which contains the methods for actually interacting with a session: begin:, end:, jobTemplate, controlJobId:withAction:error:, etc.

Now, it might seem like this adds unnecessary verbosity to the code, but for the default case, it’s really not that bad:

DRMAASession *session = [[DRMAASystem system] session];

etc.

XgridDRMAA ObjC interface

Monday, June 5th, 2006

I put together some headers for the Objective-C DRMAA interface. They follow the Java bindings pretty closely, with some name changes to match Cocoa naming conventions better, plus the use of NSError instead of exceptions. They also will reuse the constants defined in the C headers where relevant.

The one most glaring DRMAA requirement I’ve noticed missing from Xgrid is the ability to change the working directory before running the command. For missing things like this, I think it will be best to simply leave the implementation incomplete, tell Apple, and hope that the feature appears in the next release. This won’t hinder my ability to write GridSweeper as a pure-DRMAA app, so I can live without it for now.

Browse the code here.

XgridDRMAA overview

Sunday, June 4th, 2006

Because GridSweeper implementation details will take longer to hash out, and because I’d love to get the system working with Xgrid as soon as possible (for selfish personal reasons), the first code I will write will be the Xgrid DRMAA implementation, affectionately and creatively called XgridDRMAA for short. Here’s an overview of the design.

Components

Cocoa DRMAA Implementation Although there is no official Objective-C/Cocoa binding specification for DRMAA (for obvious reasons), XgridFoundation is a Cocoa API, so the DRMAA implementation will inevitably be Cocoa-based at some level. So, I thought, why not just create an Objective-C DRMAA interface? The structure will mirror the Java interface very closely. I’ll see if the DRMAA Working Group folks want to make this a standard binding—if so, great; if not, understandable (as probably only Xgrid people will be using it).

C DRMAA Implementation Easy part #1: wrap the Cocoa implementation in C, as per the DRMAA C Bindings document. Use the SGE implementation as a supplemental reference.

Java DRMAA Implementation Easy part #2: wrap the Cocoa implementation in Java, as per the DRMAA Java Bindings document. (Here is version 0.6.2; version 1.0 will be updated for JDK 1.5 and nice things like generics and typesafe enums.) This will actually be a more natural mapping, thanks to the stronger object-orientation. I predict lots of JNI calls to objc_msgSend(). Again, use the SGE implementation as a supplemental reference.

XgridDRMAA preference pane For reasons described below, it makes a lot of sense to let each user choose his/her favorite grid, and have DRMAA automatically use that one unless special steps are taken to use something else. This would fit nicely in a preference pane. Addendum: Charles notes in the comments that there are environment variables for specifying a controller host. But there doesn’t seem to be one for specifying a specific grid on that host, so you might get the wrong one if there are multiple available grids.

Packaging

XgridDRMAA.pkg A standard Mac OS X installer package to install XgridDRMAA.framework (in /Library/Frameworks/) and XgridDRMAA.prefPane (in /Library/PreferencePanes).

XgridDRMAA.framework The three APIs will be packaged in a single Mac OS X umbrella framework, XgridDRMAA.framework, which will contain one “real” framework for each language binding.

XgridDRMAA-Cocoa.framework The Cocoa/Objective-C DRMAA interface and implementation. This is the meat of the package, because this is where all the code interacting with XgridFoundation lives.

XgridDRMAA-C.framework The C interfaces (wrapping the Objective-C code).

XgridDRMAA-Java.framework The Java implementation (also wrapping the Objective-C code, via JNI), in the Java package com.edbaskerville.xgrid_drmaa. A version of Dan Templeton’s org.ggf.drmaa classes will also be included, modified to default to Xgrid rather than SGE, but still with the capability to select the SGE DRMAA at runtime.

XgridDRMAA.prefPane The grid-selection preference pane (see notes below).

Why a a Preference Pane

Sun Grid Engine has a very simple, effective method for selecting a grid/cell combination: the SGE_ROOT and SGE_CELL environment variables. These selections, nicely enough, carry over directly into DRMAA, so there is in fact no grid selection/authentication code whatsoever in the DRMAA API. Pretty nice.

Addendum, cont’d: Xgrid has the XGRID_CONTROLLER_HOSTNAME and XGRID_CONTROLLER_PASSWORD environment variables, which work if there’s only one grid on the controller. Inexplicably, there’s no XGRID_CONTROLLER_GRID, however (the equivalent to SGE_CELL). Furthermore, there’s no enforcement in the XgridFoundation API that applications use, or even default to, these settings.

The easy and simple solution: make a preference pane that lets the user select his/her grid of choice, and have DRMAA just use that one. The DRMAA-based application, then, won’t need to know anything about Xgrid grid selection or authentication. There might be good reasons, however, why different applications might want to use different grids, so I’ll also provide supplemental API to select a different grid before making any DRMAA calls. For most applications and people, though, I bet being able to select a standard grid on a per-user basis will be good enough.

With all of this XgridDRMAA work, the hope is that Apple will bring the code, or at least the concepts, into Xgrid itself at some point in the future. Not for Leopard, I don’t imagine, but for whatever cat comes next perhaps, after the thing has been field-tested for a while.