Archive for June, 2006

XgridDRMAA overview

Sunday, June 4th, 2006

Because GridSweeper implementation details will take longer to hash out, and because I’d love to get the system working with Xgrid as soon as possible (for selfish personal reasons), the first code I will write will be the Xgrid DRMAA implementation, affectionately and creatively called XgridDRMAA for short. Here’s an overview of the design.

Components

Cocoa DRMAA Implementation Although there is no official Objective-C/Cocoa binding specification for DRMAA (for obvious reasons), XgridFoundation is a Cocoa API, so the DRMAA implementation will inevitably be Cocoa-based at some level. So, I thought, why not just create an Objective-C DRMAA interface? The structure will mirror the Java interface very closely. I’ll see if the DRMAA Working Group folks want to make this a standard binding—if so, great; if not, understandable (as probably only Xgrid people will be using it).

C DRMAA Implementation Easy part #1: wrap the Cocoa implementation in C, as per the DRMAA C Bindings document. Use the SGE implementation as a supplemental reference.

Java DRMAA Implementation Easy part #2: wrap the Cocoa implementation in Java, as per the DRMAA Java Bindings document. (Here is version 0.6.2; version 1.0 will be updated for JDK 1.5 and nice things like generics and typesafe enums.) This will actually be a more natural mapping, thanks to the stronger object-orientation. I predict lots of JNI calls to objc_msgSend(). Again, use the SGE implementation as a supplemental reference.

XgridDRMAA preference pane For reasons described below, it makes a lot of sense to let each user choose his/her favorite grid, and have DRMAA automatically use that one unless special steps are taken to use something else. This would fit nicely in a preference pane. Addendum: Charles notes in the comments that there are environment variables for specifying a controller host. But there doesn’t seem to be one for specifying a specific grid on that host, so you might get the wrong one if there are multiple available grids.

Packaging

XgridDRMAA.pkg A standard Mac OS X installer package to install XgridDRMAA.framework (in /Library/Frameworks/) and XgridDRMAA.prefPane (in /Library/PreferencePanes).

XgridDRMAA.framework The three APIs will be packaged in a single Mac OS X umbrella framework, XgridDRMAA.framework, which will contain one “real” framework for each language binding.

XgridDRMAA-Cocoa.framework The Cocoa/Objective-C DRMAA interface and implementation. This is the meat of the package, because this is where all the code interacting with XgridFoundation lives.

XgridDRMAA-C.framework The C interfaces (wrapping the Objective-C code).

XgridDRMAA-Java.framework The Java implementation (also wrapping the Objective-C code, via JNI), in the Java package com.edbaskerville.xgrid_drmaa. A version of Dan Templeton’s org.ggf.drmaa classes will also be included, modified to default to Xgrid rather than SGE, but still with the capability to select the SGE DRMAA at runtime.

XgridDRMAA.prefPane The grid-selection preference pane (see notes below).

Why a a Preference Pane

Sun Grid Engine has a very simple, effective method for selecting a grid/cell combination: the SGE_ROOT and SGE_CELL environment variables. These selections, nicely enough, carry over directly into DRMAA, so there is in fact no grid selection/authentication code whatsoever in the DRMAA API. Pretty nice.

Addendum, cont’d: Xgrid has the XGRID_CONTROLLER_HOSTNAME and XGRID_CONTROLLER_PASSWORD environment variables, which work if there’s only one grid on the controller. Inexplicably, there’s no XGRID_CONTROLLER_GRID, however (the equivalent to SGE_CELL). Furthermore, there’s no enforcement in the XgridFoundation API that applications use, or even default to, these settings.

The easy and simple solution: make a preference pane that lets the user select his/her grid of choice, and have DRMAA just use that one. The DRMAA-based application, then, won’t need to know anything about Xgrid grid selection or authentication. There might be good reasons, however, why different applications might want to use different grids, so I’ll also provide supplemental API to select a different grid before making any DRMAA calls. For most applications and people, though, I bet being able to select a standard grid on a per-user basis will be good enough.

With all of this XgridDRMAA work, the hope is that Apple will bring the code, or at least the concepts, into Xgrid itself at some point in the future. Not for Leopard, I don’t imagine, but for whatever cat comes next perhaps, after the thing has been field-tested for a while.

DRMAA Java: first run

Friday, June 2nd, 2006

I got a basic DRMAA program running. It lets you execute any command + arguments via the grid.

The code is here.

compile with:

javac -cp $SGE_ROOT/lib/drmaa.jar DrmaaTest.java

run with:

java -cp .:$SGE_ROOT/lib/drmaa.jar DrmaaTest [command] [args]

On a shared-filesystem SGE setup, stdout and stderr will show up as files in the current directory. Pretty spiffy. DRMAA appears to be really simple to use, and should be pretty simple to implement for Xgrid. I’m highly optimistic!

File transfer

Friday, June 2nd, 2006

In a sophisticated network setup like a typical Sun Grid Engine installation, a GridSweeper user will have the luxury of a shared filesystem, a network home directory, etc., etc., meaning that no files will need to be transferred as part of job submission. However, this isn’t always the case. With extra work, it’s apparently possible to set up SGE without a shared filesystem. And many Xgrid users, especially if they’re installing Xgrid for the sake of using GridSweeper on their simple Repast model and network of four Macs, won’t have any shared file system at all.

Although Xgrid provides built-in facilities for transferring files, SGE and DRMAA do not—today, the typical user of these systems is on a well managed network. But I want GridSweeper to be easy to set up for any Joe Repast modeler with a few computers. Although it might seem like too much network overhead to send an model’s executable code, plus input data, and retrieve output data on the other end. But compared to typical runtimes for ABMs, the time it takes to transfer a little executable is nothing. So providing a general, easy solution to this problem, I think, is vital.

All the obvious solutions to the file transfer problem come down to requiring the user to set up some kind of file-transfer infrastructure—NFS, AFP, FTP, SCP, etc., etc. But this defeats the whole purpose of ease of use: now they have to set something up!

All roads, in my view, point to including a simple file server as part of GridSweeper. This daemon will typically run on the same machine as, say, the Xgrid controller or SGE qmaster. The client GUI and command line will provide tools to add files to the GridSweeper file daemon. Additionally, clients will be able to upload files on a per-batch basis. When the agent/execution host* starts running, first it will see if it needs to download any files from the file daemon. (Timestamp checking & caching will ensure that if five runs on the same host all need the same file, it will only be downloaded once.)

At the end of a run, the GridSweeper monitor will send any output files back to the file daemon. Using a monitoring tool/GUI, the user will be able to download any results.

* this terminology difference between SGE and Xgrid is really starting to get to me. I bet DRMAA has its own set of terms.

SGE basics

Friday, June 2nd, 2006

Sadly, things never quite worked right on my local install of SGE. But it turns out UM CSCS already has one set up that I can use. So that’s what I’m doing.

A summary of basic commands in SGE…

qsub
submits a job in a shell script. if you try to submit an executable binary, it won’t work.

-m b|e|a|s|n
tells qsub when to send mail: at the beginning, end, abort/rescheduling, suspension, or not at all.
qstat
displays queue status.
qmon
really ugly gui for submitting and controlling/monitoring jobs.