Archive for the ‘GridSweeper’ Category

File transfer overview

Thursday, July 5th, 2007

GridSweeper’s file transfer mechanism is designed to allow transfer of input and output files in a grid system-independent way. In environments without a real IT infrastructure, such as my own ad-hoc Xgrid setup, a shared filesystem will not necessarily be available, so you need a way to stage and retrieve files.

The GridSweeper code is file transfer system-agnostic, providing a simple interface that can be implemented for a particular file transfer system (e.g., FTP, which is included as a working example). The interface requires just a few basic methods: connect(), disconnect(), uploadFile, deleteFile, makeDirectory(), removeDirectory, list(), and isDirectory(), which all do pretty much what you’d expect. There is no notion of a working directory, so all paths are relative to the implicit root of the file system. Particular file transfer systems can define custom properties to affect setup—e.g., the FTP system provides properties for setting the hostname, username/password, root directory (so the GridSweeper root need not be the same as the FTP server’s default working directory), etc.

Here’s how a GridSweeper run interacts with the file system:

  1. The experiment setup data includes a list of input files, mapping (absolute) paths on the local filesystem for the submit host to relative paths in the working directory of the running job. When the job is submitted, those files are copied into an input-file directory on the file transfer system, within the location experimentName/submissionDate/submissionTime/input/.
  2. The experiment setup data also includes a list of output file paths, relative to the working directory of the running job. This list is part of the input data for a running GridSweeper job.
  3. The GridSweeperRunner tool, which is the process actually executed by the grid system, begins the process by transferring any files in the input directory on the file transfer system into the working directory as specified. After the run is complete, it copies the specified output files back to the file transfer system to the location experimentName/submissionDate/submissionTime/caseDir/filename, where caseDir is a directory name representing the particular parameter settings for the run (”b=0.1-g=25″). If the filename includes the wildcard $gs_rn_ph$, that will be replaced by the current run number. If it does not, the run number will be appended as an extension (filename.runNumber).
  4. When each run is complete, the submit host, which is monitoring the activity, retrieves files back to the local experiments directory. If the submit host stopped monitoring, there should be a way to go back and retrieve files not yet retrieved; I haven’t designed this mechanism yet.

Note: as of 4:30 PM, July 5, 2007, this is not all implemented correctly.

Thought: it’s possible, though unlikely, that file transfer system collisions may occur from multiple people submitting identically-named experiments at the same time. I can imagine a lab class with people following the same tutorial instructions all submitting identically-named jobs at the same time. So maybe it’s better to name these directories with unique hashes. Assuming no collisions, though, it doesn’t matter from the user perspective, so this can be changed later; the current naming scheme is nice for debugging.

If you have a shared filesystem, of course, none of this is necessary!

Preliminary Javadoc completed

Monday, June 25th, 2007

Between driving from San Francisco and selling furniture on Craigslist, this weekend I wrote preliminary Javadoc for all of last summer’s GridSweeper work. A very valuable exercise before diving into coding: it made me look through every single method I wrote and say something about it. It also brought a number of design flaws to my attention, duly noted in TODO comments.

High-level GridSweeper execution overview

Thursday, June 21st, 2007

The purpose of GridSweeper is to take a simple user-provided description of what parameter settings to run a model with, run the model on a grid, and return results to the user.

The user will be able to manipulate the parameter-sweep description in three ways: (1) using an XML specification file, (2) with command-line arguments, and (3) with a graphical user interface. These three mechanisms can be mixed: command-line arguments can augment or override XML as well as be saved back out to XML, and the GUI tool will serve to edit and save XML files as well.

Ultimately, user action will result in running the GridSweeper program, which turns parameter sweep specifications into job specifications for the grid system via DRMAA. Specifically, the program does the following:

  1. Parses the XML specification and command-line arguments to generate an Experiment object.
  2. Generates a list of ExperimentCase objects (parameter value settings) from the Experiment class.
  3. Sets up an output directory for the files generated by this experiment. If a shared filesystem is not present, this can be done via FTP or other file-transfer mechanism supported by a plugin implementing a subclass of FileTransferSystem.
  4. Starts a DRMAA session and submits a job for each experiment case, using an archived RunSetup object for each job’s standard input.
  5. Still unimplemented: monitors the results of jobs and reports status changes to the user.

The way things are set up now, GridSweeper requires support on both the submission end and the execution end of the grid. The DRMAA job specification specifies that the execution host run not the model itself, but the GridSweeperRunner program, which takes input data and uses that to actually run the model. Specifically, it does the following:

  1. Unarchives the RunSetup object from standard input.
  2. If necessary, downloads input files via the file transfer mechanism.
  3. Actually runs the model using an instance of the Adapter class specified by the user (explicitly, or implicitly by using, e.g., gdrone for the Drone compatibility adapter). The Adapter object knows how to take a set of parameters and send it to a particular type of model executable.
  4. If necessary, uploads output files via the file transfer mechanism.

One problem with this mechanism is that it submits a separate job for every experiment case, bypassing DRMAA’s notion of batch jobs. DRMAA batch jobs let you submit a whole bunch of jobs at the same time by specifying that each job is the same except for an integer specifier, and that specifier can be used as a variable in command-line arguments. Because some systems may be faster at accepting batch jobs than a pile of individual jobs, it might be worth using the batch job mechanism.

One way to do this would be to defer the calculation of parameter assignments and random seeds to the execution host, but that makes it impossible to generate a file for reproducing the experiment as soon as it is submitted. A better way is to generate a series of input files in the experiment directory, named with the batch run index, and have the GridSweeperRunner tool read those files at runtime rather than reading an object from standard input.

GridSweeper installation hierarchy

Thursday, June 21st, 2007

As currently conceived, GridSweeper will consist of a set of Java classes in JAR files, additional Java classes as plugins (plugin format to be determined, but will include a JAR file), and shell scripts to simplify this:

java -cp ${GRIDSWEEPER_ROOT}/classes/GridSweeper.jar \\
    com.edbaskerville.gridsweeper.GridSweeper [args]

into this:

gsweep [args]

The top level of the hierarchy will be designated by the environment variable $GRIDSWEEPER_ROOT, within which the following tree will exist:

$GRIDSWEEPER_ROOT/
    bin/
        gsweep
            (main GridSweeper submission executable)
        gdrone
            (shortcut to gsweep -a com.edbaskerville.gridsweeper.DroneAdapter)
        grunner
            (wrapper to actually execute jobs on the agent machine)
        ...
            (other scripts to shortcut, e.g., the Repast adapter)
    classes/
        classes.jar
            (all classes except those with main methods)
        GridSweeper.jar
            (app/tool class)
        GridSweeperRunner.jar
            (class to actually run simulations on agents)
    plugins/
        (contains add-on adapters and file-transfer systems)

Setting up the GridSweeper build environment

Thursday, June 21st, 2007

First things first: this post covers how to get the GridSweeper build environment set up on your machine. I’ve developed GridSweeper entirely with Eclipse, but the build process uses Ant, so it can be run from the command line as well (or, theoretically, any other Ant-compatible IDE).

To get GridSweeper building on your machine, you’ll need to get threefour things:

  1. The code distribution (trunk), checked out into your Eclipse workspace. Soon to be hosted at CSCS.
  2. An implementation of the Java Distributed Resource Management Application API (DRMAA). For CSCS/Linux, you should use the one provided by the Sun Grid Engine (in /appl/sge/drmaa.jar on CSCS machines). For building on my Mac, I’m using my XgridDRMAA implementation.
  3. Jakarta Commons Net (download page). This is for FTP file transfer, which won’t actually be relevant for CSCS—maybe I can modify the build system to make this optional.
  4. Jakarta ORO (download page), also for FTP. You won’t even realize you’re missing this until you get an obscure class not found error at runtime when using any of the FTP directory methods.

If you’re using Eclipse (recommended), open up the project in your workspace. Add the DRMAA and Jakarta Commons Net jar files (Project > Properties > Java Build Path > Add External JARs…), and, in theory, the project should build.

Next: evaluating the code.

GridSweeper code review

Thursday, June 21st, 2007

I’m beginning this year’s GridSweeper development by picking up all the pieces from last summer, documenting them, and considering design changes, then getting down to the business of finishing up the implementation. The next few posts will cover what I find.

Once there’s a wiki hosted by CSCS, I’ll organize this information there.

GridSweeper getting there…

Saturday, October 28th, 2006

Today, amidst a torrent of schoolwork and music, I got around to getting a little closer to a finished GridSweeper. (Sometimes it’s easiest to work on something when you’re using it to avoid working on something else.)

I worked through the fundamental problems of why neither Xgrid nor Sun Grid Engine wanted to run any of the jobs I was giving it. Turned out to be mostly trivial things:

  • On the CSCS machine with SGE installed, the wrong version of Java was being run to execute jobs. I hard-coded a fix for this; I need to put a check in the grunner shell script to support an optional GRIDSWEEPER_JAVA environment variable.
  • SGE doesn’t support the file-transfer mode attribute of DRMAA, so things were grinding to a halt because of that too. I just surrounded the line that set the file-transfer mode with a try/catch block; this is fine since SGE transfers all the files by default anyway. (I also changed the default behavior of XgridDRMAA to match SGE.)
  • I was trying to run jobs with XgridDRMAA from my home directory. But Xgrid jobs (with my non-fancy XgridLite setup) run as the user “nobody”, so it couldn’t access the executable. Solution: just set up a GridSweeper root directory where “nobody” can get to it. (I also added a line to XgridDRMAA to actually record the error returned by execve()…in case this happens again…)

Ah, the joy of debugging. Anyway, GridSweeper now actually runs jobs, which is pretty cool. It doesn’t monitor the jobs, or correctly extract output data from the output files it produces, but that will be easy. (And won’t have any bugs, right?)

Summer of Code wrap-up

Monday, August 21st, 2006

My mixed-up brain thought the end of Summer of Code was August 26; it’s actually right now, so it’s time to wrap things up for this program. The code will be architecturally complete in the next couple of days.

The complete list of items that should be done:

  • XgridDRMAA 0.1 (already released)
  • Blog entry introducing the use of XgridDRMAA for developers
  • GridSweeper 0.0.1, with the following features:
    • Support for Drone-compliant models, and a standard interface for adding additional types of models
    • Support for file transfer via FTP, and a standard interface for adding adding additional filesystems
    • Command-line interface for running batches

The following GridSweeper features will be implemented in a post-SoC release:

  • Direct support for Repast models
  • Graphical user interface
  • Full plug-in support and developer documentation for plug-in interfaces

Additionally, XgridDRMAA will improve with user feedback and additional testing.

From control files to experiment runs…

Thursday, July 20th, 2006

Here’s how a set of parameter sweeps will get translated into an actual experiment run…

  1. Generate a tree of Sweep objects from a control file and/or command-line arguments. (In the case of the GUI, the Sweep objects will be generated live as the model backing the view.)
  2. Get a list of parameter maps by calling generateMaps() on the top-level sweep.
  3. Convert that list of parameter maps into a list of jobs for the grid system, querying the plugin for the model system (Repast, Drone, etc.) along the way to generate the job submission data (and do things like stage files if there isn’t a shared filesystem).
  4. Submit the jobs to the grid.
  5. On the client/submit end, monitor progress of jobs using output from CLI tool or via GUI.
  6. On the agent end, run the jobs by passing the information to the plug-in. If required, stage files back to the FTP, etc. server when the job is done.

Sweeps

Thursday, July 20th, 2006

The basic model code for parameter sweeps is done. There’s a standard interface (Sweep) for all sweep types that contains a single method:

public List generateMaps()

The returned List is simply a sequence of parameter settings. Each item in the list is a ParameterMap object, which is just a subclass of HashMap with some convenience constructors.

Currently, there are six different concrete subclasses of Sweep, plus a few abstract subclasses defining common elements.

SingleValueSweep Pretty simple: assigns a single value to a single parameter.

ListSweep The first nontrivial type: assigns a list of values to a single parameter.

RangeListSweep Probably the most useful type: lets you assign a range of values, defined with a start, end, and increment, to a single parameter. The values are represented using the arbitrary-precision BigDecimal class, so there’s no possibility for rounding error when adding values together.

LinearCombinationSweep Combines two other sweeps “linearly”—that is, in parallel, so the first parameter map in sweep 1’s list gets combined with the first parameter map in sweep 2’s list, and so on. For example, combining beta=0.1,0.2,0.3 with gamma=0.4,0.5,0.6 would result in a length-3 LinearCombinationSweep with (beta,gamma)=(0.1,0.4), (0.2,0.5), (0.3,0.6).

MultiplicativeCombinationSweep This is what most people want when varying multiple parameters: generate every combination of each parameter/value pair. So, to reuse the last example, combining beta=0.1,0.2,0.3 with gamma=0.4,0.5,0.6 results in a length-9 MultiplicativeCombinationSweep with (beta,gamma) = (0.1,0.4), (0.1,0.5), (0.1,0.6), (0.2,0.4), (0.2,0.5), (0.2,0.6), (0.3,0.4), (0.3,0.5), (0.3,0.6).

UniformDoubleSweep The first in a series of stochastic sweeps (more to be written), this sweep generates a number (provided) of values uniformly distributed within a range, so a parameter space can be explored stochastically. If you’re exploring your parameter space from 0 to 1 in increments of 0.1, and it just so happens that interesting spikes happen at 0.15, 0.25, and 0.35, you’re not going to notice them unless you explore the space stochastically.