Ed Baskerville, 1 May 2006
Scientists that model complex systems on the computer often need to perform hundreds or thousands of simulation runs using different parameters. Modern grid computing systems have the potential to make such batches of simulations very easy, but no widely available software exists to integrate the two. As a software engineer familiar both with the needs of computer modelers and with the grid technologies available, I have thought in detail about the design of such software. The purpose of this document is to describe the software, called GridSweeper, and propose that Google fund its development through the Summer of Code 2006 program.
Most computer models include various parameters that describe different aspects of the system being modeled, for example the rate of disease transmission in a model of epidemic spreading on a network. Broad categories of models with this property include:
To evaluate and analyze the model, scientists need to systematically vary parameters and see how the behavior changes. With stochastic models, simulations must also be run multiple times with different seeds for the random number generator. To make best use of time and computing resources, runs should be executed in parallel on multiple machines, but setting up a system to do this is non-trivial.
Recently, a number of grid computing systems have emerged, making it relatively easy to distribute computational tasks across a network of computers. Clients can submit lists of tasks to be executed, and the grid system (such as the Sun Grid Engine or Apple Xgrid) sends the tasks to available processors on the network. Many people write custom scripts to harness these systems for particular computer models. This works, but it’s an ad-hoc solution that needs to be recreated for every model.
An ideal system would make it easy to distribute batches of simulation runs across any grid system, using standard graphical and command-line tools and standard configuration files. The system would integrate with standard modeling tools and standard grid systems. It would support flexible “parameter sweeps” and have an intuitive way of expressing them. It would also make it easy to gather data from all simulation runs in an organized fashion.
I propose to develop a new open-source software system, called GridSweeper, to address these needs. GridSweeper will make it easy to set up several different kinds of parameter sweeps, including simple lists, ranges, stochastic sampling, and arbitrary combinations of sweeps. Setting up parameter sweeps will be easy, either through an intuitive graphical user interface or from the command line, with or without text configuration files. At the outset, GridSweeper will support Repast models, models designed for the Drone batch system, and any other model where parameters can be controlled from the command line. The software will also include a plug-in API for building support for other modeling toolkits.
GridSweeper will be implemented in Java using the industry-standard Distributed Resource Management Application API (DRMAA), making it possible to directly support the Sun Grid Engine and other grid systems that implement DRMAA. Additionally, I will write an implementation of DRMAA that interfaces with Apple Xgrid, opening up the system to people with ad-hoc grids of Macs.
I have a body of experience and knowledge that makes me uniquely suited to building this system. I have worked with researchers at the University of Michigan to build a number of agent-based models. Moreover, I have already implemented a more restricted version of GridSweeper called Xdrone, which worked with Drone-based models and a preview version of Apple Xgrid. The community is on board, as well. Having announced the proposal to several mailing lists, I have already received positive responses from several scientists building computer models and one of the architects of DRMAA; Rick Riolo, a fixture in the agent-based modeling community, has agreed to mentor me on this project.
The GridSweeper software will be useful to scientists in all fields building computer models of all kinds, anyone who needs to run a model with different parameter settings. As a side effect, it will also provide an implementation of the DRMAA API for Apple Xgrid, benefitting yet another user community. By implementing the system in an industry-standard API, and by providing extensibilty APIs, the software will grow as computer modeling tools evolve. With the software being potentially useful to so many scientists, I strongly encourage Google to fund the development of GridSweeper through the Summer of Code 2006 program.
Eligibility I will be attending the University of California, Santa Barbara, this fall as a graduate student in the Department of Music. Documentation can be provided upon request.
Mentor Rick Riolo, Computer Laboratory Director for the Center for the Study of Complex Systems at the University of Michigan, has agreed to mentor me on this project.
Development Process GridSweeper will be hosted in a public Subversion repository at code.edbaskerville.com. Development notes will be kept in a blog at that site, so interested users will be able to get early releases, provide feedback, and generally stay in the loop.
Deliverables The final package delivered at the end of the summer will include:
This section provides more detail about the GridSweeper project, including feature descriptions and development methodology.
GridSweeper includes support for several types of parameter sweeps: lists of parameter values, range/increment lists, stochastic sampling, and arbitrary combinations of the first three types. Each of these features is described in more detail below. Example syntax is to illustrate usage at the command line, and may change in the final implementation.
The user can set a parameter to an arbitrary list of values. For example, the user may want to look at the behavior of a model for parameter r equal to 0.1, 0.4, 0.5, and 0.9:
r=<0.1, 0.4, 0.5>
These can be combined to match certain values of one parameter with certain values of others:
<r s>=<0.1 0.5, 0.4 0.8, 0.5 1.0>
Parameter lists can be real numbers, integers, boolean values, or strings.
Most parameter settings are not arbitrary lists of values, but are ranges of values stepped with a certain increment. For example, the user may want to set the value of r to all values between 0 and 100, inclusive, in increments of 5:
r=<0:5:100>
Range/increment lists can be real numbers or integers. By storing real-valued lists in an arbitrary-precision format, no problems will occur because of round-off error.
Sometimes, a researcher may want to do a stochastic sampling of parameter space to remove any chance of correlations created by sampling at regular intervals. The user can specify sampling from a variety of standard probability distributions, including (but not necessarily limited to):
For example, you could set the parameter r to ten different values drawn from the uniform distribution between 0.0 and 1.0:
r=<uniform:0.0:1.0:10>
This feature can also be used to set the random seed of a simulation to different values:
seed=<0:0x7fffffff:10>
Stochastic parameter values can be real numbers or integers, as indicated by the presence or absence of decimal points.
By default, the software will run the simulation for every combination of parameter values provided. For example, setting two different parameters to three different values each will result in nine different runs of the simulation. However, configuration files allow arbitrary flexibility, so that, for example, different sets of values for one parameter will be run against different values of another parameter. The syntax has not yet been formulated, but will likely resemble Repast’s.
All of GridSweeper’s features will be available in a graphical user interface written using Java/Swing. The visual layout of the interface has not yet been designed.
The command-line interface will include the same features as the graphical interface. The user will be able to control simulations either via text configuration files or using single-line commands. For example, to control a model built with Drone, the user could do the following:
gridsweeper ./mymodel -Dr=<0.1:0.1:0.9>
GridSweeper will provide flexible management of data generated by simulation runs. Where available, the user will be able to specify a network filesystem to have all data collected to. If the filesystem is unavailable, transfer to a server will be supported via ftp, scp, or WebDAV, or directly to the originating client computer via a protocol yet to be chosen. Data will be organized in directories named by parameter settings. Parameter settings, errors, standard output/standard error, and all data files generated by runs will be collected in a standard directory structure.
Besides supporting any model that can be controlled using command-line parameters, GridSweeper will include native support for the Repast modeling toolkit, and for models built for the Drone batch system. Additionally, an API will be available to add support for other modeling toolkits. This section describes these features.
Any model that can be controlled via command-line parameters will be easily supported by GridSweeper. In addition to running models using one-line commands (see section 4.3), configuration files will be able to include format strings to define how parameters will be written out to the command line.
The Repast modeling toolkit includes standard libraries for controlling model parameters. GridSweeper will directly interface with Repast by generating parameter files in Repast’s native format. GridSweeper will also support using Repast parameter files as GridSweeper configuration files, so Repast modelers will not have to rewrite files used for previous experiments, and can use the same configuration files for batch runs on single computers and grid runs using GridSweeper.
The Drone batch system performs a similar function to GridSweeper, but does not work with grid systems. However, GridSweeper will provide support for models designed for Drone by generating command-line parameters in the Drone format automatically. GridSweeper will also include partial compatibility with Drone configuration files.
In order to make it easy to add support for other modeling toolkits, GridSweeper will include an API for developing plug-ins. The API will provide methods to generate command-line parameters, configuration files, or directly override how the model is executed.
This section describes how GridSweeper will be implemented in more detail.
GridSweeper will be implemented in the Distributed Resource Management Application API (DRMAA), developed by the Global Grid Forum. The DRMAA is intended to make it easy for applications to interface with grid infrastructures. DRMAA implementations are currently available for the Sun Grid Engine the Globus Toolkit (via GridWay), and support for others is likely in the near future.
In order to facilitate interactions with the Java-based Repast, and for ease of porting, GridSweeper will be implemented in Java using the Java DRMAA bindings.
In order for GridSweeper to support Apple Xgrid, I could either include two implementations within GridSweeper—one for DRMAA, and one for the Xgrid Foundation API—or build an interface layer between DRMAA and Xgrid. The former might be easier in the short term, but the latter would be of great value to the grid computing community,
I have already been involved in some discussions on Apple’s xgrid-users mailing list about how such a system would be implemented. See this message and subsequent messages in the thread:
http://lists.apple.com/archives/xgrid-users/2006/Feb/msg00036.html
In order to prevent software defects, all the components of GridSweeper will be tested using unit tests. In Java, unit testing will be done using JUnit and the JUnit plug-in included with the Eclipse development environment. In order to bridge to Xgrid, some code will need to be written in Objective-C and C; for this code OCUnit and CUnit will be used, respectively.
GridSweeper development will take place fully in the open. All source code will be hosted live at code.edbaskerville.com, so users and interested developers will always have access to the latest version the of the code. I will also track my development using a blog hosted at the same site, and users will be able to leave comments about the code as it is being developed. I will announce significant releases to interested parties—for example, the repast-interest, swarm-support, and xgrid-users mailing lists.