GridSweeper Proposal Draft Outline
Second Revision, 26 April 2006
Ed Baskerville
Foreword
Introduce the contents of the document:
- many types of computer models require large numbers of runs with different parameter settings
- grid systems have potential to execute runs efficiently in parallel
- no widely available system to handle parameter sweeps across standard grid systems
- here I propose such a system to be funded by Google Summer of Code
Summary
Background & the problem to be solved:
- background of needs for computer modeling:
- many types of computer models—agent-based models (ABMs), differential equation models (e.g., in MATLAB), system dynamics models (e.g., in STELLA), genomics, structural equation modeling—need to be run many times with different parameter settings and different random seeds
- this takes a long time, so running jobs with different parameter settings/random seeds in parallel on many machines is very appealing
- standard ABM systems in use: RePast, Swarm, MASON, NetLogo
- grid computing background:
- no standard, integrated way to do flexible parameter sweeps on grid systems
- many computer modelers would benefit from such a system
What an ideal solution would do:
- integrate with popular modeling systems (RePast, Swarm, MASON, NetLogo, MATLAB, STELLA) and adapt to any kind of model needing parameter sweeps
- integrate with popular grid systems (Sun Grid Engine, Apple Xgrid) and be extensible to others
- provide flexible parameter sweeps
- gather data from all runs in an organized fashion
- be usable from both command line and graphical interface
Short version of discussion of GridSweeper system with key points.
Why I’m a qualified implementor:
- eligibility—admitted to UC Santa Barbara for this fall, completed Statement of Intent to register, further documentation provided on request
- BSE in CS at Michigan
- implemented similar system for a restricted domain (Xdrone for a preview version of Apple Xgrid)
- experience with ABM implementation, both using standard systems and written from the ground up (work with UM CSCS and EEB)
Explicit statement of proposal:
- the proposed GridSweeper system meets needs of computer modelers for integrating with grid systems
- can be completed in the timeframe of Google Summer of Code
- already has good response from community: Rick Riolo as mentor, DRMAA originator, unanticipated applications in genomics & structural equation modeling
- deliverables:
- DRMAA-based GridSweeper implementation in Java, with features described above
- DRMAA-Xgrid interface layer implementation
- developer/code maintenance documentation
- user documentation
- the explicit proposal: Google, please fund me to do this project
Discussion
Description of the GridSweeper system and how it addresses those needs:
- Integration with systems:
- support native parameter control facilities in RePast and NetLogo
- support standard add-ons for Swarm and MASON (Drone, ParameterDatabase)
- support any command-line invocation with format strings
- provide API for models to tell GridSweeper about available parameters
- provide plugin API to integrate with other systems
- Integration with grid systems:
- use emerging industry standard DRMAA for job control: should support Sun Grid Engine, Globus Toolkit (through GridWay)
- include new DRMAA adapter for Xgrid
- should extend with any new grid system that implements DRMAA
- Flexible parameter sweeps:
- simple lists of parameter values: r = 0.1, 0.4, 0.5
- linear increments: r = 0.1 to 0.9 by increments of 0.1
- stochastic sampling from standard distributions: r = 10 values from uniform(0,1)
- arbitary (recursive) nesting of parameter settings
- API for controlling dynamic sweeps, where parameter search depends on results from previous runs
- Data gathering:
- where available, store data to network filesystem
- or transfer data back via FTP or other protocol
- organize data in directories labeled with parameter settings
- log all parameter setings, errors, stdout/stderr as well as any files generated by runs
- Command-line interface:
- full control of grid selection, model selection, source configuration files, etc. via command-line
gridsweeper tool
- quick runs without configuration files using Matlab-like range syntax:
gridsweeper ./mymodel -Dp1=<0.1:0.1:0.9> -Dp2=<uniform:0:1:10>
- Graphical user interface:
- simple graphical setup of all kinds of parameter sweeps
- save/restore from human-editable configuration files
Implementation details and development methodology:
- implement in Java (easier integration with RePast, NetLogo, DRMAA)
- on Mac OS X, create DRMAA-Xgrid interface layer using JNI calls to C-wrapped Objective-C XgridFoundation code
- write unit tests with JUnit, CUnit, OCUnit
- blog development efforts and host live Subversion repository at code.edbaskerville.com
- get constant feedback from community via blog, starting with this proposal
- features that should be included within Summer of Code timeframe:
- support for RePast models, Drone-compatible models, and command-line format strings
- support for Sun Grid Engine and Apple Xgrid
- command-line interface
- graphical interface
- support for Linux and Mac OS X
- features that would need to be implemented later, some by other people:
- dynamic sweep API
- testing with Globus/GridWay and other systems
- direct integration with MATLAB, STELLA, MASON ParameterDatabase, NetLogo