EcoGillespie: preliminary notes & code
Last fall, I implemented a model for Mercedes Pascual that, inspired by some models by David Alonso, uses an exact, “event-based” stochastic simulation of disease spreading on a network rather than the usual (and epistemologically unsatisfying) discrete-time approximation. Rather than representing time as a series of discrete time steps during each of which a whole bunch of things happen (how most models done in Repast, Swarm, etc. work), time is continuous (to the appropriate IEEE 754 approximation), and events occur akin to chemical reactions in the Gillespie algorithm.
To be clearer: before each event, you can think of the system as being comprised of a whole bunch of Poisson processes, each of which corresponds to one of the events that could be the next event to happen. As with failure rates or chemical events, you can draw from an exponential probability distribution to figure out how long it will be until the next event—the lambda parameter is simply the sum of the rates of all the “Poisson” processes (possible events). Then you can decide what the next event actually is by drawing from a discrete probability distribution, weighted with the individual rates for the individual processes. The tricky step here is that every event can change the
One important advantage of these models over discrete-time simulations is that they map perfectly onto differential equations. For example, say you’re starting with a basic SIR (Susceptible-Infected-Recovered) model of disease spreading. Assuming a well mixed system, you can set up a set of differential equations relating the rates of change of the three classes to the densities of the other three classes alone. If you want to relax the assumption of well-mixedness, start by setting up a Gillespie-type model with a fully connected network, so that any individual can be infected by any other individual, with rates proportional to the coefficients in the ODEs. Assuming proper scaling and all that, if you run the model a bunch of times and average the runs, the graph should look exactly like the ODE’s. So you can check your model against the ODE first, and then begin changing assumptions (like the network structure).
This is such a nice approach to building ecological models (or any other kind of agent-based model) that I thought I’d build a general software framework for others to use. The idea is to let the modeler write the software without dealing at all with the actual simulation algorithm, because that remains constant from model to model. All the modeler has to do is define two things: (1) the code for individuals to perform events; and (2) the relationships of individuals in a network structure.
When an event occurs, the affected individuals will have a chance to tell the framework that their list of possible events have changed. The framework then updates any necessary data structures, simulates the next event, and repeats. This makes it really easy to build models and know that you don’t have some subtle problem in the basic algorithm.
The framework approach also means that any improvements made to the core simulation algorithm will help any models implemented using the framework. If I figure out a way to parallelize the algorithm on two processors (or, hell, across a whole network), there’s no extra work for the modeler. If I implement a faster algorithm for drawing from discrete probability distributions, models will just get it for free.
The basic algorithm is as follows:
- Setup: create a discrete probability distribution based on the Poisson rates of all “reactions” (possible events) in the system.
- Calculate the time until the next event by drawing from an exponential distribution with the sum of all the individual Poisson rates as the parameter.
- Choose the event to occur by drawing the event from the distribution. This algorithm (Matias, Vitter, Ni) seems to be the state of the art in drawing from discrete probability distributions, but a modified version of this (Rajasekaran, Ross) that I came up with independently in the shower is all I’ve implemented so far.
- Perform the event, updating the probability distribution for the next event in the process. The update process is an important part of how these algorithms are fast, of course.
- Repeat from step 2.
My framework, which I’m uncreatively calling EcoGillespie, is based on what I think is a pretty elegant object structure. At the top level of the simulation is a Model object, which, you guessed it, represents the model of the whole system. The Model could be just a single object with a set of events that can happen to it at any point in time. However, it could also be a Network object, which is a subclass of Model that contains a network of individuals, each of which manages its own set of events at any point in time. (Conceivably, one could—and maybe I will—write Model subclasses specifically for lattices, or for continuous spaces.) Of course, each of those could in turn be networks, so you can have an infinite hierarchy of Model objects. No matter how many levels you have, you’ll never have to touch the details of scheduling events and drawing from probability distributions: all you have to do is say what events can happen, and at what rate, and the simulation kernel will handle the rest.
That is, it will work that way when it’s done. The code doesn’t do much yet, but the basic structure is slightly apparent from what I’ve written so far. I’m hosting my development live on this site:
http://code.edbaskerville.com/svn/ecogillespie/
I will track my progress on this blog.
May 9th, 2006 at 9:05 am
I would modify what you have a little bit. Actually I would add something before
the paragraph you had starting with: “As a consequence…”.
Quite generally, a natural system can be seen as a system that can access a number
of discrete states at rates depending on the state of the system. In this situations, event driven simulations based on Gillespie algorithm provide an exact implementation for the temporal evolution of such a system as a stochastic process in continous time.
Such stochastic processess are well studied and this have a number of advantages.
In particular, one important advantage of these models over discrete-time simulations is that they map perfectly onto differential equations. For example, say you’re starting with a basic SIR (Susceptible-Infected-Recovered) model of disease spreading. Assuming a well mixed system, you can set up a set of differential equations relating the rates of change of the three classes to the densities of the other three classes alone. If you want to relax the assumption of well-mixedness, start by setting up a Gillespie-type model with a fully connected network, so that any individual can be infected by any other individual, with rates proportional to the coefficients in the ODEs. Assuming proper scaling and all that, if you run the model a bunch of times and average the runs, the graph should look exactly like the ODE’s. So you can check your model against the ODE first, and then begin changing assumptions (like the network structure).
May 9th, 2006 at 9:40 am
Well, this is a blog, so I’ll just leave your comment in the comments section. Your paragraph in some form will be good to use for a more permanent document.