Main research

Scheduling Dynamic, Parallel Programs with MPI-2





MPI is the de-facto standard for parallel programming, at least in HPC. It has been devised in two phases and based on PVM: the MPI 1.2 norm offers much of what is required for parallel programming, besides enabling highly performant codes in static, homogeneous architectures. 

In spite of the success of MPI 1.2, one of PVM's features has long been missed in the norm: the dynamic creation of processes. The success of Grid Computing and the necessity to
adapt the behavior of the parallel program, during its execution, to changing hardware, encouraged the MPI committee to define the MPI-2 norm.
My presentation in Bonn

MPI-2 includes the dynamic management of processes (creation, insertion in a communicator, communication with the newly created processes, etc...), Remote Memory Access and parallel I/O.

Although it has been defined in 1998, MPI-2 has lasted to be implemented and only recently did all major MPI distributions include MPI-2. The notable exception is LAM-MPI, which has provided an implementation for a few years. Currently, LAM is being replaced by Open-MPI.

Neither MPI 1.2 nor MPI-2 do define a way to schedule the processes of a MPI program. The processor on which each process will execute and the order into which the processes could run are left to the MPI runtime implementation. Yet, we noticed that the default behavior, in the LAM distribution, is rather clumsy: all the processes are spawned in the same node. Our group has devised a centralized scheduler that takes care of a better location of the processes. The technical details have been published in a few places, of which we recommend the Euro-PVM/MPI one (Sept. 2006).


Elton
One of the nice things that can be made with a dynamic MPI is running such programs on dynamic environments such as Computational Grids. This is the subject of Elton Mathias's master thesis, which he is doing partly in the french institute INRIA, at Nice, with Françoise Baude.


Of course, in order to use the spawning capacities of MPI-2, one should use a programming model that supports the dynamic unfolding of the parallelism. Divide & Conquer is a well-known framework for that end. Guilherme Pezzi is almost finishing his Master (by the end of 2006) about the use of D&C with MPI-2, and how to use Workstealing to balance the load between different nodes for such programs. If you speak portuguese, you can have a look at his paper, published at the brazilean WSCAD'06 national conference. His site (just click on his name a few lines above) also contains many information.

Marcia Cera is doing her PhD on the insertion, into a distribution of MPI, of a scheduler. Ideally, it would provide a generic and transparent module that would take care of the localization of the processes and of their communication. In  order to be efficient, the MPI implememtation of the scheduler will have to be threaded. Part of Márica's work will be to see how well MPI can do with threads. One of the special flavors of threads that could be used is the Kaapi implementation, developed in the ID labs of Grenoble.

We are working closely with some other collaborators: The master student Marcelo Veiga is  studying the migration possibilities, in MPI, in order to checkpoint a process and restart it on another node. This would come handy to be integrated into Marcia's scheduler. Two other Master students are developing parallel MPI applications, in collaboration with colleagues of the UFRGS's Institute of Physics, in Computational Fluid Dynamics in one case, and in Molecular Dynamics in the other. Last but not least,  two younger students are starting under-graduate studies about MPI and MPI2.

If you are interested in these subjects, do not hesitate and send us an email (the address is written at the bottom of this page). If you are a student who would like to contribute to these themes, here is a list (updated in Oct., 2006) of potential topics to think about:
Well, that's about it. Of course, this is an on-going work, so that many things may be done and are being done that are not discussed yet here.
email
--
last update: 2006, Oct.