Fault Tolerant MPI: FT-MPI / OpenMPI
Fault Tolerant MPI (FT-MPI) is a full 1.2 MPI specification implementation that provides process-level fault tolerance at the MPI API level. FT-MPI is built upon the fault tolerant HARNESS runtime system and framework for distributed computing. FT-MPI survives the crash of n-1 processes in an n-process job, and, if required, can restart them. However, it is still the responsibility of the application to recover the data-structures and the data on the crashed processes.
FT-MPI provides a high performance implementation of MPI, resulting in performance that is similar to MPICH2 or LAM. The FT-MPI approach to fault tolerance allows users to code their applications so that the overhead of the fault tolerance is minimized.
We have moved our active MPI development to the OpenMPI project, which is combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) in order to build the best MPI library available. A completely new MPI-2 compliant implementation, OpenMPI offers advantages for system and software vendors, application developers, and computer science researchers.