G Bosilca, C Coti, P Lemarinier, T Herault, and J Dongarra (2009)
Constructing Resiliant Communication Infrastructure for Runtime Environments
In: International Conference in Parallel Computing (ParCo2009), Lyon, France.
In this paper, we present and analyze a self-stabilizing algorithm1 to transform the underlying communication infrastructure provided by the launching service into a BMG, and maintain it in spite of failures. We demonstrate that this algorithm is scalable, tolerate transient failures, and adapt itself to topology changes.
To appear