Fengguang Song, Asim YarKhan, and Jack Dongarra (2009)
Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems
In: SC’09 The International Conference for High Performance Computing, Networking, Storage and Analysis, Portland, OR.
Multicore systems have increasingly gained importance in both shared-memory and distributed-memory environments. This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared- or distributed-memory). We use a task-based library to replace the existing linear algebra sub- routines such as PBLAS to transparently provide the same interface and computational function as the ScaLAPACK library. Linear algebra programs are written with the task- based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the met- ric of performance scalability. We propose an algorithm to solve data dependences without process cooperation in a dis- tributed manner. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky factorization, LU factorization, and QR factorization. Our experiments on both shared-memory machines (16-core In- tel Tigerton, 32-core IBM Power6) and distributed-memory machines (Cray XT4 using 1024 cores) demonstrate that our runtime system is able to achieve good scalability. Further- more, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time.