Daniel Nurmi (2008)
Statistical Methods for Mitigating Resource Provisioning Dynamism in Large-Scale Batch-Scheduled Systems
Ph.D. Thesis, University of California Santa Barbara.
Users of high performance computing (HPC) systems generally rely on concurrency to achieve performance. Modern users have the ability to draw from a vast array of distributed resources due to the ever increasing quality of connecting software and networks. However, as the pool of resources available to users grows, so does the level of resource heterogeneity and performance response dynamism.
Historically, users request access to a super-computer’s resources by submitting their work and waiting until the system has enough free resources to satisfy the user’s request. However, few facilities exist that cater to the substantial class of users who require that their work is completed by a specific time, who require that their resources are available during a specific time interval, or who require simultaneous access to multiple systems.
In this dissertation, we discuss new statistical methodologies to manage resource performance dynamism, and abstractions that build upon these methodologies to hide resource heterogeneity. In particular, we will show how we have successfully developed the methodologies and abstractions necessary to manage and hide provisioning delay of HPC resources.