John Brevik, Daniel Nurmi, and Rich Wolski (2003)
Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments
University of California Santa Barbara, Department of Computer Science(CS2003-28), Santa Barbara, CA 93106.
In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: exponential, Pareto, Weibull, and hyperexponential. In each case, we use software we have developed to determine the necessary parameters automatically from each data collection. To gauge suitability, we present both graphical and statistical evaluations of the accuracy with each distribution fits each data set. For all three data sets, we find that a hyperexponential model fits slightly more accurately than a Weibull, but that both are substantially better choices than either an exponential or Pareto. We also test the independence of individual machine measurements and the stationarity of the underlying statistical process model for each data set. These results indicate that either a hyperexponential or Weibull model effectively represents machine availability in enterprise and Internet computing environments.