Personal tools
You are here: Home Publications Combined Fault Tolerance and Scheduling Techniques for Workflow Applications on Computational Grids
Document Actions

Yang Zhang, Anirban Mandal, Charles Koelbel, and Keith Cooper (2009)

Combined Fault Tolerance and Scheduling Techniques for Workflow Applications on Computational Grids

In: IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID 2009), Shanghai, China.

More and more complex scientific workflows are now executed on computational grids. In addition to the challenges of managing and scheduling these workflows, additional reliability challenges arise because of the unreliable nature of large-scale grid infrastructure. Fault tolerance mechanisms like over-provisioning and checkpoint-recovery are used in current grid application management systems to address these reliability challenges. In this work, we propose new approaches that combine these fault tolerance techniques with existing workflow scheduling algorithms. We present a study on the effectiveness of the combined approaches by analyzing their impact on the reliability of workflow execution, workflow performance and resource usage under different reliability models, failure prediction accuracies and workflow application types.

by Charles Koelbel last modified 2009-09-04 06:04
« September 2010 »
Su Mo Tu We Th Fr Sa
1234
567891011
12131415161718
19202122232425
2627282930
 

VGrADS Collaborators include:

Rice University UCSD UH UCSB UTK ISI UTK

Powered by Plone