<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="../www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="../purl.org/dc/elements/1.1/default.htm"
         xmlns:syn="../purl.org/rss/1.0/modules/syndication/default.htm"
         xmlns="http://purl.org/rss/1.0/">




    



<channel rdf:about="search_rss">
  <title>VGrADS at Rice University</title>
  <link>./</link>
  
  <description>
    
            These are the search results for the query, showing results 1 to 3.
        
  </description>
  
  
  
  
  <image rdf:resource="logo.jpg"/>

  <items>
    <rdf:Seq>
        
            <rdf:li rdf:resource="publications/inproceedingsreference.2009-09-03.4991512480"/>
        
        
            <rdf:li rdf:resource="publications/resilience-ccgrid08"/>
        
        
            <rdf:li rdf:resource="Members/anirban"/>
        
    </rdf:Seq>
  </items>

</channel>

    <item rdf:about="publications/inproceedingsreference.2009-09-03.4991512480">        <title>Analysis of Application Heartbeats: Learning Structural and Temporal Features in Time Series Data for Identification of Performance Problems</title>        <link>publications/inproceedingsreference.2009-09-03.4991512480</link>        <description></description>        <dc:publisher>No publisher</dc:publisher>        <dc:creator>anirban</dc:creator>        <dc:rights></dc:rights>                <dc:date>2009-09-04T03:40:34Z</dc:date>        <dc:type>Inproceedings Reference</dc:type>    </item>
    <item rdf:about="publications/resilience-ccgrid08">        <title>Fault Tolerance and Recovery of Scientific Workflows on Computational Grids</title>        <link>publications/resilience-ccgrid08</link>        <description>In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our algorithms for over-provisioning and migration, which are our primary strategies for fault-tolerance. We consider application performance models, resource reliability models, network latency and bandwidth and queue wait times for batch-queues on compute resources for determining the correct fault-tolerance strategy. Our goal is to balance reliability and performance in the presence of soft real-time constraints like deadlines and expected success probabilities, and to do it in a way that is transparent to scientists. We have evaluated our strategies by developing a Fault-Tolerance and Recovery (FTR) service and deploying it as a part of the Linked Environments for Atmospheric Discovery (LEAD) production infrastructure. Results from real usage scenarios in LEAD show that the failure rate of individual steps in workflows decreases from about 30% to 5% by using our fault-tolerance strategies.</description>        <dc:publisher>No publisher</dc:publisher>        <dc:creator>anirban</dc:creator>        <dc:rights></dc:rights>                <dc:date>2008-05-16T20:58:44Z</dc:date>        <dc:type>Inproceedings Reference</dc:type>    </item>
    <item rdf:about="Members/anirban">        <title>anirban</title>        <link>Members/anirban</link>        <description></description>        <dc:publisher>No publisher</dc:publisher>        <dc:creator>anirban</dc:creator>        <dc:rights></dc:rights>                <dc:date>2008-01-24T19:01:50Z</dc:date>        <dc:type>Folder</dc:type>    </item>



</rdf:RDF>
