ࡱ> F-APiwr>TJFIFddDuckyd&Adobed  nsW   0@!1PAB6 !1A"Qaq2B 0R#@3bCcs$u !1AQ @a"0PqB2Rbr#!1A Qaq0@ Rk{IN-֤ć_.֢d=t f eYAvx׍s25&\Ï=T9q+_,{$<WmQ_͡3˱i>&(/#-s`pq6Is}'\嘈Y%$vurL>he' NbǽMW bS utBbbX藱=ܩp bM ,-F2F?ydҬV&JY+~ța:nc78;,XѺ2Ov6;%l|W cy%VX"3(uO:_~jDG]蒛} c.6iqʩHp{A*R[_d _:ˎp OM"*y}ה@E$x3A,,u>5Ll$XaYۯDMM33im4g;IM/RbiX}ŗtYt ۑ 36+#r$g_"%|xoYIG2X0DyHs!q(-emb!׉)qϣnͷ$? ?jy7NJoSV1G[;u\ i?'sHݿ1n^7 S"~H{.Lۧl>"Nޘ[g2Iö< h$]u ?zvtMPn'ٲH z R7wUj>Q+A(d懿Z4n>Ur yk= @KI" yy]{.kʋ巟$6S?r785 0Ӕ581)Z[F#"nuQ* C4 7mB>(-NGU{6\kܝ?y_ :u=f'.j0|ezxc,wWeLXv0c/FϲLb#w.iy$JimGe. 3F8XAYK܆%#^yC/,f7 BuSr2اv^d^9cG0~{~nzH6C^/y:%n*mۆZGkQuNFuW'(3,T޶Hz6#A-PKGBP$Z"*>rr,/pmR:wu'krS@ YbㄩN! )GXwcݽ'53"3G/lδln׭dMI6Hgc3C)pxs]i˄='DY""(͙ $ꮩ(֝Aq+xxwޔ>adn-~H8xoF֖a  wɵx;r}-n5 T6vß;Fv[2ƐOi#eS!)m8M;Zܰf#4BDc[' Ql5]^}e{i)_,֑gA]36,XWUq Tj|,7n$2{ Vӟ$c|s{Ʋj8ŋJխDng. g.Xvg"pZ$xY F ’5$~Xu&U1v Y.G(%nJ(lIcP[bj+V{ݢ $0vaQu;}x-?cff!UNǀP.ؓE=Or5&Բ$xd9hz;fU%="JP+F1PP@Ğ'>g$5fve(Ydu Uݕ@$ };~`6բv khhes@V-Pnozd9HKLNOtvM6䛋oj,ҩW5xkzLTG mvϛmLSV""a62ڭAEG 9xǞGf"mF謄{MϘˎ,obxb87XK?!͂/P .Gz<]Ӂ bS0 Mс_MFV@f|=N40Q}5 vЈܖ/7?y qQ`[6:Ά<}X^X/XE yLC. ``H̲`%h-{ Bl'z* Sf6ϯ^v5+De$O´ss \<&q)5 ON#ӵ5R0%A3fqKVu,Vr:-9v7[5x2v7>5u{ E$Q[tbt˰8x_) a(JUV:Vx()>v@PD>N5PꤍƵp^Nl=@+ LQS;r)  _Vnfzx;f{t=3VgR.1Y UњW)h`=2XoEkMÌRܨx6!Q >\fM*F5&.@i>g܎K {^ΨbQ']SOo?!xB<}3с YsqS'ԯ IÁΥvX|~ 򻚗F֟]kJҤxGoGrrW@NW1/{&2R"@X際7'{XʽtɽdG\F "۸ X|y,@'d>k '/ޯ>)Ţoz'A6lu89U3b9*G%q56jgHK??!}!ѓ*pz?fs Sg὿>P0OU>>Mw=CG~?\${QGA|oEѿmLd܇惉>:\*_63e~yjsGS M,yXb\U5C?܂=D]Leː2۷/QXO:7umv/ | j7FV.=: x `gx?'QsH4x]HL$Y`y罳7ܾm8b.tf?H T)Oh"2 8慱SͽqXzrRe*h'Kzio "2P0KM$ kI>QEu؀F=h BWoũUsd4]eTIL$RX%@ϷLpi(x+ G1ֺ`^tH@6Ÿ6u0.L$g? xmT:;* {~#gdS"h] @W}] FvEP4yz:3@F]6v|À)- D7Y@K˨lkJ6W)?^$G* O@WF :(i:qeQ=^U 0_떎(QSeX&4Xj5[~= `(]Rs{^ڦQb*-{`qPŅM ]*\I sNNE[vvq +8Dž#\+E> P. vUԗc YpKrnz`VX {{XMƐ+,.zjx6xѮVX O@On]9m&e[/&gwM#PjFo<h03ˌϚvsrڝw۝Vl9}s˯Z@lJ9y4œwØe]>7yl*k_ !v$D[vG w+(ucc k.{-#'c/b^ƃix61XD@Fk,&uKZ E#^fqfc=%Ԫy糐]<1.pwݤ: j{Ed1uMpV޳&/"k_51DEHEcE{`05,:LUf [Pe`a:a A.tH،ՊTCo`x%5C`sf@Uqq 0;q-l [֝%K`8Y(a5 c>*/?E 0.r=x'<$]T%YoOb"XUi 8fwuIԅIW"G?dV q\[#t)*7s*Gbjgl0h6Ezf h~J<'EDA1A#KMp`.K{Y,QS%[(dc*qhӓG ߤ IeYzWOle1uTOw3RlaH(ҀV`ZUfbl 9b?}S!]oSRL*een M'j4V0BK{TM:gԤE&<< ^ԃ?{CyMہYD[c^bL;׹ƝP_!>pqy]L9(Jik,^tv`Q*3[ H/W ʷ-~O !Sk%ObcڱТ&A'V[ ^4:I}<=n:K;0. j,0[.u lݯ8l]83ÇTG^oUPب7>LER\`LE*^h*pL_? I(6| PLqʬdw߿W킣S8q1PL$(ty6 ;aMG:N}.Q;ƾaBN+0P`kL*U?ςn+~Ǧx<4Ўe_(Ytt~O Kìg6 prttt<9%D$:±w1w@+}oa zߘ2UNc=_g4ܓ͆1/<nOaxa4uhŗ1~& ˯qx㣣/G_''ۧhđm)x&x2~,2 bIAxOJpJuQSmiO,c ti5oo\^>~஑:e÷'`hr299cM_!>ágccGt6X3cA!Dke>ygۅ1d4ǻ1FVS0nn@1u{vjewWreZAFf>f+m ѨihXc9i({]*(2x2wfb|) G͏ے\ v~ 8jMU+H;IdKwXciOďX㎗B$1֎-+91Jև-tdaI0ufd-! p_5zZBť3,6)ElĘRjU6{)| Fe3\`h_~ ǻ;ɮئҤDs|UƋr hcLW{۲1>ɛݪ47H=c"āŃMFG{C_v,|Dʰ 6aYbB]Ƈr(-m9PyVQJMN w٤Zs(xLbpfY>VozڤI(96Cc<(\] UM/ ` `x0\^!z~fZ MŔlWd$64z͞.y2rF5iT/ljԬ2]'{$Za~nf_#fŃ ϙh[ zB!8j'J6C<; ״%0 wA@}FaD:8xm+9tkv*5bX#h; P "d$X/Zt2 H]fWwHު'\,~e'\ A!V1ƍBNԄq~hy4V*od#Ifap/Kנ`;T[փ0xʢO~۟=0bpQƍrd\p-0c 3 B̈́*ja}SэQ2Lrn%J ]̢o['Űc h7L&BScÞ{3]0@a>0̢emcp1L1 }CUjoq|e9f!~%WHoo!$vN?ur*5 ,:PtcLI_ƍ&[UBQRuF8|fxvf\΄]ܛe&3_W/pn׉w8Cӽ09̀ U`<("=e&(bve[l•-F+ [R1) !ħWёM3;XƍmUiǖZ l޴qOf, \r/WI0ê+OMt DPpwX1x9`½EBҩ+QQ-E$RJZG "×t0k$ʝ݉kѾ%JiX.fSv$xo0e+)@Q`SZ*`zpFX)hPu/i{PhUi5Ƈ':BQUV@iBphB4<;:ft%(M ^Vt`Ew-E'Ā`NImɽgkƊ3bD)50nY2R ޅq-@A H P!Z`~0EwnDwEI;KqֶU,m$kYű!Ndj 'm}\w:M/ᱏL]H5EwdmTtGI]hуuf±S6YL2l^&Y8X߷Lq`7ɪ*nUjp&DA]Cw2!JEv r|!Z Ba2,;.d58&sdٰ P܏d2ͬJEr vIYBʅip%J+'Dz**2}6 E!hnQaDJ&{v!%ƸQ&[;ᧃJ F!,CX}Ch̋H8;.Z2@bi0$1*dJ?kEOvmU9ZDq٣*kh*3[EUj@=( sl\f ӲV 12+9.sp, ariG,VTiH6?0􊳅z+qQD)ۊ(o@ʢ*5 jNM-~c [6UrFeR ߷kzYJC`qWxYƉr̳RQe1tC`sDGyl(7``'^rcmӤza :s5xpg\18aF=ƅ)E5 q\J3ОpE`g9hߞZwxɢEb J%YMV;/P:vζQFϷU @Y#rTee[j5yS >C=Y' =ǘo+%D5*:,Q,ణأeWDw P׶SQe*5aťfH(T!G8m!De'2~t,gIUTi4w~) |Ha&j* `Q/#᷒1.[nhK~ 3MB6ð{ƃ9pm#v$G#T),;?NI#gtea w~pȼ,`E>p9,{J(FsW92hcdѐ̆мs WApƓ y4) z0s"Ln=/*#$2E0(FE͈*ͱ{+H*+4aب*33`sߘ3+im;2+vH3N}N9G=XVF-YFlTiDJj(oѰґ*5!87lqvfRs(6=*G[ mpCV.9r*> Dwy"ƙh-qdN2Iy1n#GU4^(rJ,ˢcM "Dh"L&V/ݙedYC<-/Ǡ O] 4D$R7UnREE!l(EMaILY&FM|yBɺo[U9^GqOO,0B74 < 3qIO0bLy!Ռɡp6Q[QUԄ5 HF~ ёv;#R/uyRM䔚!:9Q(bfi2 a~H*VV 7qd:Mf9J$lAR~ݶqFLNlQҩIwya0*R<&PQYJl?㹤<.kU$ Y s= l=ؕ%q8Pd7NBlQVre<< Iw-fOf>7ʻ` Ѻe͇T,~we`,ήzMe wĚtocl(Y в,(u 6lJ3fBn,ʇF}рّ=-bp}?#(LYv%eUDiDXW7wɎ`(~ڀr$Z?UcM Mxǟgt]Y0eWL0|!YibS., 2YuU>=xϙVV7a2?c5^_n hK~hUn]LU-rCݜ[9.s"Mрevr5h{g=ئƇH?T4ƍ"jyP5#ɲH ?( ; $XO6IV{KHюxtm9'+Ec€_0lq9`I `1(f1(>Mu[̠J`01aڄh9++#1/J͢[44%FDYoAdV]p=m&cOҩg9h'E`1(i$D/fK~h84 Bh}͢+`dg8:{j<-$|ފ ^m`~ʎv='˳q0iN,ڄJ0N7IJٲK~taQ0̃5Ǣ\<4x}"9Yg-m<#@FEzlV -!8ˇ1S !.7"Ut`QĖST,V΃y# !փq;wiO4TtU,C;!xp%=]LlE,`uQ =t8O5YOf_A0|vݯgqѮZ[#% }9E{ʥ ? 0Ϟ`asGnU%ۦ)\YFQc!8.58 >`\[Etן`L^Bsҹ:\!D'$Fht}D}:#Uqt`c;/==ӝ_R7FZ|Lln/K#SaǶYmS8, b+ʪO7`ҹqAtB;O{h`` #RpeqIj.ŠV "ּbyx9?y@=)؆n6%~Դ-x(r׀6xZklE>wOu1]dp,…EIYsGRՕx #JЇ1IB88A6J2ʴnm"}Tm7.u?)!fчNb{$_?\ b?[Yxkbqig][-h#X6) I,H8#r.;C.YO,kS#ڸX34w!%[K;C8Nb&. 6]Vb9\swzkQ?q>u(ܵaKGuEsܭUbKbqB pиk+ ~+#pX+bqV wF!I,Vaە=`_zmO2\orst 89uV(+5ӱ7s޿s +s;ӵ_MF:nwCE _; ]YrY#9lgu٩DY*tGM{xDlOU}'D;[4}?j2M_郚>ۨ&]4v΢]Kӓ?5c}s?Xs~$21{$2TqnEs//:j[ͣ^qB x O\k}iKz>=߄2eΌ?Qi0y]H%b..) x -TBW`1ەv7g:y|#yQ^x+(QSXYc%W՘Wېwi܎=ʜSLFc"99p$}NAk:qoogg/jA_-3lAjM<sŚZw_߻a蘍%OgI"ly mS; n}FĞ/mNI K5#K*Q}5 ={ Y|_sL1NkD[fcm8ӂ*qa'FM%[ o7+cߍMw}C m0c`z^72q"1cjuBr 9e[ Itr6.5κJa^Wq6WFið6 h7f cf@3ɫJW}W\j*F4'(lÒW̫ۈXϟvϟv_yȫUȫSXY:%~Wk{(s_]lËa+n6b 3Òfkx&y>]"7Euɫ pq^w=G6oк'^3`Y{a,_0ù.IJTS7oj{bޙ噸؝k>I;߳35[bBOun as6vc_C PNG  IHDRaa&zbKGD#2 cmPPJCmp0712Om dIDATxڵOlT%Vl)*$2)\"T6V=Jպ RCi[c(nJA!'9,4*Jڂ^@>aΛ_oCWZY2fߌ׉k4$Hti }D3x" &zxDz|~X_pM=mDZ燵ڒ1П|6V_@O' ?Y}a촁>H\WMgp5%Oyz"g/}]ĵ&kp朁.}06( ]:ؠ48i\>aEe\ec bjۧ%µ7'k\+zJ)M7>5%˗ t> tw~4µBV}{˸&d:6з} -µ7}? IWw:'E}ڟRk} }h:ޢG k:'pmo{'n[%jcuEu+Zutڹ(kuu4>zLNi̭JU} ޸t`=+'h54mv?աOt~VC_[fPMIKնԏ\o`}l-j3UMˎp}aJQM[yrʰ~j*ѴDunVNSmTHej#PT}FZF%eP*(9#e. e^MJNRb@~uROS: !ojV_ Mt <QL$ Ӛzc!E'#)҃6M 1iU . s%C iB_mI}>^JTWy~F_+|Dkn{_)HtR@:tC4'7-PTϞw&-'$]h5 Dsn#( (s۽)D+A{Rz@ozz r,T,zu0.~hިQOOݘFD7ݢ`;߭k iA э$bXQb̗v9~gpM&ͧ;MSv!1hIzV_mFeVsGU__[:=mǰj͡Ei ?"i|뭃T,$] i^^{+vr[zvINwQчˏwH7VkBi`3UanAۼ&t{SJQjnŷ!7tzRV#ԈQD4]L`h`'n jµO:ç< 觖 NzI7װVBNtce{:iX4 %Fu=if,t3z%wͰZu :id92[OgJ+ ֜-.ΚYj⒍\vrɟf@ygG\ct4$NmRԅ(špZt5 g`a2jNdj:Rv!Vs'N-bu37ww?hc^o~#b5U\٨A5XXA5qEU SikTTIkmߏc@[z󐀵F,m:O٨uXW֪~[Z[{T,Ղu6'`9Xk5g~kU )XӸi'OJXtRڣ$1v=XnՃkJzv~5r}wg!vyCƵ7`99 1Y(> R)F=(B05R(e@}3뒇_ʍxm8>yuR:\BQFOfwZ :f =1 .;Μ;z^*Jv^1\:.XYZ׎WPrmj5>JnrTyAqJ1M08oN.q6|h.k/ĵk|$tbYIBhmj צ>Bh$p<,eR:񰢱NMMk;]О |E/, 3_Z#|WoNhWPhH?ƀW5_P>ĵ@75MFFi: +\ kt |n%t'۳ktREHB{E H5zREZ+wZfam^av$׾pQu|+mf6OMo%~Ic3Cv#iBS?kI覮5}/#!k{dõ+6B4ZN˄#T#$;!]fН)Bg6o>p?T{dЭНc΢{ 7$k7z6(&tj:;ֶĵ{NAh~&ԖQ&twz!Љ>IhfvХ 1J"B@h}BKvU}" =7NZ/ӌ.''T=9,O+Цh[ ^dz"+} + 0zs{46IhAh71FG[0z!mHB=ЮPb8犌ntPl$-vѧcAF;|h=m}B/TP:$ XYZͫĄv:etuq!C/&t+\gi]h%mkKqw$&t#tq΋ mNmt$ qBg ft:h̓vUJ_ Aֻ"IENDB`I+(  X  Equation Equation.30,Microsoft Equation 3.0  Chart Excel.Chart.808Microsoft Office Excel ChartX/ 0DTimesnicode MSttPtn 0 DTimes New RomanttPtn 0 DArialNew RomanttPtn 00DComic Sans MSnttPtn 0B@DSymbolans MSnttPtn 0PDWingdings MSnttPtn 0`DTechings MSnttPtn 0"pDEurostar Black Extended 0"DMT Symbollack Extended 0RDFalstaff Festival MTed 0DArial Unicode MS MTed 0"DSimSunnicode MS MTed 0@ .  @n?" dd@  @@`` 2*@  ! !"#%$%&(')()'*5+B-05$$$R$APiwr>T5$b$*TlH`V9D:Q`352$|[;@tczVXxK"$)؆n6%~IMb$ as6vc_C   Vb$|8/< _s AA1? 3f3f)))333___f@8  Q ʚ;ʚ;g4ddddn 0ppp@ <4dddd@w 0t,tn<4BdBd@w 0t,tn___PPT10DSimSun}0${n\tt 0DArial}0${n\tt 0 2___PPT9/ 0? +O =N&Fault Tolerant Ideas%Jack Dongarra University of Tennessee!;Super-Scale Architectures for Clusters and Grids<<$Widely deployed systems have 1,000 processors Current tera-scale supercomputers have up to 10,000 processors. Next generation peta-scale systems will have 100,000 processors and more. Such machines may scale up beyond 100K processors in the next decade. l9:8!,6D} Failures for such a system is likely to be just a few hours, minutes, seconds away. Application checkpoint / restart is today s typical fault tolerance method. A problem with MPI, no recovery from faults in the standard >TZZZ)MPI Implementations with Fault Tolerance **(%FT-MPI http://icl.cs.utk.edu/ft-mpi/&$Define the behavior of MPI in case an error occurs FT-MPI based on MPI 1.3 with a fault tolerant model similar to what was done in PVM. Give the application the possibility to recover from a node-failure A regular, non fault-tolerant MPI program will run using FT-MPI Stick to the MPI-1 and MPI-2 specification as closely as possible (e.g. no additional function calls) What FT-MPI does not do: Recover user data (e.g. automatic check-pointing) Provide transparent fault-tolerance.PVPV=Algorithm Based Fault Tolerance Using Diskless Check Pointing>> \Not transparent, has to be built into the algorithm N processors will be executing the computation. Each processor maintains their own checkpoint locally M (M << N) extra processors maintain coding information so that if 1 or more processors die, they can be replaced Today looking at M = 1 (parity processor), can do more with Reed-Solomon coding PdZ6ZZZd6"How Diskless Check Pointing Works(#($(eSimilar to RAID for disks. If X = A XOR B then this is true: X XOR B = A A XOR X = B XGPPP%"Diskless Checkpointing CThe N application processors (4 in this case) each maintain their own checkpoints locally. M extra processors maintain coding information so that if 1 or more processors die, they can be replaced. Will describe for m=1 (parity) If a single processor fails, then its state may be restored from the remaining live processors  DPCDiskless Checkpointing   Diskless Checkpointing   Algorithm BasedBuilt into the algorithm Not transparent Allows for heterogeneity Developing prototype examples for ScaLAPACK and iterative methods for Ax=b L)K)K #A Fault-Tolerant Parallel CG Solver$$(Tightly coupled computation Do a  backup (checkpoint) every k iterations Can survive the failure of a single process Dedicate an additional process for holding data, which can be used during the recovery operation Work-communicator excludes the backup process For surviving m process failures (m < np) you need m additional processesQ= -" The Checkpoint Procedure4 processes participating in the computation, one for checkpointing and recovery If your application can survive one process failure at a time or Implementation: a single reduce operation for a vector Keep a copy of the vector v which you used for the backupFPPsP6  The Recovery Procedure$Rebuild work-communicator and Recover data Say lose process w/rank 1, checkpoint in process 4, then use remain processes 0, 2, and 3 along with checkpoint in 4 to recover data from process 1. Reset iteration counter On each process: copy backup of vector v into the current version,%Z CG Data Storage Parallel version Diskless version *Preconditioned Conjugate Grad Performance &+)b  b $ + FutureswInvestigate ideas for 10K to 100K processors in a Grid context: Processors hold backups of neighbors. Unwind the computation to get back to the checkpoint Local checkpoint and restart algorithm. Coordination of local checkpoints. Middleware supported super-scale diskless checkpointing. Development of super-scalable fault-tolerant MPI implementation with localized recovery.`@ZZ#ZZ@# [*   0` f3` f̙` 999___` f3f3f3!>?" dd@,?Zd@ d @ `  n?" dd@   @@``PR    @ ` ` p>>L0 ~(  rB  <o"pp  Ngֳgֳ "X n T Click to edit Master title style! !<  Hgֳgֳ " @  RClick to edit Master text styles Second Level Third Level Fourth Level Fifth Level!     S  Tgֳgֳ "0P   D*  6 )X"0P  D*X  C "A icl2"GB  s *޽h ? f3?4,___PPT10 . tech  0L0 (  lB  6p"pp  Nd|M gֳgֳ "p M  T Click to edit Master title style! !  HDM gֳgֳ " `   M  W#Click to edit Master subtitle style$ $  T$M gֳgֳ "` M  B*  T M gֳgֳ "`  M  D*  TM gֳgֳ "` M  D*B  s *޽h ? f3?4,___PPT10 . 0 0( !    08nn P   M  T*     0kn    M  V*   d  c $ ?  M   0DM   @ M  RClick to edit Master text styles Second level Third level Fourth level Fifth level!    S    6M  `P  M  T*     6M  `  M  V*   H  0޽h ? ̙3380___PPT10.+0a @`( 1ٓ  ` ` 0U  P   U  JI C L      ` 0U     U  T*     ` 6pU  `P  U  STitle goes here     ` 6HU  `  U  T*    H ` 0޽h ? ̙3380___PPT10.+୘< 0L0 P@$(  @r @ S LU p U  r @ S M  `   U  H @ 0޽h ? f3?80___PPT10.)   0L0 &,(  ,~ , s *,   ~ , s *- p   r , S .>    H , 0޽h ? @Eff؂oy___PPT10Y+D=' S = @B +  0   (  X  0Ԕp 0RB  s *DԔ` `RB  s *DԔ   <np j A Automatic   #  <~pW FSemi-automatic #  <ȋ$ HCheckpoint based #  <$ *  A Log based   #   <w =Other #   <   A Framework   #   <옓 :  9API !   <P ` ] Comms layer   #   <$ WCoCheck !  <\  >Starfish   !  <ܩ ~  :Clip !  < O I  ?LAM/MPI #  <첓 Vg B MPICH-V/CL   #  <6 { B Optimistic   #  <  >Casual #  < C Pessimistic   #XB  0DԔ`PP  <D!    @Pruitt98   #  <P @  rSend based Mesg. logging #    < P  UEgida !  <V t WManetho !  <O   <MPI/FT !  <xT F  >MPI-FT #  <TS  @MPICH-V2   #  <0Q  <LA-MPI !  <^   @FT-MPI %XB  0DԔ`00x   c $b0   H  0޽h ? ̙33y___PPT10Y+D=' S = @B +  0 <(  ~  s * iX   ~  s *iPT  H  0޽h ? ___f3fy___PPT10Y+D=' S = @B +  0L0 <(  ~  s *xX   ~  s *\y @  H  0޽h ? f3?___PPT10i.}\+D=' S = @B +a  0L0 `X(  ~  s *hX   ~  s *@o7      l0e0eA040  H  0޽h ? f3?___PPT10i. M+D=' S = @B +   0L0    (  ~  s *X   x  c $蟃@   F  Pp   PY   BX)X?C"? p 8P0  B)X?C"?pP 8P1  B )X?C"?pPp 8P3  B)X?C"? pp 8P2   Bl)X?C"?`i  8P4d   <)X?9 b    `GH~Io)X? b    `ZGHtIo)X?9 Y R   ZGP4 takes on the identity of P1 and the computation continues ??H  0޽h ?              ! "# f3?___PPT10i.0+D=' S = @B +  0L0 @<(  ~  s *W8 X  8  ~  s *X8  @ 8  H  0޽h ? f3?___PPT10i.} 8+D=' S = @B +  0 P<(  ~  s *pa8 X  8  ~  s *4b8   8  H  0޽h ? ___f3fy___PPT10Y+D=' S = @B +  0 `''(  ~  s *xr8 X  8  x  c $Ps8  8  F    @p   Zpy8 1?  31"  Z}8 1? 32"  Z8 1? 33"  Z8 1?  34"   Z8 1?   35"   ZІ8 1?P 32"   Zz8 1?P 33"   ZL8 1?P 34"   Z88 1?P  35"  Z 8 1? P  36"  Z8 1? P  33"  Z8 1? P  34"  Z(8 1? P  35"  Z̤8 1? P  36"  Z8 1? P  37"  Z8 1?`   34"  Z8 1?`   35"  Z8 1?`   36"  Z88 1?`   37"  Z8 1?`   38"  Z,8 1?F <Rank 0&  Z8 1?F <Rank 1&  Z8 1?  <Rank 2&  Z9 1?j <Rank 4&  Z9 1?4 8 410"  Z9 1?48 414"  ZX9 1?48 418"   Z9 1?48  422" ! ZD9 1?4 8  426" " Z9 1?  <Rank 3& #  `9 1?  7+&  $  `H9 1? 7+&  %  `,9 1?   7+&  &  `!9 1?  7=& ` ' c $A ??6   H  0޽h ? ___f3fy___PPT10Y+D=' S = @B +{  0 p((2(  ~  s *H9 X  9  ~  s *H9   9  F @   @   ZJ9 1? yL `  31"  Z,-9 1? i L P  32"  Zޓ1? Y L @  33"  Z0L9 1? I L 0  34"   ZDS9 1? 9 L  35"   ZDW9 1?kp'W  32"   Z [9 1?k` 'G  33"   Z^9 1?kP '7  34"   Z]9 1?k@ ''  35"  Ze9 1?k0 '  36"  Zi9 1?pW  33"  Zn9 1?` G  34"  Z8q9 1?P 7  35"  Zu9 1?@ '  36"  Zy9 1?0   37"  Z`~9 1?y\`  34"  Zt9 1?i \P  35"  Z9 1?Y \@  36"  Z9 1?I \0  37"  Z̍9 1?9 \  38"  Z09 1?   <Rank 0&  Z89 1? <Rank 1&  Z49 1? P <Rank 2&  Z9 1? <Rank 4&  Z$9 1?s[wB  410"  Z9 1?sK w2  414"  Zܦ9 1?s; w"  418"   Z9 1?s+ w  422" ! Zpt9 1?s w  426" " Z9 1? P <Rank 3& #  `9 1?  p  7+&  $  `9 1? ` ]  7-&  %  `9 1?  ]  7+&  &  `p9 1?c g]  7=& ~R ' NjJ?  ~R (B NjJ?@ H  0޽h ? ___f3fy___PPT10Y+D=' S = @B +   0L0    (   ~   s *9 X  9  F P     l   <)X?P rB   BD)X? rB   BD)X?pp rB   BD)X?  rB   BD)X?  d   <)X?@ F     l   <)X? rB   BD)X?`a l   <)X?    N9 )X?p@t W QThink of the data like this   N9 )X?_F ;Ac    ND9 )X?_F ;bc    N9 )X?_| F C 5 vectors  c H   0޽h ? f3?___PPT10i.IH+D=' S = @B +Y  0L0 g_&&(  ~  s *,9 X  9  F P    l  <)X?P rB  BD)X? rB  BD)X?pp rB  BD)X?  rB  BD)X?  d   <)X?@ F     l   <)X? rB   BD)X?`a l   <)X? F P   Pr  B)X?P rB  BD)X? rB  BD)X?pp rB  BD)X?  rB  BD)X?  j  B)X?F ` P   `  r  B)X? P r  B)X?` P r  B)X? P   N9 )X?p@t W QThink of the data like this  N9 )X?` 4 c-Think of the data like this on each processor..  N9 )X?_F ;Ac   N\9 )X?_F ;bc   NJ )X?_| F C 5 vectors  c   NJ )X? x p ;Ac   N4 J )X?dp ;bc    N J )X?d p C 5 vectors  c jB ! BD)X?0` 0jB " BD)X?` jB # BD)X? `  $ NJ )X?|  ;. . .  % N J )X?.|  ;. . . G & NJ )X?: s &,$ 0 =No need to checkpoint each iteration, say every k iterations.<>0a a a H  0޽h ? f3?z___PPT10Z.IH+ *aD' S = @B D' = @BA?%,( < +O%,( < +D' =%(D' =%(D8' =A@BB BB0B%(D' =1:Bvisible*o3>+B#style.visibility<*&%(D' =-g6B fade*<3<*&+8+0+&0 +  0L0 44(  ~  s *,J X  J  F P    l  <)X?P rB  BD)X? rB  BD)X?pp rB  BD)X?  rB  BD)X?  d   <)X? P F      l   <)X? rB   BD)X?`a l   <)X? F P   P r  B)X?P rB  BD)X? rB  BD)X?pp rB  BD)X?  rB  BD)X?  j  B)X?P` F ` P   P`  r  B)X? P r  B)X?` P r  B)X? P jB  BD)X? I jB  BD)X? I jB  BD)X?P I PjB  BD)X?@ @jB  BD)X?   jB  BD)X? F  Pp  p ` 0   B3J )X?C"? p 8P0 ! B8J )X?C"?pP 8P1 " BH6___PPT9 Recovery Overhead (%)D( " dg  g     @`  <TyJ ?" p  F>6___PPT9 Ckpoint Ohead (%)D( " d g  g  4     @`V  <pJ  ?"`p  F>6___PPT9 Recovery (sec)0( " dg     @`l  <̌J  ?"` p  F>6___PPT9 FT-MPI w/ recovery (sec)0( " dg     @`&  <tJ  ?"`p  f^V___PPT980 6(FT-MPI w/ ckpoint (sec)X)( " dg   #g  |  x    |     @`   <J  ?"`:p  f^V___PPT980  FT-MPI (sec)X ( " dg   g  T  x    |  @`V   <J  ?"`p : F>6___PPT9 Mpich1.2.5 (sec)0( " dg     @`T   <J  ?"`p  F>6___PPT9 Matrix ( Size ). " dg     @`J   <0J ?" F>6___PPT9 z0.370( " dg    @`J   <4J ?" F>6___PPT9 z0.120( " dg    @`X  <xJ ?" F>6___PPT9 3.170( " dg    @`X  <J ?"  F>6___PPT9 872.0( " dg    @`X  <XJ ?"  F>6___PPT9 859.0( " dg    @`X  <\J ?" :F>6___PPT9 858.0( " dg    @`X  <L ?" :F>6___PPT9 860.0( " dg    @`f  <p L ?" F>6___PPT9 bcsstk35.rsa (30237). " dg     @`J  <L ?" F>6___PPT9 z0.720( " dg    @`J  <hL ?" F>6___PPT9 z0.230( " dg    @`Y  <*L ?" F>6___PPT9 4.09 0( " dg    @`X  <2L ?"  F>6___PPT9 577.0( " dg    @`X  <d,L ?"  F>6___PPT9 570.0( " dg    @`X  <0FL ?" :F>6___PPT9 569.0( " dg    @`Y  <L ?" :F>6___PPT9 577. 0( " dg    @`u  <YL ?" F>6___PPT9 nasasrb.rsa (54870). " dg  (      @`I  <JL ?"  F>6___PPT9 y9.10( " dg    @`I  <4mL ?"  F>6___PPT9 y1.10( " dg    @`X  <LoL ?"  F>6___PPT9 2.480( " dg    @`Y  <hL ?"  F>6___PPT9 30.5 0( " dg    @`X   <L ?"   F>6___PPT9 27.50( " dg    @`X ! <XL ?" : F>6___PPT9 27.20( " dg    @`X " <DL ?"  :F>6___PPT9 27.50( " dg    @`f # <L ?"  F>6___PPT9 bcsstk17.rsa (10974). " dg     @`J $ <L ?"   F>6___PPT9 z23.70( " dg    @`I % <L ?"   F>6___PPT9 y2.40( " dg    @`X & <0L ?"   F>6___PPT9 2.310( " dg    @`X ' <L ?"   F>6___PPT9 12.90( " dg    @`X ( <4L ?"   F>6___PPT9 10.00( " dg    @`X ) <L ?" :  F>6___PPT9 9.780( " dg    @`X * <DL ?"  : F>6___PPT9 9.810( " dg    @`g + <L ?"   F>6___PPT9 bcsstk18.rsa (11948) . " dg     @`fB , 6o ?p p `B - 01 ?  `B . 01 ?`B / 01 ?fB 0 6o ?fB 1 6o ?p `B 2 01 ?p `B 3 01 ?:p :`B 4 01 ?p `B 5 01 ? p  `B 6 01 ?p `B 7 01 ?p `B 8 01 ?p fB 9 6o ?p `B : 01 ?   ; 0M "  `Pp ~v0___PPT106___PPT9 Table 1: PCG performance on 25 nodes of a dual Pentium 4 (2.4 GHz). 24 nodes are used for computation. 1 node is used for checkpoint Checkpoint every 100 iterations (diagonal preconditioning)" c   F < <M "   ~v0___PPT106___PPT9 F a  & =0 # 0e0eA ?5% ?" P@   M p > C HA bcsstk18structure plotp ? C HA bcsstk17structure plotB  s *޽h ? ̙33___PPT10i. }+D=' M = @B +  0L0  $(   r   S <X   r   S 0P   H   0޽h ? f3?___PPT10i.+D=' M = @B +<xMkQpmbմRT  C<4DRLi6i6RJ ERV/xыRD*ԃ^zA&μ6nn(23{37 >jY8  >8dLz|PŅ=*@=[k B4~' [ v&[ yN_t]욼}e9Ӑ{ehE6`gxGF+@)&iW> ){hV'_4Iվ9qhj#@8h L`{u1q͐WZ6H%Ffd8KXfwC҅L]Jed] ?-Lo8@1dLqULBPA֬dyu|Fy)OzТj*!=,V]V ͩpp6Mn]s2#1`0V?K̏E?Un$~"~QYOez.i/_c[PYmv83ZaeSh8M1^Wti9S: t.ti!aû⯮5Ic3X`.A-R1: >Q(*m(vb$q6 4$|u; `&!7ރ,7Vf;76"NMP l!Xx`@@v1$B@r[$ub9񉓸q#N7i>=4ii{˭F{{o߾]*Xw=νwܹsg.?L49@0O.D( QcnUH[Jo9œԐ!Y_obV,7GGVcS)CIv|Ѐa+pXOL5vBXqZY&v|LCgΟ(LE8ާ#%@c ]C arׁKT>GX d"Ch@X &7 w5rohO?W~o!O =DG(@08"@!eS"̄ĸ@( #E%p B `!":އ+ٚJHr^*~z]6( ;{T;GRwv0<DGK,^pqL˔16#eY 2Jv|g͠dZ{[[[[ G[⇣ mMݽd(H ّB>E_* ~^B"E9K_l{j_Yj5PRtnV=<-N~y.nm =ʯ_U^Qۤ~/?s#JOE2^{H{g{OowSoW7Z::5uw79>"1X칹?Q{1 qNSͲt ֵrHc,X-q+niQ-O,`A|^a@kZZ+TGOWM}Ba.kxu. qWUXcJ.D~{f_Q*?;kk&4z:桉5{EZpbMG?Pg+ak1k^{~jw6ݗ'㷀>ʘ*9~?/罟4nݹsB }]Q؂siS {T〟囁V|^%sV_jcY93o?/Q9LQ2V V.juxOEQ=wX#ʗsm~Bգ'%Ǭ9IŔ:'ElwlLs*q9DqJ)̇o%IF{)%SJ}o*'6]}qĩƦ=q-/ډHڊ}$V܊gƧKqE)r \b[~AnFSFB?0%3Yc,g?j>?7~a E)yx-/>AgQ?} O~Go=n $^heIsjs`j[ nAgǮ/ M4g3Lyf 4TI@01cC!0-Z%ж=eAzȸ"W ng$ς˹: W"D0%0_xN![~z|SE p r!sC܇2h G*n a;̭ۙnGI!pP4 e34`|M~a9 9'PT9u4~ 1z˔ $KUlb`fa 36vHCv(6-Ёf3`Oz V `ƺ3[Iϖ˜a^؟L~/;N#HJ[C%`$LAqD{c!)u #Co=]G4PA@j7y\/<-]f9̿ S;)xhD_w" `=[^{Af5)DV0ҎE{E(G-:OtBs\G)ƥzuZ^v0Y=İ8Y mi~5^ȡ!kJa++ W!e!:?DVM!Dϗq>^6°f__4F$iWz+g}yx3.~zUA*tfJ=usL]9*˔sAt0tCUwt{@DJ :]Y-WTqZmhn}ѹC7\}Ο0D8z{ #WjۮO ޿Ja/p Ct30CtHϿߥ0D[ Cn{෯h |&^JCuߺ+ yr{_o|4;Pg:PB8-h轈G 7?56oE_w1Χ(f9B$J/~nciAHkLzd><هp Pza0e,݆m-S"O{`-7wXgt4SFqu| Tq[9Pl&&>io'[(8Ή iI$I$I$I$I$I$[duKni-dY*bMs^ :QЋ9J6z2Oوَla2FpDNa",vN!c$aL&29DHvN!c$q73OhV5Oi*s19S(;2b['Lt{rCp'$d :S@=D\G/{iM.?MbQJ;Ktu_3 >ad߆6k92.9eQ툺+}Gt[WE붾oza־Ǵw?+Jy<$NBSr[ZGiOv|mvzqMv_~oGZq|Ҕ9ּL`--02Ζ{>:&K֣H1{kt(X%[F֋/0~N:H[\ C>^+4;m,܀n[Z8j\bLpp, w8F־@$() y5a<?rwJu_sv*~.]Hɺ[I?1sAv|JhRŷ?\ …ߤ~ˠyLC 4)eoNنy |N{}nЭFs*VN*^ЪF%i*S9. ?!X/V S46v{rRzbR)%ۛmm*oԛ:fWx;p_o{WzY˿"^'L;ːIv:\4:Efcz;K:b )E@8)RGg  *  y--$xx--'--%<<--'@Times New Roman-. %2 : Fault Tolerant Ideas.. %2 9Fault Tolerant Ideas."System)-@BComic Sans MS-. 2 K7 Jack Dongarra.-@BComic Sans MS-. *2 V'University of Tennessee.-`Root EntrydO)\.YPicturesfCurrent User.SummaryInformation( lt-Tolerant Parallel CG SolverThe Checkpoint ProcedureThe Recovery ProcedureCG Data StorageParallel versionDiskless version+Preconditioned Conjugate Grad Performance Futures  Fonts Used Design TemplateEmbedded OLE Servers Slide Titles4 $, &_㳈,Sarah GonzalesRoot EntrydO)p,ZPicturesfCurrent User.SummaryInformation( lt-Tolerant Parallel CG SolverThe Checkpoint ProcedureThe Recovery ProcedureCG Data StorageParallel versionDiskless version+Preconditioned Conjugate Grad Performance Futures  Fonts Used Design TemplateEmbedded OLE Servers Slide Titles4 $, &_㳈,Sarah Gonzales%_㳈 0Jack DongarraJack Dongarra  !"#$%&'()*+,-./012356789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~    Root EntrydO)PicturesfCurrent User SummaryInformation(PowerPoint Document(4׈DocumentSummaryInformation8Root EntrydO)p)PicturesfCurrent User SummaryInformation(     ՜.+,D՜.+,X    ( On-screen Show Innovative Computing LaboratoryM !TimesTimes New RomanArialComic Sans MSSymbol WingdingsTechEurostar Black Extended MT SymbolFalstaff Festival MTArial Unicode MSSimSuntechMicrosoft Equation 3.0Microsoft Office Excel ChartFault Tolerant Ideas<Super-Scale Architectures for Clusters and Grids*MPI Implementations with Fault Tolerance &FT-MPI http://icl.cs.utk.edu/ft-mpi/>Algorithm Based Fault Tolerance Using Diskless Check Pointing#How Diskless Check Pointing WorksDiskless CheckpointingDiskless CheckpointingDiskless CheckpointingAlgorithm Based$A Fault-Tolerant Parallel CG SolverThe Checkpoint ProcedureThe Recovery ProcedureCG Data StorageParallel versionDiskless version+Preconditioned Conjugate Grad Performance Futures  Fonts Used Design TemplateEmbedded OLE Servers Slide Titles4 $, PowerPoint Document(4׈DocumentSummaryInformation8