аЯрЁБс>ўџ -/ўџџџ,џџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџьЅС‰@ №П jbjbq”q” &іі џџџџџџlррррррр$$$$$ 0$А &PPPPPPPPm o o o o o o ,ж іВ› рPPPPP› ррPPPPRрPрPm є ррррPm іі! ррm D _ЃМ$$Ђ^9 m А А A ,ЈЈm УхEMAN – Requirements and Key Performance Elements What the application computes and what it is used for? EMAN is a Bio-imaging application. It is mainly used for doing 3D-reconstruction of single particles from 2D electron micrographs. The compute intensive portion of the EMAN software is the “Refinement” step that we want to run on the Grid. The “Refinement” step builds up a refined 3D model of the single particle from a coarser 3D model of the same and iterates until a “good” quality 3D model is obtained. How is the application structured? One iteration of “Refinement” is structured as a linear DAG. Some nodes in the DAG are inexpensive sequential components, while the interesting computationally intensive ones are parallel components [basically parameter sweeps]. Some of the parallel components can be run across multiple clusters while others may have to be run on a single cluster. For multiple iterations of “Refinement” the DAG is repeated multiple times. Please refer to the SC03 slides for a more detailed picture. There are eight nodes in each EMAN chain of computation. Of these four are parallel nodes and rest are sequential ones. Requirements – platform/software etc. We have primarily run EMAN “Refinement” on IA-32 and IA-64 Linux clusters in GrADS. We have basically used a heterogeneous set of Linux clusters. Most flavors of Linux are supported. The software can also be run on SGI-IRIX etc. EMAN also needs the following three libraries – FFTW [a library for doing FFTs] and GSL [GNU scientific library] for “Refinement” and QT for visualization. They must be available on all the target resources during compilation. Key factors that determine performance Flops and I/O look like the most important aspects that determine the performance. This is what I think the order of importance is with highest first – Flops, I/O, memory performance. Most of the computationally intensive operations are floating point ops. Also there are lots of reads and writes to a number of different files during a parallel phase. With the test data-set we are working on, it takes about eight runs of the DAG to get a refined 3D model. Other data-sets may require different number of iterations. For this data-set, the input data is about 200MB [that is replicated on all clusters]. The size of the data exchanged between nodes in the DAG varies from .3 to 6MB. I don't have exact numbers on number of bytes vs. number of flops for each node. Just to give an idea, the most compute intensive node for this data-set takes about 35 minutes on a single node and produces 300KB of data. So, I think that flops is really important. We have to keep in mind that this is a small/medium data-set. Most big experiments have a larger input data-set [few GBs upto 100GB], also implying larger intermediate data-sizes. What more general model is it representative of? The current EMAN “Refinement” is representative of applications having a sequence of components some of which are parameter sweeps. In future, each of the EMAN computations may evolve to a DAG that may be described using high-level Python scripts. Multiple such DAGs may be submitted for Grid execution. The more general model that future EMAN application may be representative of is a mixed parallel application where we have A set of large input files [replicated or distributed] A set of software components arranged in a DAG [maybe a simple chain or a more complicated DAG described through a high-level scripting language], where each component may access the input data and/or data produced from preceding components. Some of the components may be parameter sweeps themselves. A set of such DAGs needs to be processed, as many as possible and as fast as possible. §  ј B*PJph12i%„ … Ћ s t › ќ § Ѓ8ijх§ћљћћљћћљћћљћћнћћљћЧЧ1$7$8$H$ Ц 0`Р№ P€Ар@1$7$8$H$ Ц& 0`Р№ P€Ар@12i%„ … Ћ s t › ќ § Ѓ8ijхI ќљѓ№эчфслиеЯЬЩЦУРКљЕАЉŸ•Оќџџ  ы§џџ   "ўџџ ўџџžўџџ љєџџnћџџб§џџwўџџxўџџйџџџ Œљџџўџџкџџџ {ћџџ~§џџнџџџ ў§џџ1ўџџЩџџџ ЮџџџЯџџџхI ххх & F1$7$8$H$ Ц 0`Р№ P€Ар@Аа/ Ар=!А"А# $ %А i4@ёџ4NormalCJOJPJQJmH H@H Heading 1$Є№Є<@&5CJ KH OJQJF@F Heading 2$Є№Є<@&56CJOJQJD@D Heading 3$Є№Є<@&5CJOJQJ<A@ђџЁ<Default Paragraph FontjB`ђj Body Text71$7$8$H$ Ц& 0`Р№ P€Ар@ B*PJph  & џџџџ џџ z™ џџ z™e  Ё х  ј§коЧ Ъ Ы Я ? C W[Ђ%2 mvЌД7 ? † Ž Й С Ђ:::::::џџ Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoftTTiDoggie:Users:anirban:Documents:Microsoft User Data:AutoRecovery save of EMAN  Req Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoftTTiDoggie:Users:anirban:Documents:Microsoft User Data:AutoRecovery save of EMAN  Req Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.doc Hi PerSoft;TiDoggie:Users:anirban:Useful:Docs:EMAN  Req_for_vgrid.docКBІHЪџџџџџџџџџ„а„˜ўЦа^„а`„˜ўo(.€„ „˜ўЦ ^„ `„˜ў.‚„p„LџЦp^„p`„Lџ.€„@ „˜ўЦ@ ^„@ `„˜ў.€„„˜ўЦ^„`„˜ў.‚„р„LџЦр^„р`„Lџ.€„А„˜ўЦА^„А`„˜ў.€„€„˜ўЦ€^„€`„˜ў.‚„P„LџЦP^„P`„Lџ.КBІHџџџџџџџџ         џ@€А А œwјВА А ЏЩ   @GTimes New Roman5€Symbol3 Arial3Times;Helvetica pˆаhЁ„І#t„f l­ !ЅРДД€>0О  “@џџ0EMAN  Requirements and Key Performance Elements Hi PerSoft Hi PerSoftўџ р…ŸђљOhЋ‘+'Гй0ˆˆЬиьј ( D P \hpx€'1EMAN а Requirements and Key Performance Elements.1MAN Hi PerSoftii PNormalo Hi PerSofti10PMicrosoft Word 10.1@ˆa@ЁЧЧФ@ЗVh"Ф­ ўџ еЭеœ.“—+,љЎ0$ hpˆ˜  ЈАИР Ш 'Rice UniversityО  1EMAN а Requirements and Key Performance Elements Title ўџџџўџџџ !"#ўџџџ%&'()*+ўџџџ§џџџ.ўџџџўџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџRoot Entryџџџџџџџџ РFЇ›x>"Ф0€1TableџџџџџџџџWordDocumentџџџџџџџџ&SummaryInformation(џџџџDocumentSummaryInformation8џџџџџџџџџџџџ$CompObjџџџџџџџџџџџџXџџџџџџџџџџџџџџџџџџџџџџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџўџџџџџ РFMicrosoft Word DocumentўџџџNB6WWord.Document.8