Electron Micrograph Analysis (EMAN)
EMAN is a
package for Electron Micrograph Analysis developed within the National
Center for Macromolecular Imaging at Baylor College of Medicine by
Steve Ludtke, a senior researcher in Dr Wah Chiu's group. The
package processes thousands to possibly tens of thousands of
micrographs from electron microscopes iteratively in the determination
of a macromolecular structure. Most of the computations are of a
throughput nature consisting of fitting individual micrographs to a
hypothesized 3-D structure. An improved hypothesized structure is
then generated from the fits made, a computation that requires
consolidation of all fits in a "tightly coupled" computation, as shown in the following diagram:
For the purposes of VGrADS, EMAN is an excellent example of a workflow application.
One aspect of making applications Grid aware is the creation of component performance models, which allow VGrADS to make proper decisions for resource selection and scheduling of the components. An early focus of our EMAN effort was the construction of performance models for EMAN components. The EMAN model that we now use combines analytic expressions (derived from the knowledge of the computational methods) and empirical expressions (derived from benchmark runs). We also use a simple bandwidth and latency model to estimate communication time for data transmitted between components.
EMAN has also been a major test case for methods of scheduling workflow applications. Our original schedulers considered task-level scheduling of the component computations. Over time, we have also investigated more scalable schedulers - for example, scheduling at the level of clusters rather than individual processor nodes. Experiments in this vein have shown significant improvements in running time when the scheduler can exploit accurate information about module execution times. The graph below shows one such experiment, comparing results from our original heuristic with a simple random mapping advocated by some other groups. In the same experiment, we tested both our most accurate performance model for EMAN and a simpler model based only on processor clock speed. In short, for this EMAN example both the advanced scheduler and the advanced performance estimate improved the overall EMAN performance by about 50%.
We continue to use EMAN as an example application. Most recently, we demonstrated our batch queue scheduler at SC'05 using EMAN as the test application. This led to a paper accepted at SC'06 the next year.
For the purposes of VGrADS, EMAN is an excellent example of a workflow application.
One aspect of making applications Grid aware is the creation of component performance models, which allow VGrADS to make proper decisions for resource selection and scheduling of the components. An early focus of our EMAN effort was the construction of performance models for EMAN components. The EMAN model that we now use combines analytic expressions (derived from the knowledge of the computational methods) and empirical expressions (derived from benchmark runs). We also use a simple bandwidth and latency model to estimate communication time for data transmitted between components.
EMAN has also been a major test case for methods of scheduling workflow applications. Our original schedulers considered task-level scheduling of the component computations. Over time, we have also investigated more scalable schedulers - for example, scheduling at the level of clusters rather than individual processor nodes. Experiments in this vein have shown significant improvements in running time when the scheduler can exploit accurate information about module execution times. The graph below shows one such experiment, comparing results from our original heuristic with a simple random mapping advocated by some other groups. In the same experiment, we tested both our most accurate performance model for EMAN and a simpler model based only on processor clock speed. In short, for this EMAN example both the advanced scheduler and the advanced performance estimate improved the overall EMAN performance by about 50%.
We continue to use EMAN as an example application. Most recently, we demonstrated our batch queue scheduler at SC'05 using EMAN as the test application. This led to a paper accepted at SC'06 the next year.