LEAD - Linked Environments for Atmospheric Discovery
Linked Environments for Atmospheric Discovery (LEAD) is a collaborative effort to create an integrated, scalable framework for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing a broad array of meteorological data and model output, independent of format and physical location. The meteorology of LEAD is being developed under a separate ITR award. The VGrADS team is collaborating with LEAD to apply resource selection, scheduling, and provisioning techniques from the VGrADS research effort to the LEAD workflow.
The unique characteristics of LEAD lie in the dynamic workflow orchestration and data management, which allow the use of analysis tools, forecast models, and data repositories not in fixed configurations or as static recipients of data, as is now the case, but rather as dynamically adaptive, on-demand, Grid-enabled systems. In other words, as the weather changes, the observation and computing systems should change in response. This vision will enable meteorology systems that can (a) change configuration rapidly and automatically in response to weather changes; (b) continually be steered by new data; (c) respond to decision-driven inputs from users; (d) initiate other processes automatically; and (e) steer remote observing technologies to optimize data collection for the problem at hand. Toward these goals, LEAD research is focused on creating a series of interconnected, heterogeneous virtual IT “Grid environments” that are linked at several levels to enable data transport, service chaining, interoperability, and distributed computation. An overview of the system follows.
We are working to understand LEAD workflow needs, and are developing Virtual Grid (VG) specifications for the workflows, along with scheduling methods. The definition of workflow in LEAD is subtly different from the definition in EMAN. Every component in the LEAD architecture is encapsulated into individual services that represent the atomic application tasks as well as the resource and instrument monitoring agents that drive the workflow. Thus each workflow step is managed by a persistent service whereas EMAN’s tasks are not persistent. In addition, there is a need for managing streaming data, as opposed to the fixed data collections used in other VGrADS applications.
Our recent focus has been on specifically enhancing the functionality of the various VGrADS components that will interact with the LEAD architecture. We are building a performance model of the entire LEAD workflow that is able to approximately estimate data sizes and running times of all the stages in the workflow. The performance model will then be used in conjunction with the slot manager in vgES and the tools for scheduling into slots. As explained elsewhere, the slot manager is able to provide availability times for resources that may be controlled by batch queues or resource reservation systems. This resource provisioning service will appear as a web service in the LEAD framework.
Project members from both teams met at the Research Triangle Park in November 2005 to spearhead the collaboration, and a series of internal workshops has continued since then. We are currently working toward a major demonstration of a LEAD workflow running on VGrADS at the SC'06 Conference. It will show most of the research thrusts mentioned above, particularly scheduling in the presence of batch queues and reservations.