VGrADS SC'07 demo details
This document describes how to run the VGrADS SC'07 demonstration.
Steps to run VGrADS SC’07 demo
- Log in to kh0.renci.org as “drlead”
- Run /home/drlead/vgrads/my_proxy to authenticate. Use the password in /home/drlead/vgrads/pass
- Run cleanup scripts:
- /home/drlead/cleanup to clean-up kittyhawk and bowhead
- Check for existence of other drlead processes on login-hg.ncsa.teragrid.org and login-w.ncsa.terargid.org . If no other drlead processes exist, use /home/drlead/cleanup_mercury and /home/drlead/cleanup_tungsten
- If there are other drlead processes on login-hg and login-w, you have to find the drlead processes for vgrads and kill them manually using qdel. This is the way to find the vgrads related drlead process on the TG machines and cleanup.
i. login-hg : qstat | grep drlead will give all job-id; qstat –f will show output paths that will reveal whether it is a vgrads drlead process or not
ii. login-w: bjobs –lp will show output/error paths that will reveal whether it is a vgrads drlead process or not
Job <1096806>, User , Project , Status , Queue , Command <#! /bin/sh;#;# LSF batch job script built by Globus 4.0.1-r3 Job Manager;#;#BSUB -P TGMCA07T022;#BSUB -W 142; #BSUB -i /dev/null;#BSUB -o /u/ac/drlead/vgrads/pglidin/instance/stdout.vges;#BSUB -e /u/ac/drlead/vgrads/pglidin/instance>
...
iii. delete ~/vgrads/pglidein/server_name on each TG machine
- Make your own test directory say
- cp /home/drlead/vgrads/LEADEngine/vgrads.kh.tar.gz onto ; cd ; tar –zxvf vgrads.kh.tar.gz
- cd /vgrads ; chmod 755 demoTest.sh
- Run the demo using: ./demoTest.sh reschedule 0.9 kh1.renci.org bowhead.cs.ucsb.edu login-hg.ncsa.teragrid.org login-w.ncsa.teragrid.org true (duration) (jobMode) > OUT
jobMode is real or unreal for real or fake jobs
- The output will be at /vgrads/OUT and the vgES log will be at /vgrads/vges.log
- Open http://vgdemo.cs.rice.edu:8080/vgdemo/index.jsp in Firefox (not IE) to look at the viz.
- Open 4 separate ssh windows: ssh drlead@kh1.renci.org , ssh drlead@bowhead.cs.ucsb.edu , ssh drlead@login-hg.ncsa.teragrid.org , ssh drlead@login-w.ncsa.teragrid.org
- To monitor each of these 4 machines,
- cd $HOME/vgrads/pglidein ; watch bin/qstat
- If the machine is bound , you will not see a message like: “Cannot open /home/drlead/vgrads/pglidein/server_name” . Once a job starts running on a bound resource, you will see the job-id.
- To kill a running job on the resource,
- find the current running job using “cd $HOME/vgrads/pglidein ; watch bin/qstat” which will give the job-id (for example 0.compte0…)
- cat $HOME/vgrads/pglidein/server_name will give something like compute-0-0.local:24946
- qdel .. In this case, it will be “qdel 0. compute-0-0.local:24946”
- If you kill all copies of an application, rescheduling will be triggered and the viz. will reflect that. You can again monitor using method described in 11.
- To change a machine's BQP reservation status (from reserve mode to non-reserve mode), log in to bowhead.cs.ucsb.edu as user drlead and edit the file /home/nurmi/public_html/sc07reservations. Machines listed in this file will be in reservation mode, simply comment them out (with a '#' caracter) if you wish to put them in regular (non-reservation) mode.
11. To change performance model information the db information is
Host: beluga.renci.org
database: lead_adaptation
username: lead_adaptation
password: aiW2aeBo
12. Cleanup tungsten and mercury manually
# show all drlead jobs
qstat -u drlead
or
bjobs (for tungsten)
# check if jobs are ours
# if something prints out, the job is ours.
qstat -f | grep vgrads
or
bjobs -l (for tungsten)
# delete job from queue
qdel
or
canceljob (for tungsten)
# kill all drlead job belongs to vgrads
ps aux | grep vgrads | cut -b10-16 | xargs kill -9
# remove personal PBS and GRAM leftovers
rm -rf $HOME/vgrads/pglidin/instance/* $HOME/vgrads/pglidin/server_name