-
Type: New Feature
-
Resolution: Fixed
-
Priority: Critical
-
Affects Version/s: None
-
Component/s: Pegasus Planner
-
None
For the LIGO ihope workflows there are sub workflows being executed in the same directory.
Relevant email
I just noticed while testing the S6 workflows that there are certain sub workflows sharing the same submit directory
/usr1/vahi/work/ihope/s6/hm-hour-osg-itb/pegasus-submit-dir/H1L1V1-s6_lowmass_ihope_small-932255943-86400.3IF4OT/playground
[vahi@sugar playground]$ ls *dag
inspiral_hipe_playground_cat2_veto.PLAYGROUND_CAT_2_VETO-0.dag inspiral_hipe_playground.PLAYGROUND-0.dag
[vahi@sugar playground]$
I was under the impression that all sub workflows will have a unique submit directory and that is what we had agreed on.
I think this is problematic and can lead to errors that are harder to debug
1) Pegasus ensures that the filenames for the submit file are unique within a DAX . so if you plan to run 2 dax'es with the same submit directories there is bound to be clashes happening .
In the OSG mode, this is already happening.
For example : stage_in_osg-itb_0.sub is being shared across the two sub workflows. The one that was written later wins.
I see similar clashes here
merge_ligo-lalapps_thinca-1.0_PID1_ID2.err.000
merge_ligo-lalapps_thinca-1.0_PID1_ID2.out.001
If the workflows are not running in parallel, then scientifically nothing incorrect is happening. However things are going to go haywire in case of errors etc.
2) pegasus analyzer will be thrown off also, as it assumes that a workflow has it;s own submit directory.
Also in general having the workflows share the same sub directory makes debugging hard.