-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master, 4.7.0
-
Component/s: Pegasus Planner, Planner: hierarchal workflows
-
None
Hi Karan,
We have now verified that with the options:
pegasus.dir.submit.mapper=Flat
pegasus.dir.staging.mapper=Flat
the workflow runs correctly (including the sub-daxes that were failing
for Tom) in pegasus 4.7.1dev.
However, I think the reason that the workflow does not work if these
options are not given is a bug in sub-workflow handling in pegasus.
Let me try to explain, this might be a little long, but I hope what
I'm trying to say is clear:
The sub-daxes that failed for Tom are sub-daxes that are created
within the workflow. So we have a creation job:
"""
<job id="ID0000708" name="foreground_minifollowup-H1L1_ID94">
[lot's of arguments, input files and stuff]
<uses
name="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax"
link="output" register="false" transfer="true"/>
<uses
name="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax.map"
link="output" register="false" transfer="true"/>
</job>
"""
That makes one of these sub-workflows, and then an entry for the dax
file itself:
"""
<dax id="ID0000709"
file="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax">
<argument>--basename
H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600
-Dpegasus.dir.storage.mapper.replica.file=H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax.map
-Dpegasus.dir.storage.mapper.replica=File --cache
/.auto/home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/_reuse.cache
--output-site local --cleanup inplace --cluster label,horizontal
-vvv</argument>
</dax>
"""
As this dax file is created within the workflow the path to the
H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax
file, once it is created, will follow the hashed directory path,
something like:
work/00/00/main_ID0000001/00/6F/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax
under the local-site-scratch directory.
However, when running the dagman PRE script to plan the workflow the
pre script reads:
"""
#!/bin/bash
set -e
cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001
/.auto/home/spxiwh/opt/pegasus-4.7.1dev/bin/pegasus-plan $@
"""
and the full entry for this in the DAGMAN produced by pegasus looks like:
"""
SCRIPT PRE subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713
00/6F/subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713_pre.sh
-Dpegasus.log.*=/local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713.pre.log
-Dpegasus.workflow.root.uuid=7cfc44cc-80e5-4e02-85ae-c521b736e1cf
-Dpegasus.dir.storage.mapper.replica=File
-Dpegasus.dir.storage.mapper.replica.file=H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600.dax.map
--conf /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/pegasus.6052500788100706733.properties
--dir /local/spxiwh/pycbc-tmp.bZLizt8Xch --relative-dir
work/00/00/main_ID0000001/00/6F/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
--relative-submit-dir
work/00/00/./main_ID0000001.000/00/6F/./H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
--basename H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
--sites local --cache
/local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/main-0.cache,/.auto/home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/_reuse.cache
--inherited-rc-files
/local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/main-0.replica.store
--cluster label,horizontal --output-site local --cleanup none
--verbose --verbose --verbose --deferred --group pegasus --dax
H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600.dax
"""
lot's of stuff, but notice that both the .dax and .map files have no
path, just a filename.
SO I think the problem is that the pre script did the wrong thing. The line:
"""
cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001
"""
Should have been:
"""
cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001/00/6F
""'
Does that make sense?
Cheers
Ian