Hashed staging mapper doen't work correctly with sub dax generation jobs

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Hi Karan,

      We have now verified that with the options:

      pegasus.dir.submit.mapper=Flat
      pegasus.dir.staging.mapper=Flat

      the workflow runs correctly (including the sub-daxes that were failing
      for Tom) in pegasus 4.7.1dev.

      However, I think the reason that the workflow does not work if these
      options are not given is a bug in sub-workflow handling in pegasus.

      Let me try to explain, this might be a little long, but I hope what
      I'm trying to say is clear:

      The sub-daxes that failed for Tom are sub-daxes that are created
      within the workflow. So we have a creation job:

      """
      <job id="ID0000708" name="foreground_minifollowup-H1L1_ID94">
      [lot's of arguments, input files and stuff]
      <uses
      name="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax"
      link="output" register="false" transfer="true"/>
      <uses
      name="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax.map"
      link="output" register="false" transfer="true"/>
      </job>
      """

      That makes one of these sub-workflows, and then an entry for the dax
      file itself:

      """
      <dax id="ID0000709"
      file="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax">
      <argument>--basename
      H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600
      -Dpegasus.dir.storage.mapper.replica.file=H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax.map
      -Dpegasus.dir.storage.mapper.replica=File --cache
      /.auto/home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/_reuse.cache
      --output-site local --cleanup inplace --cluster label,horizontal
      -vvv</argument>
      </dax>
      """

      As this dax file is created within the workflow the path to the
      H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax
      file, once it is created, will follow the hashed directory path,
      something like:

      work/00/00/main_ID0000001/00/6F/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax

      under the local-site-scratch directory.

      However, when running the dagman PRE script to plan the workflow the
      pre script reads:

      """
      #!/bin/bash
      set -e
      cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001
      /.auto/home/spxiwh/opt/pegasus-4.7.1dev/bin/pegasus-plan $@
      """

      and the full entry for this in the DAGMAN produced by pegasus looks like:

      """
      SCRIPT PRE subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713
      00/6F/subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713_pre.sh
      -Dpegasus.log.*=/local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713.pre.log
      -Dpegasus.workflow.root.uuid=7cfc44cc-80e5-4e02-85ae-c521b736e1cf
      -Dpegasus.dir.storage.mapper.replica=File
      -Dpegasus.dir.storage.mapper.replica.file=H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600.dax.map
      --conf /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/pegasus.6052500788100706733.properties
      --dir /local/spxiwh/pycbc-tmp.bZLizt8Xch --relative-dir
      work/00/00/main_ID0000001/00/6F/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
      --relative-submit-dir
      work/00/00/./main_ID0000001.000/00/6F/./H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
      --basename H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
      --sites local --cache
      /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/main-0.cache,/.auto/home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/_reuse.cache
      --inherited-rc-files
      /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/main-0.replica.store
      --cluster label,horizontal --output-site local --cleanup none
      --verbose --verbose --verbose --deferred --group pegasus --dax
      H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600.dax
      """

      lot's of stuff, but notice that both the .dax and .map files have no
      path, just a filename.

      SO I think the problem is that the pre script did the wrong thing. The line:

      """
      cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001
      """

      Should have been:

      """
      cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001/00/6F
      ""'

      Does that make sense?

      Cheers
      Ian

            Assignee:
            Karan Vahi
            Reporter:
            Duncan Brown
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: