Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1132

Hashed staging mapper doen't work correctly with sub dax generation jobs

XMLWordPrintable

      Hi Karan,

      We have now verified that with the options:

      pegasus.dir.submit.mapper=Flat
      pegasus.dir.staging.mapper=Flat

      the workflow runs correctly (including the sub-daxes that were failing
      for Tom) in pegasus 4.7.1dev.

      However, I think the reason that the workflow does not work if these
      options are not given is a bug in sub-workflow handling in pegasus.

      Let me try to explain, this might be a little long, but I hope what
      I'm trying to say is clear:

      The sub-daxes that failed for Tom are sub-daxes that are created
      within the workflow. So we have a creation job:

      """
      <job id="ID0000708" name="foreground_minifollowup-H1L1_ID94">
      [lot's of arguments, input files and stuff]
      <uses
      name="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax"
      link="output" register="false" transfer="true"/>
      <uses
      name="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax.map"
      link="output" register="false" transfer="true"/>
      </job>
      """

      That makes one of these sub-workflows, and then an entry for the dax
      file itself:

      """
      <dax id="ID0000709"
      file="H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax">
      <argument>--basename
      H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600
      -Dpegasus.dir.storage.mapper.replica.file=H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax.map
      -Dpegasus.dir.storage.mapper.replica=File --cache
      /.auto/home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/_reuse.cache
      --output-site local --cleanup inplace --cluster label,horizontal
      -vvv</argument>
      </dax>
      """

      As this dax file is created within the workflow the path to the
      H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax
      file, once it is created, will follow the hashed directory path,
      something like:

      work/00/00/main_ID0000001/00/6F/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H-1128299417-183600.dax

      under the local-site-scratch directory.

      However, when running the dagman PRE script to plan the workflow the
      pre script reads:

      """
      #!/bin/bash
      set -e
      cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001
      /.auto/home/spxiwh/opt/pegasus-4.7.1dev/bin/pegasus-plan $@
      """

      and the full entry for this in the DAGMAN produced by pegasus looks like:

      """
      SCRIPT PRE subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713
      00/6F/subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713_pre.sh
      -Dpegasus.log.*=/local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/subdax_H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600_ID0000713.pre.log
      -Dpegasus.workflow.root.uuid=7cfc44cc-80e5-4e02-85ae-c521b736e1cf
      -Dpegasus.dir.storage.mapper.replica=File
      -Dpegasus.dir.storage.mapper.replica.file=H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600.dax.map
      --conf /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/pegasus.6052500788100706733.properties
      --dir /local/spxiwh/pycbc-tmp.bZLizt8Xch --relative-dir
      work/00/00/main_ID0000001/00/6F/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
      --relative-submit-dir
      work/00/00/./main_ID0000001.000/00/6F/./H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
      --basename H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600
      --sites local --cache
      /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/main-0.cache,/.auto/home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/_reuse.cache
      --inherited-rc-files
      /local/spxiwh/pycbc-tmp.bZLizt8Xch/work/00/00/./main_ID0000001.000/main-0.replica.store
      --cluster label,horizontal --output-site local --cleanup none
      --verbose --verbose --verbose --deferred --group pegasus --dax
      H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_BIN_1-1128299417-183600.dax
      """

      lot's of stuff, but notice that both the .dax and .map files have no
      path, just a filename.

      SO I think the problem is that the pre script did the wrong thing. The line:

      """
      cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001
      """

      Should have been:

      """
      cd /home/spxiwh/aLIGO/O1/analyses/inj_cutter/full_injcutter/run3/local-site-scratch/work/00/00/main_ID0000001/00/6F
      ""'

      Does that make sense?

      Cheers
      Ian

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: