Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-973

locations in DAX file override entries in replica catalog

XMLWordPrintable

      i Karan,

      I’m following up on a problem we saw back when putting the XSEDE proposal together. I’m on sugar-dev3 in

      /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615

      The gwf files have pfn mappings in the dag with local entries, such as

      <file name="96261/L-L1_LDAS_C02_L2-962613760-128.gwf">
      <pfn url="/frames/S6/LDAS_C02_L2/L1/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf" site="local"/>
      </file>

      but we also have a cache file with entries for both sites

      $ grep 96261/L-L1_LDAS_C02_L2-962613760-128.gwf /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/_reuse.cache
      96261/L-L1_LDAS_C02_L2-962613760-128.gwf /scratch/02750/stuart/frames/S6/LDAShoftC02/LLO/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf pool="stampede"
      96261/L-L1_LDAS_C02_L2-962613760-128.gwf /frames/S6/LDAS_C02_L2/L1/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf pool="local"

      At the planning stage:

      $ pwd
      /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work

      $ /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/subdax_main_ID0000001_pre.sh -Dpegasus.log.*=/usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/subdax_main_ID0000001.pre.log -Dpegasus.workflow.root.uuid=60190396-09c2-4b12-ba2d-6ef1725a0437 -Dpegasus.dir.storage.mapper.replica=File -Dpegasus.dir.storage.mapper.replica.file=/home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/main.map --conf /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/pegasus.3320546546350005354.properties --dir /usr1/lppekows/pycbc-tmp.vlVRykqsxb --relative-dir work/main_ID0000001 --relative-submit-dir work/./main_ID0000001 --sites local,stampede --cache /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/_reuse.cache,/usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/weekly_ahope-0.cache --inherited-rc-files /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/weekly_ahope-0.replica.store --cluster label,horizontal --output-site local --cleanup none --deferred --group pegasus --dax /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/main.dax -vvv

      it seems that Pegasus doesn’t see the Stampede entries. The log has messages such as

      2015.08.14 14:02:04.490 EDT: [DEBUG] Selecting a pfn for lfn 96261/L-L1_LDAS_C02_L2-962613760-128.gwf
      amongst[(/frames/S6/LDAS_C02_L2/L1/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf,

      {site=local}

      )]

      Consequently the frame files are unnecessarily transferred to Stampede. During the proposal we hacked around this by removing the entries from the dax, after which everything worked as expected.

      I’ve tried reproducing this problem with a small hand-written dax (in /home/lppekows/projects/pegasus) which I think has all the essential features; a local entry in the dax and two entries in the cache file, but so far I haven’t been able to reproduce this. Either my test is missing something or maybe the problem only triggered when the dax or cache exceeds a certain size.

      Would you mind taking a look?

      Thanks,

      • Larne

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: