Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-973

locations in DAX file override entries in replica catalog

    XMLWordPrintable

    Details

      Description

      i Karan,

      I’m following up on a problem we saw back when putting the XSEDE proposal together. I’m on sugar-dev3 in

       /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615

      The gwf files have pfn mappings in the dag with local entries, such as

             <file name="96261/L-L1_LDAS_C02_L2-962613760-128.gwf">
                     <pfn url="/frames/S6/LDAS_C02_L2/L1/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf" site="local"/>
             </file>

      but we also have a cache file with entries for both sites

       $ grep 96261/L-L1_LDAS_C02_L2-962613760-128.gwf /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/_reuse.cache
       96261/L-L1_LDAS_C02_L2-962613760-128.gwf /scratch/02750/stuart/frames/S6/LDAShoftC02/LLO/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf pool="stampede"
       96261/L-L1_LDAS_C02_L2-962613760-128.gwf /frames/S6/LDAS_C02_L2/L1/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf pool="local"

      At the planning stage:

       $ pwd
       /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work

       $ /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/subdax_main_ID0000001_pre.sh -Dpegasus.log.*=/usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/subdax_main_ID0000001.pre.log -Dpegasus.workflow.root.uuid=60190396-09c2-4b12-ba2d-6ef1725a0437 -Dpegasus.dir.storage.mapper.replica=File -Dpegasus.dir.storage.mapper.replica.file=/home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/main.map --conf /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/pegasus.3320546546350005354.properties --dir /usr1/lppekows/pycbc-tmp.vlVRykqsxb --relative-dir work/main_ID0000001 --relative-submit-dir work/./main_ID0000001 --sites local,stampede --cache /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/_reuse.cache,/usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/weekly_ahope-0.cache --inherited-rc-files /usr1/lppekows/pycbc-tmp.vlVRykqsxb/work/weekly_ahope-0.replica.store --cluster label,horizontal --output-site local --cleanup none --deferred --group pegasus --dax /home/lppekows/projects/XRAC_Jul2015/pool_test/962582415-962625615/main.dax -vvv


      it seems that Pegasus doesn’t see the Stampede entries. The log has messages such as

       2015.08.14 14:02:04.490 EDT: [DEBUG] Selecting a pfn for lfn 96261/L-L1_LDAS_C02_L2-962613760-128.gwf
        amongst[(/frames/S6/LDAS_C02_L2/L1/L-L1_LDAS_C02_L2-9626/L-L1_LDAS_C02_L2-962613760-128.gwf,{site=local})]

      Consequently the frame files are unnecessarily transferred to Stampede. During the proposal we hacked around this by removing the entries from the dax, after which everything worked as expected.

      I’ve tried reproducing this problem with a small hand-written dax (in /home/lppekows/projects/pegasus) which I think has all the essential features; a local entry in the dax and two entries in the cache file, but so far I haven’t been able to reproduce this. Either my test is missing something or maybe the problem only triggered when the dax or cache exceeds a certain size.

      Would you mind taking a look?

      Thanks,

      - Larne

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              dbrown Duncan Brown
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: