Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1001

symlinking not working against datasets on compute site in nonsharedfs mode

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master, 4.6.0, 4.5.3
    • Affects Version/s: master, 4.5.2
    • Component/s: Pegasus Planner
    • None

      Hi Karan,

      I'm getting closer, but I'm still stuck. Even with pegasus 4.5.3cvs, it won't see frames on orange-grid, it only sees the local site. I've tried:

      Remove the frame PFNs from the DAX and put the PFNs for local and orange-grid in the static PFN cache. This results in all jobs pulling from the local site:

      2015.10.29 17:08:38.956 EDT: [DEBUG] Job being traversed is calculate_psd-PART0-H1_ID52_ID0000185
      2015.10.29 17:08:38.956 EDT: [DEBUG] To be run at local
      2015.10.29 17:08:38.956 EDT: [DEBUG] Parents of job:{}
      2015.10.29 17:08:38.957 EDT: [DEBUG] Selecting a pfn for lfn 112861/H-H1_HOFT_C00-1128615936-4096.gwf
      amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128615936-4096.gwf,

      {site=local})]
      2015.10.29 17:08:38.957 EDT: [DEBUG] Selecting a pfn for lfn 112862/H-H1_HOFT_C00-1128620032-4096.gwf
      amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128620032-4096.gwf,{site=local}

      )]
      2015.10.29 17:08:38.957 EDT: [DEBUG] Selecting a pfn for lfn 112862/H-H1_HOFT_C00-1128624128-4096.gwf
      amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128624128-4096.gwf,

      {site=local}

      )]

      This is in /usr1/dbrown/pycbc-tmp.cOut6LpZC9/work and /home/dbrown/projects/osg/karan-test-1/output

      Add the orange grid PFNs to the DAX, along with the local PFNs. This gives me the error:

      2015.10.29 16:48:20.525 EDT: [DEBUG] Selecting a pfn for lfn 112861/H-H1_HOFT_C00-1128615936-4096.gwf
      amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128615936-4096.gwf,

      {site=orange-grid})]
      2015.10.29 16:48:20.525 EDT: [FATAL ERROR] java.lang.RuntimeException: Unable to select a Physical Filename (PFN) for file with logical filename (LFN) as 112861/H-H1_HOFT_C00-1128615936-4096.gwf for staging to site local amongst [(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128615936-4096.gwf,{site=orange-grid}

      )]

      This is in /usr1/dbrown/pycbc-tmp.nPa73sBswj/work and /home/dbrown/projects/osg/karan-test-4/output

      I've tried with and without pegasus.transfer.bypass.input.staging=true and the issue is the same in either case.

      So I'm stuck getting the orange grid jobs to see frame files locally. (I also just confirmed with Larne that he is still seeing the same issue on Stampede, so maybe 4.5.3 was never fixed?)

      Cheers,
      Duncan.

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: