-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master, 4.5.2
-
Component/s: Pegasus Planner
-
None
Hi Karan,
I'm getting closer, but I'm still stuck. Even with pegasus 4.5.3cvs, it won't see frames on orange-grid, it only sees the local site. I've tried:
Remove the frame PFNs from the DAX and put the PFNs for local and orange-grid in the static PFN cache. This results in all jobs pulling from the local site:
2015.10.29 17:08:38.956 EDT: [DEBUG] Job being traversed is calculate_psd-PART0-H1_ID52_ID0000185
2015.10.29 17:08:38.956 EDT: [DEBUG] To be run at local
2015.10.29 17:08:38.956 EDT: [DEBUG] Parents of job:{}
2015.10.29 17:08:38.957 EDT: [DEBUG] Selecting a pfn for lfn 112861/H-H1_HOFT_C00-1128615936-4096.gwf
amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128615936-4096.gwf,
2015.10.29 17:08:38.957 EDT: [DEBUG] Selecting a pfn for lfn 112862/H-H1_HOFT_C00-1128620032-4096.gwf
amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128620032-4096.gwf,{site=local}
)]
2015.10.29 17:08:38.957 EDT: [DEBUG] Selecting a pfn for lfn 112862/H-H1_HOFT_C00-1128624128-4096.gwf
amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128624128-4096.gwf,
)]
This is in /usr1/dbrown/pycbc-tmp.cOut6LpZC9/work and /home/dbrown/projects/osg/karan-test-1/output
Add the orange grid PFNs to the DAX, along with the local PFNs. This gives me the error:
2015.10.29 16:48:20.525 EDT: [DEBUG] Selecting a pfn for lfn 112861/H-H1_HOFT_C00-1128615936-4096.gwf
amongst[(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128615936-4096.gwf,
2015.10.29 16:48:20.525 EDT: [FATAL ERROR] java.lang.RuntimeException: Unable to select a Physical Filename (PFN) for file with logical filename (LFN) as 112861/H-H1_HOFT_C00-1128615936-4096.gwf for staging to site local amongst [(file:///frames/O1/H1_HOFT_C00/H1/H-H1_HOFT_C00-1128/H-H1_HOFT_C00-1128615936-4096.gwf,{site=orange-grid}
)]
This is in /usr1/dbrown/pycbc-tmp.nPa73sBswj/work and /home/dbrown/projects/osg/karan-test-4/output
I've tried with and without pegasus.transfer.bypass.input.staging=true and the issue is the same in either case.
So I'm stuck getting the orange grid jobs to see frame files locally. (I also just confirmed with Larne that he is still seeing the same issue on Stampede, so maybe 4.5.3 was never fixed?)
Cheers,
Duncan.
- depends on
-
PM-1002 Support symlinking against compute site datasets in nonsharedfs mode with bypass of input file staging
- Closed