-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major
-
None
-
Affects Version/s: None
-
Component/s: lalapps_ihope
-
None
The cache files generated by lalapps_ihope code should have gsiftp URL's in them.
Currently, they only have file URL's in there. This works as long as the user is executing on the site local , as they are registered with pool attribute local.
This creates problems while planning the workflow for non local sites ( like OSG ).
Once the cache files have the gsiftp url's, Pegasus should be able to correctly resolve them to file URL for symlinking to happen, in the case the user is submitting to the local condor pool.
Relevant Emails
Hi Britta /Duncan
It does not look to me a RLS issue.
The location of the file L1-
SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input is in the cache
file that is being passed to pegasus-plan in the prescript.
Can you confirm?
Here is my take on what is happening
1) Duncan is passing a file url in the cache file for the local site.
file:///home/bdaudert/lm-test1/871234214-871320614/playground/L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input
2) You are trying to run on site LIGO_CIT.
Since, the URL is a file url ( read not a server accessible URL ) and
you compute site is LIGO_CIT, the replica selector will not pick up
the file url passed for local site.
The reason being the file URL for local site has no meaning on site
LIGO_CIT
Hence, in this case the URL's in the cache file need to be gsiftp .
Duncan this will be an issue in the LIGO data grid also when a user
tries to run on a site other than local site.
Karan
On Sep 2, 2009, at 8:28 AM, Britta Daudert wrote:
Hi Karan, good new: the datafind dag ran without fail.
Bad news: all other dags fail. I looked at two and it looks like they
say the same kind of thing. Something like:
[DEBUG] Selecting a pfn for lfn
L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input
amongst[(file:///home/bdaudert/lm-test1/871234214-871320614/playground/L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input
,
)]
2009.09.01 18:10:49.656 PDT: [FATAL ERROR]
[1] java.lang.RuntimeException: Unable to select any location from the
list passed for lfn
L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input at
org
.griphyn
.cPlanner.selector.replica.Default.selectReplica(Default.java:155)
2009.09.01 18:10:49.656 PDT: [WARNING] Non-zero exit-code 1
I guess this means that data is missing from RLS?
I will re-populate the gps times I am using first thing today and then
start repopulating the rest.
I checked on another weeks data yesterday and found that 2661 out of
10661 files where not registered. I had previously checked the same
data
set and all files where there. I think that the crash yesterday made
some damage.
Regards Britta