cache files generated by ihope code for pegasus use should have gsiftp URL's

XMLWordPrintable

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major
    • None
    • Affects Version/s: None
    • Component/s: lalapps_ihope
    • None

      The cache files generated by lalapps_ihope code should have gsiftp URL's in them.
      Currently, they only have file URL's in there. This works as long as the user is executing on the site local , as they are registered with pool attribute local.

      This creates problems while planning the workflow for non local sites ( like OSG ).

      Once the cache files have the gsiftp url's, Pegasus should be able to correctly resolve them to file URL for symlinking to happen, in the case the user is submitting to the local condor pool.

      Relevant Emails

      Hi Britta /Duncan

      It does not look to me a RLS issue.
      The location of the file L1-
      SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input is in the cache
      file that is being passed to pegasus-plan in the prescript.
      Can you confirm?

      Here is my take on what is happening
      1) Duncan is passing a file url in the cache file for the local site.
      file:///home/bdaudert/lm-test1/871234214-871320614/playground/L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input

      2) You are trying to run on site LIGO_CIT.

      Since, the URL is a file url ( read not a server accessible URL ) and
      you compute site is LIGO_CIT, the replica selector will not pick up
      the file url passed for local site.
      The reason being the file URL for local site has no meaning on site
      LIGO_CIT

      Hence, in this case the URL's in the cache file need to be gsiftp .

      Duncan this will be an issue in the LIGO data grid also when a user
      tries to run on a site other than local site.

      Karan

      On Sep 2, 2009, at 8:28 AM, Britta Daudert wrote:

      Hi Karan, good new: the datafind dag ran without fail.

      Bad news: all other dags fail. I looked at two and it looks like they
      say the same kind of thing. Something like:

      [DEBUG] Selecting a pfn for lfn
      L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input
      amongst[(file:///home/bdaudert/lm-test1/871234214-871320614/playground/L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input
      ,

      {pool=local}

      )]
      2009.09.01 18:10:49.656 PDT: [FATAL ERROR]
      [1] java.lang.RuntimeException: Unable to select any location from the
      list passed for lfn
      L1-SIRE_SUMMARY_FIRST_PLAYGROUND-871234214-86400.input at
      org
      .griphyn
      .cPlanner.selector.replica.Default.selectReplica(Default.java:155)
      2009.09.01 18:10:49.656 PDT: [WARNING] Non-zero exit-code 1

      I guess this means that data is missing from RLS?
      I will re-populate the gps times I am using first thing today and then
      start repopulating the rest.

      I checked on another weeks data yesterday and found that 2661 out of
      10661 files where not registered. I had previously checked the same
      data
      set and all files where there. I think that the crash yesterday made
      some damage.

      Regards Britta

            Assignee:
            Duncan Brown
            Reporter:
            Karan Vahi
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: