wget errors because of network issues

XMLWordPrintable

      Hi Karan, Mats,

      starting a few weeks ago we’ve been running into a problem where jobs on Comet are unable to download the executable from Syracuse. Here’s an example

      <https://sugar-dev2.phy.syr.edu/pegasus/u/amber.lenon/r/119/w/2/j/19376/ji/10327/stderr>

      and the error is

      2016-02-24 01:49:53,195 INFO: Tool found: wget Version: 1.12 Path: /usr/bin/wget
      2016-02-24 01:49:53,195 INFO: /usr/bin/wget -nv --no-cookies --no-check-certificate -O '/data1/condor_local/execute/dir_3185356/glide_d7pv6O/execute/dir_3234931/pegasus.te9xVU/inspiral-NSBH01_INJ-L1_ID372' 'http://code.pycbc.phy.syr.edu/pycbc-software/v1.3.6/x86_64/composer_xe_2015.0.090/pycbc_inspiral'
      2016-02-24 01:49:53,210 ERROR: Command exited with non-zero exit code (4): /usr/bin/wget …

      From what I can tell by looking at the apache logs these requests aren’t even reaching us. Edgar has run the same wget from the command line on that machine and it worked

      [1139] ligo@comet-18 ~$ wget -nv --no-cookies --no-check-certificate -O '/tmp/ada' 'http://code.pycbc.phy.syr.edu/pycbc-software/v1.3.6/x86_64/composer_xe_2015.0.090/pycbc_inspiral'
      2016-02-24 11:40:03 URL:http://code.pycbc.phy.syr.edu/pycbc-software/v1.3.6/x86_64/composer_xe_2015.0.090/pycbc_inspiral [80055063/80055063] -> "/tmp/ada" [1]
      [1140] ligo@comet-18 ~$ ls -lh /tmp/ada
      rw-rw-r-. 1 ligo ligo 77M Feb 3 12:02 /tmp/ada

      Brian has also been able to run from the command line after doing condor_ssh_to_job to mimic the environment as closely as possible.

      Can either of you think of anything about the environment that Pegasus sets up that could cause this?

      Thanks,

            Assignee:
            Mats Rynge
            Reporter:
            Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: