Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1147

pegasus-transfer should check that files exist before trying to transfer them

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.8.0
    • Fix Version/s: 4.8.0, 4.7.3
    • Component/s: pegasus-transfer
    • Labels:
      None

      Description

      After running the user job, kickstart always tries to transfer files back to the submit host even if these files do not exist.

      This leads to confusing error messages in the stderr file (see below). Users think that there is some kind of globus problem because they see "error: globus_xio" rather than the fact that the user job did not create the output that kickstart is trying to send back.

      Before initiating the file transfer, kickstart should check whether or not the output file exists. If it does not, kickstart should print a user-friendly error message saying that a target output file of the job does not exist and that the user should check their job to see why that file was not created.


      ############################# staging out output files #############################
      2016-12-06 07:01:56,342 INFO: Reading URL pairs from stdin
      2016-12-06 07:01:56,343 INFO: 1 transfers loaded
      2016-12-06 07:01:56,343 INFO: PATH=/var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/pegasus-4.8.0dev/bin:/usr/local/bin:/usr/bin:/bin
      2016-12-06 07:01:56,343 INFO: LD_LIBRARY_PATH=
      2016-12-06 07:01:56,678 INFO: --------------------------------------------------------------------------------
      2016-12-06 07:01:56,678 INFO: Starting transfers - attempt 1
      2016-12-06 07:01:58,733 INFO: Tool found: globus-url-copy Version: 9.22 Path: /usr/bin/globus-url-copy
      2016-12-06 07:01:58,734 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-L4hpaZ.lst
      2016-12-06 07:01:58,734 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-L4hpaZ.lst
      2016-12-06 07:01:58,906 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
      globus_xio: System error in open: No such file or directory
      globus_xio: A system call failed: No such file or directory
      2016-12-06 07:01:58,907 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
      2016-12-06 07:01:58,907 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-CK6U6R.lst
      2016-12-06 07:01:58,907 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-CK6U6R.lst
      2016-12-06 07:01:59,023 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
      globus_xio: System error in open: No such file or directory
      globus_xio: A system call failed: No such file or directory
      2016-12-06 07:01:59,142 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
      2016-12-06 07:01:59,143 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-o81W5s.lst
      2016-12-06 07:01:59,143 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-o81W5s.lst
      2016-12-06 07:01:59,259 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
      globus_xio: System error in open: No such file or directory
      globus_xio: A system call failed: No such file or directory
      2016-12-06 07:01:59,259 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
      2016-12-06 07:02:41,563 INFO: --------------------------------------------------------------------------------
      2016-12-06 07:02:41,563 INFO: Starting transfers - attempt 2
      2016-12-06 07:02:44,313 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-nQYkRj.lst
      2016-12-06 07:02:44,313 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-nQYkRj.lst
      2016-12-06 07:03:41,388 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
      globus_xio: System error in open: No such file or directory
      globus_xio: A system call failed: No such file or directory
      2016-12-06 07:03:42,607 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
      2016-12-06 07:06:12,254 INFO: --------------------------------------------------------------------------------
      2016-12-06 07:06:12,254 INFO: Starting transfers - attempt 3
      2016-12-06 07:06:14,686 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-wNPLIT.lst
      2016-12-06 07:06:14,686 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-wNPLIT.lst
      2016-12-06 07:06:44,426 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
      globus_xio: System error in open: No such file or directory
      globus_xio: A system call failed: No such file or directory
      2016-12-06 07:06:44,779 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
      2016-12-06 07:06:44,995 INFO: --------------------------------------------------------------------------------
      2016-12-06 07:06:45,039 INFO: Stats: Total 5 transfers, 0.0 B transferred in 289 seconds. Rate: 0.0 B/s (0.0 b/s)
      2016-12-06 07:06:45,039 INFO: Between sites osg->local : 5 transfers, 0.0 B transferred in 289 seconds. Rate: 0.0 B/s (0.0 b/s)
      2016-12-06 07:06:45,039 CRITICAL: Some transfers failed! See above, and possibly stderr.
      2016-12-06 07:06:59: Last command exited with 1
      2016-12-06 07:09:46: /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS cleaned up
      PegasusLite: exitcode 1

        Attachments

          Activity

            People

            • Assignee:
              rynge Mats Rynge
              Reporter:
              dbrown Duncan Brown
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: