Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1147

pegasus-transfer should check that files exist before trying to transfer them

XMLWordPrintable

      After running the user job, kickstart always tries to transfer files back to the submit host even if these files do not exist.

      This leads to confusing error messages in the stderr file (see below). Users think that there is some kind of globus problem because they see "error: globus_xio" rather than the fact that the user job did not create the output that kickstart is trying to send back.

      Before initiating the file transfer, kickstart should check whether or not the output file exists. If it does not, kickstart should print a user-friendly error message saying that a target output file of the job does not exist and that the user should check their job to see why that file was not created.

                                                              1. staging out output files #############################
                                                                2016-12-06 07:01:56,342 INFO: Reading URL pairs from stdin
                                                                2016-12-06 07:01:56,343 INFO: 1 transfers loaded
                                                                2016-12-06 07:01:56,343 INFO: PATH=/var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/pegasus-4.8.0dev/bin:/usr/local/bin:/usr/bin:/bin
                                                                2016-12-06 07:01:56,343 INFO: LD_LIBRARY_PATH=
                                                                2016-12-06 07:01:56,678 INFO: --------------------------------------------------------------------------------
                                                                2016-12-06 07:01:56,678 INFO: Starting transfers - attempt 1
                                                                2016-12-06 07:01:58,733 INFO: Tool found: globus-url-copy Version: 9.22 Path: /usr/bin/globus-url-copy
                                                                2016-12-06 07:01:58,734 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-L4hpaZ.lst
                                                                2016-12-06 07:01:58,734 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-L4hpaZ.lst
                                                                2016-12-06 07:01:58,906 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
                                                                globus_xio: System error in open: No such file or directory
                                                                globus_xio: A system call failed: No such file or directory
                                                                2016-12-06 07:01:58,907 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
                                                                2016-12-06 07:01:58,907 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-CK6U6R.lst
                                                                2016-12-06 07:01:58,907 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-CK6U6R.lst
                                                                2016-12-06 07:01:59,023 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
                                                                globus_xio: System error in open: No such file or directory
                                                                globus_xio: A system call failed: No such file or directory
                                                                2016-12-06 07:01:59,142 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
                                                                2016-12-06 07:01:59,143 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-o81W5s.lst
                                                                2016-12-06 07:01:59,143 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-o81W5s.lst
                                                                2016-12-06 07:01:59,259 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
                                                                globus_xio: System error in open: No such file or directory
                                                                globus_xio: A system call failed: No such file or directory
                                                                2016-12-06 07:01:59,259 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
                                                                2016-12-06 07:02:41,563 INFO: --------------------------------------------------------------------------------
                                                                2016-12-06 07:02:41,563 INFO: Starting transfers - attempt 2
                                                                2016-12-06 07:02:44,313 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-nQYkRj.lst
                                                                2016-12-06 07:02:44,313 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-nQYkRj.lst
                                                                2016-12-06 07:03:41,388 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
                                                                globus_xio: System error in open: No such file or directory
                                                                globus_xio: A system call failed: No such file or directory
                                                                2016-12-06 07:03:42,607 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
                                                                2016-12-06 07:06:12,254 INFO: --------------------------------------------------------------------------------
                                                                2016-12-06 07:06:12,254 INFO: Starting transfers - attempt 3
                                                                2016-12-06 07:06:14,686 INFO: Grouped 1 similar gsiftp transfers together in temporary file /var/lib/condor/execute/dir_3379755/pegasus-transfer-wNPLIT.lst
                                                                2016-12-06 07:06:14,686 INFO: /usr/bin/globus-url-copy -r -create-dest -no-third-party-transfers -no-data-channel-authentication -f /var/lib/condor/execute/dir_3379755/pegasus-transfer-wNPLIT.lst
                                                                2016-12-06 07:06:44,426 INFO: error: globus_xio: Unable to open file /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS/112834/H1-INSPIRAL_BBH01_INJ_JOB0-1128349884-1800.hdf
                                                                globus_xio: System error in open: No such file or directory
                                                                globus_xio: A system call failed: No such file or directory
                                                                2016-12-06 07:06:44,779 ERROR: Command exited with non-zero exit code (1): /usr/bin/globus-url-copy ...
                                                                2016-12-06 07:06:44,995 INFO: --------------------------------------------------------------------------------
                                                                2016-12-06 07:06:45,039 INFO: Stats: Total 5 transfers, 0.0 B transferred in 289 seconds. Rate: 0.0 B/s (0.0 b/s)
                                                                2016-12-06 07:06:45,039 INFO: Between sites osg->local : 5 transfers, 0.0 B transferred in 289 seconds. Rate: 0.0 B/s (0.0 b/s)
                                                                2016-12-06 07:06:45,039 CRITICAL: Some transfers failed! See above, and possibly stderr.
                                                                2016-12-06 07:06:59: Last command exited with 1
                                                                2016-12-06 07:09:46: /var/lib/condor/execute/dir_3379755/pegasus.sSSWUS cleaned up
                                                                PegasusLite: exitcode 1

            Assignee:
            rynge Mats Rynge
            Reporter:
            dbrown Duncan Brown
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: