Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-999

pegasus-transfer taking too long to finish in case of retries

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major Major
    • master, 4.6.0, 4.5.4
    • Affects Version/s: master, 4.5.2
    • Component/s: CLI: pegasus-transfer
    • None
    • Environment:
      Pegasus tutorial on workflow.isi.edu with training accounts

      In one of the examples, we get the stage job to fail removing the input file
      pegtrain01@workflow:~/examples/split/pegtrain01/pegasus/split/run0002$ cat stage_in_remote_local_0_0.in

      1. src 1 local prio 0
        file:///nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html
      2. dst 1 local
        file:///nfs/ccg3/ccg/home/pegtrain01/run/pegtrain01/pegasus/split/run0002/pegasus.html
        pegtrain01@workflow:~/examples/split/pegtrain01/pegasus/split/run0002$

      The transfer job runs locally.

      pegtrain01@workflow:~/examples/split/pegtrain01/pegasus/split/run0002$ time pegasus-transfer -n 1 -f stage_in_remote_local_0_0.in
      2015-10-23 15:19:55,917 INFO: Reading URL pairs from stage_in_remote_local_0_0.in
      2015-10-23 15:19:55,917 INFO: PATH=/usr/bin:/usr/local/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
      2015-10-23 15:19:55,917 INFO: LD_LIBRARY_PATH=
      2015-10-23 15:19:55,918 INFO: 1 transfers loaded
      2015-10-23 15:19:55,919 INFO: Sorting the tranfers based on transfer type and source/destination
      2015-10-23 15:19:55,919 INFO: --------------------------------------------------------------------------------
      2015-10-23 15:19:55,919 INFO: Starting transfers - attempt 1
      2015-10-23 15:19:55,919 INFO: Using 1 threads for this round of transfers
      /bin/cp: cannot stat `/nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html': No such file or directory
      2015-10-23 15:19:57,929 ERROR: Command exited with non-zero exit code (1): /bin/cp -f -R -L '/nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html' '/nfs/ccg3/ccg/home/pegtrain01/run/pegtrain01/pegasus/split/run0002/pegasus.html'
      2015-10-23 15:20:33,966 INFO: --------------------------------------------------------------------------------
      2015-10-23 15:20:33,966 INFO: Starting transfers - attempt 2
      2015-10-23 15:20:33,966 INFO: Using 1 threads for this round of transfers
      /bin/cp: cannot stat `/nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html': No such file or directory
      2015-10-23 15:20:35,977 ERROR: Command exited with non-zero exit code (1): /bin/cp -f -R -L '/nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html' '/nfs/ccg3/ccg/home/pegtrain01/run/pegtrain01/pegasus/split/run0002/pegasus.html'
      2015-10-23 15:23:36,078 INFO: --------------------------------------------------------------------------------
      2015-10-23 15:23:36,078 INFO: Starting transfers - attempt 3
      2015-10-23 15:23:36,079 INFO: Using 1 threads for this round of transfers
      /bin/cp: cannot stat `/nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html': No such file or directory
      2015-10-23 15:23:38,091 ERROR: Command exited with non-zero exit code (1): /bin/cp -f -R -L '/nfs/ccg3/ccg/home/pegtrain01/examples/split/input/pegasus.html' '/nfs/ccg3/ccg/home/pegtrain01/run/pegtrain01/pegasus/split/run0002/pegasus.html'
      2015-10-23 15:23:38,091 INFO: --------------------------------------------------------------------------------
      2015-10-23 15:23:38,091 INFO: Stats: no local files in the transfer set
      2015-10-23 15:23:38,092 CRITICAL: Some transfers failed! See above, and possibly stderr.

      real 3m42.258s
      user 0m0.085s
      sys 0m0.022s

      Looking at the output it appears, there is a delay of one minute before pegasus-transfer starts the retry.

      is this deliberate?

            Assignee:
            rynge Mats Rynge
            Reporter:
            vahi Karan Vahi
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: