pegasus-run fails to restart workflow in some cases

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major
    • None
    • Affects Version/s: None
    • Component/s: CLI: pegasus-run
    • None

      My submit host got messed up and we had to reboot it. When I tried to resubmit some of the workflows I got this error:

      $ pegasus-run $PWD
      Rescued /submit/kepler2012/work/gideon/pegasus/Run2-25/run0001/Run2-25-0.log as /submit/kepler2012/work/gideon/pegasus/Run2-25/run0001/Run2-25-0.log.000

      ERROR: "Run2-25-0.dag.condor.sub" already exists.
      ERROR: "Run2-25-0.dag.lib.out" already exists.
      ERROR: "Run2-25-0.dag.lib.err" already exists.
      ERROR: "Run2-25-0.dag.dagman.log" already exists.

      Some file(s) needed by condor_dagman already exist. Either rename them,
      use the "-f" option to force them to be overwritten, or use
      the "-update_submit" option to update the submit file and continue.
      ERROR: Running pegasus-submit-dag failed with exit code 1 at /usr/local/pegasus/bin/pegasus-run line 289.

      I removed the files listed in the ERROR messages and everything seemed to work.

      I tarred up the workflow and put it here: juve.isi.edu:/submit/kepler2012/work/gideon/pegasus/Run2-25/Run2-25.tar.gz

            Assignee:
            Gaurang Mehta (Inactive)
            Reporter:
            Gideon Juve
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: