Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-306

pegasus-run does not do rescue correctly when submit directory is on NFS

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master
    • Affects Version/s: 3.0
    • Component/s: CLI: pegasus-run
    • None

      Scott ran into this issue.

      His submit directory is on NFS
      /home/scec-02/tera3d/runs/TEST_PP_dax/dags/tera3d/pegasus/CyberShake_TEST/20110124T104757-0800

      When pegasus planned the workflow, it created a condor log in the submit directory that pointed to a file in /tmp

      CyberShake_TEST-0.log -> /tmp/CyberShake_TEST-058331.log

      However, the workflow failed, and pegasus-run was used to submit a rescue dag

      In this case, pegasus-run took a backup of condor log in submit directory
      It moved

      CyberShake_TEST-0.log to CyberShake_TEST-0.log.000

      So after that we had
      CyberShake_TEST-0.log.000 -> /tmp/CyberShake_TEST-058331.log

      Because of the above, in case of rescue dag the log ended up being written to file CyberShake_TEST-0.log in the submit directory that was on NFS
      That caused the workflow to fail.

      The fix would be , that when a backup of condor log is created by pegasus-run , it checks if condor log is a symlink.
      If condor log is a symlink, pegasus-run should create a symlink in the submit directory that points to a log file in /tmp or whatever directory had the earlier condor log

            Assignee:
            gmehta Gaurang Mehta (Inactive)
            Reporter:
            vahi Karan Vahi
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: