pegasus-run does not do rescue correctly when submit directory is on NFS

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major
    • master
    • Affects Version/s: 3.0
    • Component/s: CLI: pegasus-run
    • None

      Scott ran into this issue.

      His submit directory is on NFS
      /home/scec-02/tera3d/runs/TEST_PP_dax/dags/tera3d/pegasus/CyberShake_TEST/20110124T104757-0800

      When pegasus planned the workflow, it created a condor log in the submit directory that pointed to a file in /tmp

      CyberShake_TEST-0.log -> /tmp/CyberShake_TEST-058331.log

      However, the workflow failed, and pegasus-run was used to submit a rescue dag

      In this case, pegasus-run took a backup of condor log in submit directory
      It moved

      CyberShake_TEST-0.log to CyberShake_TEST-0.log.000

      So after that we had
      CyberShake_TEST-0.log.000 -> /tmp/CyberShake_TEST-058331.log

      Because of the above, in case of rescue dag the log ended up being written to file CyberShake_TEST-0.log in the submit directory that was on NFS
      That caused the workflow to fail.

      The fix would be , that when a backup of condor log is created by pegasus-run , it checks if condor log is a symlink.
      If condor log is a symlink, pegasus-run should create a symlink in the submit directory that points to a log file in /tmp or whatever directory had the earlier condor log

            Assignee:
            Gaurang Mehta (Inactive)
            Reporter:
            Karan Vahi
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: