-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: 3.0
-
Component/s: CLI: pegasus-run
-
None
Scott ran into this issue.
His submit directory is on NFS
/home/scec-02/tera3d/runs/TEST_PP_dax/dags/tera3d/pegasus/CyberShake_TEST/20110124T104757-0800
When pegasus planned the workflow, it created a condor log in the submit directory that pointed to a file in /tmp
CyberShake_TEST-0.log -> /tmp/CyberShake_TEST-058331.log
However, the workflow failed, and pegasus-run was used to submit a rescue dag
In this case, pegasus-run took a backup of condor log in submit directory
It moved
CyberShake_TEST-0.log to CyberShake_TEST-0.log.000
So after that we had
CyberShake_TEST-0.log.000 -> /tmp/CyberShake_TEST-058331.log
Because of the above, in case of rescue dag the log ended up being written to file CyberShake_TEST-0.log in the submit directory that was on NFS
That caused the workflow to fail.
The fix would be , that when a backup of condor log is created by pegasus-run , it checks if condor log is a symlink.
If condor log is a symlink, pegasus-run should create a symlink in the submit directory that points to a log file in /tmp or whatever directory had the earlier condor log