Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-222

Condor common log not copied after workflow finishes

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • 3.0
    • Affects Version/s: master
    • Component/s: None
    • None

      While it is necessary and useful to have the Condor common log <dagname>-0.log being a symlink to a location in /tmp, I just stumbled over the following problem: Nobody feels responsible to copy the common log back into the work directory after the workflow is done. I rebooted my machine a couple of times, and of course, /tmp was cleaned out between boots. This is not good.

      I see two possible ways around this:

      [1] Create a directory (do not fail on EEXIST) /tmp/<username> and put the common log in there. Hopefully, the /tmp/<username> directory does not get cleaned away between boots. This still leaves us with clean-ups, and if a user uses NFS, the file is still only visible on the submit machine.

      [2] Once DAGMan finishes, and monitord will notice, it should copy the common log back into the working directory. In case of a successful workflow finish, removing the symlink and replacing it with the common log should not constitute any problem whatsoever. It gets a little trickier in case of failed workflows, in which case a restart may want to add to the common log. But I still think that in the successful case, something in Pegasus should copy back the common log so that it does not get lost.

      I like [2] more than [1], but [1] may be easier to implement for now.

            Assignee:
            vahi Karan Vahi
            Reporter:
            voeckler Jens Voeckler
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: