monitord fails when trying to open a job error file in a workflow with condor recovery

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major
    • master, 4.7.0, 4.6.1
    • Affects Version/s: master, 4.6.0
    • Component/s: Monitord
    • None
    • Environment:
      Large LIGO run by amber at syrancuse

      LIGO has a large run, where the sub workflow is evicted multiple times repeatedly. this causes out of order events in dagman log for sub workflow, that trips monitord over, and it fails when trying to open a job error file ( the location of which it has not parsed from the submit file)

            Assignee:
            Karan Vahi
            Reporter:
            Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: