Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1068

monitord fails when trying to open a job error file in a workflow with condor recovery

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master, 4.7.0, 4.6.1
    • Affects Version/s: master, 4.6.0
    • Component/s: Monitord
    • None
    • Environment:
      Large LIGO run by amber at syrancuse

      LIGO has a large run, where the sub workflow is evicted multiple times repeatedly. this causes out of order events in dagman log for sub workflow, that trips monitord over, and it fails when trying to open a job error file ( the location of which it has not parsed from the submit file)

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: