monitord fails on divide by 0 error while computing avg cpu utilization

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major
    • master, 5.1.0, 5.0.1
    • Affects Version/s: master, 5.0.0
    • Component/s: Monitord
    • None

      cat monitord.log
      2021-03-25 00:49:53,569:INFO:Pegasus.db.connection(737): Rotating sqlite db file /web/data/qc/runs/605bddcb2d8bf/dags/605bddcb2d8bf-0.stampede.db
      s/605bddcb2d8bf-0.stampede.db
      Traceback (most recent call last):
      File "/usr/lib64/python3.6/site-packages/Pegasus/cli/pegasus-monitord.py", line 1538, in <module>
      workflow_entry.wf, workflow_entry.ml_buffer[0:ml_pos]
      File "/usr/lib64/python3.6/site-packages/Pegasus/cli/pegasus-monitord.py", line 881, in process_dagman_out
      add(wf, my_jobid, "%s_SCRIPT_SUCCESS" % (my_script), status=0)
      File "/usr/lib64/python3.6/site-packages/Pegasus/cli/pegasus-monitord.py", line 732, in add
      jobid, sched_id, my_job_submit_seq, event, status, my_time, reason
      File "/usr/lib64/python3.6/site-packages/Pegasus/monitoring/workflow.py", line 2774, in update_job_state
      real_app_exitcode = self.parse_job_output(my_job, job_state)
      File "/usr/lib64/python3.6/site-packages/Pegasus/monitoring/workflow.py", line 2314, in parse_job_output
      self.db_send_task_end(my_job, "MAIN_JOB", my_task_id, record)
      File "/usr/lib64/python3.6/site-packages/Pegasus/monitoring/workflow.py", line 1939, in db_send_task_end
      kwargs["avg_cpu"] = kwargs["remote_cpu_time"] / float(kwargs["dur"])
      ZeroDivisionError: float division by zero

        1. test.out
          14 kB
          George Papadimitriou

            Assignee:
            Mats Rynge
            Reporter:
            Rajiv Mayani
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: