Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1737

monitord fails on divide by 0 error while computing avg cpu utilization

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • master, 5.0.0
    • master, 5.1.0, 5.0.1
    • Monitord
    • None

    Description

       cat monitord.log
      2021-03-25 00:49:53,569:INFO:Pegasus.db.connection(737): Rotating sqlite db file /web/data/qc/runs/605bddcb2d8bf/dags/605bddcb2d8bf-0.stampede.db
      s/605bddcb2d8bf-0.stampede.db
      Traceback (most recent call last):
        File "/usr/lib64/python3.6/site-packages/Pegasus/cli/pegasus-monitord.py", line 1538, in <module>
          workflow_entry.wf, workflow_entry.ml_buffer[0:ml_pos]
        File "/usr/lib64/python3.6/site-packages/Pegasus/cli/pegasus-monitord.py", line 881, in process_dagman_out
          add(wf, my_jobid, "%s_SCRIPT_SUCCESS" % (my_script), status=0)
        File "/usr/lib64/python3.6/site-packages/Pegasus/cli/pegasus-monitord.py", line 732, in add
          jobid, sched_id, my_job_submit_seq, event, status, my_time, reason
        File "/usr/lib64/python3.6/site-packages/Pegasus/monitoring/workflow.py", line 2774, in update_job_state
          real_app_exitcode = self.parse_job_output(my_job, job_state)
        File "/usr/lib64/python3.6/site-packages/Pegasus/monitoring/workflow.py", line 2314, in parse_job_output
          self.db_send_task_end(my_job, "MAIN_JOB", my_task_id, record)
        File "/usr/lib64/python3.6/site-packages/Pegasus/monitoring/workflow.py", line 1939, in db_send_task_end
          kwargs["avg_cpu"] = kwargs["remote_cpu_time"] / float(kwargs["dur"])
      ZeroDivisionError: float division by zero

      Attachments

        1. test.out
          14 kB
          George Papadimitriou

        Activity

          People

            rynge Mats Rynge
            mayani Rajiv Mayani
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: