Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1332

monitord is failing on a dagman.out file

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master, 5.0.0, 4.9.1
    • Affects Version/s: master, 4.9.1
    • Component/s: Monitord
    • None

      Monitord is failing with the error below, so the dashboard is incorrect:

      2018-11-18 17:23:35,449:INFO:Pegasus.monitoring.workflow(170): Parsing DAG file /usr1/dbrown/pycbc-tmp.zFX4x9ZOC5/work/00/00/main_ID0000001.000/03/89/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_FULL_DATA_FULL-1126051217-3331800.000/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_FULL_DATA_FULL-1126051217-3331800.dag
      2018-11-18 17:23:35,455:INFO:Pegasus.monitoring.workflow(680): DAGMan starting with condor id 6505203.0
      Traceback (most recent call last):
      File "/usr/bin/pegasus-monitord", line 1259, in <module>
      process_output = process_dagman_out(workflow_entry.wf, workflow_entry.ml_buffer[0:ml_pos])
      File "/usr/bin/pegasus-monitord", line 691, in process_dagman_out
      add(wf, my_jobid, "%s_SCRIPT_SUCCESS" % (my_script), status=0)
      File "/usr/bin/pegasus-monitord", line 551, in add
      wf.update_job_state(jobid, sched_id, my_job_submit_seq, event, status, my_time, reason)
      File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/workflow.py", line 2351, in update_job_state
      real_app_exitcode = self.parse_job_output(my_job, job_state)
      File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/workflow.py", line 1938, in parse_job_output
      my_invocation_found = my_job.extract_job_info( my_output)
      File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/job.py", line 469, in extract_job_info
      task_output = self.split_task_output( my_record["stdout"])
      File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/job.py", line 695, in split_task_output
      task_data.write( task_output )
      UnicodeEncodeError: 'ascii' codec can't encode characters in position 1525-1527: ordinal not in range(128)
      2018-11-18 17:23:38,357:INFO:pegasus-monitord(185): DB flushing beginning
      2018-11-18 17:23:38,357:INFO:Pegasus.db.workflow_loader.WorkflowLoader(1100): Executing final flush
      2018-11-18 17:23:38,733:INFO:Pegasus.db.dashboard_loader.DashboardLoader(353): Executing final flush
      2018-11-18 17:23:38,734:INFO:pegasus-monitord(199): DB flushing ended

      The run is in

      /home/dbrown/projects/aligo/gw150914-fig4b/output/submitdir/work

      and I can reproduce this error with

      pegasus-monitord --replay gw150914-16day-c01-v1_3_2-0.dag.dagman.out

      Dashboard page is at

      https://sugwg-scitokens.phy.syr.edu/pegasus/u/dbrown/r/643/w?wf_uuid=32b4c10a-5a67-4b6b-8b78-75f239fccaa5

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: