Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1332

monitord is failing on a dagman.out file

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: master, 4.9.1
    • Fix Version/s: master, 5.0.0, 4.9.1
    • Component/s: Monitord
    • Labels:
      None

      Description

      Monitord is failing with the error below, so the dashboard is incorrect:

      2018-11-18 17:23:35,449:INFO:Pegasus.monitoring.workflow(170): Parsing DAG file /usr1/dbrown/pycbc-tmp.zFX4x9ZOC5/work/00/00/main_ID0000001.000/03/89/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_FULL_DATA_FULL-1126051217-3331800.000/H1L1-FOREGROUND_MINIFOLLOWUP_FULL_DATA_FULL_CUMULATIVE_CAT_12H_FULL_DATA_FULL-1126051217-3331800.dag
      2018-11-18 17:23:35,455:INFO:Pegasus.monitoring.workflow(680): DAGMan starting with condor id 6505203.0
      Traceback (most recent call last):
        File "/usr/bin/pegasus-monitord", line 1259, in <module>
          process_output = process_dagman_out(workflow_entry.wf, workflow_entry.ml_buffer[0:ml_pos])
        File "/usr/bin/pegasus-monitord", line 691, in process_dagman_out
          add(wf, my_jobid, "%s_SCRIPT_SUCCESS" % (my_script), status=0)
        File "/usr/bin/pegasus-monitord", line 551, in add
          wf.update_job_state(jobid, sched_id, my_job_submit_seq, event, status, my_time, reason)
        File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/workflow.py", line 2351, in update_job_state
          real_app_exitcode = self.parse_job_output(my_job, job_state)
        File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/workflow.py", line 1938, in parse_job_output
          my_invocation_found = my_job.extract_job_info( my_output)
        File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/job.py", line 469, in extract_job_info
          task_output = self.split_task_output( my_record["stdout"])
        File "/usr/lib64/python2.7/site-packages/Pegasus/monitoring/job.py", line 695, in split_task_output
          task_data.write( task_output )
      UnicodeEncodeError: 'ascii' codec can't encode characters in position 1525-1527: ordinal not in range(128)
      2018-11-18 17:23:38,357:INFO:pegasus-monitord(185): DB flushing beginning
      2018-11-18 17:23:38,357:INFO:Pegasus.db.workflow_loader.WorkflowLoader(1100): Executing final flush
      2018-11-18 17:23:38,733:INFO:Pegasus.db.dashboard_loader.DashboardLoader(353): Executing final flush
      2018-11-18 17:23:38,734:INFO:pegasus-monitord(199): DB flushing ended

      The run is in

      /home/dbrown/projects/aligo/gw150914-fig4b/output/submitdir/work

      and I can reproduce this error with

      pegasus-monitord --replay gw150914-16day-c01-v1_3_2-0.dag.dagman.out

      Dashboard page is at

      https://sugwg-scitokens.phy.syr.edu/pegasus/u/dbrown/r/643/w?wf_uuid=32b4c10a-5a67-4b6b-8b78-75f239fccaa5

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              dbrown Duncan Brown
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: