monitord is tripping over the quoting function it uses to quote the stdout section in the kickstart output. Gideon can you take a look?
Recently I have been noticing that when running a number of my
workflows pegasus-analyser starts reporting incorrect information
after some amount of time. This has the knock-on effect that
pegasus-plots and the dashboard also report incorrect information.
This generally has the effect that analyser reports my job is only
(say) 10% complete, when really it is much further along.
Two examples of this on sugar-dev3:
/usr1/spxiwh/log/spxiwh/pegasus/weekly_ahope/run0002/
/usr1/spxiwh/log/spxiwh/pegasus/weekly_ahope/run0005/
One recommendation from pegasus-plots was to run pegasus-monitord in
replay mode. So I tried this:
pegasus-monitord --verbose --replay
/usr1/spxiwh/log/spxiwh/pegasus/weekly_ahope/run0002/*dag.dagman.out
After about 10 minutes that command failed with:
2014-05-22 09:42:58,265:workflow.py:parse_job_output:1705: INFO:
Starting extraction of job_info from job output file
/usr1/spxiwh/log/spxiwh/pegasus/weekly_ahope/run0002/pycbc_inspiral_ID022937.out.000
/usr/lib64/pegasus/python/Pegasus/tools/utils.py:75: UnicodeWarning:
Unicode equal comparison failed to convert both arguments to Unicode -
interpreting them as being unequal
return ''.join(map(mapping.getitem_, s))
Traceback (most recent call last):
File "/usr/bin/pegasus-monitord", line 1349, in <module>
process_output = process_dagman_out(workflow_entry.wf,
workflow_entry.ml_buffer[0:ml_pos])
File "/usr/bin/pegasus-monitord", line 758, in process_dagman_out
add(wf, my_jobid, "JOB_SUCCESS", sched_id=my_sched_id, status=0)
File "/usr/bin/pegasus-monitord", line 589, in add
wf.update_job_state(jobid, sched_id, my_job_submit_seq, event,
status, my_time)
File "/usr/lib64/pegasus/python/Pegasus/monitoring/workflow.py",
line 1981, in update_job_state
self.parse_job_output(my_job, job_state)
File "/usr/lib64/pegasus/python/Pegasus/monitoring/workflow.py",
line 1706, in parse_job_output
my_invocation_found = my_job.extract_job_info(self._run_dir, my_output)
File "/usr/lib64/pegasus/python/Pegasus/monitoring/job.py", line
331, in extract_job_info
stdout_text_list.append(utils.quote(my_record["stdout"]))
File "/usr/lib64/pegasus/python/Pegasus/tools/utils.py", line 75, in quote
return ''.join(map(mapping.getitem_, s))
KeyError: u'\xe2'
The only strange thing I notice about the listed .out file is that it
contains some warning messages from scipy compiling a function. These
messages contain non-ASCII characters:
/home/spxiwh/.python26_compiled/sc_fb2424d04c9b3822b33b4d49e59ccce70.cpp:670:
warning: unused variable ‘Narr’
(specifically the quotes around Narr).
Is it obvious what the problem is here? Is there anything I can do to
fix it? The workflows in question are being used to profile different
stages of ahope, and pegasus-plots is extremely useful in clearly
displaying which jobs are running the longest and where we need to
optimize.