Loading...

Type: Bug
Resolution: Won't Fix
Priority: Major
Fix Version/s: master, 3.0.3, 3.1
Affects Version/s: 3.0.3
Component/s: Monitord
Labels:
None

We did a LIGO run from the latest 3.0 branch ( 3.0.3cvs )

Looking at the the outer level workflow, I see
[vahi@sugar H1L1-s6c_lowmass_ihope-956707143-86400.7xhgOx]$ tail *dagman.out
05/13 05:14:37 === === === === === === ===
05/13 05:14:37 86 0 0 0 0 0 0
05/13 05:14:37 Note: 26 total PRE script deferrals because of -MaxPre limit (1)
05/13 05:14:37 All jobs Completed!
05/13 05:14:37 Note: 0 total job deferrals because of -MaxJobs limit (5000)
05/13 05:14:37 Note: 0 total job deferrals because of -MaxIdle limit (2000)
05/13 05:14:37 Note: 0 total job deferrals because of node category throttles
05/13 05:14:37 Note: 26 total PRE script deferrals because of -MaxPre limit (1)
05/13 05:14:37 Note: 0 total POST script deferrals because of -MaxPost limit (20)
05/13 05:14:37 **** condor_scheduniv_exec.12055169.0 (condor_DAGMAN) pid 5025 EXITING WITH STATUS 0

DAGMan completed the workflow successfully

However the jobstate.log does not indicate completion
[vahi@sugar H1L1-s6c_lowmass_ihope-956707143-86400.7xhgOx]$ tail jobstate.log
1305277874 subdag_plot_hipe_bnslininj_summary_plots_cat_4_veto.BNSLININJ_SUMMARY_PLOTS_CAT_4_VETO_ID000053 EXECUTE 12154987.0 - - 74
1305277874 subdag_plot_hipe_bnslininj_summary_plots_cat_3_veto.BNSLININJ_SUMMARY_PLOTS_CAT_3_VETO_ID000052 EXECUTE 12154989.0 - - 75
1305277944 subdag_plot_hipe_allinj_summary_plots_cat_4_veto.ALLINJ_SUMMARY_PLOTS_CAT_4_VETO_ID000081 JOB_TERMINATED 12154955.0 - - 53
1305277944 subdag_plot_hipe_allinj_summary_plots_cat_4_veto.ALLINJ_SUMMARY_PLOTS_CAT_4_VETO_ID000081 JOB_SUCCESS 0 - - 53
1305277944 subdag_plot_hipe_full_data_summary_plots_cat_4_veto.FULL_DATA_SUMMARY_PLOTS_CAT_4_VETO_ID000049 JOB_TERMINATED 12154994.0 - - 77
1305277944 subdag_plot_hipe_full_data_summary_plots_cat_4_veto.FULL_DATA_SUMMARY_PLOTS_CAT_4_VETO_ID000049 JOB_SUCCESS 0 - - 77
1305277949 subdag_plot_hipe_allinj_summary_plots_cat_3_veto.ALLINJ_SUMMARY_PLOTS_CAT_3_VETO_ID000080 JOB_TERMINATED 12154956.0 - - 54
1305277949 subdag_plot_hipe_allinj_summary_plots_cat_3_veto.ALLINJ_SUMMARY_PLOTS_CAT_3_VETO_ID000080 JOB_SUCCESS 0 - - 54
1305277950 subdag_plot_hipe_full_data_slide_summary_plots_cat_4_veto.FULL_DATA_SLIDE_SUMMARY_PLOTS_CAT_4_VETO_ID000045 JOB_TERMINATED 12154998.0 - - 78
1305277950 subdag_plot_hipe_full_data_slide_summary_plots_cat_4_veto.FULL_DATA_SLIDE_SUMMARY_PLOTS_CAT_4_VETO_ID000045 JOB_SUCCESS 0 - - 78

In the monitord log we have this

[vahi@sugar H1L1-s6c_lowmass_ihope-956707143-86400.7xhgOx]$ more monitord.log
Traceback (most recent call last):
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 2998, in ?
new_dagman_out = process(workflow_entry.wf, workflow_entry.ml_buffer[0:ml_pos])
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 2262, in process
add(wf, my_jobid, my_event, condor_id=my_condor_id)
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 2173, in add
wf.update_job_state(jobid, my_job_submit_seq, event, status, my_time)
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 1752, in update_job_state
self.db_send_job_state(my_job)
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 1379, in db_send_job_state
self.output_to_db("job.state", kwargs)
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 717, in output_to_db
self._sink.send(event, kwargs)
File "/home/vahi/SOFTWARE/install/pegasus/default/bin/pegasus-monitord", line 284, in send
self._db.notify(d)
File "/home/vahi/SOFTWARE/install/pegasus/default/lib/python/netlogger/analysis/modules/_base.py", line 251, in notify
raise ProcessException(str(err))
netlogger.analysis.modules._base.ProcessException: New instance <Jobstate at 0x2b8841a5d0d0> with identity key (<class 'netlogger.analysis.schema.st
ampede_schema.Jobstate'>, (2798, 'EXECUTE', 1305277946.0, '2')) conflicts with persistent instance <Jobstate at 0x2b8841a5d6d0>

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

jira-pm-387.bp.tar.bz2
236 kB
16/May/11 1:29 PM
ligo-jira-pm387.tar.bz2
2.60 MB
13/May/11 6:30 AM

Details

Description

Attachments

Attachments

Activity

People

Dates