Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-447

monitord does not mark a job as terminated in jobstate.log

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical Critical
    • 3.0.3
    • Affects Version/s: master, 3.0.3
    • Component/s: Monitord
    • None

      For a SCEC run available here

      http://www.isi.edu/~vahi/work/scec/CyberShake_USC_5_dax_5.tgz

      The jobstate.log file created is missing CONDOR_TERMINATED events for jobs

      Here is a snapshot from jobstate.log
      corbusier:CyberShake_USC_5_dax_5 vahi$ grep merge_scec-HighFrequency-1.0_PID3_ID12 jobstate.log
      1311820916 merge_scec-HighFrequency-1.0_PID3_ID12 SUBMIT 5437648.0 ranger - 212
      1311820963 merge_scec-HighFrequency-1.0_PID3_ID12 EXECUTE 5437648.0 ranger - 212
      1311820970 merge_scec-HighFrequency-1.0_PID3_ID12 IMAGE_SIZE 5437648.0 ranger - 212
      1311822690 merge_scec-HighFrequency-1.0_PID3_ID12 JOB_EVICTED 5437648.0 ranger - 212

      The dagman.out file has a JOB_TERMINATED event for the job though

      orbusier:CyberShake_USC_5_dax_5 vahi$ grep merge_scec-HighFrequency-1.0_PID3_ID12 CyberShake_USC_5-5.dag.dagman.out
      07/27/11 19:41:55 Submitting Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 job(s)...
      07/27/11 19:41:55 submitting: condor_submit -a dag_node_name' '=' 'merge_scec-HighFrequency-1.0_PID3_ID12 -a +DAGManJobId' '=' '5431096 -a DAGManJobId' '=' '5431096 -a submit_event_notes' '=' 'DAG' 'Node:' 'merge_scec-HighFrequency-1.0_PID3_ID12 -a +DAGParentNodeNames' '=' '"create_dir_CyberShake_USC_5_5_ranger,stage_in_remote_ranger_0,merge_scec-srf2stoch-1.0_PID2_ID6" merge_scec-HighFrequency-1.0_PID3_ID12.sub
      07/27/11 19:41:56 Event: ULOG_SUBMIT for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/27/11 19:42:43 Event: ULOG_EXECUTE for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/27/11 19:42:50 Event: ULOG_IMAGE_SIZE for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/27/11 20:11:30 Event: ULOG_JOB_EVICTED for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/27/11 20:21:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 20:31:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 20:41:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 20:51:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 21:01:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 21:11:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 21:21:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 21:31:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 21:41:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 21:51:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 22:01:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 22:11:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 22:21:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 22:31:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 22:41:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 22:51:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 23:01:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 23:11:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 23:21:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 23:31:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 23:41:56 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/27/11 23:51:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 00:01:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 00:11:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 00:21:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 00:31:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 00:41:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 00:51:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 01:01:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 01:11:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 01:21:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 01:31:57 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 01:41:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 01:51:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 02:01:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 02:11:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 02:21:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 02:31:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 02:41:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 02:51:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 03:01:58 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 03:11:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 03:21:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 03:31:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 03:41:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 03:51:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 04:01:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 04:11:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 04:21:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 04:31:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 04:41:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 04:51:59 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 05:02:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 05:12:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 05:22:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 05:32:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 05:42:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 05:52:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 06:02:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 06:12:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 06:22:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 06:32:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 06:42:00 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 06:52:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 07:02:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 07:12:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 07:22:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 07:32:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 07:42:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 07:52:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 08:02:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 08:12:01 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 08:19:04 Event: ULOG_EXECUTE for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/28/11 08:29:09 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 08:39:09 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 08:49:09 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 09:12:04 Node merge_scec-HighFrequency-1.0_PID3_ID12, Condor ID 5437648, status STATUS_SUBMITTED
      07/28/11 09:12:54 Event: ULOG_JOB_TERMINATED for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/28/11 09:12:54 Node merge_scec-HighFrequency-1.0_PID3_ID12 job proc (5437648.0.0) completed successfully.
      07/28/11 09:12:54 Node merge_scec-HighFrequency-1.0_PID3_ID12 job completed
      07/28/11 09:12:54 Running POST script of Node merge_scec-HighFrequency-1.0_PID3_ID12...
      07/28/11 09:12:59 Event: ULOG_POST_SCRIPT_TERMINATED for Condor Node merge_scec-HighFrequency-1.0_PID3_ID12 (5437648.0.0)
      07/28/11 09:12:59 POST Script of Node merge_scec-HighFrequency-1.0_PID3_ID12 completed successfully.
      07/28/11 09:14:33 submitting: condor_submit -a dag_node_name' '=' 'merge_scec-MergeFrequency-1.0_PID4_ID6 -a +DAGManJobId' '=' '5431096 -a DAGManJobId' '=' '5431096 -a submit_event_notes' '=' 'DAG' 'Node:' 'merge_scec-MergeFrequency-1.0_PID4_ID6 -a +DAGParentNodeNames' '=' '"create_dir_CyberShake_USC_5_5_ranger,merge_scec-seismogram_synthesis-1.0_PID2_ID4,merge_scec-seismogram_synthesis-1.0_PID2_ID5,merge_scec-seismogram_synthesis-1.0_PID2_ID6,merge_scec-seismogram_synthesis-1.0_PID2_ID8,merge_scec-HighFrequency-1.0_PID3_ID12,merge_scec-HighFrequency-1.0_PID3_ID11" merge_scec-MergeFrequency-1.0_PID4_ID6.sub
      corbusier:CyberShake_USC_5_dax_5 vahi$

            Assignee:
            vahi Karan Vahi
            Reporter:
            vahi Karan Vahi
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: