Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1374

make monitord resilient to dagman logging the debug level in dagman.out

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: master, 4.9.1
    • Fix Version/s: master, 5.0.0, 4.9.2
    • Component/s: Monitord
    • Labels:
      None

      Description

      monitord parsing of dagman.out file breaks if dagman logging is enabled , and the log level gets recorded in the dagman.out file.

      For example snippet below:
      5/10/19 16:12:03 (D_ALWAYS) Adding a DAGMan workflow log /home/nu_vahi/tutorial/pop/submit/nu_vahi/pegasus/population/run0003/./population-0.dag.nodes.log
      05/10/19 16:12:03 (D_ALWAYS) Masking the events recorded in the DAGMAN workflow log
      05/10/19 16:12:03 (D_ALWAYS) Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36
      05/10/19 16:12:03 (D_ALWAYS) submitting: /usr/bin/condor_submit -a dag_node_name' '=' 'stage_in_remote_local_1_0 -a +DAGManJobId' '=' '73970 -a DAGManJobId' '=' '73970 -batch-name population-0.dag+73970 -a submit_event_notes' '=' 'DAG' 'Node:' 'stage_in_remote_local_1_0 -a dagman_log' '=' '/home/nu_vahi/tutorial/pop/submit/nu_vahi/pegasus/population/run0003/./population-0.dag.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a priority=700 -a +DAGNodeRetry' '=' '0 -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a +KeepClaimIdle' '=' '20 -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"create_dir_population_0_local" 00/00/stage_in_remote_local_1_0.sub
      05/10/19 16:12:03 (D_ALWAYS) From submit: Submitting job(s).
      05/10/19 16:12:03 (D_ALWAYS) From submit: 1 job(s) submitted to cluster 73973.
      05/10/19 16:12:03 (D_ALWAYS) From submit: WARNING: the line 'copy_to_spool = false' was unused by condor_submit. Is it a typo?
      05/10/19 16:12:03 (D_ALWAYS) assigned HTCondor ID (73973.0.0)
      05/10/19 16:12:03 (D_ALWAYS) Submitting HTCondor Node stage_in_remote_local_0_0 job(s)...
      05/10/19 16:12:03 (D_ALWAYS) Adding a DAGMan workflow log /home/nu_vahi/tutorial/pop/submit/nu_vahi/pegasus/population/run0003/./population-0.dag.nodes.log
      05/10/19 16:12:03 (D_ALWAYS) Masking the events recorded in the DAGMAN workflow log
      05/10/19 16:12:03 (D_ALWAYS) Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36
      05/10/19 16:12:03 (D_ALWAYS) submitting: /usr/bin/condor_submit -a dag_node_name' '=' 'stage_in_remote_local_0_0 -a +DAGManJobId' '=' '73970 -a DAGManJobId' '=' '73970 -batch-name population-0.dag+73970 -a submit_event_notes' '=' 'DAG' 'Node:' 'stage_in_remote_local_0_0 -a dagman_log' '=' '/home/nu_vahi/tutorial/pop/submit/nu_vahi/pegasus/population/run0003/./population-0.dag.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a priority=700 -a +DAGNodeRetry' '=' '0 -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a +KeepClaimIdle' '=' '20 -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"create_dir_population_0_local" 00/00/stage_in_remote_local_0_0.sub
      05/10/19 16:12:03 (D_ALWAYS) From submit: Submitting job(s).
      05/10/19 16:12:03 (D_ALWAYS) From submit: 1 job(s) submitted to cluster 73974.
      05/10/19 16:12:03 (D_ALWAYS) From submit: WARNING: the line 'copy_to_spool = false' was unused by condor_submit. Is it a typo?
      05/10/19 16:12:03 (D_ALWAYS) assigned HTCondor ID (73974.0.0)
      05/10/19 16:12:03 (D_ALWAYS) Submitting HTCondor Node stage_in_local_local_0_0 job(s)...
      05/10/19 16:12:03 (D_ALWAYS) Adding a DAGMan workflow log /home/nu_vahi/tutorial/pop/submit/nu_vahi/pegasus/pop


      this causes, the invocation and job instance tables to not be populated
      the parsing regex's should be updated to ignore the logging of the log level if detected.

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              vahi Karan Vahi
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: