Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1835

Upcoming changes to DAGMan output logging

XMLWordPrintable

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master, 5.1.0, 5.0.2
    • Affects Version/s: master, 5.0.1
    • Component/s: Monitord
    • None

      We're about to release a big change in DAGMan that will affect output
      to the .dagman.out file. Currently when DAGMan submits a job, it forks
      a new condor_submit process to do the submission and that output is
      logged. In our new approach, DAGMan will directly submit the job to
      the condor_schedd without forking a new process. This will result in
      different messages.

      Currently with the condor_submit forked process, the output looks
      something like this:

      01/03/22 10:26:28 submitting:
      /nobackup/condor/release_dir/bin/condor_submit -a dag_node_name' '='
      'SleepA -a +DAGManJobId' '=' '2331 -a DAGManJobId' '=' '2331
      -batch-name sleep.dag+2331 -batch-id 2331.0 -a submit_event_notes' '='
      'DAG' 'Node:' 'SleepA -a dagman_log' '='
      '/scratch/condor/sleep/./sleep.dag.nodes.log -a +DAGManNodesMask' '='
      '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a JOB=SleepA -a
      DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a +KeepClaimIdle' '=' '20
      -a notification' '=' 'never -a +DAGParentNodeNames' '=' '"" sleep.sub
      01/03/22 10:26:28 From submit: Submitting job(s).
      01/03/22 10:26:28 From submit: 1 job(s) submitted to cluster 2332.
      01/03/22 10:26:28 From submit: WARNING: the line 'JOB = SleepA' was
      unused by condor_submit. Is it a typo?
      01/03/22 10:26:28 assigned HTCondor ID (2332.0.0)
      01/03/22 10:26:28 Just submitted 1 job this cycle...

      With the new direct submission method, the output will look like:

      01/03/22 11:58:05 Submitting node SleepA from file sleep.sub using
      direct job submission
      01/03/22 11:58:05 Submit warning: Submit:0:the line 'RETRY = 0' was
      unused by DAGMAN. Is it a typo?

      Submit:0:the line 'JOB = SleepA' was unused by DAGMAN. Is it a typo?
      01/03/22 11:58:05 assigned HTCondor ID (2334.0.0)
      01/03/22 11:58:05 Just submitted 1 job this cycle...

      This change is not due to go live until our 9.6.0 release at the
      earliest, which will release in late February. Also you can easily
      revert to the old forked condor_submit approach by setting a
      configuration knob.

      I hope this isn't going to be a problem on your end? Please let me
      know if any questions or concerns,

      Mark

            Assignee:
            vahi Karan Vahi
            Reporter:
            vahi Karan Vahi
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: