Uploaded image for project: 'Pegasus'

+DAGNodeRetry for attrib=value assigment breaks on condor 10.0.x when direct submission is disabled

XMLWordPrintable

      ames reported a Pegasus/HTCondor interaction issue on Slack. I moved
      it here to email to make sure the right people can join the
      discussion. The issue is described below.

      As far as I know, we in the Pegasus team have not seen this before,
      and we have added that attribute since 2007. Can somebody add the
      HTCondor and Pegasus versions to this thread, and let us know if you
      have made any config changes to the access point recently?

      The problem is that condor_submit is complaining about an extra
      attribute added by Pegasus:

      05/02/23 11:18:07 From submit: condor_submit: invalid attribute name
      '+DAGNodeRetry' for attrib=value assigment
      05/02/23 11:18:07 failed while reading from pipe.

      In the dagman.out:

      05/02/23 11:18:38 submit command was: /usr/bin/condor_submit -a
      dag_node_name=create_dir_o3_sbbh2_0p985.dax_0_local -a
      My.DAGManJobId=68812570 -a DAGManJobId=68812570 -batch-name
      o3_sbbh2_0p985.dax-0.dag+68812570 -batch-id 68812570.0 -a
      submit_event_notes' '=' 'DAG' 'Node:'
      'create_dir_o3_sbbh2_0p985.dax_0_local -a
      dagman_log=/home/praveen.kumar/focused_search/pipeline/hlv/sbbh2_0p985/a1_2/output/pycbc-tmp_i_a0uf2l/work/./o3_sbbh2_0p985.dax-0.dag.nodes.log
      -a My.DAGManNodesMask="0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36"
      priority=800 JOB=create_dir_o3_sbbh2_0p985.dax_0_local +DAGNodeRetry=0
      DAG_STATUS=0 FAILED_COUNT=0 My.KeepClaimIdle=20 -a notification=never
      My.DAGParentNodeNames="" ./create_dir_o3_sbbh2_0p985.dax_0_local.sub

      And in the DAG itself:

      VARS create_dir_o3_sbbh2_0p985.dax_0_local +DAGNodeRetry="$(RETRY)"

            Assignee:
            vahi Karan Vahi
            Reporter:
            rynge Mats Rynge
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: