Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-553

dag jobs in the DAX are launched via pegasus-dagman instead of condor_dagman

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: master, 3.1
    • Fix Version/s: master, 3.1.1, 4.0
    • Component/s: CLI: pegasus-run
    • Labels:
      None

      Description

      If a DAX has a dag job referred to in there, pegasus creates the dag file with the SUBDAG external keyword that points to the user specified dag to be run for the subdag.

      This subdag should be launched via condor_dagman as pegasus_dagman only launches the top level DAG.

      However, the way the condor_submit file for the top level dag is created, condor_dagman when launching the subdag jobs launches using pegasus_dagman

      For example.


      vahi@sugar H1L1-s6c_lowmass_ihope-953078343-36000.NGvjWW]$ cat s6c_lowmass_ihope-0.dag.condor.sub
      # Filename: s6c_lowmass_ihope-0.dag.condor.sub
      # Generated by condor_submit_dag s6c_lowmass_ihope-0.dag
      universe = scheduler
      executable = /opt/pegasus/3.1cvs/bin/pegasus-dagman
      getenv = True
      output = s6c_lowmass_ihope-0.dag.lib.out
      error = s6c_lowmass_ihope-0.dag.lib.err
      log = s6c_lowmass_ihope-0.dag.dagman.log
      remove_kill_sig = SIGUSR1
      +OtherJobRemoveRequirements = "DAGManJobId == $(cluster)"
      # Note: default on_exit_remove expression:
      # ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
      # attempts to ensure that DAGMan is automatically
      # requeued by the schedd if it exits abnormally or
      # is killed (e.g., during a reboot).
      on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
      copy_to_spool = False
      arguments = "-f -l . -Lockfile s6c_lowmass_ihope-0.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag s6c_lowmass_ihope-0.dag -MaxPre 20 -MaxPost 20 -CsdVersion $CondorVersion:' '7.6.4' 'Oct' '20' '2011' 'BuildID:' '379441' '$ -Notification NEVER -Dagman /opt/pegasus/3.1cvs/bin/pegasus-dagman"
      environment = _CONDOR_DAGMAN_LOG=s6c_lowmass_ihope-0.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
      notification = NEVER
      +pegasus_wf_uuid="2a65eeaa-b38e-4aa9-8fb3-3514b8f15fdf"
      +pegasus_root_wf_uuid="2a65eeaa-b38e-4aa9-8fb3-3514b8f15fdf"
      +pegasus_wf_name="s6c_lowmass_ihope-0"
      +pegasus_wf_time="20120207T203925-0500"
      +pegasus_version="3.1.1cvs"
      +pegasus_job_class=11
      +pegasus_cluster_size=1
      +pegasus_site="local"
      +pegasus_wf_xformation="pegasus::dagman"
      queue

      Note: the -Dagman option in the arguments. This refers to pegasus-dagman . It should refer to condor_dagman

        Attachments

          Activity

            People

            • Assignee:
              gmehta Gaurang Mehta (Inactive)
              Reporter:
              vahi Karan Vahi
            • Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: