Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-553

dag jobs in the DAX are launched via pegasus-dagman instead of condor_dagman

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master, 3.1.1, 4.0
    • Affects Version/s: master, 3.1
    • Component/s: CLI: pegasus-run
    • None

      If a DAX has a dag job referred to in there, pegasus creates the dag file with the SUBDAG external keyword that points to the user specified dag to be run for the subdag.

      This subdag should be launched via condor_dagman as pegasus_dagman only launches the top level DAG.

      However, the way the condor_submit file for the top level dag is created, condor_dagman when launching the subdag jobs launches using pegasus_dagman

      For example.

      vahi@sugar H1L1-s6c_lowmass_ihope-953078343-36000.NGvjWW]$ cat s6c_lowmass_ihope-0.dag.condor.sub

      1. Filename: s6c_lowmass_ihope-0.dag.condor.sub
      2. Generated by condor_submit_dag s6c_lowmass_ihope-0.dag
        universe = scheduler
        executable = /opt/pegasus/3.1cvs/bin/pegasus-dagman
        getenv = True
        output = s6c_lowmass_ihope-0.dag.lib.out
        error = s6c_lowmass_ihope-0.dag.lib.err
        log = s6c_lowmass_ihope-0.dag.dagman.log
        remove_kill_sig = SIGUSR1
        +OtherJobRemoveRequirements = "DAGManJobId == $(cluster)"
      3. Note: default on_exit_remove expression:
      4. ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
      5. attempts to ensure that DAGMan is automatically
      6. requeued by the schedd if it exits abnormally or
      7. is killed (e.g., during a reboot).
        on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
        copy_to_spool = False
        arguments = "-f -l . -Lockfile s6c_lowmass_ihope-0.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag s6c_lowmass_ihope-0.dag -MaxPre 20 -MaxPost 20 -CsdVersion $CondorVersion:' '7.6.4' 'Oct' '20' '2011' 'BuildID:' '379441' '$ -Notification NEVER -Dagman /opt/pegasus/3.1cvs/bin/pegasus-dagman"
        environment = _CONDOR_DAGMAN_LOG=s6c_lowmass_ihope-0.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
        notification = NEVER
        +pegasus_wf_uuid="2a65eeaa-b38e-4aa9-8fb3-3514b8f15fdf"
        +pegasus_root_wf_uuid="2a65eeaa-b38e-4aa9-8fb3-3514b8f15fdf"
        +pegasus_wf_name="s6c_lowmass_ihope-0"
        +pegasus_wf_time="20120207T203925-0500"
        +pegasus_version="3.1.1cvs"
        +pegasus_job_class=11
        +pegasus_cluster_size=1
        +pegasus_site="local"
        +pegasus_wf_xformation="pegasus::dagman"
        queue

      Note: the -Dagman option in the arguments. This refers to pegasus-dagman . It should refer to condor_dagman

            Assignee:
            gmehta Gaurang Mehta (Inactive)
            Reporter:
            vahi Karan Vahi
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: