-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master, 3.1
-
Component/s: CLI: pegasus-run
-
None
If a DAX has a dag job referred to in there, pegasus creates the dag file with the SUBDAG external keyword that points to the user specified dag to be run for the subdag.
This subdag should be launched via condor_dagman as pegasus_dagman only launches the top level DAG.
However, the way the condor_submit file for the top level dag is created, condor_dagman when launching the subdag jobs launches using pegasus_dagman
For example.
vahi@sugar H1L1-s6c_lowmass_ihope-953078343-36000.NGvjWW]$ cat s6c_lowmass_ihope-0.dag.condor.sub
- Filename: s6c_lowmass_ihope-0.dag.condor.sub
- Generated by condor_submit_dag s6c_lowmass_ihope-0.dag
universe = scheduler
executable = /opt/pegasus/3.1cvs/bin/pegasus-dagman
getenv = True
output = s6c_lowmass_ihope-0.dag.lib.out
error = s6c_lowmass_ihope-0.dag.lib.err
log = s6c_lowmass_ihope-0.dag.dagman.log
remove_kill_sig = SIGUSR1
+OtherJobRemoveRequirements = "DAGManJobId == $(cluster)" - Note: default on_exit_remove expression:
- ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
- attempts to ensure that DAGMan is automatically
- requeued by the schedd if it exits abnormally or
- is killed (e.g., during a reboot).
on_exit_remove = ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
copy_to_spool = False
arguments = "-f -l . -Lockfile s6c_lowmass_ihope-0.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag s6c_lowmass_ihope-0.dag -MaxPre 20 -MaxPost 20 -CsdVersion $CondorVersion:' '7.6.4' 'Oct' '20' '2011' 'BuildID:' '379441' '$ -Notification NEVER -Dagman /opt/pegasus/3.1cvs/bin/pegasus-dagman"
environment = _CONDOR_DAGMAN_LOG=s6c_lowmass_ihope-0.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
notification = NEVER
+pegasus_wf_uuid="2a65eeaa-b38e-4aa9-8fb3-3514b8f15fdf"
+pegasus_root_wf_uuid="2a65eeaa-b38e-4aa9-8fb3-3514b8f15fdf"
+pegasus_wf_name="s6c_lowmass_ihope-0"
+pegasus_wf_time="20120207T203925-0500"
+pegasus_version="3.1.1cvs"
+pegasus_job_class=11
+pegasus_cluster_size=1
+pegasus_site="local"
+pegasus_wf_xformation="pegasus::dagman"
queue
Note: the -Dagman option in the arguments. This refers to pegasus-dagman . It should refer to condor_dagman