-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: 4.6.0
-
Component/s: statistics visualization and debugging tools
-
None
When DAGMan is not able to submit a job to condor, the workflow fails, but pegasus-analyzer does not provide any useful information:
$ pegasus-analyzer .
***********************************Summary************************************
Submit Directory : .
Total jobs : 13 (100.00%)
- jobs succeeded : 3 (23.08%)
- jobs failed : 1 (7.69%)
- jobs unsubmitted : 9 (69.23%)
*****************************Failed jobs' details*****************************
==============================preprocess_ID0000001==============================
last state: -
site: -
submit file: preprocess_ID0000001.sub
output file: -
error file: -
------------------------------Task #1 - Summary-------------------------------
site : -
hostname : -
executable : /tmp/tutorial/submit/juve/pegasus/diamond/run0001/preprocess_ID0000001.sh
arguments : -
exitcode : -1
working dir : -
In this case the cause is in the dagman.out file:
02/18/16 08:03:00 Submitting Condor Node preprocess_ID0000001 job(s)...
02/18/16 08:03:00 Adding a DAGMan workflow log /private/tmp/tutorial/submit/juve/pegasus/diamond/run0001/./diamond-0.dag.nodes.log
02/18/16 08:03:00 Masking the events recorded in the DAGMAN workflow log
02/18/16 08:03:00 Mask for workflow log is 0,1,2,4,5,7,9,10,11,12,13,16,17,24,27
02/18/16 08:03:00 submitting: /usr/local/bin/condor_submit -a dag_node_name' '=' 'preprocess_ID0000001 -a +DAGManJobId' '=' '46 -a DAGManJobId' '=' '46 -a submit_event_notes' '=' 'DAG' 'Node:' 'preprocess_ID0000001 -a dagman_log' '=' '/private/tmp/tutorial/submit/juve/pegasus/diamond/run0001/./diamond-0.dag.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27" -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a +DAGParentNodeNames' '=' '"stage_in_remote_local_0_0" -a +KeepClaimIdle' '=' '20 -a notification' '=' 'never preprocess_ID0000001.sub
02/18/16 08:03:00 From submit: Submitting job(s)
02/18/16 08:03:00 From submit: ERROR: No such directory: /tmp/tutorial/juve/pegasus/diamond/run0001
02/18/16 08:03:00 failed while reading from pipe.
02/18/16 08:03:00 Read so far: Submitting job(s)ERROR: No such directory: /tmp/tutorial/juve/pegasus/diamond/run0001
02/18/16 08:03:00 ERROR: submit attempt failed
02/18/16 08:03:00 submit command was: /usr/local/bin/condor_submit -a dag_node_name' '=' 'preprocess_ID0000001 -a +DAGManJobId' '=' '46 -a DAGManJobId' '=' '46 -a submit_event_notes' '=' 'DAG' 'Node:' 'preprocess_ID0000001 -a dagman_log' '=' '/private/tmp/tutorial/submit/juve/pegasus/diamond/run0001/./diamond-0.dag.nodes.log -a +DAGManNodesMask' '=' '"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27" -a DAG_STATUS' '=' '0 -a FAILED_COUNT' '=' '0 -a +DAGParentNodeNames' '=' '"stage_in_remote_local_0_0" -a +KeepClaimIdle' '=' '20 -a notification' '=' 'never preprocess_ID0000001.sub
02/18/16 08:03:00 Job submit try 1/6 failed, will try again in >= 1 second.