-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master, 4.9.1
-
Component/s: Pegasus Planner
-
None
We need to run pycbc_inference jobs using condor i/o and without kickstart so that we can implement condor's native vanilla universe checkpointing. Pegasus' checkpointing mechanism causes too much badput with these jobs.
However, turning on NoGridStart disables transfer input and output files. Here is a job run without NoGridStart:
[dbrown@sugwg-condor condor-checkpoint-sig]$ grep transfer test_condorio-main_ID0000001/testjob_j1.sub
should_transfer_files = YES
transfer_executable = true
transfer_input_files = /home/dbrown/projects/osg/condor-checkpoint-sig/my.input,/usr/share/pegasus/sh/pegasus-lite-common.sh,/home/dbrown/projects/osg/condor-checkpoint-sig/./test_condorio-main_ID0000001.000/pegasus-worker-4.9.1dev-x86_64_rhel_7.tar.gz
transfer_output_files = my.output,my.checkpoint,wrapper.log,wrapper.checkpoint,
when_to_transfer_output = ON_EXIT_OR_EVICT
However, when I add
profile pegasus "gridstart" "NoGridStart"
to the transformation catalog, the planner correctly generates only a .sub file and not a .sh file, but transfer_input_files and transfer_output_files are missing from the .sub file:
[dbrown@sugwg-condor condor-checkpoint-sig]$ grep transfer test_condorio-main_ID0000001/testjob_j1.sub
should_transfer_files = YES
transfer_executable = false
when_to_transfer_output = ON_EXIT_OR_EVICT
I couldn't see an obvious solution looking through the planner code, as this is split between the SLS and GridStart classes, but Karan may know an easy fix.