-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major
-
Affects Version/s: master, 4.9.0
-
Component/s: Pegasus Planner
-
None
Pegasus job checkpointing is incompatible with setting
pegasus.data.configuration=condorio
Setting this results in condor trying to always transfer the checkpoint file on job startup:
transfer_input_files = /home/daniel.finstad/projects/bh_spin_priors/gw150914/gw150914_inference_tf2.ini,H1L1V1-CREATE_INJECTIONS_0-1126259454-16.hdf,H1L1V1-INFERENCE_0-1126259454-16.hdf.checkpoint,/usr/share/pegasus/sh/pegasus-lite-common.sh,/usr1/dbrown/daniel/./test_condorio-main_ID0000001.000/pegasus-worker-4.8.4-x86_64_rhel_7.tar.gz
which results in the Condor error
Error from slot1@CRUSH-SUGWG-OSG-10-5-229-88: SHADOW at 128.230.190.43 failed to send file(s) to <128.230.11.10:22390>: error reading from /usr1/dbrown/daniel/./test_condorio-main_ID0000001.000/H1L1V1-INFERENCE_0-1126259454-16.hdf.checkpoint: (errno 2) No such file or directory
I'm not sure if there's a way to tell condor that it's OK if a transfer input/output file does not exist.