Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1346

Pegasus job checkpointing is incompatible with condorio

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major Major
    • master, 5.0.0, 4.9.3
    • Affects Version/s: master, 4.9.0
    • Component/s: Pegasus Planner
    • None

      Pegasus job checkpointing is incompatible with setting

      pegasus.data.configuration=condorio

      Setting this results in condor trying to always transfer the checkpoint file on job startup:

      transfer_input_files = /home/daniel.finstad/projects/bh_spin_priors/gw150914/gw150914_inference_tf2.ini,H1L1V1-CREATE_INJECTIONS_0-1126259454-16.hdf,H1L1V1-INFERENCE_0-1126259454-16.hdf.checkpoint,/usr/share/pegasus/sh/pegasus-lite-common.sh,/usr1/dbrown/daniel/./test_condorio-main_ID0000001.000/pegasus-worker-4.8.4-x86_64_rhel_7.tar.gz

      which results in the Condor error

      Error from slot1@CRUSH-SUGWG-OSG-10-5-229-88: SHADOW at 128.230.190.43 failed to send file(s) to <128.230.11.10:22390>: error reading from /usr1/dbrown/daniel/./test_condorio-main_ID0000001.000/H1L1V1-INFERENCE_0-1126259454-16.hdf.checkpoint: (errno 2) No such file or directory

      I'm not sure if there's a way to tell condor that it's OK if a transfer input/output file does not exist.

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: