Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1347

pegasus will always try and transfer output when a code has checkpointed

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.9.0
    • Fix Version/s: None
    • Component/s: pegasus-plan
    • Labels:
      None

      Description

      The job wrapper script created by pegasus-plan always tries to move the output files of the job (and will fail if they don't exist). This seems inconsistent with checkpointing a job, where the output file(s) might not exist until the job has actually finished.

      It should at least be documented that the output must be written in the case that checkpointing is used, but I would claim that it's better to allow the user to set a properties that says "only expect output if the job has not been killed by a checkpoint."

      echo -e "\n###################[Pegasus Lite] Staging out checkpoint files ###################" 1>&2
      # stage out checkpoint files
      set +e
      pegasus-transfer --threads 1 1>&2 << 'EOF'
      [
       { "type": "transfer",
         "id": 1,
         "src_urls": [
           { "site_label": "local", "url": "file://$PWD/H1L1V1-INFERENCE_0-1126259454-16.hdf.checkpoint", "checkpoint": "true" }
         ],
         "dest_urls": [
           { "site_label": "local", "url": "file:///home/daniel.finstad/projects/bh_spin_priors/gw150914/test_condorio/local-site-scratch/./test_condorio-main_ID0000001/./H1L1V1-INFERENCE_0-1126259454-16.hdf.checkpoint" }
         ] }
      ]
      EOF
      ec=$?
      set -e
      if [ $ec -ne 0 ]; then
          echo " Ignoring failure while transferring chkpoint files. Exicode was $ec" 1>&2
      fi

      echo -e "\n#####################[Pegasus Lite] Staging out output files #####################" 1>&2
      # stage out
      pegasus-transfer --threads 1 1>&2 << 'EOF'
      [
       { "type": "transfer",
         "id": 1,
         "src_urls": [
           { "site_label": "local", "url": "file://$PWD/H1L1V1-INFERENCE_0-1126259454-16.hdf", "checkpoint": "false" }
         ],
         "dest_urls": [
           { "site_label": "local", "url": "file:///home/daniel.finstad/projects/bh_spin_priors/gw150914/test_condorio/local-site-scratch/./test_condorio-main_ID0000001/./H1L1V1-INFERENCE_0-1126259454-16.hdf" }
         ] }
      ]
      EOF

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              dbrown Duncan Brown
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: