Uploaded image for project: 'Pegasus'

Integrity logic when doing 3rd party staging

XMLWordPrintable

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major Major
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      If you have a nonsharedfs workflow with inputs on for example s3://, and your intermediate storage is s3://, Pegasus will add a stagein job and pegasus-transfer will do it as a 3rd party transfer. Example:

      /usr/bin/pegasus-s3 cp -f -c s3://aws/input/foo.txt s3://aws/intermediate/foo.txt

      The problem is that the planner assumes that it will get checksums for those files, while pegasus-transfer can't generate the checksum as the file never touched the submit host. Subsequent jobs will fail due to missing file checksums in the passed meta files.

      One solution would be to make the planner 3rd-party transfer aware, but that decision is usually a runtime decision by pegasus-transfer.

      Another solution would be to make the checksums optional and skip integrity checking if there are missing - I think we discussed this in the past, but can't remember what we decided.

      A third option would be for pegasus-transfer to pull down a copy and introduce the checksum, but that would negate the benefits of 3rd party transfers.

            Assignee:
            vahi Karan Vahi
            Reporter:
            rynge Mats Rynge
            Watchers:
            1 Start watching this issue

              Created:
              Updated: