-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master
-
Component/s: Pegasus Planner
-
None
We need to put in support whereby users can setup their workflows in such a way that the datasets are directly staged to the worker node, and are not passed via the staging site.
The name of the property that enables this is a boolean property called
pegasus.transfer.bypass.input.staging
this is useful in the S3 case, where data may already reside in a S3 bucket or any PegasusLite mode where the data is directly accessible on the worker nodes.
We need to make sure that this is supported in both the Pegasus Lite Modes
- condorio ( using condor to transfer the files that are required)
For the condorio mode, we can only bypass those files for which the URL's end with the LFN name. So for example with executable staging turned on the executables themselves cannot be bypassed most of the time. In montage for example the fits files don't have the same naming scheme as the LFN's in the DAX. Only file URL's that exist on the submit host with the pool attribute can be staged directly in the Condor IO mode.
- nonsharedfs ( using pegasus-transfer to transfer the files )
Also, in this case we need to make sure that the cleanup algorithm does not delete the original input files mentioned in the replica catalog.