-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: master, 5.0.3
-
Component/s: Planner: hierarchal workflows, Workflow API Libraries
-
None
With the 5.0 release, we allow users to specify inputs for their sub workflow jobs
For example, as in this example
https://github.com/pegasus-isi/hierarichal-sample-wf
- type: pegasusWorkflow
file: inner_diamond_workflow.yml
id: diamond_subworkflow
arguments: - --conf
- inner_diamond_workflow.pegasus.properties
- --output-sites
- local
- -vvv
- --basename
- inner
uses: - lfn: pegasus.html
type: input - lfn: f.d
type: output
stageOut: true
registerReplica: true - lfn: sites.yml
type: input - lfn: inner_diamond_workflow_tc.yml
type: input - lfn: inner_diamond_workflow.yml
type: input - lfn: inner_diamond_workflow.pegasus.properties
type: input
the diamond sub workflow has 4 input dependenices
1. pegasus.html
2. inner_diamond_workflow_tc.yml
3. inner_diamond_workflow.yml
4. inner_diamond_workflow.pegasus.properties
Of these dependencies in pegasus.html is an input file that is required by a job in diamond sub workflow while it runs, whereas the remaining files are required to plan the sub workflow.
In Pegasus the planning of the sub workflow only happens on the submit host, and hence the files need to be brought down to the submit host.
The pegasus.html file on the other hand is required by a compute job in the sub workflow diamond, and should be pulled down only when the compute job is run.
Currently, since there is no way to differentiate these files the planner always setups all the inputs for the sub workflow to be pulled down to the submit host, even though the file (pegasus.html in this case) needs to be pulled down only when the sub workflow runs. this duplicate transfer to submit host can create problems on 2 fronts
- extra files downloaded to the submit host that are not required.
- if using globus online for workflows, usually there is no local endpoint for submit host, and the workflow ends up failing.
We need a way in the API to differentiate a file that is for planner use in the sub workflow to avoid this.