File dependencies for sub workflow jobs - differentiate inputs for planner use and those for sub workflow

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      With the 5.0 release, we allow users to specify inputs for their sub workflow jobs

      For example, as in this example
      https://github.com/pegasus-isi/hierarichal-sample-wf

      • type: pegasusWorkflow
        file: inner_diamond_workflow.yml
        id: diamond_subworkflow
        arguments:
      • --conf
      • inner_diamond_workflow.pegasus.properties
      • --output-sites
      • local
      • -vvv
      • --basename
      • inner
        uses:
      • lfn: pegasus.html
        type: input
      • lfn: f.d
        type: output
        stageOut: true
        registerReplica: true
      • lfn: sites.yml
        type: input
      • lfn: inner_diamond_workflow_tc.yml
        type: input
      • lfn: inner_diamond_workflow.yml
        type: input
      • lfn: inner_diamond_workflow.pegasus.properties
        type: input

      the diamond sub workflow has 4 input dependenices

      1. pegasus.html
      2. inner_diamond_workflow_tc.yml
      3. inner_diamond_workflow.yml
      4. inner_diamond_workflow.pegasus.properties

      Of these dependencies in pegasus.html is an input file that is required by a job in diamond sub workflow while it runs, whereas the remaining files are required to plan the sub workflow.

      In Pegasus the planning of the sub workflow only happens on the submit host, and hence the files need to be brought down to the submit host.

      The pegasus.html file on the other hand is required by a compute job in the sub workflow diamond, and should be pulled down only when the compute job is run.

      Currently, since there is no way to differentiate these files the planner always setups all the inputs for the sub workflow to be pulled down to the submit host, even though the file (pegasus.html in this case) needs to be pulled down only when the sub workflow runs. this duplicate transfer to submit host can create problems on 2 fronts

      • extra files downloaded to the submit host that are not required.
      • if using globus online for workflows, usually there is no local endpoint for submit host, and the workflow ends up failing.

      We need a way in the API to differentiate a file that is for planner use in the sub workflow to avoid this.

            Assignee:
            Karan Vahi
            Reporter:
            Karan Vahi
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: