Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1800

enable inplace cleanup for hierarchical workflows

    XMLWordPrintable

Details

    Description

      Hi Karan,

      Thanks again for the advice, with the change you suggest the code:

      https://github.com/spxiwh/pegasus_subflow_example/blob/main/gen.py

      runs fine. This is now at the point where I've been able to hook this up to our more complicated example and things are working. However, there is one thing I am still a little confused about here, which I'd like to try to understand better.

      To illustrate the problem we can add the pegasus.file.cleanup.scope = 'deferred' option (which I do want to include in our big workflows) as in:

      https://github.com/spxiwh/pegasus_subflow_example/blob/main/genA.py

      This example fails, because the code tries to stage in k1.txt to subworkflow-2 *directly* from subworkflow-1. As subworkflow-1 has already been deleted this fails.

      I know you've said that we have to turn off cleanup jobs when using subworkflows, but I don't see why that should be the case in this example. To hopefully explain this a bit clearer I can follow the file k1.txt around:

       * k1.txt is initially created within subworkflow1.
       * subworkflow1 stages k1.txt out to its location defined in my output mapper file.
       * subworkflow1 *also* stages k1.txt out to the main workflow's scratch directory.
       * subworkflow1 then deletes all it's scratch space and temporary files because I have cleanup enabled.
       * Subworkflow1 finishes and we go back to the main workflow to start planning subworkflow2.
       * At the same time the main workflow *also* stages k1.txt out to the location defined in my output mapper file (duplicating this operation).
       * When planning subworkflow1 the planner is provided with a custom `--output-map` giving the location of k1.txt in the main workflow's scratch directory. The file is guaranteed to exist in this location. It also gives a --cache entry from subworkflow1 giving the location the file *was* in in subworkflow1's scratch space. The file will not exist in this location if cleanup is enabled. The cache location is preferred over the output.map and then a later stage-in job will only work if subworkflow1's scratch space still exists.

      So this raises one important question, and one not-so important question.

       * I think that subworkflow2 *should not* be trying to stage in files from subworkflow1's scratch space, when it has explicitly been told what files are inputs, and when those files are being staged-in in the `pegasus-plan_subwf2.pre.sh` script before the planner runs. The location that looks like:

      k1.txt file:////home/ian.harry/lscsoft_git/src/pycbc/examples/search/pegasus_test/ian.harry/pegasus/root/run0001/././wf-scratch/LOCAL/ian.harry/pegasus/root/run0001/k1.txt site="local"

      is guaranteed to exist at this stage. The location it actually gets:

      k1.txt file:////home/ian.harry/lscsoft_git/src/pycbc/examples/search/pegasus_test/ian.harry/pegasus/root/run0001/././wf-scratch/LOCAL/ian.harry/pegasus/root/run0001/subworkflow-1_subwf1/k1.txt site="local"

      is guaranteed *not* to exist with cleanup enabled.

      Attachments

        Activity

          People

            vahi Karan Vahi
            dbrown Duncan Brown
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: