Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1800

enable inplace cleanup for hierarchical workflows

XMLWordPrintable

      Hi Karan,

      Thanks again for the advice, with the change you suggest the code:

      https://github.com/spxiwh/pegasus_subflow_example/blob/main/gen.py

      runs fine. This is now at the point where I've been able to hook this up to our more complicated example and things are working. However, there is one thing I am still a little confused about here, which I'd like to try to understand better.

      To illustrate the problem we can add the pegasus.file.cleanup.scope = 'deferred' option (which I do want to include in our big workflows) as in:

      https://github.com/spxiwh/pegasus_subflow_example/blob/main/genA.py

      This example fails, because the code tries to stage in k1.txt to subworkflow-2 directly from subworkflow-1. As subworkflow-1 has already been deleted this fails.

      I know you've said that we have to turn off cleanup jobs when using subworkflows, but I don't see why that should be the case in this example. To hopefully explain this a bit clearer I can follow the file k1.txt around:

      • k1.txt is initially created within subworkflow1.
      • subworkflow1 stages k1.txt out to its location defined in my output mapper file.
      • subworkflow1 also stages k1.txt out to the main workflow's scratch directory.
      • subworkflow1 then deletes all it's scratch space and temporary files because I have cleanup enabled.
      • Subworkflow1 finishes and we go back to the main workflow to start planning subworkflow2.
      • At the same time the main workflow also stages k1.txt out to the location defined in my output mapper file (duplicating this operation).
      • When planning subworkflow1 the planner is provided with a custom `--output-map` giving the location of k1.txt in the main workflow's scratch directory. The file is guaranteed to exist in this location. It also gives a --cache entry from subworkflow1 giving the location the file was in in subworkflow1's scratch space. The file will not exist in this location if cleanup is enabled. The cache location is preferred over the output.map and then a later stage-in job will only work if subworkflow1's scratch space still exists.

      So this raises one important question, and one not-so important question.

      • I think that subworkflow2 should not be trying to stage in files from subworkflow1's scratch space, when it has explicitly been told what files are inputs, and when those files are being staged-in in the `pegasus-plan_subwf2.pre.sh` script before the planner runs. The location that looks like:

      k1.txt file:////home/ian.harry/lscsoft_git/src/pycbc/examples/search/pegasus_test/ian.harry/pegasus/root/run0001/././wf-scratch/LOCAL/ian.harry/pegasus/root/run0001/k1.txt site="local"

      is guaranteed to exist at this stage. The location it actually gets:

      k1.txt file:////home/ian.harry/lscsoft_git/src/pycbc/examples/search/pegasus_test/ian.harry/pegasus/root/run0001/././wf-scratch/LOCAL/ian.harry/pegasus/root/run0001/subworkflow-1_subwf1/k1.txt site="local"

      is guaranteed not to exist with cleanup enabled.

            Assignee:
            vahi Karan Vahi
            Reporter:
            dbrown Duncan Brown
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: