exposing a knob for fine tuning the number of cleanup jobs generated per level of the workflow

XMLWordPrintable

      Hi Karan, I've been running a gigantic workflow with >250K jobs (I split it into 6 sub-workflows though because it took forever, >12 hours, for pegasus to plan the workflow).

      So now the pegasus.file.cleanup.clusters.num=1 setup starts to finally get me because the cleanup won't start until all the jobs on the next level has finished. so workflows aborted due to out of storage for 3 times.

      I would increase it to pegasus.file.cleanup.clusters.num=25 for future. However, i think a parameter that makes more sense is:

      pegasus.file.cleanup.clusters.fraction

      which is a value with the range (0,1]. it equals the number of cleanup jobs /the number of jobs on that level. So if it's 1, then one cleanup job for each computing job. If it's 0.2, then one cleanup for 5 computing jobs. and so on. You can set it default to 0.1 or something.

      In this way, a level with lots of computing jobs would get the same rate of cleanup as a level with very few computing jobs. In some sense, this parameter controls the rate of cleanup.

      The pegasus.file.cleanup.clusters.num parameter doesn't really achieve this. A level with lots of computing jobs would probably get cleanup a lot slower than a level with very few jobs.

      What do u think? u can give the parameter whatever name u like.

            Assignee:
            Karan Vahi
            Reporter:
            Karan Vahi
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: