exposing a knob for fine tuning the number of cleanup jobs generated per level of the workflow

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Hi Karan, I've been running a gigantic workflow with >250K jobs (I split it into 6 sub-workflows though because it took forever, >12 hours, for pegasus to plan the workflow).

      So now the pegasus.file.cleanup.clusters.num=1 setup starts to finally get me because the cleanup won't start until all the jobs on the next level has finished. so workflows aborted due to out of storage for 3 times.

      I would increase it to pegasus.file.cleanup.clusters.num=25 for future. However, i think a parameter that makes more sense is:

      pegasus.file.cleanup.clusters.fraction

      which is a value with the range (0,1]. it equals the number of cleanup jobs /the number of jobs on that level. So if it's 1, then one cleanup job for each computing job. If it's 0.2, then one cleanup for 5 computing jobs. and so on. You can set it default to 0.1 or something.

      In this way, a level with lots of computing jobs would get the same rate of cleanup as a level with very few computing jobs. In some sense, this parameter controls the rate of cleanup.

      The pegasus.file.cleanup.clusters.num parameter doesn't really achieve this. A level with lots of computing jobs would probably get cleanup a lot slower than a level with very few jobs.

      What do u think? u can give the parameter whatever name u like.

            Assignee:
            Karan Vahi
            Reporter:
            Karan Vahi
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: