Kickstart should kill a job gracefully before maxwalltime

XMLWordPrintable

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major
    • master, 5.1.0
    • Affects Version/s: master, 5.0.8
    • Component/s: None
    • None

      On some sites/batch systems, the pegasus.maxwalltime is getting translated to the batch system walltime, and due to the site configuration, a SIGKILL is issued. The result is that both the job and kickstart gets hard killed, and stdout/stderr get lost.

      I suggest we use the existing TERM/KILL functionality in kickstart to kill the job gracefully, right before the walltime runs out.

      Something like converting pegasus.maxwalltime to seconds, subtracting 300 (5 minutes), and then hardkill 60 seconds after that:

      pegasus-kickstart -k [totalsecs-300] -K 60

       

            Assignee:
            Karan Vahi
            Reporter:
            Mats Rynge
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: