Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: master, 5.1.0
Affects Version/s: master, 5.0.8
Component/s: None
Labels:
None

On some sites/batch systems, the pegasus.maxwalltime is getting translated to the batch system walltime, and due to the site configuration, a SIGKILL is issued. The result is that both the job and kickstart gets hard killed, and stdout/stderr get lost.

I suggest we use the existing TERM/KILL functionality in kickstart to kill the job gracefully, right before the walltime runs out.

Something like converting pegasus.maxwalltime to seconds, subtracting 300 (5 minutes), and then hardkill 60 seconds after that:

pegasus-kickstart -k [totalsecs-300] -K 60

Assignee:: Karan Vahi
Reporter:: Mats Rynge

Created:: 05/Sep/24 5:08 PM
Updated:: 11/Sep/24 9:46 PM
Resolved:: 11/Sep/24 9:46 PM
Archived:: 14/Dec/24 10:43 PM

Details

Description

Attachments

Activity

People

Dates