Details
-
Sub-task
-
Resolution: Fixed
-
Major
-
master
-
None
Description
Pegasus has a notion of a checkpoint file that user can designate for their jobs.
<job id="j1" namespace="pegasus" name="checkpoint" version="4.0">
<argument>-o <file name="f.b1"/> -o <file name="f.b2"/></argument>
<uses name="f.a" link="input"/>
<uses name="f.b1" link="output" transfer="true" register="true"/>
<uses name="f.b2" link="output" transfer="true" register="true"/>
<uses name="test.checkpoint" link="checkpoint" transfer="true" register="true"/>
</job>
the semantics of this file, is that the updated version of the checkpoint file is available whenever a job is retried with it's last copy.
When the application code succeeds the application code is expected to delete the checkpoint file.
<job id="j1" namespace="pegasus" name="checkpoint" version="4.0">
<argument>-o <file name="f.b1"/> -o <file name="f.b2"/></argument>
<uses name="f.a" link="input"/>
<uses name="f.b1" link="output" transfer="true" register="true"/>
<uses name="f.b2" link="output" transfer="true" register="true"/>
<uses name="test.checkpoint" link="checkpoint" transfer="true" register="true"/>
</job>
the semantics of this file, is that the updated version of the checkpoint file is available whenever a job is retried with it's last copy.
When the application code succeeds the application code is expected to delete the checkpoint file.