One way to do this would be to create pegasus-hold and pegasus-release scripts to allow users to hold and release workflows. They would be wrappers around condor_hold and condor_release that know how to identify the DAGMan job(s) associated with a Pegasus workflow. They may need extra logic to handle hierarchical workflows.
This was a feature mentioned by SCEC in relation to Ensemble Manager. They have cases where they would like to be able to hold their workflows to prevent them from failing repeatedly until they can resolve an issue with their infrastructure.
- blocks
-
PM-876 Add support for 'halt' to ensemble manager
- Open