Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-927

pegasus-exitcode ignores errors when it gets "-r 0"




      Vickie had a problem with pegasus-exitcode today. She had a lot of jobs that should have been failures, but weren't marked as such by pegasus-exitcode. They had output looking like this:

      insufficient allocation - please contact your PI

       + --------------------------------------------------------------------------
       + Job name: STDIN
       + Job Id: 9230114.hopque01
       + System: hopper
       + Queued Time: Fri May 15 20:30:06 2015
       + Start Time: Sat May 16 11:12:19 2015
       + Completion Time: Sat May 16 11:12:19 2015
       + User: vlynch
       + MOM Host: nid05416
       + Queue: reg_long
       + Req. Resources: mppnodect=34,mppnppn=24,mppwidth=800,walltime=96:00:00
       + Used Resources:
       + Acct String: m1503
       + PBS_O_WORKDIR: /scratch/scratchdirs/vlynch/lynchve/pegasus/refinement/run0001
       + Submit Args:
       + --------------------------------------------------------------------------

      My first thought was that this should have been a GRAM failure, but I will have to follow up with NERSC about that to see where that error message is coming from.

      The other issue is that this job (and similar jobs) had a success message of "End of program" that wasn't present in the output. The postscript command looked like this:

      /usr/bin/pegasus-exitcode -s End+of+program -r $RETURN /ccg/home/lynchve/SNS-Nanodiamond/8ND300Kscan2/submit/lynchve/pegasus/refinement/run0001/namd_ID0000003.out

      The problem is the "-r $RETURN". I assume that is added whenever Kickstart is not used. Is that correct?

      The question is: What should we do if exitcode gets "-r 0"? Currently exitcode ignores the other tests if it gets "-r 0". Maybe we should modify it so that it only ignores the invocation record tests if it gets "-r"? Maybe it shouldn't ignore anything and we should have a different flag for "no invocation record(s) expected"?




            gideon Gideon Juve (Inactive)
            gideon Gideon Juve (Inactive)
            2 Start watching this issue