pegasus-exitcode ignores errors when it gets "-r 0"

XMLWordPrintable

      Vickie had a problem with pegasus-exitcode today. She had a lot of jobs that should have been failures, but weren't marked as such by pegasus-exitcode. They had output looking like this:

      insufficient allocation - please contact your PI

      + --------------------------------------------------------------------------
      + Job name: STDIN
      + Job Id: 9230114.hopque01
      + System: hopper
      + Queued Time: Fri May 15 20:30:06 2015
      + Start Time: Sat May 16 11:12:19 2015
      + Completion Time: Sat May 16 11:12:19 2015
      + User: vlynch
      + MOM Host: nid05416
      + Queue: reg_long
      + Req. Resources: mppnodect=34,mppnppn=24,mppwidth=800,walltime=96:00:00
      + Used Resources:
      + Acct String: m1503
      + PBS_O_WORKDIR: /scratch/scratchdirs/vlynch/lynchve/pegasus/refinement/run0001
      + Submit Args:
      + --------------------------------------------------------------------------

      My first thought was that this should have been a GRAM failure, but I will have to follow up with NERSC about that to see where that error message is coming from.

      The other issue is that this job (and similar jobs) had a success message of "End of program" that wasn't present in the output. The postscript command looked like this:

      /usr/bin/pegasus-exitcode -s End+of+program -r $RETURN /ccg/home/lynchve/SNS-Nanodiamond/8ND300Kscan2/submit/lynchve/pegasus/refinement/run0001/namd_ID0000003.out

      The problem is the "-r $RETURN". I assume that is added whenever Kickstart is not used. Is that correct?

      The question is: What should we do if exitcode gets "-r 0"? Currently exitcode ignores the other tests if it gets "-r 0". Maybe we should modify it so that it only ignores the invocation record tests if it gets "-r"? Maybe it shouldn't ignore anything and we should have a different flag for "no invocation record(s) expected"?

            Assignee:
            Gideon Juve (Inactive)
            Reporter:
            Gideon Juve (Inactive)
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: