Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-966

incorrect ( malformed) rescue dag gets submitted in case planner die because of memory related issue

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: master, 4.5.0
    • Fix Version/s: master, 4.6.0, 4.5.1
    • Component/s: Pegasus Planner
    • Labels:
      None

      Description

      Hi Larne,

      The dag file:

      /local/user/lppekows/pycbc-tmp.dRq2USWe1Y/work/main_ID0000001/main-0.dag

      on atlas2 is invalid. If you look at the end of the file you can see a
      partial entry, which suggests that the process writing the dag
      terminated ... as the PARENT...CHILD entries are written last, this
      file has none of them. If you look in:

      /local/user/lppekows/pycbc-tmp.dRq2USWe1Y/work/subdax_main_ID0000001.pre.log.000

      this theory seems to be confirmed: A failure message for the dag
      writing process is given (out of memory!).

      However when this runs a second time in:

      /local/user/lppekows/pycbc-tmp.dRq2USWe1Y/work/subdax_main_ID0000001.pre.log.001

      it sees the existing .dag file and just tries to submit it. That looks
      like a bug in pegasus.

      Cheers
      Ian

      On 24 July 2015 at 04:06, Larne Pekowsky <lppekows@syr.edu> wrote:
      Hi all,

      I have a workflow on atlas, started from


      /home/lppekows/projects/cbc/pycbc1.1_review/analysis8_ahope-same-harm-exact-nomax-nosubbank/962582415-963187215

      and running in

       /local/user/lppekows/pycbc-tmp.dRq2USWe1Y/work

      It looks like none of the inspiral jobs were scheduled. They’re in
      main-0.dag, but there are no inspiral*out* or inspiral*err* files, the
      workflow seems to have just jumped directly to the llwadd jobs.

      Has anyone seen anything like this before?

      Thanks,

      - Larne

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              dbrown Duncan Brown
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: