Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1051

Error missing when nodes, cores, and ppn are all specified

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.6.0
    • Fix Version/s: 4.7.0, 4.6.1
    • Component/s: pegasus-plan
    • Labels:
      None

      Description

      If all three are specified, then the planner prints out this:

      2016.02.02 15:12:29.646 EST: [DEBUG] Postscript constructed is
      /autofs/nccs-svm1_sw/redhat6/pegasus/4.6.0/bin/pegasus-exitcode
      2016.02.02 15:12:29.649 EST: [DEBUG] Written Submit file :
      chmod_acme-run_ID0000002_0.sub
      2016.02.02 15:12:29.649 EST: [DEBUG] Applying priority of 800 to
      chmod_acme-setup_ID0000001_0
      2016.02.02 15:12:29.652 EST: [DEBUG] Trying to get TCEntries for
      pegasus::kickstart on resource local-pbs-titan of type INSTALLED
      2016.02.02 15:12:29.652 EST: [DEBUG] Postscript constructed is
      /autofs/nccs-svm1_sw/redhat6/pegasus/4.6.0/bin/pegasus-exitcode
      2016.02.02 15:12:29.654 EST: [DEBUG] Written Submit file :
      chmod_acme-setup_ID0000001_0.sub
      2016.02.02 15:12:29.654 EST: [DEBUG] Applying priority of 800 to
      chmod_acme-output_ID0000003_0
      2016.02.02 15:12:29.655 EST: [DEBUG] Trying to get TCEntries for
      pegasus::kickstart on resource local-pbs-titan of type INSTALLED
      2016.02.02 15:12:29.655 EST: [DEBUG] Postscript constructed is
      /autofs/nccs-svm1_sw/redhat6/pegasus/4.6.0/bin/pegasus-exitcode
      2016.02.02 15:12:29.656 EST: [DEBUG] Written Submit file :
      chmod_acme-output_ID0000003_0.sub
      2016.02.02 15:12:29.656 EST: [DEBUG] Applying priority of 30 to
      acme-setup_ID0000001
      2016.02.02 15:12:29.658 EST: [DEBUG] Trying to get TCEntries for
      pegasus::kickstart on resource local-pbs-titan of type INSTALLED
      2016.02.02 15:12:29.658 EST: [DEBUG] Postscript constructed is
      /autofs/nccs-svm1_sw/redhat6/pegasus/4.6.0/bin/pegasus-exitcode
      2016.02.02 15:12:29.658 EST: [INFO] event.pegasus.code.generation dax.id
      acme-20160202T180009Z_0 (0.059 seconds) - FINISHED
      2016.02.02 15:12:29.659 EST: [FATAL ERROR] Unable to generate code
      2016.02.02 15:12:29.669 EST: [DEBUG] Sending Planner Metrics to [1 of 1]
      http://metrics.pegasus.isi.edu/metrics
      2016.02.02 15:12:30.097 EST: [DEBUG] Metrics succesfully sent to the
      server
      2016.02.02 15:12:30.098 EST: [DEBUG] Exiting with non-zero exit-code 1
      2016.02.02 15:12:30.098 EST: [INFO] event.pegasus.code.generation dax.id
      acme-20160202T180009Z_0 (0.503 seconds) - FINISHED

      Note that the cause of the RuntimeException is not printed, even with many -v's.

      Here is the actual exception from the metrics server:

      java.lang.RuntimeException: Unable to generate code
      at edu.isi.pegasus.planner.client.CPlanner.executeCommand(CPlanner.java:680)
      at edu.isi.pegasus.planner.client.CPlanner.executeCommand(CPlanner.java:365)
      at edu.isi.pegasus.planner.client.CPlanner.main(CPlanner.java:245)
      Caused by: edu.isi.pegasus.planner.code.generator.condor.CondorStyleException: Invalid combination of cores nodes ppn (1,1,1,) for job acme-setup_ID0000001
      at edu.isi.pegasus.planner.code.generator.condor.style.GLite.handleResourceRequirements(GLite.java:594)
      at edu.isi.pegasus.planner.code.generator.condor.style.GLite.getCERequirementsForJob(GLite.java:339)
      at edu.isi.pegasus.planner.code.generator.condor.style.GLite.apply(GLite.java:239)
      at edu.isi.pegasus.planner.code.generator.condor.CondorGenerator.applyStyle(CondorGenerator.java:1790)
      at edu.isi.pegasus.planner.code.generator.condor.CondorGenerator.generateCode(CondorGenerator.java:679)
      at edu.isi.pegasus.planner.code.generator.condor.CondorGenerator.generateCode(CondorGenerator.java:513)
      at edu.isi.pegasus.planner.client.CPlanner.executeCommand(CPlanner.java:677)
      ... 2 more

      The error should actually be something like:

      "Only two of (nodes, cores, ppn) should be specified for job X"

      And, really, if the values for nodes cores and ppn satisfy nodes * ppn = cores (for example, 1,1,1), then it shouldn't really be an error (maybe a warning). The only time it should be an error is in cases that don't satisfy that equation like: nodes = 2, ppn = 2, cores = 2.

      Maybe the error should be something like:

      "The values of (nodes, ppn, cores) for job X (2,2,2) do not satisfy cores = nodes * ppn. Please specify only two of (nodes, cores, ppn)."

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              gideon Gideon Juve (Inactive)
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: