Uploaded image for project: 'Pegasus'

turn off concurrency limits by default

XMLWordPrintable

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major Major
    • master, 4.6.0, 4.5.3
    • Affects Version/s: master, 4.5.2
    • Component/s: Pegasus Planner
    • None
    • Environment:
      HTCondor with partition able slots

      For 4.5 release, we had started associating concurrency limits with the jobs by default. however, it seems to have a side affect if there are partitionalble slots.

      email from Mats below
      Pegasus has recently started to add concurrency limits on certain jobs by default. The idea was to always label our jobs, and then a user could set the limits later if they felt they needed to control the jobs. A side effect seems to be that scheduling on partitionable slots has slowed down. We did a simple test on a single machine and 15 jobs:

      universe = vanilla
      requirements = Machine == "workflow.isi.edu"
      concurrency_limits = peg.foo
      executable = test.sh
      output = outputs/$(Cluster).$(Process).out
      error = outputs/$(Cluster).$(Process).err
      log = outputs/$(Cluster).$(Process).log
      should_transfer_files = YES
      when_to_transfer_output = ON_EXIT
      Notification = never
      queue 15

      With the concurrency limit, we get one job started per negotiation cycle, which means ~15 minutes for the job to start. Commenting out the concurrency line makes all 15 jobs start almost instantaneously. Is this expected behavior? Is there something wrong in out configuration?

      We have no limits for peg.foo, and CONCURRENCY_LIMIT_DEFAULT is the default.

      $ condor_config_val CONCURRENCY_LIMIT_DEFAULT
      2308032

      The HTCondor version is 8.2.9. I have attached out config dump.

      Thanks,

            Assignee:
            vahi Karan Vahi
            Reporter:
            rynge Mats Rynge
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: