Support for SeqExec as a Gridstart implementation

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Type: New Feature
    • Resolution: Fixed
    • Priority: Major
    • 3.0
    • Affects Version/s: None
    • Component/s: Pegasus Planner
    • None
    • Environment:
      amazon ec2 cloud, using S3 to store data

      In case of the cloud environment , where there is no shared filesystem we rely on staging data to S3 storage on the cloud and enabling worker node execution in Pegasus.
      This results in creation of SLS files to be created in the submit directory, that are then staged to the S3 storage as part of the first level staging.
      However, this adds to the data transfer time in reference to how long the workflow executes.

      Since the S3 transfer implementation uses seqexec to execute multiple s3 commands in one job, there is possibility of optimization whereby the contents of all sls files for a job can be coalesced into a single seqexec input file.
      This seqexec input file will then be transferred to the node by condor when running the job.

      To achieve this we require a new GridStart implementation in Pegasus that uses seqexec to launch jobs.

            Assignee:
            Karan Vahi
            Reporter:
            Karan Vahi
            Archiver:
            Rajiv Mayani

              Created:
              Updated:
              Resolved:
              Archived: