Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1384

.sig Singularity images (naming issue?)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: master, 4.9.2
    • Fix Version/s: master, 4.9.2
    • Component/s: None
    • Labels:
      None

      Description

      I tried to submit small Pegasus workflow to OSG with the Singularity image I created earlier, but I encountered some errors.
      I believe I am missing something, so I would appreciate if you can give me some hints when you have time.


      1.
      The error I am getting is:
      --------------Job stderr file - 00/00/ex_sra_run_ID0000001.err.001--------------

      2019-08-01 01:17:52: PegasusLite: version 4.9.1
      2019-08-01 01:17:52: Executing on host hcc-5139558.0-red-c7107.unl.edu OSG_SITE_NAME=Red GLIDEIN_Site=Nebraska GLIDEIN_ResourceName=Nebraska

      ########################[Pegasus Lite] Setting up workdir ########################
      2019-08-01 01:17:52: Not creating a new work directory as it is already set to /var/lib/condor/execute/dir_307126/glide_AwvrJV/execute/dir_9774

      ##############[Pegasus Lite] Figuring out the worker package to use ##############
      2019-08-01 01:17:52: The job contained a Pegasus worker package
      2019-08-01 01:17:52: Warning: worker package pegasus-worker-4.9.1-x86_64_rhel_6.tar.gz does not seem to match the system x86_64_rhel_7
      2019-08-01 01:17:52: Using /cvmfs/oasis.opensciencegrid.org/osg/projects/pegasus/worker/4.9.1/x86_64_rhel_7 as worker package

      ########[Pegasus Lite] Writing out script to launch user task in container ########
      2019-08-01 01:17:52: Copied credential $X509_USER_PROXY to /var/lib/condor/execute/dir_307126/glide_AwvrJV/execute/dir_9774/myproxy
      2019-08-01 01:17:52: Set $X509_USER_PROXY to /scratch/myproxy (for inside the container)
      2019-08-01 01:17:52: container file is salmonella_ice
      pegasus-lite-common.sh: line 342: docker: command not found
      2019-08-01 01:17:52: Unable to load image from salmonella_ice
      2019-08-01 01:17:52: Last command exited with 1
      PegasusLite: exitcode 1

      In tc.txt I have:
      cont salmonella_ice {
      # type "singularity"
          image "https://workflow.isi.edu/scratch/rynge/ffh-workflow_latest.sif"
      }

      tr ex_sra_run {
          site condor_pool {
              type "INSTALLED"
              container "salmonella_ice"
              pfn "file:///opt/anaconda/bin/fastq-dump"
          }
      }

      And the sites are:
          <site handle="local" arch="x86_64" os="LINUX">
              <directory type="shared-scratch" path="${PWD}/scratch">
                  <file-server operation="all" url="file://${PWD}/scratch"/>
              </directory>
              <directory type="local-storage" path="${PWD}/outputs">
                  <file-server operation="all" url="file://${PWD}/outputs"/>
              </directory>
          </site>

          <site handle="local-hcc" arch="x86_64" os="LINUX">
              <directory type="shared-scratch" path="${PWD}/out">
                  <file-server operation="all" url="file://${PWD}/out"/>
              </directory>
              <profile namespace="pegasus" key="style">glite</profile>
              <profile namespace="condor" key="grid_resource">batch slurm</profile>
              <profile namespace="pegasus" key="queue">batch,tmp_anvil,devel</profile>
              <profile namespace="env" key="PEGASUS_HOME">/usr</profile>
              <profile namespace="env" key="PATH">/usr/bin:/bin:/sbin/</profile>
              <profile namespace="condor" key="request_memory"> ifthenelse(isundefined(DAGNodeRetry) || DAGNodeRetry == 2000, 4000, 6000) </profile>
          </site>

          <site handle="condor_pool" arch="x86_64" os="LINUX">
              <profile namespace="condor" key="requirements">HasSingularity == True</profile>
              <profile namespace="pegasus" key="style" >condor</profile>
              <profile namespace="condor" key="universe" >vanilla</profile>
              <profile namespace="condor" key="request_memory" >2 GB</profile>
              <profile namespace="condor" key="request_disk" >5 GB</profile>
          </site>

      Since the error is "pegasus-lite-common.sh: line 342: docker: command not found", is there something additionally I need to specify in the requirements section?

      2.
      In tc.txt I initially had:
      cont salmonella_ice {
          type "singularity"
          image "docker://npavlovikj/ffh-workflow:latest"
      }
      However, I got the error "Unable to pull docker://npavlovikj/ffh-workflow:latest: While searching for mksquashfs: exec: &quot;mksquashfs&quot;: executable file not found in $PATH" for the staging part. We do have "mksquashfs" on both login and worker nodes in "/sbin/", and I tried adding this to PATH in dax.py and sites.xml, but I couldn't overwrite the PATH var shown by the workflow which only has "/usr/bin:/bin" in it.

      3.
      Because the above didn't work, and I can not push to "shub" in a straight-forward manner, I decided to use the link you created for me last time, "https://workflow.isi.edu/scratch/rynge/ffh-workflow_latest.sif". When I use URL, I can not specify type to either "docker" or "singularity", because it complains about the .sif extension (if I don't use the extension, it downloads "https://workflow.isi.edu/scratch/rynge/ffh-workflow_latest" which doesn't exist).


      I tried the "Population Modeling using Containers" tutorial example you have provided using Singularity and OSG, and that exampled worked fine. Therefore, I wonder if the error I get is because of the type of source I use for the container? If uploading the image to the CVMFS Singularity repository is easier and will fix this, then I can do that. The image I have now is not the final one, but as long as the OSG image is automatically updated when I modify mine and I don't need to bother anyone to do that for me, I am ok with doing that.

        Attachments

          Activity

            People

            • Assignee:
              vahi Karan Vahi
              Reporter:
              rynge Mats Rynge
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: