pegasus lite jobs fail at CIT if there is a lost+found dir in the condor scratch dir

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more


    • Type: Bug
    • Resolution: Fixed
    • Priority: Major
    • master, 5.1.0, 5.0.8
    • Affects Version/s: master, 5.0.7
    • Component/s: Pegasus Lite
    • None

      TWO: When running the bash function `pegasus_lite_setup_work_dir` (which all PyCBC jobs will do), all jobs fail. The failure is in the bash command:
      Two: I am seeing failures that I think are a combination of an issue in pegasus and configuration changes on CIT ... I think these failures would happen on older pegasus versions as well (so we may have issues if ldas-osg works the same as the ldas-pcdev4 testing submit machine).

                  # PM-968 if provided, copy lof files from the HTCondor iwd to the PegasusLite work dir
                  find $pegasus_lite_start_dir -name *.lof -exec cp {} $pegasus_lite_work_dir/ \; >/dev/null 2>&1

      which fails with "find: ‘/var/lib/condor/execute/dir_711539/lost+found’: Permission denied". The failure causes the job to stop and fail (even though I think it could be ignored). Some thoughts:

      • On the pegasus end, I think you should only be searching in $pegasus_lite_start_dir and not subdirectories for this file, so making this:

              find $pegasus_lite_start_dir -maxdepth 1 -name *.lof -exec cp {} $pegasus_lite_work_dir/ \; >/dev/null 2>&1

      fixes the issue for me (noting the same command appears also for *.meta two lines later, and needs to be changed there as well). If this change makes sense, could we get it into the 5.0.8 release?

            Mats Rynge
            Ian Harry
            Rajiv Mayani
