Uploaded image for project: 'Pegasus'
  1. Pegasus
  2. PM-1055

Interleaved libinterpose records

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 4.6.0
    • Fix Version/s: 4.6.1
    • Component/s: pegasus-kickstart
    • Labels:
      None

      Description

      I tend to run a cron job (minimalistic batch environment that cron jobs get), and last night I got a number of weird errors in my logs:

      2016-02-05T08:31:02-0800 release directory release/20160205
      2016-02-05T08:31:02-0800 list has 19 entries
      kickstart[19814]: mysystem.c[349]: Unrecognized libinterpose record: 1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[19813]: mysystem.c[349]: Unrecognized libinterpose record: b/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[19948]: mysystem.c[349]: Unrecognized libinterpose record: lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[19812]: mysystem.c[349]: Unrecognized libinterpose record: ib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20107]: mysystem.c[349]: Unrecognized libinterpose record: 5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20111]: mysystem.c[349]: Unrecognized libinterpose record: ib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20169]: mysystem.c[349]: Unrecognized libinterpose record: lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20173]: mysystem.c[349]: Unrecognized libinterpose record: 2.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20176]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[19811]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20059]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20186]: mysystem.c[349]: Unrecognized libinterpose record: .22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20183]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20181]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20210]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20213]: mysystem.c[349]: Unrecognized libinterpose record: l-5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[20105]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[19810]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      kickstart[19809]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
      2016-02-05 09:10:14.911 - done after 17833 seconds

      Those are 19 incidences, which corresponds to the number of jobs passed to pegasus-cluster. I am typically running pegasus-kickstart from within pegasus-cluster using a dynamically-generated cluster script in /tmp:

      /usr/local/pegasus/bin/pegasus-cluster -fn 6 -s /dev/null -R /dev/fd/2 /tmp/sldb-57OVHD

      where the file in /tmp looks like this (first line) and all other lines look very similar except for the final basenames:

      /usr/local/pegasus/bin/pegasus-kickstart -Z -n turkey -S /hdfs/jvoeckler/Twitter/data/release/20160205/byLocation/turkey.info -o !/hdfs/jvoeckler/Twitter/data/release/20160205/log/sldb-turkey.log -e !/hdfs/jvoeckler/Twitter/data/release/20160205/log/sldb-turkey.log -l /hdfs/jvoeckler/Twitter/data/release/20160205/log/sldb-turkey.xml /opt/perl/bin/perl /hdfs/jvoeckler/Twitter/data/info2sldb.pl /hdfs/jvoeckler/Twitter/data/release/20160205/byLocation/turkey.info

      The kickstart record does contain a <file ...> entry for IO/Compress/Gzip/Constants.pm though that the warning was about.

      Since the kickstart record no longer contain the environment by default, I cannot tell whether PEGASUS_HOME was indeed set and seen. However, the driver script does ensure that PEGASUS_HOME is set. If unset, it points it to the worker node installation. The variable is typically not set in my login environment, thus the script pointing takes precedence. The kickstart is definitely the new one. And I am exporting KICKSTART_TRACE_ALL=1

      Running a comparable command manually on the command-line (though I have to [1] escape the kickstart ! from the shell, and [2] add a separate output location to avoid overwriting my production files), I don't see any such errors.

      Or is it clustered options again, this time in pegasus-cluster?

      The perl-5.22.1 is self-compiled.

      You may be able to test the use-case - with its spartan environment - using Unix's "at" command.

        Attachments

          Activity

            People

            • Assignee:
              gideon Gideon Juve (Inactive)
              Reporter:
              voeckler Jens Voeckler
            • Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: