-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major
-
Affects Version/s: 4.6.0
-
Component/s: pegasus-kickstart
-
None
I tend to run a cron job (minimalistic batch environment that cron jobs get), and last night I got a number of weird errors in my logs:
2016-02-05T08:31:02-0800 release directory release/20160205
2016-02-05T08:31:02-0800 list has 19 entries
kickstart[19814]: mysystem.c[349]: Unrecognized libinterpose record: 1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[19813]: mysystem.c[349]: Unrecognized libinterpose record: b/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[19948]: mysystem.c[349]: Unrecognized libinterpose record: lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[19812]: mysystem.c[349]: Unrecognized libinterpose record: ib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20107]: mysystem.c[349]: Unrecognized libinterpose record: 5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20111]: mysystem.c[349]: Unrecognized libinterpose record: ib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20169]: mysystem.c[349]: Unrecognized libinterpose record: lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20173]: mysystem.c[349]: Unrecognized libinterpose record: 2.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20176]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[19811]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20059]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20186]: mysystem.c[349]: Unrecognized libinterpose record: .22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20183]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20181]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20210]: mysystem.c[349]: Unrecognized libinterpose record: -5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20213]: mysystem.c[349]: Unrecognized libinterpose record: l-5.22.1/lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[20105]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[19810]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
kickstart[19809]: mysystem.c[349]: Unrecognized libinterpose record: /lib/site_perl/5.22.1/IO/Compress/Gzip/Constants.pm' 3903 3903 0 1 0 0 0
2016-02-05 09:10:14.911 - done after 17833 seconds
Those are 19 incidences, which corresponds to the number of jobs passed to pegasus-cluster. I am typically running pegasus-kickstart from within pegasus-cluster using a dynamically-generated cluster script in /tmp:
/usr/local/pegasus/bin/pegasus-cluster -fn 6 -s /dev/null -R /dev/fd/2 /tmp/sldb-57OVHD
where the file in /tmp looks like this (first line) and all other lines look very similar except for the final basenames:
/usr/local/pegasus/bin/pegasus-kickstart -Z -n turkey -S /hdfs/jvoeckler/Twitter/data/release/20160205/byLocation/turkey.info -o !/hdfs/jvoeckler/Twitter/data/release/20160205/log/sldb-turkey.log -e !/hdfs/jvoeckler/Twitter/data/release/20160205/log/sldb-turkey.log -l /hdfs/jvoeckler/Twitter/data/release/20160205/log/sldb-turkey.xml /opt/perl/bin/perl /hdfs/jvoeckler/Twitter/data/info2sldb.pl /hdfs/jvoeckler/Twitter/data/release/20160205/byLocation/turkey.info
The kickstart record does contain a <file ...> entry for IO/Compress/Gzip/Constants.pm though that the warning was about.
Since the kickstart record no longer contain the environment by default, I cannot tell whether PEGASUS_HOME was indeed set and seen. However, the driver script does ensure that PEGASUS_HOME is set. If unset, it points it to the worker node installation. The variable is typically not set in my login environment, thus the script pointing takes precedence. The kickstart is definitely the new one. And I am exporting KICKSTART_TRACE_ALL=1
Running a comparable command manually on the command-line (though I have to [1] escape the kickstart ! from the shell, and [2] add a separate output location to avoid overwriting my production files), I don't see any such errors.
Or is it clustered options again, this time in pegasus-cluster?
The perl-5.22.1 is self-compiled.
You may be able to test the use-case - with its spartan environment - using Unix's "at" command.