I'm seeing problem with monitord not exiting cleanly after the condor_dagman process has exited. This has happened a few times, so it's not a one-off. An example is currently job 16715729.0 on sugwg-osg.phy.syr.edu. The dagman process has exited:
11/28/18 12:24:25 **** condor_scheduniv_exec.16715729.0 (condor_DAGMAN) pid 2122725 EXITING WITH STATUS 1
However, monitord has not exited:
[dbrown@sugwg-osg ~]$ ps wwwwaux | grep o1-analysis-3-v1_13_0-LOSC_16_V1-0.dag.dagman.out
dbrown 2251055 10.4 0.2 494072 245656 ? S 11:50 17:39 /usr/bin/python2.7 /usr/bin/pegasus-monitord -N o1-analysis-3-v1_13_0-LOSC_16_V1-0.dag.dagman.out
and so pegasus-dagman has not exited:
dbrown 2122719 0.0 0.0 199784 11668 ? Ss 09:57 0:00 /usr/bin/python2.7 /usr/bin/pegasus-dagman -p 0 -f -l . -Lockfile o1-analysis-3-v1_13_0-LOSC_16_V1-0.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag o1-analysis-3-v1_13_0-LOSC_16_V1-0.dag -MaxPre 1 -MaxPost 20 -Suppress_notification -CsdVersion $CondorVersion: 8.6.7 Oct 29 2017 BuildID: 422776 $ -Dagman /bin/condor_dagman
I'll leave this job in the queue so Karan can investigate.