-
Type: Bug
-
Resolution: Fixed
-
Priority: Major
-
Affects Version/s: None
-
Component/s: None
-
None
I have had monitord blow up on me again. This was with a checkout from 11/2.
This happened for a workflow of workflows. But what is interesting is that I had two of the workflows running, and monitord only had problems with one of them.
Some details I collected (also see attached memory graph):
Tasks: 555 total, 1 running, 554 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.8%us, 1.5%sy, 0.0%ni, 86.0%id, 3.5%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 8132440k total, 7898640k used, 233800k free, 18132k buffers
Swap: 23826424k total, 9477152k used, 14349272k free, 209320k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1428 rynge 20 0 14.3g 5.7g 1600 D 0 73.9 7:46.58 python
$ lsof | grep "python 1428" | wc -l
133
$ ls -lh gp-0.stampede.db
rw-rr- 1 rynge rynge 379M Nov 4 00:57 gp-0.stampede.db