On Tue 17-04-18 16:15:07, Pavlos Parissis wrote:
On 17/04/2018 04:02 μμ, Jan Kara wrote:
On Tue 17-04-18 12:39:32, Pavlos Parissis wrote:
In one of our production servers where we run kernel version 4.14.32, I noticed the following:
OK, I was looking into this for some time and couldn't find a problem in 4.14.32 code. Can you try running a kernel with CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC enabled to hopefully catch the problem earlier? Thanks!
I can certainly do that, but I can't reproduce that specific crash. So, it may take days before we get a similar crash. Having said that, the soft lockup issue, which I mentioned in another thread, happens once per day, so I hope running a kernel with those debug settings will help you.
Yes, the hope is debug checks will trigger faster than the actual data corruption (or softlockup) you've hit.
Honza