Hi again,
On Fri, Jun 26, 2020 at 05:50:00PM +0100, Steve McIntyre wrote:
On Fri, Jun 26, 2020 at 04:25:59PM +0200, Jann Horn wrote:
On Fri, Jun 26, 2020 at 3:41 PM Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Jun 26, 2020 at 12:35:58PM +0100, Steve McIntyre wrote:
...
Considering I'm running strace build tests to provoke this bug, finding the failure in a commit talking about ptrace changes does look very suspicious...!
Annoyingly, I can't reproduce this on my disparate other machines here, suggesting it's maybe(?) timing related.
Does "hard lockup" mean that the HARDLOCKUP_DETECTOR infrastructure prints a warning to dmesg? If so, can you share that warning?
I mean the machine locks hard - X stops updating, the mouse/keyboard stop responding. No pings, etc. When I reboot, there's nothing in the logs.
If you don't have any way to see console output, and you don't have a working serial console setup or such, you may want to try re-running those tests while the kernel is booted with netconsole enabled to log to a different machine over UDP (see https://www.kernel.org/doc/Documentation/networking/netconsole.txt).
ACK, will try that now for you.
You may want to try setting the sysctl kernel.sysrq=1 , then when the system has locked up, press ALT+PRINT+L (to generate stack traces for all active CPUs from NMI context), and maybe also ALT+PRINT+T and ALT+PRINT+W (to collect more information about active tasks).
Nod.
(If you share stack traces from these things with us, it would be helpful if you could run them through scripts/decode_stacktrace.pl from the kernel tree first, to add line number information.)
ACK.
Output passed through scripts/decode_stacktrace.sh attached.
Just about to try John's suggestion next.