On Mon 2018-12-10 17:45:54, Feng Tang wrote:
Hi Sergey,
- lkml.
On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote:
On (12/06/18 11:58), Feng Tang wrote:
Same here, I tried on several platforms and hardly get the sysrq magic key working, though it works while system is running.
And it make me wondering if those workqueue dependent led blinking code can still really work.
Also, IMHO, if we need a panic blink method, it should better be simple and robust with only HW registers access plus delay function, as I'm not sure if the scheduling can still work.
Anyway, can I propose to make the "local_irq_enable" conditional and off by default, and add a warning.
I'm not sure what to do about this. I think that the behaviour is platform specific. For instance, arm64 keeps secondary CPUs in a busy loop while (1) cpu_relax();
Yes, it's similar to x86's handling for non-panic CPU.
(masked out) and on panic_cpu disables only SDEI (interrupts from firmware, if I got it right); so it seems that arm64 can handle IRQs after panic. And if there are platforms that handle IRQ (including sysrq) after panic, then both options - making printk a noop or keeping local irqs off - maybe can cause some problems. Or maybe not. We better ask arch people.
Yes, this is very valid concern. And after Petr and you raised it, I did some experiments with 3 x86 platforms at my hand, one Apollolake IOT device with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key all works well before panic, and fails after panic. But I did remember the PageUp/PageDown key worked on some laptop years ago. And you actually raised a good question: what do we expect for the post-panic kernel?
I am not sure why it does not work. But it would be nice if sysrq worked.
For the v4 patch, my thought is, for experienced developers to make sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, or runtime change it by "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" while for normal user, they can by default see the clean panic call stack either on a screen or a serial console.
I see this problematic from two reasons:
First, the blinking and especially sysrq might be handy but they are disabled by default. It is too late to do anything when the system is panicing.
Second, new parameters or configure options should be avoided whenever possible. They are easy to implement but the are less easy to use. The current amount of them is already overhelming.
I still think that calming down printk() is acceptable when it can be restored from sysrq.
I think that only few people might be interested into debugging post-panic problems. We could print a warning for them about that printk() has got disabled.
Best Regards, Petr