+ Sasha
Thanks Petr and Sergey for the reviews.
On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote:
On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
From: Feng Tang feng.tang@intel.com Subject: panic: Avoid the extra noise dmesg
When kernel panic happens, it will first print the panic call stack, then the ending msg like:
[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] ------------[ cut here ]------------
The above message are very useful for debugging.
But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below:
WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace:
<IRQ> try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt
I guess that it is a warning about migrating tasks to an offline CPU.
My v1 patch was trying to add some hacky code into architecture code to address several WARN()s directly, but that turns out to be very hacky and involve much code for many archs.
For people working in console mode, the screen will first show the panic call stack, but immediately overridded by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen.
Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context.
Removing the "local_irq_enable" will avoid the noisy message.
The justification for the removing is: when code runs to this point, it means user has chosed to not reboot, or do any special handling by using the panic notifier method, no much point in re-enabling the interrupt.
Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.c... Signed-off-by: Feng Tang feng.tang@intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Kees Cook keescook@chromium.org Cc: Borislav Petkov bp@suse.de Cc: Petr Mladek pmladek@suse.com Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
--- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg +++ a/kernel/panic.c @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) } #endif pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
- local_irq_enable(); for (i = 0; ; i += PANIC_TIMER_STEP) { touch_softlockup_watchdog(); if (i >= i_next) {
Hmm, this calls panic_blink(). It seems that it depends on workqueues and the scheduler:
- led_panic_blink()
- led_trigger_event()
- led_set_brightness()
- schedule_work(set_brightness_work)
I guess that blinking might be important in some situations and on some devices. On the other hand, we are interested only into blinking from this point on.
The easiest solution seems to be to make a noop from printk(). For example, we could add a global flag:
int panic_blinking;
and add the following into vprintk_func()
/* * Do not push away real panic() message by warnings from led * blinking code. */ if (panic_blinking) return 0;
How does that sound?
This should be able to achieve the same goal.
One thing I can think of is what mentioned by Sergey that some sysrq handler may want to print out something, but it should mostly be covered by 2 other panic debug print patches, which will print out task/mem/timer/lock/ftrace info runtime on demand.
Thanks, Feng