On Mon, Sep 29, 2025 at 04:13:06PM +0206, John Ogness wrote:
On 2025-09-26, Breno Leitao leitao@debian.org wrote:
My concern is when printk() is called with kmemleak_lock held(). Something as:
raw_spin_lock_irqsave(&kmemleak_lock, flags); -> printk()
This is instant deadlock when netconsole is enabled. Given that netconsole tries to allocate memory when flushing. Similarly to commit 47b0f6d8f0d2be ("mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock").
Yes, it is a known problem that a caller must not hold any locks that are used during console printing. Locking the serial port lock (uart_port->lock) and calling printk() also leads to deadlock if that port is registered as a serial console.
This is properly fixed by converting to the new nbcon console API, which netconsole is currently working on. But until then something like Breno is suggesting will provide a functional workaround.
Note that printk_deferred_enter/exit() require migration to be disabled. If kmemleak_lock() is not always being called in such a context, it cannot enable deferring.
One option is to enable deferring after taking the lock:
void kmemleak_lock(unsigned long *flags) { raw_spin_lock_irqsave(&kmemleak_lock, flags); printk_deferred_enter(); }
printk() always defers in NMI context, so there is no risk if an NMI jumped in between locking and deferring and then called printk().
The hack above would guarantee that all printks() inside kmemleak_lock critical area to be deferred, and not executed inline.
Yes, although I think netconsole is the only console that tries to allocate memory. So if this hack is used, it should at least be wrapped by an ifdef CONFIG_NETCONSOLE.
Although it would be preferable if netconsole did not need to allocate memory for flushing.
Most (all?) of the allocation is in the skb, where alloc_skb() is done.
In fact, netconsole maintains a pool of 32 skbs that is used when alloc_skb() fails. And I see cases when that is exhausted (when OOM causes a lot of messages to be flushed).
If we want to make alloc_skb() out of the TX path, then we probably need a bigger (configurable?) pool of SKBs.