Our syztester report the lockdep WARNING [1]. kmemleak_scan_thread() invokes scan_block() which may invoke a nomal printk() to print warning message. This can cause a deadlock in the scenario reported below:
CPU0 CPU1 ---- ---- lock(kmemleak_lock); lock(&port->lock); lock(kmemleak_lock); lock(console_owner);
To solve this problem, switch to printk_safe mode before printing warning message, this will redirect all printk()-s to a special per-CPU buffer, which will be flushed later from a safe context (irq work), and this deadlock problem can be avoided. The proper API to use should be printk_deferred_enter()/printk_deferred_exit() if we want to deferred the printing [2].
This patch also fixes other similar case that need to use the printk deferring [3].
[1] https://lore.kernel.org/all/20250730094914.566582-1-gubowen5@huawei.com/ [2] https://lore.kernel.org/all/5ca375cd-4a20-4807-b897-68b289626550@redhat.com/ [3] https://lore.kernel.org/all/aJCir5Wh362XzLSx@arm.com/ ====================
Cc: stable@vger.kernel.org # 5.10 Signed-off-by: Gu Bowen gubowen5@huawei.com --- mm/kmemleak.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 4801751cb6b6..b9cb321c1cf3 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -390,9 +390,15 @@ static struct kmemleak_object *lookup_object(unsigned long ptr, int alias) else if (object->pointer == ptr || alias) return object; else { + /* + * Printk deferring due to the kmemleak_lock held. + * This is done to avoid deadlock. + */ + printk_deferred_enter(); kmemleak_warn("Found object by alias at 0x%08lx\n", ptr); dump_object_info(object); + printk_deferred_exit(); break; } } @@ -632,6 +638,11 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, else if (parent->pointer + parent->size <= ptr) link = &parent->rb_node.rb_right; else { + /* + * Printk deferring due to the kmemleak_lock held. + * This is done to avoid deadlock. + */ + printk_deferred_enter(); kmemleak_stop("Cannot insert 0x%lx into the object search tree (overlaps existing)\n", ptr); /* @@ -639,6 +650,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, * be freed while the kmemleak_lock is held. */ dump_object_info(parent); + printk_deferred_exit(); kmem_cache_free(object_cache, object); object = NULL; goto out;
On Wed, Aug 13, 2025 at 04:53:10PM +0800, Gu Bowen wrote:
Our syztester report the lockdep WARNING [1]. kmemleak_scan_thread() invokes scan_block() which may invoke a nomal printk() to print warning message. This can cause a deadlock in the scenario reported below:
CPU0 CPU1 ---- ----
lock(kmemleak_lock); lock(&port->lock); lock(kmemleak_lock); lock(console_owner);
To solve this problem, switch to printk_safe mode before printing warning message, this will redirect all printk()-s to a special per-CPU buffer, which will be flushed later from a safe context (irq work), and this deadlock problem can be avoided. The proper API to use should be printk_deferred_enter()/printk_deferred_exit() if we want to deferred the printing [2].
This patch also fixes other similar case that need to use the printk deferring [3].
[1] https://lore.kernel.org/all/20250730094914.566582-1-gubowen5@huawei.com/ [2] https://lore.kernel.org/all/5ca375cd-4a20-4807-b897-68b289626550@redhat.com/ [3] https://lore.kernel.org/all/aJCir5Wh362XzLSx@arm.com/ ====================
Cc: stable@vger.kernel.org # 5.10 Signed-off-by: Gu Bowen gubowen5@huawei.com
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
On Wed, 13 Aug 2025 16:53:10 +0800 Gu Bowen gubowen5@huawei.com wrote:
Our syztester report the lockdep WARNING [1]. kmemleak_scan_thread() invokes scan_block() which may invoke a nomal printk() to print warning message. This can cause a deadlock in the scenario reported below:
CPU0 CPU1 ---- ----
lock(kmemleak_lock); lock(&port->lock); lock(kmemleak_lock); lock(console_owner);
To solve this problem, switch to printk_safe mode before printing warning message, this will redirect all printk()-s to a special per-CPU buffer, which will be flushed later from a safe context (irq work), and this deadlock problem can be avoided. The proper API to use should be printk_deferred_enter()/printk_deferred_exit() if we want to deferred the printing [2].
This patch also fixes other similar case that need to use the printk deferring [3].
...
--- a/mm/kmemleak.c +++ b/mm/kmemleak.c
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Best Regards, Guber
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
On Thu, Aug 14, 2025 at 02:08:35PM +0100, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
Why not take the upstream changes instead?
On Thu, Aug 14, 2025 at 03:56:58PM +0200, Greg Kroah-Hartman wrote:
On Thu, Aug 14, 2025 at 02:08:35PM +0100, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
Why not take the upstream changes instead?
Gu reckons there are 40 patches - https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d...
I haven't checked what ended in mainline and whether we could do with fewer backports.
On Thu, Aug 14, 2025 at 03:38:23PM +0100, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 03:56:58PM +0200, Greg Kroah-Hartman wrote:
On Thu, Aug 14, 2025 at 02:08:35PM +0100, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
Why not take the upstream changes instead?
Gu reckons there are 40 patches - https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d...
40 really isn't that much overall, we've taken way more for much smaller issues :)
I haven't checked what ended in mainline and whether we could do with fewer backports.
I'll leave that all up to the people who are still wanting these older kernels.
thanks,
greg k-h
On Thu, Aug 14, 2025 at 04:54:33PM +0200, Greg Kroah-Hartman wrote:
On Thu, Aug 14, 2025 at 03:38:23PM +0100, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 03:56:58PM +0200, Greg Kroah-Hartman wrote:
On Thu, Aug 14, 2025 at 02:08:35PM +0100, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
Why not take the upstream changes instead?
Gu reckons there are 40 patches - https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d...
40 really isn't that much overall, we've taken way more for much smaller issues :)
TBH, I'm not sure it's worth it. That's a potential deadlock on a rare error condition (a kmemleak bug or something wrong with the sites calling the kmemleak API).
I haven't checked what ended in mainline and whether we could do with fewer backports.
I'll leave that all up to the people who are still wanting these older kernels.
Good point. Thanks for the advice ;).
On 8/14/2025 9:08 PM, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
I have discovered that I made a mistake, this fix patch should be merged into the mainline. Since we have identified two types of deadlocks, the AA deadlock [1] and the ABBA deadlock[2], the latter's deadlock path no longer exists in the mainline due to the 40 patches that refactored console_lock. However, the AA deadlock issue persists, so I believe this fix should be applied to the mainline.
[1] https://lore.kernel.org/all/20250731-kmemleak_lock-v1-1-728fd470198f@debian.... [2] https://lore.kernel.org/all/20250730094914.566582-1-gubowen5@huawei.com/
Best Regards, Guber
On Mon, Aug 18, 2025 at 10:24:38AM +0800, Gu Bowen wrote:
On 8/14/2025 9:08 PM, Catalin Marinas wrote:
On Thu, Aug 14, 2025 at 10:33:56AM +0800, Gu Bowen wrote:
On 8/14/2025 6:56 AM, Andrew Morton wrote:
I'm not sure which kernel version this was against, but kmemleak.c has changed quite a lot.
Could we please see a patch against a latest kernel version? Linus mainline will suit.
Thanks.
I discovered this issue in kernel version 5.10. Afterwards, I reviewed the code of the mainline version and found that this deadlock path no longer exists due to the refactoring of console_lock in v6.2-rc1. For details on the refactoring, you can refer to this link : https://lore.kernel.org/all/20221116162152.193147-1-john.ogness@linutronix.d.... Therefore, theoretically, this issue existed before the refactoring of console_lock.
Oh, so you can no longer hit this issue with mainline. This wasn't mentioned (or I missed it) in the commit log.
So this would be a stable-only fix that does not have a correspondent upstream. Adding Greg for his opinion.
I have discovered that I made a mistake, this fix patch should be merged into the mainline. Since we have identified two types of deadlocks, the AA deadlock [1] and the ABBA deadlock[2], the latter's deadlock path no longer exists in the mainline due to the 40 patches that refactored console_lock. However, the AA deadlock issue persists, so I believe this fix should be applied to the mainline.
[1] https://lore.kernel.org/all/20250731-kmemleak_lock-v1-1-728fd470198f@debian.... [2] https://lore.kernel.org/all/20250730094914.566582-1-gubowen5@huawei.com/
Pleasae submit it as a normal patch then.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org