It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) { - struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object); + const void *caller; + struct vm_struct *vm; + struct vmap_area *va; + unsigned long addr; + unsigned int nr_pages;
- vm = find_vm_area(objp); - if (!vm) + if (!spin_trylock(&vmap_area_lock)) + return false; + va = __find_vmap_area((unsigned long)objp, &vmap_area_root); + if (!va) { + spin_unlock(&vmap_area_lock); return false; + } + + vm = va->vm; + if (!vm) { + spin_unlock(&vmap_area_lock); + return false; + } + addr = (unsigned long)vm->addr; + caller = vm->caller; + nr_pages = vm->nr_pages; + spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", - vm->nr_pages, (unsigned long)vm->addr, vm->caller); + nr_pages, addr, caller); return true; } #endif
From: Zqiang qiang.zhang1211@gmail.com
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something.
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Zqiang qiang.zhang1211@gmail.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..406634f26918 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) if (vmalloc_dump_obj(object)) return;
- if (virt_addr_valid(object)) + if (is_vmalloc_addr(object)) + type = "vmalloc memory"; + else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory"; else if (object == NULL) type = "NULL pointer";
On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
From: Zqiang qiang.zhang1211@gmail.com
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something.
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Zqiang qiang.zhang1211@gmail.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..406634f26918 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) if (vmalloc_dump_obj(object)) return;
- if (virt_addr_valid(object))
- if (is_vmalloc_addr(object))
type = "vmalloc memory";
- else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
I think you should update this to say non-slab/non-vmalloc memory (as much as that description sucks!) as this phrasing in the past meant to say 'non-slab or vmalloc memory' (already confusing phrasing) so better to be clear.
else if (object == NULL) type = "NULL pointer"; -- 2.42.0.283.g2d96d420d3-goog
On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
From: Zqiang qiang.zhang1211@gmail.com
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something.
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Zqiang qiang.zhang1211@gmail.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..406634f26918 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) if (vmalloc_dump_obj(object)) return;
- if (virt_addr_valid(object))
- if (is_vmalloc_addr(object))
type = "vmalloc memory";
- else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
I think you should update this to say non-slab/non-vmalloc memory (as much as that description sucks!) as this phrasing in the past meant to say 'non-slab or vmalloc memory' (already confusing phrasing) so better to be clear.
True, though the issue you mentioned it is in existing code, a respin of this patch could update it to say non-vmalloc. Good point, thanks for reviewing!
- Joel
On Tue, 5 Sept 2023 at 12:48, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
From: Zqiang qiang.zhang1211@gmail.com
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something.
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Zqiang qiang.zhang1211@gmail.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..406634f26918 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) if (vmalloc_dump_obj(object)) return;
- if (virt_addr_valid(object))
- if (is_vmalloc_addr(object))
type = "vmalloc memory";
- else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
I think you should update this to say non-slab/non-vmalloc memory (as much as that description sucks!) as this phrasing in the past meant to say 'non-slab or vmalloc memory' (already confusing phrasing) so better to be clear.
True, though the issue you mentioned it is in existing code, a respin of this patch could update it to say non-vmalloc. Good point, thanks for reviewing!
No it's not, you're changing the meaning, because you changed the code that determines the output...
This has been merged now despite my outstanding comments (!) so I guess I'll have to send a follow up patch to address this.
- Joel
-- Lorenzo Stoakes https://ljs.io
On 9/6/23 21:18, Lorenzo Stoakes wrote:
On Tue, 5 Sept 2023 at 12:48, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
From: Zqiang qiang.zhang1211@gmail.com
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something.
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Zqiang qiang.zhang1211@gmail.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..406634f26918 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) if (vmalloc_dump_obj(object)) return;
- if (virt_addr_valid(object))
- if (is_vmalloc_addr(object))
type = "vmalloc memory";
- else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
I think you should update this to say non-slab/non-vmalloc memory (as much as that description sucks!) as this phrasing in the past meant to say 'non-slab or vmalloc memory' (already confusing phrasing) so better to be clear.
True, though the issue you mentioned it is in existing code, a respin of this patch could update it to say non-vmalloc. Good point, thanks for reviewing!
No it's not, you're changing the meaning, because you changed the code that determines the output...
I think it has always meant (but clearly it's not unambiguously worded) "not slab && not vmalloc", that is before and after this patch. Only in case patch 1 is applied and patch 2 not, can the output be wrong in that a vmalloc pointer will (in case of trylock fail) be classified as "not slab && not vmalloc", but seems fine to me after patch 2.
I guess if we wanted, we could also rewrite it to be more like the kmem check in the beginning of mem_dump_obj(), so there would be:
if (is_vmalloc_addr(...)) { vmalloc_dump_obj(...); return; }
where vmalloc_dump_obj() itself would print at least "vmalloc memory" with no further details in case of trylock failure.
that assumes is_vmalloc_addr() is guaranteed to be true for all addresses that __find_vmap_area resolves, otherwise it could miss something compared to current code. Is it guaranteed?
This has been merged now despite my outstanding comments (!) so I guess I'll have to send a follow up patch to address this.
- Joel
-- Lorenzo Stoakes https://ljs.io
On Thu, Sep 07, 2023 at 09:10:38AM +0200, Vlastimil Babka wrote:
On 9/6/23 21:18, Lorenzo Stoakes wrote:
On Tue, 5 Sept 2023 at 12:48, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
From: Zqiang qiang.zhang1211@gmail.com
Currently, for double invoke call_rcu(), will dump rcu_head objects memory info, if the objects is not allocated from the slab allocator, the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock need to be held, since the call_rcu() can be invoked in interrupt context, therefore, there is a possibility of spinlock deadlock scenarios.
And in Preempt-RT kernel, the rcutorture test also trigger the following lockdep warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 3 locks held by swapper/0/1: #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70 irq event stamp: 565512 hardirqs last enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940 hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370 softirqs last enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170 softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0 Preemption disabled at: [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.5.0-rc4-rt2-yocto-preempt-rt+ #15 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x68/0xb0 dump_stack+0x14/0x20 __might_resched+0x1aa/0x280 ? __pfx_rcu_torture_err_cb+0x10/0x10 rt_spin_lock+0x53/0x130 ? find_vmap_area+0x1f/0x70 find_vmap_area+0x1f/0x70 vmalloc_dump_obj+0x20/0x60 mem_dump_obj+0x22/0x90 __call_rcu_common+0x5bf/0x940 ? debug_smp_processor_id+0x1b/0x30 call_rcu_hurry+0x14/0x20 rcu_torture_init+0x1f82/0x2370 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_leak_cb+0x10/0x10 ? __pfx_rcu_torture_init+0x10/0x10 do_one_initcall+0x6c/0x300 ? debug_smp_processor_id+0x1b/0x30 kernel_init_freeable+0x2b9/0x540 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x150 ret_from_fork+0x40/0x50 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The previous patch fixes this by using the deadlock-safe best-effort version of find_vm_area. However, in case of failure print the fact that the pointer was a vmalloc pointer so that we print at least something.
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Zqiang qiang.zhang1211@gmail.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/util.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c index dd12b9531ac4..406634f26918 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object) if (vmalloc_dump_obj(object)) return;
- if (virt_addr_valid(object))
- if (is_vmalloc_addr(object))
type = "vmalloc memory";
- else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
I think you should update this to say non-slab/non-vmalloc memory (as much as that description sucks!) as this phrasing in the past meant to say 'non-slab or vmalloc memory' (already confusing phrasing) so better to be clear.
True, though the issue you mentioned it is in existing code, a respin of this patch could update it to say non-vmalloc. Good point, thanks for reviewing!
No it's not, you're changing the meaning, because you changed the code that determines the output...
I think it has always meant (but clearly it's not unambiguously worded) "not slab && not vmalloc", that is before and after this patch. Only in case patch 1 is applied and patch 2 not, can the output be wrong in that a vmalloc pointer will (in case of trylock fail) be classified as "not slab && not vmalloc", but seems fine to me after patch 2.
I guess if we wanted, we could also rewrite it to be more like the kmem check in the beginning of mem_dump_obj(), so there would be:
if (is_vmalloc_addr(...)) { vmalloc_dump_obj(...); return; }
where vmalloc_dump_obj() itself would print at least "vmalloc memory" with no further details in case of trylock failure.
that assumes is_vmalloc_addr() is guaranteed to be true for all addresses that __find_vmap_area resolves, otherwise it could miss something compared to current code. Is it guaranteed?
It is guaranteed based on my reading of the code. But maybe it may aid additional vmalloc-internals debugging if for some reason the address of the object stored in the vmalloc data structures is out of bound for some reason and the lookup actually succeded. That's just a hypothetical situation though and I don't think that that can actually happen.
thanks,
- Joel
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
- va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
- if (!va) {
return false;spin_unlock(&vmap_area_lock);
- }
- vm = va->vm;
- if (!vm) {
spin_unlock(&vmap_area_lock);
return false;
- }
- addr = (unsigned long)vm->addr;
- caller = vm->caller;
- nr_pages = vm->nr_pages;
- spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
vm->nr_pages, (unsigned long)vm->addr, vm->caller);
return true;nr_pages, addr, caller);
}
#endif
2.42.0.283.g2d96d420d3-goog
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
True. I was hoping the 'trylock' mention would be sufficient (example hardirq context interrupting a lock-held region) but you're right.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
In the patch 2/2 we do print if the address looks like a vmalloc address even if the vmalloc look up fails.
Also the reporter's usecase is not a common one. We only attempt to dump information if there was a debug objects failure (example if somebody did a double call_rcu). In such a situation, the patch will prevent a deadlock and still print something about the address.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
Yes this is a good point, but there's another case as well: PREEMPT_RT can sleep on lock contention (as spinlocks are sleeping) and we can't sleep from call_rcu() as it may be called in contexts that cannot sleep. So we handle that also using trylock.
Thanks for the review!
- Joel
- va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
- if (!va) {
return false;spin_unlock(&vmap_area_lock);
- }
- vm = va->vm;
- if (!vm) {
spin_unlock(&vmap_area_lock);
return false;
- }
- addr = (unsigned long)vm->addr;
- caller = vm->caller;
- nr_pages = vm->nr_pages;
- spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
vm->nr_pages, (unsigned long)vm->addr, vm->caller);
return true;nr_pages, addr, caller);
}
#endif
2.42.0.283.g2d96d420d3-goog
On Tue, 5 Sept 2023 at 12:47, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
True. I was hoping the 'trylock' mention would be sufficient (example hardirq context interrupting a lock-held region) but you're right.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
In the patch 2/2 we do print if the address looks like a vmalloc address even if the vmalloc look up fails.
No, you output exactly what was output before, only changing what it means and in no way differentiating between couldn't find vmalloc area/couldn't get lock.
Also the reporter's usecase is not a common one. We only attempt to dump information if there was a debug objects failure (example if somebody did a double call_rcu). In such a situation, the patch will prevent a deadlock and still print something about the address.
Right, but the function still purports to do X but does Y.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
Yes this is a good point, but there's another case as well: PREEMPT_RT can sleep on lock contention (as spinlocks are sleeping) and we can't sleep from call_rcu() as it may be called in contexts that cannot sleep. So we handle that also using trylock.
Right so somebody now has to find this email to realise that. I hate implicit knowledge like this, it needs a comment. It also furthers the point that it'd be useful to differentiate between the two.
Thanks for the review!
This got merged despite my outstanding comments so I guess I'll have to follow up with a patch.
- Joel
- va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
- if (!va) {
spin_unlock(&vmap_area_lock); return false;
- }
- vm = va->vm;
- if (!vm) {
spin_unlock(&vmap_area_lock);
return false;
- }
- addr = (unsigned long)vm->addr;
- caller = vm->caller;
- nr_pages = vm->nr_pages;
- spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
vm->nr_pages, (unsigned long)vm->addr, vm->caller);
return true;nr_pages, addr, caller);
}
#endif
2.42.0.283.g2d96d420d3-goog
This reads like another 'nice review and I agree but I won't change anything!'...
On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
On Tue, 5 Sept 2023 at 12:47, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
True. I was hoping the 'trylock' mention would be sufficient (example hardirq context interrupting a lock-held region) but you're right.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
In the patch 2/2 we do print if the address looks like a vmalloc address even if the vmalloc look up fails.
No, you output exactly what was output before, only changing what it means and in no way differentiating between couldn't find vmalloc area/couldn't get lock.
Also the reporter's usecase is not a common one. We only attempt to dump information if there was a debug objects failure (example if somebody did a double call_rcu). In such a situation, the patch will prevent a deadlock and still print something about the address.
Right, but the function still purports to do X but does Y.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
Yes this is a good point, but there's another case as well: PREEMPT_RT can sleep on lock contention (as spinlocks are sleeping) and we can't sleep from call_rcu() as it may be called in contexts that cannot sleep. So we handle that also using trylock.
Right so somebody now has to find this email to realise that. I hate implicit knowledge like this, it needs a comment. It also furthers the point that it'd be useful to differentiate between the two.
Thanks for the review!
This got merged despite my outstanding comments so I guess I'll have to follow up with a patch.
- Joel
- va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
- if (!va) {
spin_unlock(&vmap_area_lock); return false;
- }
- vm = va->vm;
- if (!vm) {
spin_unlock(&vmap_area_lock);
return false;
- }
- addr = (unsigned long)vm->addr;
- caller = vm->caller;
- nr_pages = vm->nr_pages;
- spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
vm->nr_pages, (unsigned long)vm->addr, vm->caller);
return true;nr_pages, addr, caller);
}
#endif
2.42.0.283.g2d96d420d3-goog
This reads like another 'nice review and I agree but I won't change anything!'...
Sorry I actually wrote this unkind comment in a moment of annoyance then meant to delete it but of course forgot to :>) Disregard this bit.
Happy for pushback/disagreement, just feel like a few little touchups would have helped improve documentation/clarity of what this series does.
Obviously stability matters more so perhaps touch-ups best as a follow up series... though would be nice to have a comment to that effect.
Thanks!
On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
On Tue, 5 Sept 2023 at 12:47, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
True. I was hoping the 'trylock' mention would be sufficient (example hardirq context interrupting a lock-held region) but you're right.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
In the patch 2/2 we do print if the address looks like a vmalloc address even if the vmalloc look up fails.
No, you output exactly what was output before, only changing what it means and in no way differentiating between couldn't find vmalloc area/couldn't get lock.
2/2 does this: - if (virt_addr_valid(object)) + if (is_vmalloc_addr(object)) + type = "vmalloc memory"; + else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
This code is executed only if vmalloc_dump_obj() returns false. The is_vmalloc_addr() was added by 2/2 which is newly added right?
You are right we are not differentiating between trylock failure and failure to find the vmalloc area. I was just saying, even though we don't differentiate, we do print "vmalloc memory" right? That wasn't being printed before.
Also the reporter's usecase is not a common one. We only attempt to dump information if there was a debug objects failure (example if somebody did a double call_rcu). In such a situation, the patch will prevent a deadlock and still print something about the address.
Right, but the function still purports to do X but does Y.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
Yes this is a good point, but there's another case as well: PREEMPT_RT can sleep on lock contention (as spinlocks are sleeping) and we can't sleep from call_rcu() as it may be called in contexts that cannot sleep. So we handle that also using trylock.
Right so somebody now has to find this email to realise that. I hate implicit knowledge like this, it needs a comment. It also furthers the point that it'd be useful to differentiate between the two.
This is a valid point, and I acknowledged it in last email. A code comment could indeed be useful.
So I guess from an agreement standpoint, I agree:
1/2 could use an additional comment explaining why we need trylock (sighting the RT sleeping lock issue).
2/2 could update the existing code to convert "non-slab/vmalloc" to "non-slab/non-vmalloc". Note: that's an *existing* issue.
The issue in 2/2 is not a new one so that can certainly be a separate patch. And while at it, we could update the comment in that patch as well.
But the whole differentiating between trylock vs vmalloc area lookup failure is not that useful -- just my opinion fwiw! I honestly feel differentiating between trylock vs vmalloc area lookup failure complicates the code because it will require passing this information down from vmalloc_dump_obj() to the caller AFAICS and I am not sure if the person reading the debug will really care much. But I am OK with whatever the -mm community wants and I am happy to send out a new patch on top with the above that I agree on since Andrew took these 2 (but for the stuff I don't agree, I would appreciate if you could send a patch for review and I am happy to review it!).
As you mentioned, this series is a stability fix and we can put touch-ups on top of it if needed, and there is also plenty of time till the next merge window. Allow me a few days and I'll do the new patch on top (I'd say dont bother to spend your time on it, I'll do it).
thanks,
- Joel
-- Lorenzo Stoakes https://ljs.io
On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
On Tue, 5 Sept 2023 at 12:47, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
True. I was hoping the 'trylock' mention would be sufficient (example hardirq context interrupting a lock-held region) but you're right.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
In the patch 2/2 we do print if the address looks like a vmalloc address even if the vmalloc look up fails.
No, you output exactly what was output before, only changing what it means and in no way differentiating between couldn't find vmalloc area/couldn't get lock.
2/2 does this: - if (virt_addr_valid(object)) + if (is_vmalloc_addr(object)) + type = "vmalloc memory"; + else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
This code is executed only if vmalloc_dump_obj() returns false. The is_vmalloc_addr() was added by 2/2 which is newly added right?
You are right we are not differentiating between trylock failure and failure to find the vmalloc area. I was just saying, even though we don't differentiate, we do print "vmalloc memory" right? That wasn't being printed before.
Also the reporter's usecase is not a common one. We only attempt to dump information if there was a debug objects failure (example if somebody did a double call_rcu). In such a situation, the patch will prevent a deadlock and still print something about the address.
Right, but the function still purports to do X but does Y.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
Yes this is a good point, but there's another case as well: PREEMPT_RT can sleep on lock contention (as spinlocks are sleeping) and we can't sleep from call_rcu() as it may be called in contexts that cannot sleep. So we handle that also using trylock.
Right so somebody now has to find this email to realise that. I hate implicit knowledge like this, it needs a comment. It also furthers the point that it'd be useful to differentiate between the two.
This is a valid point, and I acknowledged it in last email. A code comment could indeed be useful.
Thanks, yeah this may seem trivial, but I am quite sensitive about things being added to the code base that are neither described in commit msg nor in a comment or elsewhere and become 'implicit' in a sense.
So just a simple comment here would be helpful, and I'm glad we're in agreement on that, will leave to you to do a follow up patch.
So I guess from an agreement standpoint, I agree:
1/2 could use an additional comment explaining why we need trylock (sighting the RT sleeping lock issue).
2/2 could update the existing code to convert "non-slab/vmalloc" to "non-slab/non-vmalloc". Note: that's an *existing* issue.
Yeah sorry this whole thing was rather confusing, it did indeed (unclearly) specify non-/non- in the past (on assumption dumping function would work), addition of vmalloc check now makes that correct again, the phrasing is the issue.
You can leave this as-is as yeah, you're right, this was a pre-existing issue.
virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is checked above so already been ruled out, now you ruled out vmalloc.
Just a bit tricksy.
The issue in 2/2 is not a new one so that can certainly be a separate patch. And while at it, we could update the comment in that patch as well.
But the whole differentiating between trylock vs vmalloc area lookup failure is not that useful -- just my opinion fwiw! I honestly feel differentiating between trylock vs vmalloc area lookup failure complicates the code because it will require passing this information down from vmalloc_dump_obj() to the caller AFAICS and I am not sure if the person reading the debug will really care much. But I am OK with whatever the -mm community wants and I am happy to send out a new patch on top with the above that I agree on since Andrew took these 2 (but for the stuff I don't agree, I would appreciate if you could send a patch for review and I am happy to review it!).
Ah right, I think maybe I wasn't clear, all I meant to suggest is to output log output rather than feed anything back to caller, something like:-
if (!spin_trylock(&vmap_area_lock)) { pr_cont(" [couldn't acquire vmap lock]\n"); ... }
My concern is that in the past this function would only return false if it couldn't find the address in a VA, now it returns false also if you happen to call it when the spinlock is locked, which might be confusing for somebody debugging this.
HOWEVER, since you now indicate that the address is vmalloc anyway, and you _absolutely cannot_ give any further details safely, perhaps this additional information is indeed not that usful.
My concern was just feeling antsy that we suddenly don't do something because a lock happens to be applied but as you say that cannot be helped in certain contexts.
So actually, leave this.
As you mentioned, this series is a stability fix and we can put touch-ups on top of it if needed, and there is also plenty of time till the next merge window. Allow me a few days and I'll do the new patch on top (I'd say dont bother to spend your time on it, I'll do it).
Ack, I was just a little frustrated we didn't reach a resolution on review (either deciding things could be deferred or having changes) before merge. Obviously fine to prioritise, but would be good to have that explicitly stated.
thanks,
- Joel
Anyway, so TL;DR:-
1. As we both agree, add a comment to explain why you need the spin trylock. (there are no further steps :P)
And I don't believe this actually needs any further changes after this discussion*, so if you fancy doing a follow up to that effect that will suffice for me thanks!
* Though I strongly feel vmalloc as a whole needs top-to-bottom refactoring, but that's another story...
On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
On Tue, 5 Sept 2023 at 12:47, Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
It'd be nice to have more details as to precisely which contexts and what this resolves.
True. I was hoping the 'trylock' mention would be sufficient (example hardirq context interrupting a lock-held region) but you're right.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
It'd be good to have a comment here explaining why we must trylock here. I am also concerned that in the past this function would return false only if the address was not a vmalloc one, but now it might just return false due to lock contention and the user has no idea which it is?
I'd want to at least output "vmalloc region cannot lookup lock contention" vs. the below cannot find case.
In the patch 2/2 we do print if the address looks like a vmalloc address even if the vmalloc look up fails.
No, you output exactly what was output before, only changing what it means and in no way differentiating between couldn't find vmalloc area/couldn't get lock.
2/2 does this: - if (virt_addr_valid(object)) + if (is_vmalloc_addr(object)) + type = "vmalloc memory"; + else if (virt_addr_valid(object)) type = "non-slab/vmalloc memory";
This code is executed only if vmalloc_dump_obj() returns false. The is_vmalloc_addr() was added by 2/2 which is newly added right?
You are right we are not differentiating between trylock failure and failure to find the vmalloc area. I was just saying, even though we don't differentiate, we do print "vmalloc memory" right? That wasn't being printed before.
Also the reporter's usecase is not a common one. We only attempt to dump information if there was a debug objects failure (example if somebody did a double call_rcu). In such a situation, the patch will prevent a deadlock and still print something about the address.
Right, but the function still purports to do X but does Y.
Under heavy lock contention aren't you potentially breaking the ability to introspect vmalloc addresses? Wouldn't it be better to explicitly detect the contexts under which acquiring this spinlock is not appropriate?
Yes this is a good point, but there's another case as well: PREEMPT_RT can sleep on lock contention (as spinlocks are sleeping) and we can't sleep from call_rcu() as it may be called in contexts that cannot sleep. So we handle that also using trylock.
Right so somebody now has to find this email to realise that. I hate implicit knowledge like this, it needs a comment. It also furthers the point that it'd be useful to differentiate between the two.
This is a valid point, and I acknowledged it in last email. A code comment could indeed be useful.
Thanks, yeah this may seem trivial, but I am quite sensitive about things being added to the code base that are neither described in commit msg nor in a comment or elsewhere and become 'implicit' in a sense.
So just a simple comment here would be helpful, and I'm glad we're in agreement on that, will leave to you to do a follow up patch.
So I guess from an agreement standpoint, I agree:
1/2 could use an additional comment explaining why we need trylock (sighting the RT sleeping lock issue).
2/2 could update the existing code to convert "non-slab/vmalloc" to "non-slab/non-vmalloc". Note: that's an *existing* issue.
Yeah sorry this whole thing was rather confusing, it did indeed (unclearly) specify non-/non- in the past (on assumption dumping function would work), addition of vmalloc check now makes that correct again, the phrasing is the issue.
You can leave this as-is as yeah, you're right, this was a pre-existing issue.
virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is checked above so already been ruled out, now you ruled out vmalloc.
Just a bit tricksy.
The issue in 2/2 is not a new one so that can certainly be a separate patch. And while at it, we could update the comment in that patch as well.
But the whole differentiating between trylock vs vmalloc area lookup failure is not that useful -- just my opinion fwiw! I honestly feel differentiating between trylock vs vmalloc area lookup failure complicates the code because it will require passing this information down from vmalloc_dump_obj() to the caller AFAICS and I am not sure if the person reading the debug will really care much. But I am OK with whatever the -mm community wants and I am happy to send out a new patch on top with the above that I agree on since Andrew took these 2 (but for the stuff I don't agree, I would appreciate if you could send a patch for review and I am happy to review it!).
Ah right, I think maybe I wasn't clear, all I meant to suggest is to output log output rather than feed anything back to caller, something like:-
if (!spin_trylock(&vmap_area_lock)) { pr_cont(" [couldn't acquire vmap lock]\n"); ... }
My concern is that in the past this function would only return false if it couldn't find the address in a VA, now it returns false also if you happen to call it when the spinlock is locked, which might be confusing for somebody debugging this.
HOWEVER, since you now indicate that the address is vmalloc anyway, and you _absolutely cannot_ give any further details safely, perhaps this additional information is indeed not that usful.
My concern was just feeling antsy that we suddenly don't do something because a lock happens to be applied but as you say that cannot be helped in certain contexts.
So actually, leave this.
As you mentioned, this series is a stability fix and we can put touch-ups on top of it if needed, and there is also plenty of time till the next merge window. Allow me a few days and I'll do the new patch on top (I'd say dont bother to spend your time on it, I'll do it).
Ack, I was just a little frustrated we didn't reach a resolution on review (either deciding things could be deferred or having changes) before merge. Obviously fine to prioritise, but would be good to have that explicitly stated.
thanks,
- Joel
Anyway, so TL;DR:-
- As we both agree, add a comment to explain why you need the spin trylock.
(there are no further steps :P)
And I don't believe this actually needs any further changes after this discussion*, so if you fancy doing a follow up to that effect that will suffice for me thanks!
For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non slepable context, this is just reality, because we use a sleep type of spinlock.
I am not sure how urgent we need this fix. But to me it looks like debuging and corner case. Probably i am wrong and miss something. But if it is correct, i would just bailout for RT kernel and rework later in a more proper way. For example implement a safe way of RCU scan but this is also another story.
-- Uladzislau Rezki
On Thu, Sep 07, 2023 at 11:23:40AM +0200, Uladzislau Rezki wrote:
On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
[..]
Anyway, so TL;DR:-
- As we both agree, add a comment to explain why you need the spin trylock.
(there are no further steps :P)
And I don't believe this actually needs any further changes after this discussion*, so if you fancy doing a follow up to that effect that will suffice for me thanks!
Thanks.
For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non slepable context, this is just reality, because we use a sleep type of spinlock.
I am not sure how urgent we need this fix. But to me it looks like debuging and corner case. Probably i am wrong and miss something. But if it is correct, i would just bailout for RT kernel and rework later in a more proper way. For example implement a safe way of RCU scan but this is also another story.
Bailing out for RT kernel is insufficient, as we need the trylock() to avoid self-deadlock as well for !PREEMPT_RT. Plus IIRC in the past there was a opposition to special-casing PREEMPT_RT in code as well. Admittedly those PREEMPT_RT cases were related to detecting preempt-disabled than a lock-held section though.
We could optionally do a trylock() loop + bail out after certain number of tries as well but that would compilicate the code a bit more and I am not sure if it is worth it. Still if you guys feel strongly about doing something like that, let me know and I can give it a try :).
thanks,
- Joel
Hi,
On 9/4/23 20:08, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
I was a bit confused by the subject which suggests a new function is added, but it seems open-coded in its only caller. I assume it's due to evolution of the series. Something like:
mm/vmalloc: use trylock for vmap_area_lock in vmalloc_dump_obj()
?
I also notice it's trying hard to copy everything from "vm" to temporary variables before unlocking, presumably to prevent use-after-free, so should that be also mentioned in the changelog?
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
mm/vmalloc.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..2c6a0e2ff404 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) {
- struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- const void *caller;
- struct vm_struct *vm;
- struct vmap_area *va;
- unsigned long addr;
- unsigned int nr_pages;
- vm = find_vm_area(objp);
- if (!vm)
- if (!spin_trylock(&vmap_area_lock))
return false;
- va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
- if (!va) {
return false;spin_unlock(&vmap_area_lock);
- }
- vm = va->vm;
- if (!vm) {
spin_unlock(&vmap_area_lock);
return false;
- }
- addr = (unsigned long)vm->addr;
- caller = vm->caller;
- nr_pages = vm->nr_pages;
- spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
vm->nr_pages, (unsigned long)vm->addr, vm->caller);
return true;nr_pages, addr, caller);
} #endif
On Thu, Sep 07, 2023 at 08:53:14AM +0200, Vlastimil Babka wrote:
Hi,
On 9/4/23 20:08, Joel Fernandes (Google) wrote:
It is unsafe to dump vmalloc area information when trying to do so from some contexts. Add a safer trylock version of the same function to do a best-effort VMA finding and use it from vmalloc_dump_obj().
I was a bit confused by the subject which suggests a new function is added, but it seems open-coded in its only caller. I assume it's due to evolution of the series. Something like:
mm/vmalloc: use trylock for vmap_area_lock in vmalloc_dump_obj()
?
I also notice it's trying hard to copy everything from "vm" to temporary variables before unlocking, presumably to prevent use-after-free, so should that be also mentioned in the changelog?
Apologies for the less-than-ideal changelog. Andrew would you mind replacing the merged patch with the below one instead? It just contains non-functional changes to change log and an additional code comment/print. Thanks!
---8<-----------------------
From: "Joel Fernandes (Google)" joel@joelfernandes.org Subject: [PATCH v3.1] mm/vmalloc: Add a safer inlined version of find_vm_area() for debug
It is unsafe to dump vmalloc area information when trying to do so from some contexts such as PREEMPT_RT or from an IRQ handler that interrupted a vmap_area_lock-held region. Add a safer and inlined trylock version of find_vm_area() to do a best-effort VMA finding and use it from vmalloc_dump_obj().
While the vmap_area_lock is held, copy interesting attributes from the vm_struct before unlocking.
[applied test robot feedback on unused function fix.] [applied Uladzislau feedback on locking.] [applied Vlastimil and Lorenzo feedback on changelog, comment and print improvements]
Reported-by: Zhen Lei thunder.leizhen@huaweicloud.com Cc: Paul E. McKenney paulmck@kernel.org Cc: rcu@vger.kernel.org Cc: Zqiang qiang.zhang1211@gmail.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory") Cc: stable@vger.kernel.org Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- mm/vmalloc.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 93cf99aba335..990a0d5efba8 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -4274,14 +4274,40 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) #ifdef CONFIG_PRINTK bool vmalloc_dump_obj(void *object) { - struct vm_struct *vm; void *objp = (void *)PAGE_ALIGN((unsigned long)object); + const void *caller; + struct vm_struct *vm; + struct vmap_area *va; + unsigned long addr; + unsigned int nr_pages; + + /* + * Use trylock as we don't want to contend since this is debug code and + * we might run this code in contexts like PREEMPT_RT where spinlock + * contention may result in sleeping, or from an IRQ handler which + * might interrupt a vmap_area_lock-held critical section. + */ + if (!spin_trylock(&vmap_area_lock)) { + pr_cont(" [couldn't acquire vmap_area_lock]\n"); + return false; + } + va = __find_vmap_area((unsigned long)objp, &vmap_area_root); + if (!va) { + spin_unlock(&vmap_area_lock); + return false; + }
- vm = find_vm_area(objp); - if (!vm) + vm = va->vm; + if (!vm) { + spin_unlock(&vmap_area_lock); return false; + } + addr = (unsigned long)vm->addr; + caller = vm->caller; + nr_pages = vm->nr_pages; + spin_unlock(&vmap_area_lock); pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", - vm->nr_pages, (unsigned long)vm->addr, vm->caller); + nr_pages, addr, caller); return true; } #endif
linux-stable-mirror@lists.linaro.org