Re: [PATCH v4] mm/memory_hotplug: Fix remove_memory() lockdep splat

11 Jan 2020

      On Sat, Jan 11, 2020 at 6:52 AM David Hildenbrand david@redhat.com wrote:
...
On 11.01.20 15:25, David Hildenbrand wrote:
...
...
Am 11.01.2020 um 14:56 schrieb Qian Cai cai@lca.pw:

...
On Jan 11, 2020, at 6:03 AM, David Hildenbrand david@redhat.com wrote:
So I just remember why I think this (and the previously reported done
for ACPI DIMMs) are false positives. The actual locking order is
onlining/offlining from user space:
kn->count -> device_hotplug_lock -> cpu_hotplug_lock -> mem_hotplug_lock
memory removal:
device_hotplug_lock -> cpu_hotplug_lock -> mem_hotplug_lock -> kn->count
This looks like a locking inversion - but it's not. Whenever we come via
user space we do a mutex_trylock(), which resolves this issue by backing
up. The device_hotplug_lock will prevent
I have no clue why the device_hotplug_lock does not pop up in the
lockdep report here. Sounds wrong to me.
I think this is a false positive and not stable material.
The point is that there are other paths does kn->count —> cpu_hotplug_lock without needing device_hotplug_lock to race with memory removal.
kmem_cache_shrink_all+0x50/0x100 (cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem)
shrink_store+0x34/0x60
slab_attr_store+0x6c/0x170
sysfs_kf_write+0x70/0xb0
kernfs_fop_write+0x11c/0x270 ((kn->count)
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x200
ksys_write+0x7c/0x140
system_call+0x5c/0x6
But not the lock of the memory devices, or am I missing something?
To clarify:
memory unplug will remove e.g., /sys/devices/system/memory/memoryX/,
which has a dedicated kn->count AFAIK
If you do a "echo 1 > /sys/kernel/slab/X/shrink", you would not lock the
kn->count of /sys/devices/system/memory/memoryX/, but the one of some
slab thingy.
The only scenario I could see is if remove_memory_block_devices() will
not only remove /sys/devices/system/memory/memoryX/, but also implicitly
e.g., /sys/kernel/slab/X/. If that is the case, then this is indeed not
a false positive, but something rather hard to trigger (which would
still classify as stable material).
Yes, already agreed to drop stable.
However, the trylock does not solve the race it just turns the
blocking wait to a spin wait, but the subsequent 5ms sleep does make
the theoretical race nearly impossible, Thanks for pointing that out.
The theoretical race is still a problem because it hides future
lockdep violations, but I otherwise can't point to whether the
kn->count in question is a false positive concern for an actual
deadlock or not. Tracking that down is possible, but not something I
have time for at present.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4] mm/memory_hotplug: Fix remove_memory() lockdep splat