On Sat, Jan 11, 2020 at 6:52 AM David Hildenbrand david@redhat.com wrote:
On 11.01.20 15:25, David Hildenbrand wrote:
Am 11.01.2020 um 14:56 schrieb Qian Cai cai@lca.pw:
On Jan 11, 2020, at 6:03 AM, David Hildenbrand david@redhat.com wrote:
So I just remember why I think this (and the previously reported done for ACPI DIMMs) are false positives. The actual locking order is
onlining/offlining from user space:
kn->count -> device_hotplug_lock -> cpu_hotplug_lock -> mem_hotplug_lock
memory removal:
device_hotplug_lock -> cpu_hotplug_lock -> mem_hotplug_lock -> kn->count
This looks like a locking inversion - but it's not. Whenever we come via user space we do a mutex_trylock(), which resolves this issue by backing up. The device_hotplug_lock will prevent
I have no clue why the device_hotplug_lock does not pop up in the lockdep report here. Sounds wrong to me.
I think this is a false positive and not stable material.
The point is that there are other paths does kn->count —> cpu_hotplug_lock without needing device_hotplug_lock to race with memory removal.
kmem_cache_shrink_all+0x50/0x100 (cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem) shrink_store+0x34/0x60 slab_attr_store+0x6c/0x170 sysfs_kf_write+0x70/0xb0 kernfs_fop_write+0x11c/0x270 ((kn->count) __vfs_write+0x3c/0x70 vfs_write+0xcc/0x200 ksys_write+0x7c/0x140 system_call+0x5c/0x6
But not the lock of the memory devices, or am I missing something?
To clarify:
memory unplug will remove e.g., /sys/devices/system/memory/memoryX/, which has a dedicated kn->count AFAIK
If you do a "echo 1 > /sys/kernel/slab/X/shrink", you would not lock the kn->count of /sys/devices/system/memory/memoryX/, but the one of some slab thingy.
The only scenario I could see is if remove_memory_block_devices() will not only remove /sys/devices/system/memory/memoryX/, but also implicitly e.g., /sys/kernel/slab/X/. If that is the case, then this is indeed not a false positive, but something rather hard to trigger (which would still classify as stable material).
Yes, already agreed to drop stable.
However, the trylock does not solve the race it just turns the blocking wait to a spin wait, but the subsequent 5ms sleep does make the theoretical race nearly impossible, Thanks for pointing that out.
The theoretical race is still a problem because it hides future lockdep violations, but I otherwise can't point to whether the kn->count in question is a false positive concern for an actual deadlock or not. Tracking that down is possible, but not something I have time for at present.