On Mon, 7 Oct 2019 10:16:21 +0200 Michal Hocko mhocko@kernel.org wrote:
On Fri 04-10-19 14:57:01, Michal Hocko wrote:
On Fri 04-10-19 08:31:49, Qian Cai wrote:
Long time ago, there fixed a similar deadlock in show_slab_objects() [1]. However, it is apparently due to the commits like 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") and 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by just reading files in /sys/kernel/slab which will generate a lockdep splat below.
Since the "mem_hotplug_lock" here is only to obtain a stable online node mask while racing with NUMA node hotplug, in the worst case, the results may me miscalculated while doing NUMA node hotplug, but they shall be corrected by later reads of the same files.
I think it is important to mention that this doesn't expose the show_slab_objects to use-after-free. There is only a single path that might really race here and that is the slab hotplug notifier callback __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path doesn't really destroy kmem_cache_node data structures.
Yes, I noted this during review. It's a bit subtle and is worthy of more than a changelog note, I think. How about this?
--- a/mm/slub.c~mm-slub-fix-a-deadlock-in-show_slab_objects-fix +++ a/mm/slub.c @@ -4851,6 +4851,10 @@ static ssize_t show_slab_objects(struct * already held which will conflict with an existing lock order: * * mem_hotplug_lock->slab_mutex->kernfs_mutex + * + * We don't really need mem_hotplug_lock (to hold off + * slab_mem_going_offline_callback()) here because slab's memory hot + * unplug code doesn't destroy the kmem_cache->node[] data. */
#ifdef CONFIG_SLUB_DEBUG _
Andrew, please add this to the changelog so that we do not have to scratch heads again when looking into that code.
I did that as well.