On Tue, Dec 02, 2025 at 07:16:26PM +0900, Harry Yoo wrote:
Currently, kvfree_rcu_barrier() flushes RCU sheaves across all slab caches when a cache is destroyed. This is unnecessary; only the RCU sheaves belonging to the cache being destroyed need to be flushed.
As suggested by Vlastimil Babka, introduce a weaker form of kvfree_rcu_barrier() that operates on a specific slab cache.
Factor out flush_rcu_sheaves_on_cache() from flush_all_rcu_sheaves() and call it from flush_all_rcu_sheaves() and kvfree_rcu_barrier_on_cache().
Call kvfree_rcu_barrier_on_cache() instead of kvfree_rcu_barrier() on cache destruction.
The performance benefit is evaluated on a 12 core 24 threads AMD Ryzen 5900X machine (1 socket), by loading slub_kunit module.
Before: Total calls: 19 Average latency (us): 18127 Total time (us): 344414
After: Total calls: 19 Average latency (us): 10066 Total time (us): 191264
Two performance regression have been reported:
- stress module loader test's runtime increases by 50-60% (Daniel)
- internal graphics test's runtime on Tegra23 increases by 35% (Jon)
^Tegra234
just a minor typo :)
They are fixed by this change.
Suggested-by: Vlastimil Babka vbabka@suse.cz Fixes: ec66e0d59952 ("slab: add sheaf support for batching kfree_rcu() operations") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-mm/1bda09da-93be-4737-aef0-d47f8c5c9301@suse.c... Reported-and-tested-by: Daniel Gomez da.gomez@samsung.com Closes: https://lore.kernel.org/linux-mm/0406562e-2066-4cf8-9902-b2b0616dd742@kernel... Reported-and-tested-by: Jon Hunter jonathanh@nvidia.com Closes: https://lore.kernel.org/linux-mm/e988eff6-1287-425e-a06c-805af5bbf262@nvidia... Signed-off-by: Harry Yoo harry.yoo@oracle.com