On 9/29/25 21:21, Greg Kroah-Hartman wrote:
On Sat, Sep 27, 2025 at 01:46:58AM +0800, Wen Yang wrote:
From: Pierre Gondois pierre.gondois@arm.com
commit 5944ce092b97caed5d86d961e963b883b5c44ee2 upstream.
adds a call to detect_cache_attributes() to populate the cacheinfo before updating the siblings mask. detect_cache_attributes() allocates memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT kernels, on secondary CPUs, this triggers a: 'BUG: sleeping function called from invalid context' [1] as the code is executed with preemption and interrupts disabled.
The primary CPU was previously storing the cache information using the now removed (struct cpu_topology).llc_id: commit 5b8dc787ce4a ("arch_topology: Drop LLC identifier stash from the CPU topology")
allocate_cache_info() tries to build the cacheinfo from the primary CPU prior secondary CPUs boot, if the DT/ACPI description contains cache information. If allocate_cache_info() fails, then fallback to the current state for the cacheinfo allocation. [1] will be triggered in such case.
When unplugging a CPU, the cacheinfo memory cannot be freed. If it was, then the memory would be allocated early by the re-plugged CPU and would trigger [1].
Note that populate_cache_leaves() might be called multiple times due to populate_leaves being moved up. This is required since detect_cache_attributes() might be called with per_cpu_cacheinfo(cpu) being allocated but not populated.
[1]: | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111 | preempt_count: 1, expected: 0 | RCU nest depth: 1, expected: 1 | 3 locks held by swapper/111/0: | #0: (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8 | #1: (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0 | #2: (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80 | irq event stamp: 0 | hardirqs last enabled at (0): 0x0 | hardirqs last disabled at (0): copy_process+0x5dc/0x1ab8 | softirqs last enabled at (0): copy_process+0x5dc/0x1ab8 | softirqs last disabled at (0): 0x0 | Preemption disabled at: | migrate_enable+0x30/0x130 | CPU: 111 PID: 0 Comm: swapper/111 Tainted: G W 6.0.0-rc4-rt6-[...] | Call trace: | __kmalloc+0xbc/0x1e8 | detect_cache_attributes+0x2d4/0x5f0 | update_siblings_masks+0x30/0x368 | store_cpu_topology+0x78/0xb8 | secondary_start_kernel+0xd0/0x198 | __secondary_switched+0xb0/0xb4
Signed-off-by: Pierre Gondois pierre.gondois@arm.com Reviewed-by: Sudeep Holla sudeep.holla@arm.com Acked-by: Palmer Dabbelt palmer@rivosinc.com Link: https://lore.kernel.org/r/20230104183033.755668-7-pierre.gondois@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Cc: stable@vger.kernel.org # 6.1.x: c3719bd:cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation Cc: stable@vger.kernel.org # 6.1.x: 8844c3d:cacheinfo: Return error code in init_of_cache_level( Cc: stable@vger.kernel.org # 6.1.x: de0df44:cacheinfo: Check 'cache-unified' property to count cache leaves Cc: stable@vger.kernel.org # 6.1.x: fa4d566:ACPI: PPTT: Remove acpi_find_cache_levels() Cc: stable@vger.kernel.org # 6.1.x: bd50036:ACPI: PPTT: Update acpi_find_last_cache_level() to acpi_get_cache_info( Cc: stable@vger.kernel.org # 6.1.x
I do not understand, why do you want all of these applied as well? Can you just send the full series of commits?
Thanks for your comments, here is the original series: https://lore.kernel.org/all/167404285593.885445.6219705651301997538.b4-ty@ar...
commit 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection in the CPU hotplug path") introduced a bug, and this series fixed it.
Signed-off-by: Wen Yang wen.yang@linux.dev
Also, you have changed this commit a lot from the original one, please document what you did here.
Thanks for the reminder. We just hope to cherry-pick them onto the 6.1 stable branch, without modifying the original commit. Also checked again, as follows:
$ git cherry-pick c3719bd $ git cherry-pick 8844c3d $ git cherry-pick de0df44 $ git cherry-pick fa4d566 $ git cherry-pick bd50036 $ git cherry-pick 5944ce0
$ git format-patch HEAD -1
$ diff 0001-arch_topology-Build-cacheinfo-from-primary-CPU.patch 20250927_wen_yang_arch_topology_build_cacheinfo_from_primary_cpu.mbx
Consistent with the original commit.
Also, why not just use 6.6.y instead? What is forcing you to use 6.1.y for this platform? What caused this issue to just show up now?
Thank you for your suggestion. But our production environment has been using 6.1.y-rt for quite some time now, so we can only gradually migrate to 6.6.y. Perhaps some recently added loads related to power on/off have made it easier for this bug to be exposed. Also hope that the upstream 6.1.y branch could fix it.
-- Best wishes, Wen