On 1/22/25 02:59, Peter Zijlstra wrote:
On Wed, Jan 22, 2025 at 11:56:13AM +0100, Arnd Bergmann wrote:
On Wed, Jan 22, 2025, at 11:04, Naresh Kamboju wrote:
On Tue, 21 Jan 2025 at 23:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: 0000000000000000 <4>[ 160.712071] Call trace: <4>[ 160.712597] place_entity (kernel/sched/fair.c:5250 (discriminator 1)) <4>[ 160.713221] reweight_entity (kernel/sched/fair.c:3813) <4>[ 160.713802] update_cfs_group (kernel/sched/fair.c:3975 (discriminator 1)) <4>[ 160.714277] dequeue_entities (kernel/sched/fair.c:7091) <4>[ 160.714903] dequeue_task_fair (kernel/sched/fair.c:7144 (discriminator 1)) <4>[ 160.716502] move_queued_task.isra.0 (kernel/sched/core.c:2437 (discriminator 1))
I don't see anything that immediately sticks out as causing this, but I do see five scheduler patches backported in stable-rc on top of v6.12.8, these are the original commits:
66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE")
This one reworks reweight_entity(), but I've been running with that on top of 13-rc6 for a week or so and not seen this.
The offending commit is 6d71a9c6160479899ee744d2c6d6602a191deb1f "sched/fair: Fix EEVDF entity placement bug causing scheduling lag"
It works fine on 6.13, at least on RISC-V (which is the only arch I test).
It's already been reverted and 6.12.11-rc2 has been pushed out.