On 03.04.24 02:38, Tejun Heo wrote:
On Tue, Apr 02, 2024 at 10:08:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
Hi stable team, there is a report that the recent backport of 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the culprit does.
With kernel 6.6.23 hibernating usually hangs here: the display stays on but the mouse pointer does not move and the keyboard does not work. But SysRq REISUB does reboot. Sometimes it seems to hibernate: the computer powers down and can be waked up and the previous display comes visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details. Note, you have to use bugzilla to reach the reporter, as I sadly[1] can not CCed them in mails like this.
Side note: there is a mainline report about problems due to 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") as well, but it's about "nohz_full=0 prevents kernel from booting": https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that separately to Tejun.
Sorry about late reply
No problem at all, really, thx for your reply!
but that commit is not for -stable. It does fix an undesirable behavior from an earlier commit, so it has the Fixes tag but it has a series of dependencies that need to be backported together
Not sure if you know, but the stable team apparently recently backported a bunch of other workqueue commits as well; I see 10 from a quick look here: https://lore.kernel.org/all/20240326223803.2647796-1-sashal@kernel.org/
$ git log --grep='workqueue:' --oneline v6.6.22..stable/linux-6.6.y -- include/linux/workqueue.h kernel/workqueue* 7df62b8cca38aa workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active() 5a70baec2294e8 workqueue: Implement system-wide nr_active enforcement for unbound workqueues b522229a56941a workqueue: Introduce struct wq_node_nr_active bd31fb926dfa02 workqueue: RCU protect wq->dfl_pwq and implement accessors for it 5f99fee6f2dea1 workqueue: Make wq_adjust_max_active() round-robin pwqs while activating 4023a2d9507691 workqueue: Move nr_active handling into helpers 6c592f0bb96815 workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() bad184d26a4f68 workqueue: Factor out pwq_is_empty() 82e098f5bed1ff workqueue: Move pwq->max_active to wq->max_active 43a181f8f41aca workqueue.c: Increase workqueue name length
And there is also "workqueue: Shorten events_freezable_power_efficient name" in the queue for the next 6.6.y release: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree...
That leads to the question: what is the better patch forward here: pick up what's is missing or back out?
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
Ciao, Thorsten