Hi stable team, there is a report that the recent backport of 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the culprit does.
With kernel 6.6.23 hibernating usually hangs here: the display stays on but the mouse pointer does not move and the keyboard does not work. But SysRq REISUB does reboot. Sometimes it seems to hibernate: the computer powers down and can be waked up and the previous display comes visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details. Note, you have to use bugzilla to reach the reporter, as I sadly[1] can not CCed them in mails like this.
Side note: there is a mainline report about problems due to 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") as well, but it's about "nohz_full=0 prevents kernel from booting": https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that separately to Tejun.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their "email address will never be displayed to logged out users"
#regzbot introduced: 5a70baec2294e8a7d0fcc4558741c23e752dad #regzbot from: Petri Kaukasoina #regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218658 #regzbot title: workqueue: hubernate usually hangs when going to sleep #regzbot ignore-activity
Hello,
On Tue, Apr 02, 2024 at 10:08:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
Hi stable team, there is a report that the recent backport of 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the culprit does.
With kernel 6.6.23 hibernating usually hangs here: the display stays on but the mouse pointer does not move and the keyboard does not work. But SysRq REISUB does reboot. Sometimes it seems to hibernate: the computer powers down and can be waked up and the previous display comes visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details. Note, you have to use bugzilla to reach the reporter, as I sadly[1] can not CCed them in mails like this.
Side note: there is a mainline report about problems due to 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") as well, but it's about "nohz_full=0 prevents kernel from booting": https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that separately to Tejun.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
Sorry about late reply but that commit is not for -stable. It does fix an undesirable behavior from an earlier commit, so it has the Fixes tag but it has a series of dependencies that need to be backported together which would be far too invasive for -stable, thus no Cc: stable. I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
Thanks.
On 03.04.24 02:38, Tejun Heo wrote:
On Tue, Apr 02, 2024 at 10:08:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
Hi stable team, there is a report that the recent backport of 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the culprit does.
With kernel 6.6.23 hibernating usually hangs here: the display stays on but the mouse pointer does not move and the keyboard does not work. But SysRq REISUB does reboot. Sometimes it seems to hibernate: the computer powers down and can be waked up and the previous display comes visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details. Note, you have to use bugzilla to reach the reporter, as I sadly[1] can not CCed them in mails like this.
Side note: there is a mainline report about problems due to 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") as well, but it's about "nohz_full=0 prevents kernel from booting": https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that separately to Tejun.
Sorry about late reply
No problem at all, really, thx for your reply!
but that commit is not for -stable. It does fix an undesirable behavior from an earlier commit, so it has the Fixes tag but it has a series of dependencies that need to be backported together
Not sure if you know, but the stable team apparently recently backported a bunch of other workqueue commits as well; I see 10 from a quick look here: https://lore.kernel.org/all/20240326223803.2647796-1-sashal@kernel.org/
$ git log --grep='workqueue:' --oneline v6.6.22..stable/linux-6.6.y -- include/linux/workqueue.h kernel/workqueue* 7df62b8cca38aa workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active() 5a70baec2294e8 workqueue: Implement system-wide nr_active enforcement for unbound workqueues b522229a56941a workqueue: Introduce struct wq_node_nr_active bd31fb926dfa02 workqueue: RCU protect wq->dfl_pwq and implement accessors for it 5f99fee6f2dea1 workqueue: Make wq_adjust_max_active() round-robin pwqs while activating 4023a2d9507691 workqueue: Move nr_active handling into helpers 6c592f0bb96815 workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() bad184d26a4f68 workqueue: Factor out pwq_is_empty() 82e098f5bed1ff workqueue: Move pwq->max_active to wq->max_active 43a181f8f41aca workqueue.c: Increase workqueue name length
And there is also "workqueue: Shorten events_freezable_power_efficient name" in the queue for the next 6.6.y release: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree...
That leads to the question: what is the better patch forward here: pick up what's is missing or back out?
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
Ciao, Thorsten
On Wed, Apr 03, 2024 at 06:26:24AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
On 03.04.24 02:38, Tejun Heo wrote:
On Tue, Apr 02, 2024 at 10:08:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
Hi stable team, there is a report that the recent backport of 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the culprit does.
With kernel 6.6.23 hibernating usually hangs here: the display stays on but the mouse pointer does not move and the keyboard does not work. But SysRq REISUB does reboot. Sometimes it seems to hibernate: the computer powers down and can be waked up and the previous display comes visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details. Note, you have to use bugzilla to reach the reporter, as I sadly[1] can not CCed them in mails like this.
Side note: there is a mainline report about problems due to 5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") as well, but it's about "nohz_full=0 prevents kernel from booting": https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that separately to Tejun.
Sorry about late reply
No problem at all, really, thx for your reply!
but that commit is not for -stable. It does fix an undesirable behavior from an earlier commit, so it has the Fixes tag but it has a series of dependencies that need to be backported together
Not sure if you know, but the stable team apparently recently backported a bunch of other workqueue commits as well; I see 10 from a quick look here: https://lore.kernel.org/all/20240326223803.2647796-1-sashal@kernel.org/
$ git log --grep='workqueue:' --oneline v6.6.22..stable/linux-6.6.y -- include/linux/workqueue.h kernel/workqueue* 7df62b8cca38aa workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active() 5a70baec2294e8 workqueue: Implement system-wide nr_active enforcement for unbound workqueues b522229a56941a workqueue: Introduce struct wq_node_nr_active bd31fb926dfa02 workqueue: RCU protect wq->dfl_pwq and implement accessors for it 5f99fee6f2dea1 workqueue: Make wq_adjust_max_active() round-robin pwqs while activating 4023a2d9507691 workqueue: Move nr_active handling into helpers 6c592f0bb96815 workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() bad184d26a4f68 workqueue: Factor out pwq_is_empty() 82e098f5bed1ff workqueue: Move pwq->max_active to wq->max_active 43a181f8f41aca workqueue.c: Increase workqueue name length
And there is also "workqueue: Shorten events_freezable_power_efficient name" in the queue for the next 6.6.y release: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree...
That leads to the question: what is the better patch forward here: pick up what's is missing or back out?
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
The Fixes: tag triggered it, that's why they were backported.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
The problem is that we have subsystems that only use Fixes: and not cc: stable which is why we need to pick these up as well. Fixes: is great, but if everyone were to do this "properly" then we wouldn't need to pick these other ones up, but instead, it's about 1/3 of our volume :(
I'll gladly revert the above series if they shouldn't have been backported to stable, but from reading them, it seemed like they were fixing an issue that was serious and should have been added to stable, which is why they were.
This is also why we have review cycles, so maintainers can let us know if they want us to drop them :)
thanks,
greg k-h
Hello,
On Wed, Apr 03, 2024 at 07:11:04AM +0200, Greg KH wrote:
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
The Fixes: tag triggered it, that's why they were backported.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
The problem is that we have subsystems that only use Fixes: and not cc: stable which is why we need to pick these up as well. Fixes: is great, but if everyone were to do this "properly" then we wouldn't need to pick these other ones up, but instead, it's about 1/3 of our volume :(
I'll gladly revert the above series if they shouldn't have been backported to stable, but from reading them, it seemed like they were fixing an issue that was serious and should have been added to stable, which is why they were.
Oh, yeah, they're fixing an issue. It's just that the issue is relatively confined peformance degradation and the fix is really invasive, so not a great -stable candidate. At the very least, they'd need a log longer cooking time in mainline before being considered for -stable backport.
My intention w/ Fixes: wasn't triggering -stable backport at all, so it's a miscommunication. From now on, I'll keep in mind that Fixes: does trigger backports. I'm not too worried about not using it as the fixee commit can be mentioned in the commit message.
This is also why we have review cycles, so maintainers can let us know if they want us to drop them :)
Heh, sorry about that. This never caused any issues, so I just glide over the stable mails without thinking.
Thanks.
On Wed, Apr 03, 2024 at 05:22:17AM -1000, Tejun Heo wrote:
Hello,
On Wed, Apr 03, 2024 at 07:11:04AM +0200, Greg KH wrote:
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
The Fixes: tag triggered it, that's why they were backported.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
The problem is that we have subsystems that only use Fixes: and not cc: stable which is why we need to pick these up as well. Fixes: is great, but if everyone were to do this "properly" then we wouldn't need to pick these other ones up, but instead, it's about 1/3 of our volume :(
I'll gladly revert the above series if they shouldn't have been backported to stable, but from reading them, it seemed like they were fixing an issue that was serious and should have been added to stable, which is why they were.
Oh, yeah, they're fixing an issue. It's just that the issue is relatively confined peformance degradation and the fix is really invasive, so not a great -stable candidate. At the very least, they'd need a log longer cooking time in mainline before being considered for -stable backport.
Ok, I'll go revert them all now. I did some test builds here with them reverted and they seem sane. I'll push out some -rcs with just the reverts to at least fix the regressions found in the 6.8.y tree now.
My intention w/ Fixes: wasn't triggering -stable backport at all, so it's a miscommunication. From now on, I'll keep in mind that Fixes: does trigger backports. I'm not too worried about not using it as the fixee commit can be mentioned in the commit message.
No worries, if you want, we can add any files/paths to our "ignore Fixes: tags, only take cc: stable ones" that we have for many parts of the kernel already, where maintainers are good and properly tag stuff. Just let me know.
thanks,
greg k-h
On 03.04.24 18:10, Greg KH wrote:
On Wed, Apr 03, 2024 at 05:22:17AM -1000, Tejun Heo wrote:
On Wed, Apr 03, 2024 at 07:11:04AM +0200, Greg KH wrote:
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
The Fixes: tag triggered it, that's why they were backported.
Yeah, this is what I assumed.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
The problem is that we have subsystems that only use Fixes: and not cc: stable which is why we need to pick these up as well. Fixes: is great, but if everyone were to do this "properly" then we wouldn't need to pick these other ones up, but instead, it's about 1/3 of our volume :(
I'm also well aware of that and do not want to complain about it, as I think I grasped why the stable team works like that -- and even think given the circumstances it is round about the right approach. I also understand that mistakes will always happen.
Nevertheless this thread and the Bluetooth thing we had earlier this week[1] makes me fear that this approach could lead to developer avoiding Fixes: tags. And funny thing, that's something that is already happening, as I noticed by chance today: "'"A "Fixes" tag has been deliberately omitted to avoid potential test failures and subsequent regression issues that could arise from backporting."'"[2].
I wonder if that in the long term might be bad. But well, maybe it won't matter much. Still made me wonder if we should have a different solution for this kind of problem. Something like this?
Cc: stable@vger.kernel.org # DoNotBackport
Or something like this?
Cc: stable@vger.kernel.org # DoNotBackport - or only after 16 weeks in mainline [but I don't care]
Whatever, mainly thinking aloud and do no need a reply to this. :-D
[1] https://lore.kernel.org/all/84da1f26-0457-451c-b4fd-128cb9bd860d@leemhuis.in...
[2] saw that today here: https://lore.kernel.org/all/cover.1712226175.git.antony.antony@secunet.com/
I'll gladly revert the above series if they shouldn't have been backported to stable, but from reading them, it seemed like they were fixing an issue that was serious and should have been added to stable, which is why they were.
Oh, yeah, they're fixing an issue. It's just that the issue is relatively confined peformance degradation and the fix is really invasive, so not a great -stable candidate. At the very least, they'd need a log longer cooking time in mainline before being considered for -stable backport.
Ok, I'll go revert them all now. I did some test builds here with them reverted and they seem sane. I'll push out some -rcs with just the reverts to at least fix the regressions found in the 6.8.y tree now.
Great, thx for taking care of this!
Ciao, Thorsten
On Thu, Apr 04, 2024 at 05:36:42PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
On 03.04.24 18:10, Greg KH wrote:
On Wed, Apr 03, 2024 at 05:22:17AM -1000, Tejun Heo wrote:
On Wed, Apr 03, 2024 at 07:11:04AM +0200, Greg KH wrote:
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
The Fixes: tag triggered it, that's why they were backported.
Yeah, this is what I assumed.
which would be far too invasive for -stable, thus no Cc: stable.
I didn't know a Fixes tag automatically triggers backport to -stable. I will keep that in mind for future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
The problem is that we have subsystems that only use Fixes: and not cc: stable which is why we need to pick these up as well. Fixes: is great, but if everyone were to do this "properly" then we wouldn't need to pick these other ones up, but instead, it's about 1/3 of our volume :(
I'm also well aware of that and do not want to complain about it, as I think I grasped why the stable team works like that -- and even think given the circumstances it is round about the right approach. I also understand that mistakes will always happen.
Nevertheless this thread and the Bluetooth thing we had earlier this week[1] makes me fear that this approach could lead to developer avoiding Fixes: tags. And funny thing, that's something that is already happening, as I noticed by chance today: "'"A "Fixes" tag has been deliberately omitted to avoid potential test failures and subsequent regression issues that could arise from backporting."'"[2].
I wonder if that in the long term might be bad. But well, maybe it won't matter much. Still made me wonder if we should have a different solution for this kind of problem. Something like this?
Cc: stable@vger.kernel.org # DoNotBackport
Or something like this?
Cc: stable@vger.kernel.org # DoNotBackport - or only after 16 weeks in mainline [but I don't care]
We do this today, with stuff like: Cc: stable stable@kernel.org # wait for -rc3 to be out
So if people want to do that, they can, the documentation says that you can give us hints and the like in the # area, and usually we notice them :)
thanks,
greg k-h
On 04.04.24 17:44, Greg KH wrote:
On Thu, Apr 04, 2024 at 05:36:42PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
On 03.04.24 18:10, Greg KH wrote:
On Wed, Apr 03, 2024 at 05:22:17AM -1000, Tejun Heo wrote:
On Wed, Apr 03, 2024 at 07:11:04AM +0200, Greg KH wrote:
Side note: I have no idea why the stable team backported those patches and no option on whether that was wise, just trying to help finding the best solution forward from the current state of things.
The Fixes: tag triggered it, that's why they were backported.
Yeah, this is what I assumed.
> which would > be far too invasive for -stable, thus no Cc: stable. > > I didn't know a Fixes > tag automatically triggers backport to -stable. I will keep that in mind for > future.
/me fears that more and more developers due to situations like this will avoid Fixes: tags and wonders what consequences that might have for the kernel as a whole
The problem is that we have subsystems that only use Fixes: and not cc: stable which is why we need to pick these up as well. Fixes: is great, but if everyone were to do this "properly" then we wouldn't need to pick these other ones up, but instead, it's about 1/3 of our volume :(
I'm also well aware of that and do not want to complain about it, as I think I grasped why the stable team works like that -- and even think given the circumstances it is round about the right approach. I also understand that mistakes will always happen.
Nevertheless this thread and the Bluetooth thing we had earlier this week[1] makes me fear that this approach could lead to developer avoiding Fixes: tags. And funny thing, that's something that is already happening, as I noticed by chance today: "'"A "Fixes" tag has been deliberately omitted to avoid potential test failures and subsequent regression issues that could arise from backporting."'"[2].
I wonder if that in the long term might be bad. But well, maybe it won't matter much. Still made me wonder if we should have a different solution for this kind of problem. Something like this?
Cc: stable@vger.kernel.org # DoNotBackport
Or something like this?
Cc: stable@vger.kernel.org # DoNotBackport - or only after 16 weeks in mainline [but I don't care]
We do this today, with stuff like: Cc: stable stable@kernel.org # wait for -rc3 to be out
So if people want to do that, they can, the documentation says that you can give us hints and the like in the # area, and usually we notice them :)
I know, as I wrote that (as you likely remember). ;-) But it seems it's not well known; and maybe making it explicit that this can be used to convey a "DoNotBackport" message is supported as well.
Guess I'll prepare a patch to do that then and we'll see how it goes from there.
Ciao, Thorsten
On Thu, Apr 04, 2024 at 05:56:58PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
I know, as I wrote that (as you likely remember). ;-) But it seems it's not well known; and maybe making it explicit that this can be used to convey a "DoNotBackport" message is supported as well.
Guess I'll prepare a patch to do that then and we'll see how it goes from there.
Maybe something like "ManualBackportOnly"instead? The basic idea is that it's not that the commit should *never* be backported, but only with human intervention where someone has specifically requested the backport, perhaps with qualification test.
(For example, the XFS file system has an implicit ManualBackportOnly on all commits, and the XFS stable maintainers are responsible for backporting identifying commits with Fixes: tags, and running QA before passing on a request to having them be backported.)
- Ted
P.S. I'd love to be able do something equivalent for ext4, if we could scare up the resources to do the same, in terms of having some company fund the necessary developer resources, or someone turn up to volunteer to do that for ext4. Gentle reader, if that's you or your company --- let's talk. :-)
On Thu, Apr 04, 2024 at 10:54:39PM -0400, Theodore Ts'o wrote:
On Thu, Apr 04, 2024 at 05:56:58PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
I know, as I wrote that (as you likely remember). ;-) But it seems it's not well known; and maybe making it explicit that this can be used to convey a "DoNotBackport" message is supported as well.
Guess I'll prepare a patch to do that then and we'll see how it goes from there.
Maybe something like "ManualBackportOnly"instead? The basic idea is that it's not that the commit should *never* be backported, but only with human intervention where someone has specifically requested the backport, perhaps with qualification test.
(For example, the XFS file system has an implicit ManualBackportOnly on all commits, and the XFS stable maintainers are responsible for backporting identifying commits with Fixes: tags, and running QA before passing on a request to having them be backported.)
For drivers/subsystems/files that no one wants to have backported at all UNLESS there is an explicit cc: stable tag, just email us at stable@vger and let us know and we will add you to the list of files that we ignore for this. We keep that list in the stable-queue repo for anyone to see if they are curious.
That's what xfs and bcachefs and kvm and mm and other subsystems have done, that way they can tag things with "Fixes:" to their hearts content and we just ignore them.
thanks,
greg k-h
On 05.04.24 04:54, Theodore Ts'o wrote:
On Thu, Apr 04, 2024 at 05:56:58PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
I know, as I wrote that (as you likely remember). ;-) But it seems it's not well known; and maybe making it explicit that this can be used to convey a "DoNotBackport" message is supported as well.
Guess I'll prepare a patch to do that then and we'll see how it goes from there.
Maybe something like "ManualBackportOnly"instead? The basic idea is that it's not that the commit should *never* be backported, but only with human intervention where someone has specifically requested the backport, perhaps with qualification test.
I liked the idea at first, as it was more from the positive side of things. But a CC stable with a "# ManualBackportOnly" might sound like "I want this backported and handle that" to some developers and not what they want to express.
After thinking about it for some time I went with "# no semi-automatic backport" for now. For details see: https://lore.kernel.org/all/c0a08b160b286e8c98549eedb37404c6e784cf8a.1712812...
Ciao, Thorsten
linux-stable-mirror@lists.linaro.org