Re: [REGRESSION] workqueue/writeback: Severe CPU hang due to kworker proliferation during I/O flush and cgroup cleanup

26 Sep 2025

      Just did more testing here. Confirmed that the system hang's still
there but less frequently(6/40) with the patches
http://lkml.kernel.org/r/20250912103522.2935-1-jack@suse.cz appied to
v6.17-rc7. In the bad instances, the kworker count climbed to over
600+ and caused the hang over 80+ seconds.
So I think the patches didn't fully solve the issue.
On Wed, Sep 24, 2025 at 5:29 PM Chenglong Tang chenglongtang@google.com wrote:
...
Hello,
This is Chenglong from Google Container Optimized OS. I'm reporting a
severe CPU hang regression that occurs after a high volume of file
creation and subsequent cgroup cleanup.
Through bisection, the issue appears to be caused by a chain reaction
between three commits related to writeback, unbound workqueues, and
CPU-hogging detection. The issue is greatly alleviated on the latest
mainline kernel but is not fully resolved, still occurring
intermittently (~1 in 10 runs).
How to reproduce
The kernel v6.1 is good. The hang is reliably triggered(over 80%
chance) on kernels v6.6 and 6.12 and intermittently on
mainline(6.17-rc7) with the following steps:
Environment: A machine with a fast SSD and a high core count (e.g.,
Google Cloud's N2-standard-128).
Workload: Concurrently generate a large number of files (e.g., 2
million) using multiple services managed by systemd-run. This creates
significant I/O and cgroup churn.
Trigger: After the file generation completes, terminate the
systemd-run services.
Result: Shortly after the services are killed, the system's CPU load
spikes, leading to a massive number of kworker/+inode_switch_wbs
threads and a system-wide hang/livelock where the machine becomes
unresponsive (20s - 300s).
Analysis and Problematic Commits

The initial commit: The process begins with a worker that can get

stuck busy-waiting on a spinlock.
Commit: ("writeback, cgroup: release dying cgwbs by switching attached inodes")
Effect: This introduced the inode_switch_wbs_work_fn worker to clean
up cgroup writeback structures. Under our test load, this worker
appears to hit a highly contended wb->list_lock spinlock, causing it
to burn 100% CPU without sleeping.

The Kworker Explosion: A subsequent change misinterprets the

spinning worker from Stage 1, leading to a runaway feedback loop of
worker creation.
Commit: 616db8779b1e ("workqueue: Automatically mark CPU-hogging work
items CPU_INTENSIVE")
Effect: This logic sees the spinning worker, marks it as
CPU_INTENSIVE, and excludes it from concurrency management. To handle
the work backlog, it spawns a new kworker, which then also gets stuck
on the same lock, repeating the cycle. This directly causes the
kworker count to explode from <50 to 100-2000+.

The System-Wide Lockdown: The final piece allows this localized

worker explosion to saturate the entire system.
Commit: 8639ecebc9b1 ("workqueue: Implement non-strict affinity scope
for unbound workqueues")
Effect: This change introduced non-strict affinity as the default. It
allows the hundreds of kworkers created in Stage 2 to be spread by the
scheduler across all available CPU cores, turning the problem into a
system-wide hang.
Current Status and Mitigation
Mainline Status: On the latest mainline kernel, the hang is far less
frequent and the kworker counts are reduced back to normal (<50),
suggesting other changes have partially mitigated the issue. However,
the hang still occurs, and when it does, the kworker count still
explodes (e.g., 300+), indicating the underlying feedback loop
remains.
Workaround: A reliable mitigation is to revert to the old workqueue
behavior by setting affinity_strict to 1. This contains the kworker
proliferation to a single CPU pod, preventing the system-wide hang.
Questions
Given that the issue is not fully resolved, could you please provide
some guidance?

Is this a known issue, and are there patches in development that

might fully address the underlying spinlock contention or the kworker
feedback loop?

Is there a better long-term mitigation we can apply other than

forcing strict affinity?
Thank you for your time and help.
Best regards,
Chenglong

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [REGRESSION] workqueue/writeback: Severe CPU hang due to kworker proliferation during I/O flush and cgroup cleanup