Re: [REGRESSION] workqueue/writeback: Severe CPU hang due to kworker proliferation during I/O flush and cgroup cleanup

25 Sep 2025

      On Wed, Sep 24, 2025 at 05:24:15PM -0700, Chenglong Tang wrote:
...
The kernel v6.1 is good. The hang is reliably triggered(over 80% chance) on
kernels v6.6 and 6.12 and intermittently on mainline(6.17-rc7) with the
following steps:

*Environment:* A machine with a fast SSD and a high core count (e.g.,
Google Cloud's N2-standard-128).

*Workload:* Concurrently generate a large number of files (e.g., 2 million)
using multiple services managed by systemd-run. This creates significant
I/O and cgroup churn.

*Trigger:* After the file generation completes, terminate the systemd-run
services.

*Result:* Shortly after the services are killed, the system's CPU load
spikes, leading to a massive number of kworker/+inode_switch_wbs threads
and a system-wide hang/livelock where the machine becomes unresponsive (20s

300s).

Sounds like:
http://lkml.kernel.org/r/20250912103522.2935-1-jack@suse.cz
Can you see whether those patches resolve the problem?
Thanks.
-- 
tejun

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [REGRESSION] workqueue/writeback: Severe CPU hang due to kworker proliferation during I/O flush and cgroup cleanup