On Wed, Sep 24, 2025 at 05:24:15PM -0700, Chenglong Tang wrote:
The kernel v6.1 is good. The hang is reliably triggered(over 80% chance) on kernels v6.6 and 6.12 and intermittently on mainline(6.17-rc7) with the following steps:
*Environment:* A machine with a fast SSD and a high core count (e.g., Google Cloud's N2-standard-128).
*Workload:* Concurrently generate a large number of files (e.g., 2 million) using multiple services managed by systemd-run. This creates significant I/O and cgroup churn.
*Trigger:* After the file generation completes, terminate the systemd-run services.
*Result:* Shortly after the services are killed, the system's CPU load spikes, leading to a massive number of kworker/+inode_switch_wbs threads and a system-wide hang/livelock where the machine becomes unresponsive (20s
- 300s).
Sounds like:
http://lkml.kernel.org/r/20250912103522.2935-1-jack@suse.cz
Can you see whether those patches resolve the problem?
Thanks.