cc'ing GKE folks
On Fri, Sep 26, 2025 at 12:59 PM Tejun Heo tj@kernel.org wrote:
cc'ing Jan.
On Fri, Sep 26, 2025 at 12:54:29PM -0700, Chenglong Tang wrote:
Just did more testing here. Confirmed that the system hang's still there but less frequently(6/40) with the patches http://lkml.kernel.org/r/20250912103522.2935-1-jack@suse.cz appied to v6.17-rc7. In the bad instances, the kworker count climbed to over 600+ and caused the hang over 80+ seconds.
So I think the patches didn't fully solve the issue.
I wonder how the number of workers still exploded to 600+. Are there that many cgroups being shut down? Does clamping down @max_active resolve the problem? There's no reason to have really high concurrency for this.
Thanks.
-- tejun