On Fri, Sep 26, 2025 at 3:43 AM K Prateek Nayak kprateek.nayak@amd.com wrote:
Hello John, Matt,
On 9/26/2025 5:35 AM, John Stultz wrote:
However, there are two spots where we might exit dequeue_entities() early when cfs_rq_throttled(rq), so maybe that's what's catching us here?
That could very likely be it.
That tracks -- we're heavy users of cgroups and this particular issue only appeared on our kubernetes nodes.
Matt, if possible can you try the patch attached below to check if the bailout for throttled hierarchy is indeed the root cause. Thanks in advance.
I've been running our reproducer with this patch for the last few hours without any issues, so the fix looks good to me.
Tested-by: Matt Fleming mfleming@cloudflare.com