On Wed, Jan 8, 2025 at 11:14 AM Yosry Ahmed yosryahmed@google.com wrote:
On Tue, Jan 7, 2025 at 7:56 PM Nhat Pham nphamcs@gmail.com wrote:
I may have found a simpler "proper" fix than disabling migration, please see my suggestion in: https://lore.kernel.org/lkml/CAJD7tkYpNNsbTZZqFoRh-FkXDgxONZEUPKk1YQv7-TFMWW...
Discovered that thread just now - sorry, too many emails to catch up on :) Taking a look now.
Is this a frequently occured problem in the wild? If so, we can disable migration to firefight, and then do the proper thing down the line.
I don't believe so. Actually, I think the deadlock introduced by the previous fix is more problematic than the UAF it fixes.
Andrew, could you please pick up patch 1 (the revert) while we figure out the alternative fix? It's important that it lands in v6.13 to avoid the possibility of deadlock. Figuring out an alternative fix is less important.
Agree. Let's revert the "fix" first. CPU offlining is a much rarer event than this deadlocking scenario discovered by syzbot.