The pmd_trans_huge() code in mfill_atomic() is wrong in three different ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit the right two race windows) - I've tested this in a kernel build with some extra mdelay() calls. See the commit message for a description of the race scenario. On older kernels (before 6.5), I think the same bug can even theoretically lead to accessing transhuge page contents as a page table if you hit the right 5 narrow race windows (I haven't tested this case). 2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for detecting PMDs that don't point to page tables. On older kernels (before 6.5), you'd just have to win a single fairly wide race to hit this. I've tested this on 6.1 stable by racing migration (with a mdelay() patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86 VM, that causes a kernel oops in ptlock_ptr(). 3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed to yank page tables out from under us (though I haven't tested that), so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one fix for bug 3), so that the first fix can be backported to kernels affected by bugs 1+2.
Signed-off-by: Jann Horn jannh@google.com --- Changes in v2: - in patch 1/2: - change title - get rid of redundant early pmd_trans_huge() check - also check for swap PMDs and devmap PMDs (Qi Zheng) - Link to v1: https://lore.kernel.org/r/20240812-uffd-thp-flip-fix-v1-0-4fc1db7ccdd0@googl...
--- Jann Horn (2): userfaultfd: Fix checks for huge PMDs userfaultfd: Don't BUG_ON() if khugepaged yanks our page table
mm/userfaultfd.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) --- base-commit: d4560686726f7a357922f300fc81f5964be8df04 change-id: 20240812-uffd-thp-flip-fix-20f91f1151b9