Recently _pgd_alloc() was switched from using __get_free_pages() to pagetable_alloc_noprof(), which might return a compound page in case the allocation order is larger than 0.
On x86 this will be the case if CONFIG_MITIGATION_PAGE_TABLE_ISOLATION is set, even if PTI has been disabled at runtime.
When running as a Xen PV guest (this will always disable PTI), using a compound page for a PGD will result in VM_BUG_ON_PGFLAGS being triggered when the Xen code tries to pin the PGD.
Fix the Xen issue together with the not needed 8k allocation for a PGD with PTI disabled by using a variable holding the PGD allocation order in case CONFIG_MITIGATION_PAGE_TABLE_ISOLATION is set.
Reported-by: Petr Vaněk arkamar@atlas.cz Fixes: a9b3c355c2e6 ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/include/asm/pgalloc.h | 7 ++++++- arch/x86/mm/pgtable.c | 4 ++++ arch/x86/mm/pti.c | 3 +++ 3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index a33147520044..754f95bddf98 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -34,8 +34,13 @@ static inline void paravirt_release_p4d(unsigned long pfn) {} * Instead of one PGD, we acquire two PGDs. Being order-1, it is * both 8k in size and 8k-aligned. That lets us just flip bit 12 * in a pointer to swap between the two 4k halves. + * + * As PTI can be runtime disabled (either via boot parameter or due to + * running as a Xen PV guest), store the actually needed allocation + * order in a global variable. */ -#define PGD_ALLOCATION_ORDER 1 +#define PGD_ALLOCATION_ORDER pgd_allocation_order +extern unsigned int pgd_allocation_order; #else #define PGD_ALLOCATION_ORDER 0 #endif diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index a05fcddfc811..f61b2d6be311 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -12,6 +12,10 @@ phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1; EXPORT_SYMBOL(physical_mask); #endif
+#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION +unsigned int pgd_allocation_order = 0; +#endif + pgtable_t pte_alloc_one(struct mm_struct *mm) { return __pte_alloc_one(mm, GFP_PGTABLE_USER); diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 5f0d579932c6..44b7120c63e3 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -38,6 +38,7 @@ #include <asm/desc.h> #include <asm/sections.h> #include <asm/set_memory.h> +#include <asm/pgalloc.h>
#undef pr_fmt #define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt @@ -97,6 +98,8 @@ void __init pti_check_boottime_disable(void) if (pti_mode == PTI_AUTO && !boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) return;
+ pgd_allocation_order = 1; + setup_force_cpu_cap(X86_FEATURE_PTI); }
On 4/17/25 07:48, Juergen Gross wrote:
-#define PGD_ALLOCATION_ORDER 1 +#define PGD_ALLOCATION_ORDER pgd_allocation_order +extern unsigned int pgd_allocation_order; #else #define PGD_ALLOCATION_ORDER 0 #endif
Instead of hiding a variable behind a macro-looking name and a bunch of #ifdefs, can we please fix this properly?
static inline pgd_allocation_order(void) { if (cpu_feature_enabled(X86_FEATURE_PTI)) return 1; return 0; }
and then s/PGD_ALLOCATION_ORDER/pgd_allocation_order()/.
Wouldn't that be a billion times better?
On Thu, Apr 17, 2025 at 04:48:08PM +0200, Juergen Gross wrote:
Recently _pgd_alloc() was switched from using __get_free_pages() to pagetable_alloc_noprof(), which might return a compound page in case the allocation order is larger than 0.
On x86 this will be the case if CONFIG_MITIGATION_PAGE_TABLE_ISOLATION is set, even if PTI has been disabled at runtime.
When running as a Xen PV guest (this will always disable PTI), using a compound page for a PGD will result in VM_BUG_ON_PGFLAGS being triggered when the Xen code tries to pin the PGD.
Fix the Xen issue together with the not needed 8k allocation for a PGD with PTI disabled by using a variable holding the PGD allocation order in case CONFIG_MITIGATION_PAGE_TABLE_ISOLATION is set.
Reported-by: Petr Vaněk arkamar@atlas.cz Fixes: a9b3c355c2e6 ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross jgross@suse.com
I have runtime tested this patch, and it fixes the reported issue. The following trailers can be appended to the commit message (as per [1]):
Closes: https://lore.kernel.org/lkml/202541612720-Z_-deOZTOztMXHBh-arkamar@atlas.cz/ Tested-by: Petr Vaněk arkamar@atlas.cz
Cheers, Petr
[1] https://docs.kernel.org/process/5.Posting.html#patch-formatting-and-changelo...
linux-stable-mirror@lists.linaro.org