From: David Woodhouse dwmw@amazon.co.uk
The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups respectively) assume that the root page table is actually a 8KiB allocation, with the userspace root immediately after the kernel root page table (so that the former can enforce NX on on all the subordinate page tables, which are actually shared).
However, users of the kernel_ident_mapping_init() code do not give it an 8KiB allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset() allocate only a single 4KiB page. The kexec code on x86_64 currently gets away with it purely by chance, because it allocates 8KiB for its "control code page" and then actually uses the first half for the PGD, then copies the actual trampoline code into the second half only after the identmap code has finished scribbling over it.
Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as _PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to write to the userspace 'shadow' PGD.
Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the actual page tables; since __pti_set_user_pgtbl() returns the value to be written to the kernel page table, it could be filtered out. But there seems to be no benefit to actually doing so.
Cc: stable@kernel.org Suggested-by: Dave Hansen dave.hansen@intel.com Signed-off-by: David Woodhouse dwmw@amazon.co.uk --- Split out from the kexec-debug series at https://lore.kernel.org/kexec/c16ed8f5c960c34f05b88b84a31f28a610f6a3cf.camel... because it looks like it's an actual bug fix for the hibernate and acpi_mp cases. It only bites kexec after I take away the extra 4KiB that we were letting it safely scribble on purely by chance.
Added Cc:stable for discussion but the simpler fix there might just be to allocate a full 8KiB for the PGD for the affected use cases?
arch/x86/include/asm/pgtable_types.h | 8 ++++++-- arch/x86/mm/ident_map.c | 6 +++--- arch/x86/mm/pti.c | 2 +- 3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 6f82e75b6149..4b804531b03c 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -36,10 +36,12 @@ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
#ifdef CONFIG_X86_64 -#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit */ +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */ +#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD) */ #else /* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */ -#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit */ +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */ +#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD) */ #endif
/* If _PAGE_BIT_PRESENT is clear, we use these: */ @@ -139,6 +141,8 @@
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#define _PAGE_NOPTISHADOW (_AT(pteval_t, 1) << _PAGE_BIT_NOPTISHADOW) + /* * Set of bits not changed in pte_modify. The pte's * protection key is treated like _PAGE_RW, for diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c index 437e96fb4977..5ab7bd2f1983 100644 --- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -174,7 +174,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page, if (result) return result;
- set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag)); + set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag | _PAGE_NOPTISHADOW)); }
return 0; @@ -218,14 +218,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page, if (result) return result; if (pgtable_l5_enabled()) { - set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag)); + set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag | _PAGE_NOPTISHADOW)); } else { /* * With p4d folded, pgd is equal to p4d. * The pgd entry has to point to the pud page table in this case. */ pud_t *pud = pud_offset(p4d, 0); - set_pgd(pgd, __pgd(__pa(pud) | info->kernpg_flag)); + set_pgd(pgd, __pgd(__pa(pud) | info->kernpg_flag | _PAGE_NOPTISHADOW)); } }
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 851ec8f1363a..5f0d579932c6 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -132,7 +132,7 @@ pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) * Top-level entries added to init_mm's usermode pgd after boot * will not be automatically propagated to other mms. */ - if (!pgdp_maps_userspace(pgdp)) + if (!pgdp_maps_userspace(pgdp) || (pgd.pgd & _PAGE_NOPTISHADOW)) return pgd;
/*
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH] x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables Link: https://lore.kernel.org/stable/412c90a4df7aef077141d9f68d19cbe5602d6c6d.came...
linux-stable-mirror@lists.linaro.org