On Wed, 28 Jun 2023 00:54:05 +0100, Oliver Upton oliver.upton@linux.dev wrote:
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest memory. This is even the case with hugepages, although it is a rather suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified range is exactly one page/block of memory, which in the aforementioned case is not necessarily true. All together this leads to the WARN() in kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit at most a single leaf PTE. For hugepage-backed memory in a memslot that isn't hugepage-aligned, page aging entirely misses accesses to the hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new walker and drop the WARN that caught the issue in the first place. The implementation of this walker was inspired by the test_clear_young() implementation by Yu Zhao [*], but repurposed to address a bug in the existing aging implementation.
Cc: stable@vger.kernel.org # v5.15 Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers") Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/ Co-developed-by: Yu Zhao yuzhao@google.com Signed-off-by: Yu Zhao yuzhao@google.com Reported-by: Reiji Watanabe reijiw@google.com Signed-off-by: Oliver Upton oliver.upton@linux.dev
arch/arm64/include/asm/kvm_pgtable.h | 26 ++++++------------ arch/arm64/kvm/hyp/pgtable.c | 41 ++++++++++++++++++++++------ arch/arm64/kvm/mmu.c | 18 ++++++------ 3 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index dc3c072e862f..75f437e8cd15 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -556,22 +556,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size); kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr); /**
- kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
- kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
flag in a page-table entry.
- @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- @addr: Intermediate physical address to identify the page-table entry.
- @size: Size of the address range to visit.
- @mkold: True if the access flag should be cleared.
- The offset of @addr within a page is ignored.
- If there is a valid, leaf page-table entry used to translate @addr, then
- clear the access flag in that entry.
- Tests and conditionally clears the access flag for every valid, leaf
- page-table entry used to translate the range [@addr, @addr + @size).
- Note that it is the caller's responsibility to invalidate the TLB after
- calling this function to ensure that the updated permissions are visible
- to the CPUs.
- Return: The old page-table entry prior to clearing the flag, 0 on failure.
*/
- Return: True if any of the visited PTEs had the access flag set.
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr); +bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
u64 size, bool mkold);
/**
- kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -593,18 +597,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr); int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot); -/**
- kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
access flag set.
- @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- @addr: Intermediate physical address to identify the page-table entry.
- The offset of @addr within a page is ignored.
- Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
/**
- kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 5282cb9ca4cf..5d701e9adf5c 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1153,25 +1153,48 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr) return pte; } -kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr) +struct stage2_age_data {
- bool mkold;
- bool young;
+};
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
&pte, NULL, 0);
- kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
- struct stage2_age_data *data = ctx->arg;
- if (!kvm_pte_valid(ctx->old) || new == ctx->old)
return 0;
- data->young = true;
- if (data->mkold && !stage2_try_set_pte(ctx, new))
return -EAGAIN;
- /*
*/
- "But where's the TLBI?!", you scream.
- "Over in the core code", I sigh.
- See the '->clear_flush_young()' callback on the KVM mmu notifier.
- return pte;
- return 0;
} -bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr) +bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
- struct stage2_age_data data = {
.mkold = mkold,
- };
- struct kvm_pgtable_walker walker = {
.cb = stage2_age_walker,
.arg = &data,
.flags = KVM_PGTABLE_WALK_LEAF,
- };
- WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
Do we really want a WARN_ON() here? From what I can tell, it can be (trivially?) triggered by the previous function returning -EAGAIN if the pte update fails in the case of a shared walk.
Otherwise, look OK to me.
M.