From: "Tyler Hicks" <code(a)tyhicks.com>
The backport of commit 05c2224d4b04 ("KVM: selftests: Fix number of
pages for memory slot in memslot_modification_stress_test") broke the
build of the KVM selftest memslot_modification_stress_test.c source file
in two ways:
- Incorrectly assumed that max_t() was defined despite commit
5cf67a6051ea ("tools/include: Add _RET_IP_ and math definitions to
kernel.h") not being present
- Incorrectly assumed that kvm_vm struct members could be directly
accessed despite b530eba14c70 ("KVM: selftests: Get rid of
kvm_util_internal.h") not being present
Backport the first commit, as it is simple enough. Work around the lack
of the second commit by using the accessors to get to the kvm_vm struct
members.
Note that the linux-6.0.y backport of commit 05c2224d4b04 ("KVM:
selftests: Fix number of pages for memory slot in
memslot_modification_stress_test") is fine because the two prerequisite
commits, mentioned above, are both present in v6.0.
Tyler
Karolina Drobnik (1):
tools/include: Add _RET_IP_ and math definitions to kernel.h
Tyler Hicks (Microsoft) (1):
KVM: selftests: Fix build regression by using accessor function
tools/include/linux/kernel.h | 6 ++++++
.../selftests/kvm/memslot_modification_stress_test.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
--
2.34.1
The patch titled
Subject: mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe(a)google.com>
Subject: mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
Date: Sat, 24 Dec 2022 00:20:35 -0800
SHMEM_HUGE_DENY is for emergency use by the admin, to disable allocation
of shmem huge pages if, for example, a dangerous bug is found in their
usage: see "deny" in Documentation/mm/transhuge.rst. An app using
madvise(,,MADV_COLLAPSE) should not be allowed to override it: restore its
precedence over shmem_huge_force.
Restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE.
Link: https://lkml.kernel.org/r/20221224082035.3197140-2-zokeefe@google.com
Fixes: 7c6c6cc4d3a2 ("mm/shmem: add flag to enforce shmem THP in hugepage_vma_check()")
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Suggested-by: Hugh Dickins <hughd(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/shmem.c~mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse
+++ a/mm/shmem.c
@@ -478,12 +478,10 @@ bool shmem_is_huge(struct vm_area_struct
if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) ||
test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)))
return false;
- if (shmem_huge_force)
- return true;
- if (shmem_huge == SHMEM_HUGE_FORCE)
- return true;
if (shmem_huge == SHMEM_HUGE_DENY)
return false;
+ if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
+ return true;
switch (SHMEM_SB(inode->i_sb)->huge) {
case SHMEM_HUGE_ALWAYS:
_
Patches currently in -mm which might be from zokeefe(a)google.com are
mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end.patch
mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse.patch
The patch titled
Subject: mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe(a)google.com>
Subject: mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
Date: Sat, 24 Dec 2022 00:20:34 -0800
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.
At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped. One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.
However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.
The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results. An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region. An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.
Don't expand the acted-on region when refetching vma->vm_end.
Link: https://lkml.kernel.org/r/20221224082035.3197140-1-zokeefe@google.com
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/khugepaged.c~mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end
+++ a/mm/khugepaged.c
@@ -2647,7 +2647,7 @@ int madvise_collapse(struct vm_area_stru
goto out_nolock;
}
- hend = vma->vm_end & HPAGE_PMD_MASK;
+ hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
_
Patches currently in -mm which might be from zokeefe(a)google.com are
mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end.patch
mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse.patch
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into
the IPA space as part as a read-only memslot.
Not only this is legitimate, but it also results in added security,
so thumbs up. However, this clashes mildly with our handling of a S1PTW
as a write to correctly handle AF/DB updates to the S1 PTs, and results
in the guest taking an abort it won't recover from (the PTs mapping the
vectors will suffer freom the same problem...).
So clearly our handling is... wrong.
Instead, switch to a two-pronged approach:
- On S1PTW translation fault, handle the fault as a read
- On S1PTW permission fault, handle the fault as a write
This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use AF/DB anyway, as that'd be wrong.
Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.
Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..fd6ad8b21f85 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
{
- if (kvm_vcpu_abt_iss1tw(vcpu))
- return true;
+ if (kvm_vcpu_abt_iss1tw(vcpu)) {
+ /*
+ * Only a permission fault on a S1PTW should be
+ * considered as a write. Otherwise, page tables baked
+ * in a read-only memslot will result in an exception
+ * being delivered in the guest.
+ *
+ * The drawback is that we end-up fauling twice if the
+ * guest is using any of HW AF/DB: a translation fault
+ * to map the page containing the PT (read only at
+ * first), then a permission fault to allow the flags
+ * to be set.
+ */
+ switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
+ case ESR_ELx_FSC_PERM:
+ return true;
+ default:
+ return false;
+ }
+ }
if (kvm_vcpu_trap_is_iabt(vcpu))
return false;
--
2.34.1