From: Jia He <justin.he(a)arm.com>
When walking the page tables at a given level, and if the start
address for the range isn't aligned for that level, we propagate
the misalignment on each iteration at that level.
This results in the walker ignoring a number of entries (depending
on the original misalignment) on each subsequent iteration.
Properly aligning the address before the next iteration addresses
this issue.
Cc: stable(a)vger.kernel.org
Reported-by: Howard Zhang <Howard.Zhang(a)arm.com>
Acked-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Jia He <justin.he(a)arm.com>
Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker infrastructure")
[maz: rewrite commit message]
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210303024225.2591-1-justin.he@arm.com
---
arch/arm64/kvm/hyp/pgtable.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..926fc07074f5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -223,6 +223,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
goto out;
if (!table) {
+ data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
data->addr += kvm_granule_size(level);
goto out;
}
--
2.29.2
From: Will Deacon <will(a)kernel.org>
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest
context") tracks the currently running vCPU, clearing the pointer to
NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the
kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS
code to go off into the weeds when it saves the DISR assuming that the
CPU context is embedded in a struct vCPU.
Leave x1 alone and use x3 as a temporary register instead when clearing
the vCPU on the guest exit path.
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Andrew Scull <ascull(a)google.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context")
Suggested-by: Quentin Perret <qperret(a)google.com>
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210226181211.14542-1-will@kernel.org
---
arch/arm64/kvm/hyp/entry.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index b0afad7a99c6..0c66a1d408fd 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -146,7 +146,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
// Now restore the hyp regs
restore_callee_saved_regs x2
- set_loaded_vcpu xzr, x1, x2
+ set_loaded_vcpu xzr, x2, x3
alternative_if ARM64_HAS_RAS_EXTN
// If we have the RAS extensions we can consume a pending error
--
2.29.2
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
The nVHE KVM hyp drains and disables the SPE buffer, before
entering the guest, as the EL1&0 translation regime
is going to be loaded with that of the guest.
But this operation is performed way too late, because :
- The owning translation regime of the SPE buffer
is transferred to EL2. (MDCR_EL2_E2PB == 0)
- The guest Stage1 is loaded.
Thus the flush could use the host EL1 virtual address,
but use the EL2 translations instead of host EL1, for writing
out any cached data.
Fix this by moving the SPE buffer handling early enough.
The restore path is doing the right thing.
Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow")
Cc: stable(a)vger.kernel.org
Cc: Christoffer Dall <christoffer.dall(a)arm.com>
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Alexandru Elisei <alexandru.elisei(a)arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei(a)arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poulose@arm.com
---
arch/arm64/include/asm/kvm_hyp.h | 5 +++++
arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 ++++++++++--
arch/arm64/kvm/hyp/nvhe/switch.c | 11 ++++++++++-
3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index c0450828378b..385bd7dd3d39 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -83,6 +83,11 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
void __debug_switch_to_host(struct kvm_vcpu *vcpu);
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+#endif
+
void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 91a711aa8382..f401724f12ef 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -58,16 +58,24 @@ static void __debug_restore_spe(u64 pmscr_el1)
write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
}
-void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
/* Disable and flush SPE data generation */
__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
+}
+
+void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
+{
__debug_switch_to_guest_common(vcpu);
}
-void __debug_switch_to_host(struct kvm_vcpu *vcpu)
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
{
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+}
+
+void __debug_switch_to_host(struct kvm_vcpu *vcpu)
+{
__debug_switch_to_host_common(vcpu);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f3d0e9eca56c..59aa1045fdaf 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -192,6 +192,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
__sysreg_save_state_nvhe(host_ctxt);
+ /*
+ * We must flush and disable the SPE buffer for nVHE, as
+ * the translation regime(EL1&0) is going to be loaded with
+ * that of the guest. And we must do this before we change the
+ * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
+ * before we load guest Stage1.
+ */
+ __debug_save_host_buffers_nvhe(vcpu);
__adjust_pc(vcpu);
@@ -234,11 +242,12 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
__fpsimd_save_fpexc32(vcpu);
+ __debug_switch_to_host(vcpu);
/*
* This must come after restoring the host sysregs, since a non-VHE
* system may enable SPE here and make use of the TTBRs.
*/
- __debug_switch_to_host(vcpu);
+ __debug_restore_host_buffers_nvhe(vcpu);
if (pmu_switch_needed)
__pmu_switch_to_host(host_ctxt);
--
2.29.2
Commit 384d87ef2c95 ("block: Do not discard buffers under a mounted
filesystem") made paths issuing discard or zeroout requests to the
underlying device try to grab block device in exclusive mode. If that
failed we returned EBUSY to userspace. This however caused unexpected
fallout in userspace where e.g. FUSE filesystems issue discard requests
from userspace daemons although the device is open exclusively by the
kernel. Also shrinking of logical volume by LVM issues discard requests
to a device which may be claimed exclusively because there's another LV
on the same PV. So to avoid these userspace regressions, fall back to
invalidate_inode_pages2_range() instead of returning EBUSY to userspace
and return EBUSY only of that call fails as well (meaning that there's
indeed someone using the particular device range we are trying to
discard).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167
Fixes: 384d87ef2c95 ("block: Do not discard buffers under a mounted filesystem")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/block_dev.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 235b5042672e..c33151020bcd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -118,13 +118,22 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
if (!(mode & FMODE_EXCL)) {
int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
if (err)
- return err;
+ goto invalidate;
}
truncate_inode_pages_range(bdev->bd_inode->i_mapping, lstart, lend);
if (!(mode & FMODE_EXCL))
bd_abort_claiming(bdev, truncate_bdev_range);
return 0;
+
+invalidate:
+ /*
+ * Someone else has handle exclusively open. Try invalidating instead.
+ * The 'end' argument is inclusive so the rounding is safe.
+ */
+ return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
+ lstart >> PAGE_SHIFT,
+ lend >> PAGE_SHIFT);
}
EXPORT_SYMBOL(truncate_bdev_range);
--
2.26.2