Dear stable kernel maintainers,
Please consider merging the following patch series'. They enable
clang's integrated assembler to be used to assemble ARCH=arm kernels
back to linux-4.19.y. This is analogous to previous series' sent for
LLVM_IAS=1 support, but focused on ARM (32b).
Below is the list of commits in each series, in the form
<first tag of mainline containing sha> <sha12> <commit oneline>
For 5.10:
v5.11-rc1 3c9f5708b7ae ("ARM: 9029/1: Make iwmmxt.S support Clang's
integrated assembler")
v5.11-rc1 0b1674638a5c ("ARM: assembler: introduce adr_l, ldr_l and
str_l macros")
v5.11-rc1 67e3f828bd4b ("ARM: efistub: replace adrl pseudo-op with
adr_l macro invocation")
For 5.4:
v5.5-rc1 b4d0c0aad57a ("crypto: arm - use Kconfig based compiler
checks for crypto opcodes")
v5.5-rc1 9f1984c6ae30 ("ARM: 8929/1: use APSR_nzcv instead of r15 as
mrc operand")
v5.5-rc1 790756c7e022 ("ARM: 8933/1: replace Sun/Solaris style flag on
section directive")
v5.6-rc1 42d519e3d0c0 ("kbuild: Add support for 'as-instr' to be used
in Kconfig files")
v5.7-rc1 7548bf8c17d8 ("crypto: arm/ghash-ce - define fpu before fpu
registers are referenced")
v5.8-rc1 d85d5247885e ("ARM: OMAP2+: drop unnecessary adrl")
v5.8-rc1 a780e485b576 ("ARM: 8971/1: replace the sole use of a symbol
with its definition")
v5.8-rc1 b744b43f79cc ("kbuild: add CONFIG_LD_IS_LLD")
v5.9-rc1 a6c30873ee4a ("ARM: 8989/1: use .fpu assembler directives
instead of assembler arguments")
v5.9-rc1 ee440336e5ef ("ARM: 8990/1: use VFP assembler mnemonics in
register load/store macros")
v5.9-rc1 2cbd1cc3dcd3 ("ARM: 8991/1: use VFP assembler mnemonics if available")
v5.10-rc1 54781938ec34 ("crypto: arm/sha256-neon - avoid ADRL pseudo
instruction")
v5.10-rc1 0f5e8323777b ("crypto: arm/sha512-neon - avoid ADRL pseudo
instruction")
v5.11-rc1 28187dc8ebd9 ("ARM: 9025/1: Kconfig: CPU_BIG_ENDIAN depends
on !LD_IS_LLD")
Then 3c9f5708b7ae from the 5.10 series is applied (0b1674638a5c and
67e3f828bd4b were not necessary). 28187dc8ebd9 had previously been
picked up into 5.10 automatically. There was a minor conflict in
2cbd1cc3dcd3 due to 5.10 missing 8a90a3228b6a ("arm: Unplug KVM from
the build system").
b744b43f79cc and 28187dc8ebd9 are more specifically for allmodconfig
support than strictly LLVM_IAS=1.
For 4.19:
v4.20-rc1 d3c61619568c ("ARM: 8788/1: ftrace: remove old mcount support")
v4.20-rc1 f9b58e8c7d03 ("ARM: 8800/1: use choice for kernel unwinders")
v5.1-rc1 baf2df8e15be ("ARM: 8827/1: fix argument count to match macro
definition")
v5.1-rc1 32fdb046ac43 ("ARM: 8828/1: uaccess: use unified assembler
language syntax")
v5.1-rc1 eb7ff9023e4f ("ARM: 8829/1: spinlock: use unified assembler
language syntax")
v5.1-rc1 a216376add73 ("ARM: 8841/1: use unified assembler in macros")
v5.1-rc1 e44fc38818ed ("ARM: 8844/1: use unified assembler in assembly files")
v5.2-rc1 fe09d9c641f2 ("ARM: 8852/1: uaccess: use unified assembler
language syntax")
v5.2-rc1 3ab2b5fdd1d8 ("ARM: mvebu: drop unnecessary label")
v5.2-rc1 969ad77c14ab ("ARM: mvebu: prefix coprocessor operand with p")
v5.3-rc1 3fe1ee40b2a2 ("ARM: use arch_extension directive instead of
arch argument")
v5.4-rc3 3aa6d4abd4eb ("crypto: arm/aes-ce - build for v8 architecture
explicitly")
Then the entire 5.4 series is applied on top. 3fe1ee40b2a2 had a minor
conflict due to 4.19 missing 2997520c2d4e ("ARM: exynos: Set MCPM as
mandatory for Exynos542x/5800 SoCs").
I plan to send some follow ups; I need to do another pass to find what
we may need in addition when setting CONFIG_THUMB2_KERNEL=y
(non-default), there are two patches working their way through the ARM
maintainer's tree needed for allmodconfig
(https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9061/1
and https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9062/1)
and v4.19.y has one more issue I need to look into
(https://github.com/ClangBuiltLinux/linux/issues/1329) that has been
cleaned up by a 7 patch series that landed in v5.2-rc1, but on first
glance I suspect might be an assembler bug for us to fix.
These series will be used in Android and ChromeOS. We're also ready to
wire up CI coverage for LLVM_IAS=1 ARCH=arm for these branches.
--
Thanks,
~Nick Desaulniers
Backport 2 patches that are required to make KASAN+LKDTM work
with recent clang (patch 2/2 has a complete description).
Tested on our chromeos-4.19 branch.
Also compile tested on x86-64 and arm64 with gcc this time
around.
Patch 1/2 adds a guard around noinstr that matches upstream,
to prevent a build issue, and has some minor context conflicts.
Patch 2/2 is a clean backport.
These patches have been merged to 5.4 stable already. We might
need to backport to older stable branches, but this is what I
could test for now.
Changes in v2:
- Guard noinstr macro by __KERNEL__ && !__ASSEMBLY__ to prevent
expansion in linker script and match upstream.
Mark Rutland (1):
lkdtm: don't move ctors to .rodata
Thomas Gleixner (1):
vmlinux.lds.h: Create section for protection against instrumentation
arch/powerpc/kernel/vmlinux.lds.S | 1 +
drivers/misc/lkdtm/Makefile | 2 +-
drivers/misc/lkdtm/rodata.c | 2 +-
include/asm-generic/sections.h | 3 ++
include/asm-generic/vmlinux.lds.h | 10 ++++++
include/linux/compiler.h | 54 +++++++++++++++++++++++++++++++
include/linux/compiler_types.h | 6 ++++
scripts/mod/modpost.c | 2 +-
8 files changed, 77 insertions(+), 3 deletions(-)
--
2.31.0.rc2.261.g7f71774620-goog
From: Jacob Keller <jacob.e.keller(a)intel.com>
commit 6704a3abf4cf4181a1ee64f5db4969347b88ca1d upstream.
On hardware which supports timestamping all packets, the timestamps are
recorded in the packet buffer, and the driver no longer uses or reads
the registers. This makes the logic for checking and clearing Rx
timestamp hangs meaningless.
If we run the ixgbe_ptp_rx_hang() function in this case, then the driver
will continuously spam the log output with "Clearing Rx timestamp hang".
These messages are spurious, and confusing to end users.
The original code in commit a9763f3cb54c ("ixgbe: Update PTP to support
X550EM_x devices", 2015-12-03) did have a flag PTP_RX_TIMESTAMP_IN_REGISTER
which was intended to be used to avoid the Rx timestamp hang check,
however it did not actually check the flag before calling the function.
Do so now in order to stop the checks and prevent the spurious log
messages.
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices", 2015-12-03)
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
Cc: <stable(a)vger.kernel.org> # 4.9.x: 622a2ef538fb: ixgbe: check for Tx timestamp timeouts during watchdog
Cc: <stable(a)vger.kernel.org> # 4.9.x
Signed-off-by: Wen Yang <wenyang(a)linux.alibaba.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 66b1cc02..36d73bf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7257,7 +7257,8 @@ static void ixgbe_service_task(struct work_struct *work)
if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state)) {
ixgbe_ptp_overflow_check(adapter);
- ixgbe_ptp_rx_hang(adapter);
+ if (adapter->flags & IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER)
+ ixgbe_ptp_rx_hang(adapter);
ixgbe_ptp_tx_hang(adapter);
}
--
1.8.3.1
I noticed that bpf speculative execution fixes are already queued for
4.14.y except for f232326f6966 ("bpf: Prohibit alu ops for pointer
types not defining ptr_limit").
It is important that for all patches from this series to be applied
together, so we avoid introducing a new vulnerability.
For the missing patch, I see conflicting lines in the context diffs
due to API change that apparently caused import to fail.
I'm attaching a copy of the patch that is backported to 4.14.y. The
only change comparing with version queued for newer version is that
"verbose" API does not take "env" parameter.
Please queue or let me know how to proceed.
Thanks,
Piotr
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b318e8decf6b9ef1bcf4ca06fae6d6a2cb5d5c5c Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 16 Mar 2021 11:44:33 -0700
Subject: [PATCH] KVM: x86: Protect userspace MSR filter with SRCU, and set
atomically-ish
Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.
Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created. That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.
Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...
- Nothing is __rcu annotated, making it nigh impossible to audit the
code for correctness.
- kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel
coding style aside, the lack of a smb_rmb() anywhere casts all code
into doubt.
- kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
count before taking the lock.
- kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
The entire approach of updating the live filter is also flawed. While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters. An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.
[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Cc: stable(a)vger.kernel.org
Cc: Alexander Graf <graf(a)amazon.com>
Reported-by: Yuan Yao <yaoyuan0329os(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210316184436.2544875-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38e327d4b479..2898d3e86b08 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4806,8 +4806,10 @@ If an MSR access is not permitted through the filtering, it generates a
allows user space to deflect and potentially handle various MSR accesses
into user space.
-If a vCPU is in running state while this ioctl is invoked, the vCPU may
-experience inconsistent filtering behavior on MSR accesses.
+Note, invoking this ioctl with a vCPU is running is inherently racy. However,
+KVM does guarantee that vCPUs will see either the previous filter or the new
+filter, e.g. MSRs with identical settings in both the old and new filter will
+have deterministic behavior.
4.127 KVM_XEN_HVM_SET_ATTR
--------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e1b6e2edc828..3768819693e5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -948,6 +948,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+struct kvm_x86_msr_filter {
+ u8 count;
+ bool default_allow:1;
+ struct msr_bitmap_range ranges[16];
+};
+
#define APICV_INHIBIT_REASON_DISABLE 0
#define APICV_INHIBIT_REASON_HYPERV 1
#define APICV_INHIBIT_REASON_NESTED 2
@@ -1042,16 +1048,11 @@ struct kvm_arch {
bool guest_can_read_msr_platform_info;
bool exception_payload_enabled;
+ bool bus_lock_detection_enabled;
+
/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
u32 user_space_msr_mask;
-
- struct {
- u8 count;
- bool default_allow:1;
- struct msr_bitmap_range ranges[16];
- } msr_filter;
-
- bool bus_lock_detection_enabled;
+ struct kvm_x86_msr_filter __rcu *msr_filter;
struct kvm_pmu_event_filter __rcu *pmu_event_filter;
struct task_struct *nx_lpage_recovery_thread;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a5c5b38735e1..a04e78b89637 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1526,35 +1526,44 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
{
+ struct kvm_x86_msr_filter *msr_filter;
+ struct msr_bitmap_range *ranges;
struct kvm *kvm = vcpu->kvm;
- struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
- u32 count = kvm->arch.msr_filter.count;
- u32 i;
- bool r = kvm->arch.msr_filter.default_allow;
+ bool allowed;
int idx;
+ u32 i;
- /* MSR filtering not set up or x2APIC enabled, allow everything */
- if (!count || (index >= 0x800 && index <= 0x8ff))
+ /* x2APIC MSRs do not support filtering. */
+ if (index >= 0x800 && index <= 0x8ff)
return true;
- /* Prevent collision with set_msr_filter */
idx = srcu_read_lock(&kvm->srcu);
- for (i = 0; i < count; i++) {
+ msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
+ if (!msr_filter) {
+ allowed = true;
+ goto out;
+ }
+
+ allowed = msr_filter->default_allow;
+ ranges = msr_filter->ranges;
+
+ for (i = 0; i < msr_filter->count; i++) {
u32 start = ranges[i].base;
u32 end = start + ranges[i].nmsrs;
u32 flags = ranges[i].flags;
unsigned long *bitmap = ranges[i].bitmap;
if ((index >= start) && (index < end) && (flags & type)) {
- r = !!test_bit(index - start, bitmap);
+ allowed = !!test_bit(index - start, bitmap);
break;
}
}
+out:
srcu_read_unlock(&kvm->srcu, idx);
- return r;
+ return allowed;
}
EXPORT_SYMBOL_GPL(kvm_msr_allowed);
@@ -5354,25 +5363,34 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
return r;
}
-static void kvm_clear_msr_filter(struct kvm *kvm)
+static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
+{
+ struct kvm_x86_msr_filter *msr_filter;
+
+ msr_filter = kzalloc(sizeof(*msr_filter), GFP_KERNEL_ACCOUNT);
+ if (!msr_filter)
+ return NULL;
+
+ msr_filter->default_allow = default_allow;
+ return msr_filter;
+}
+
+static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
{
u32 i;
- u32 count = kvm->arch.msr_filter.count;
- struct msr_bitmap_range ranges[16];
- mutex_lock(&kvm->lock);
- kvm->arch.msr_filter.count = 0;
- memcpy(ranges, kvm->arch.msr_filter.ranges, count * sizeof(ranges[0]));
- mutex_unlock(&kvm->lock);
- synchronize_srcu(&kvm->srcu);
+ if (!msr_filter)
+ return;
+
+ for (i = 0; i < msr_filter->count; i++)
+ kfree(msr_filter->ranges[i].bitmap);
- for (i = 0; i < count; i++)
- kfree(ranges[i].bitmap);
+ kfree(msr_filter);
}
-static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user_range)
+static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
+ struct kvm_msr_filter_range *user_range)
{
- struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
struct msr_bitmap_range range;
unsigned long *bitmap = NULL;
size_t bitmap_size;
@@ -5406,11 +5424,9 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
goto err;
}
- /* Everything ok, add this range identifier to our global pool */
- ranges[kvm->arch.msr_filter.count] = range;
- /* Make sure we filled the array before we tell anyone to walk it */
- smp_wmb();
- kvm->arch.msr_filter.count++;
+ /* Everything ok, add this range identifier. */
+ msr_filter->ranges[msr_filter->count] = range;
+ msr_filter->count++;
return 0;
err:
@@ -5421,10 +5437,11 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
{
struct kvm_msr_filter __user *user_msr_filter = argp;
+ struct kvm_x86_msr_filter *new_filter, *old_filter;
struct kvm_msr_filter filter;
bool default_allow;
- int r = 0;
bool empty = true;
+ int r = 0;
u32 i;
if (copy_from_user(&filter, user_msr_filter, sizeof(filter)))
@@ -5437,25 +5454,32 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
if (empty && !default_allow)
return -EINVAL;
- kvm_clear_msr_filter(kvm);
-
- kvm->arch.msr_filter.default_allow = default_allow;
+ new_filter = kvm_alloc_msr_filter(default_allow);
+ if (!new_filter)
+ return -ENOMEM;
- /*
- * Protect from concurrent calls to this function that could trigger
- * a TOCTOU violation on kvm->arch.msr_filter.count.
- */
- mutex_lock(&kvm->lock);
for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) {
- r = kvm_add_msr_filter(kvm, &filter.ranges[i]);
- if (r)
- break;
+ r = kvm_add_msr_filter(new_filter, &filter.ranges[i]);
+ if (r) {
+ kvm_free_msr_filter(new_filter);
+ return r;
+ }
}
+ mutex_lock(&kvm->lock);
+
+ /* The per-VM filter is protected by kvm->lock... */
+ old_filter = srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1);
+
+ rcu_assign_pointer(kvm->arch.msr_filter, new_filter);
+ synchronize_srcu(&kvm->srcu);
+
+ kvm_free_msr_filter(old_filter);
+
kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED);
mutex_unlock(&kvm->lock);
- return r;
+ return 0;
}
long kvm_arch_vm_ioctl(struct file *filp,
@@ -10636,8 +10660,6 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
void kvm_arch_destroy_vm(struct kvm *kvm)
{
- u32 i;
-
if (current->mm == kvm->mm) {
/*
* Free memory regions allocated on behalf of userspace,
@@ -10653,8 +10675,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
static_call_cond(kvm_x86_vm_destroy)(kvm);
- for (i = 0; i < kvm->arch.msr_filter.count; i++)
- kfree(kvm->arch.msr_filter.ranges[i].bitmap);
+ kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
kvm_free_vcpus(kvm);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b318e8decf6b9ef1bcf4ca06fae6d6a2cb5d5c5c Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 16 Mar 2021 11:44:33 -0700
Subject: [PATCH] KVM: x86: Protect userspace MSR filter with SRCU, and set
atomically-ish
Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.
Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created. That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.
Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...
- Nothing is __rcu annotated, making it nigh impossible to audit the
code for correctness.
- kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel
coding style aside, the lack of a smb_rmb() anywhere casts all code
into doubt.
- kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
count before taking the lock.
- kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
The entire approach of updating the live filter is also flawed. While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters. An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.
[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Cc: stable(a)vger.kernel.org
Cc: Alexander Graf <graf(a)amazon.com>
Reported-by: Yuan Yao <yaoyuan0329os(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210316184436.2544875-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38e327d4b479..2898d3e86b08 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4806,8 +4806,10 @@ If an MSR access is not permitted through the filtering, it generates a
allows user space to deflect and potentially handle various MSR accesses
into user space.
-If a vCPU is in running state while this ioctl is invoked, the vCPU may
-experience inconsistent filtering behavior on MSR accesses.
+Note, invoking this ioctl with a vCPU is running is inherently racy. However,
+KVM does guarantee that vCPUs will see either the previous filter or the new
+filter, e.g. MSRs with identical settings in both the old and new filter will
+have deterministic behavior.
4.127 KVM_XEN_HVM_SET_ATTR
--------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e1b6e2edc828..3768819693e5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -948,6 +948,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+struct kvm_x86_msr_filter {
+ u8 count;
+ bool default_allow:1;
+ struct msr_bitmap_range ranges[16];
+};
+
#define APICV_INHIBIT_REASON_DISABLE 0
#define APICV_INHIBIT_REASON_HYPERV 1
#define APICV_INHIBIT_REASON_NESTED 2
@@ -1042,16 +1048,11 @@ struct kvm_arch {
bool guest_can_read_msr_platform_info;
bool exception_payload_enabled;
+ bool bus_lock_detection_enabled;
+
/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
u32 user_space_msr_mask;
-
- struct {
- u8 count;
- bool default_allow:1;
- struct msr_bitmap_range ranges[16];
- } msr_filter;
-
- bool bus_lock_detection_enabled;
+ struct kvm_x86_msr_filter __rcu *msr_filter;
struct kvm_pmu_event_filter __rcu *pmu_event_filter;
struct task_struct *nx_lpage_recovery_thread;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a5c5b38735e1..a04e78b89637 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1526,35 +1526,44 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
{
+ struct kvm_x86_msr_filter *msr_filter;
+ struct msr_bitmap_range *ranges;
struct kvm *kvm = vcpu->kvm;
- struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
- u32 count = kvm->arch.msr_filter.count;
- u32 i;
- bool r = kvm->arch.msr_filter.default_allow;
+ bool allowed;
int idx;
+ u32 i;
- /* MSR filtering not set up or x2APIC enabled, allow everything */
- if (!count || (index >= 0x800 && index <= 0x8ff))
+ /* x2APIC MSRs do not support filtering. */
+ if (index >= 0x800 && index <= 0x8ff)
return true;
- /* Prevent collision with set_msr_filter */
idx = srcu_read_lock(&kvm->srcu);
- for (i = 0; i < count; i++) {
+ msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
+ if (!msr_filter) {
+ allowed = true;
+ goto out;
+ }
+
+ allowed = msr_filter->default_allow;
+ ranges = msr_filter->ranges;
+
+ for (i = 0; i < msr_filter->count; i++) {
u32 start = ranges[i].base;
u32 end = start + ranges[i].nmsrs;
u32 flags = ranges[i].flags;
unsigned long *bitmap = ranges[i].bitmap;
if ((index >= start) && (index < end) && (flags & type)) {
- r = !!test_bit(index - start, bitmap);
+ allowed = !!test_bit(index - start, bitmap);
break;
}
}
+out:
srcu_read_unlock(&kvm->srcu, idx);
- return r;
+ return allowed;
}
EXPORT_SYMBOL_GPL(kvm_msr_allowed);
@@ -5354,25 +5363,34 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
return r;
}
-static void kvm_clear_msr_filter(struct kvm *kvm)
+static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
+{
+ struct kvm_x86_msr_filter *msr_filter;
+
+ msr_filter = kzalloc(sizeof(*msr_filter), GFP_KERNEL_ACCOUNT);
+ if (!msr_filter)
+ return NULL;
+
+ msr_filter->default_allow = default_allow;
+ return msr_filter;
+}
+
+static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
{
u32 i;
- u32 count = kvm->arch.msr_filter.count;
- struct msr_bitmap_range ranges[16];
- mutex_lock(&kvm->lock);
- kvm->arch.msr_filter.count = 0;
- memcpy(ranges, kvm->arch.msr_filter.ranges, count * sizeof(ranges[0]));
- mutex_unlock(&kvm->lock);
- synchronize_srcu(&kvm->srcu);
+ if (!msr_filter)
+ return;
+
+ for (i = 0; i < msr_filter->count; i++)
+ kfree(msr_filter->ranges[i].bitmap);
- for (i = 0; i < count; i++)
- kfree(ranges[i].bitmap);
+ kfree(msr_filter);
}
-static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user_range)
+static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
+ struct kvm_msr_filter_range *user_range)
{
- struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
struct msr_bitmap_range range;
unsigned long *bitmap = NULL;
size_t bitmap_size;
@@ -5406,11 +5424,9 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
goto err;
}
- /* Everything ok, add this range identifier to our global pool */
- ranges[kvm->arch.msr_filter.count] = range;
- /* Make sure we filled the array before we tell anyone to walk it */
- smp_wmb();
- kvm->arch.msr_filter.count++;
+ /* Everything ok, add this range identifier. */
+ msr_filter->ranges[msr_filter->count] = range;
+ msr_filter->count++;
return 0;
err:
@@ -5421,10 +5437,11 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
{
struct kvm_msr_filter __user *user_msr_filter = argp;
+ struct kvm_x86_msr_filter *new_filter, *old_filter;
struct kvm_msr_filter filter;
bool default_allow;
- int r = 0;
bool empty = true;
+ int r = 0;
u32 i;
if (copy_from_user(&filter, user_msr_filter, sizeof(filter)))
@@ -5437,25 +5454,32 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
if (empty && !default_allow)
return -EINVAL;
- kvm_clear_msr_filter(kvm);
-
- kvm->arch.msr_filter.default_allow = default_allow;
+ new_filter = kvm_alloc_msr_filter(default_allow);
+ if (!new_filter)
+ return -ENOMEM;
- /*
- * Protect from concurrent calls to this function that could trigger
- * a TOCTOU violation on kvm->arch.msr_filter.count.
- */
- mutex_lock(&kvm->lock);
for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) {
- r = kvm_add_msr_filter(kvm, &filter.ranges[i]);
- if (r)
- break;
+ r = kvm_add_msr_filter(new_filter, &filter.ranges[i]);
+ if (r) {
+ kvm_free_msr_filter(new_filter);
+ return r;
+ }
}
+ mutex_lock(&kvm->lock);
+
+ /* The per-VM filter is protected by kvm->lock... */
+ old_filter = srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1);
+
+ rcu_assign_pointer(kvm->arch.msr_filter, new_filter);
+ synchronize_srcu(&kvm->srcu);
+
+ kvm_free_msr_filter(old_filter);
+
kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED);
mutex_unlock(&kvm->lock);
- return r;
+ return 0;
}
long kvm_arch_vm_ioctl(struct file *filp,
@@ -10636,8 +10660,6 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
void kvm_arch_destroy_vm(struct kvm *kvm)
{
- u32 i;
-
if (current->mm == kvm->mm) {
/*
* Free memory regions allocated on behalf of userspace,
@@ -10653,8 +10675,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
mutex_unlock(&kvm->slots_lock);
}
static_call_cond(kvm_x86_vm_destroy)(kvm);
- for (i = 0; i < kvm->arch.msr_filter.count; i++)
- kfree(kvm->arch.msr_filter.ranges[i].bitmap);
+ kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
kvm_free_vcpus(kvm);
I'm announcing the release of the 5.11.8 kernel.
All users of the 5.11 kernel series must upgrade.
The updated 5.11.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.11.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm64/include/asm/el2_setup.h | 4
arch/x86/crypto/aesni-intel_asm.S | 115 +++++++++-------
arch/x86/crypto/aesni-intel_glue.c | 25 +--
arch/x86/kvm/mmu/mmu_internal.h | 13 +
drivers/infiniband/ulp/srp/ib_srp.c | 110 ++++++---------
drivers/net/dsa/b53/b53_common.c | 18 ++
drivers/net/dsa/b53/b53_regs.h | 1
drivers/net/dsa/bcm_sf2.c | 15 --
drivers/regulator/pca9450-regulator.c | 30 ++++
fs/fuse/fuse_i.h | 1
fs/gfs2/ops_fstype.c | 33 ++--
fs/gfs2/recovery.c | 8 -
fs/gfs2/super.c | 45 ------
fs/gfs2/util.c | 58 +++++++-
fs/gfs2/util.h | 3
fs/io_uring.c | 84 ++++++-----
fs/locks.c | 3
fs/nfsd/nfs4state.c | 53 +------
include/linux/regulator/pca9450.h | 10 +
kernel/bpf/verifier.c | 33 ++--
net/mptcp/pm.c | 5
net/mptcp/pm_netlink.c | 23 ++-
net/mptcp/protocol.c | 20 ++
net/mptcp/protocol.h | 5
tools/testing/selftests/bpf/verifier/bounds_deduction.c | 27 ++-
tools/testing/selftests/bpf/verifier/map_ptr.c | 4
tools/testing/selftests/bpf/verifier/unpriv.c | 15 +-
tools/testing/selftests/bpf/verifier/value_ptr_arith.c | 23 +++
29 files changed, 461 insertions(+), 325 deletions(-)
Amir Goldstein (1):
fuse: fix live lock in fuse_iget()
Ard Biesheuvel (1):
crypto: x86/aes-ni-xts - use direct calls to and 4-way stride
Bob Peterson (3):
gfs2: Add common helper for holding and releasing the freeze glock
gfs2: move freeze glock outside the make_fs_rw and _ro functions
gfs2: bypass signal_our_withdraw if no journal
Florian Fainelli (1):
net: dsa: b53: Support setting learning on port
Florian Westphal (2):
mptcp: pm: add lockdep assertions
mptcp: dispose initial struct socket when its subflow is closed
Frieder Schrempf (3):
regulator: pca9450: Add SD_VSEL GPIO for LDO5
regulator: pca9450: Enable system reset on WDOG_B assertion
regulator: pca9450: Clear PRESET_EN bit to fix BUCK1/2/3 voltage setting
Geliang Tang (1):
mptcp: send ack for every add_addr
Greg Kroah-Hartman (1):
Linux 5.11.8
J. Bruce Fields (2):
Revert "nfsd4: remove check_conflicting_opens warning"
Revert "nfsd4: a client's own opens needn't prevent delegations"
Jens Axboe (3):
io_uring: don't attempt IO reissue from the ring exit path
io_uring: don't keep looping for more events if we can't flush overflow
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
Nicolas Morey-Chaisemartin (1):
RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes
Pavel Begunkov (3):
io_uring: refactor scheduling in io_cqring_wait
io_uring: refactor io_cqring_wait
io_uring: simplify do_read return parsing
Piotr Krysiuk (5):
bpf: Prohibit alu ops for pointer types not defining ptr_limit
bpf: Fix off-by-one for area size in creating mask to left
bpf: Simplify alu_limit masking for pointer arithmetic
bpf: Add sanity check for upper ptr_limit
bpf, selftests: Fix up some test_verifier cases for unprivileged
Sean Christopherson (2):
KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()
KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled
Vladimir Murzin (1):
arm64: Unconditionally set virtual cpu id registers