svm_update_lbrv() is called when MSR_IA32_DEBUGCTLMSR is updated, and on
nested transitions where LBRV is used. It checks whether LBRV enablement
needs to be changed in the current VMCB, and if it does, it also
recalculate intercepts to LBR MSRs.
However, there are cases where intercepts need to be updated even when
LBRV enablement doesn't. Example scenario:
- L1 has MSR_IA32_DEBUGCTLMSR cleared.
- L1 runs L2 without LBR_CTL_ENABLE (no LBRV).
- L2 sets DEBUGCTLMSR_LBR in MSR_IA32_DEBUGCTLMSR, svm_update_lbrv()
sets LBR_CTL_ENABLE in VMCB02 and disables intercepts to LBR MSRs.
- L2 exits to L1, svm_update_lbrv() is not called on this transition.
- L1 clears MSR_IA32_DEBUGCTLMSR, svm_update_lbrv() finds that
LBR_CTL_ENABLE is already cleared in VMCB01 and does nothing.
- Intercepts remain disabled, L1 reads to LBR MSRs read the host MSRs.
Fix it by always recalculating intercepts in svm_update_lbrv().
Fixes: 1d5a1b5860ed ("KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d25c56b30b4e2..26ab75ecf1c67 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -806,25 +806,29 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
vmcb_mark_dirty(to_vmcb, VMCB_LBR);
}
-void svm_enable_lbrv(struct kvm_vcpu *vcpu)
+static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
- svm_recalc_lbr_msr_intercepts(vcpu);
/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
if (is_guest_mode(vcpu))
svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr);
}
-static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
+void svm_enable_lbrv(struct kvm_vcpu *vcpu)
+{
+ __svm_enable_lbrv(vcpu);
+ svm_recalc_lbr_msr_intercepts(vcpu);
+}
+
+static void __svm_disable_lbrv(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm);
svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
- svm_recalc_lbr_msr_intercepts(vcpu);
/*
* Move the LBR msrs back to the vmcb01 to avoid copying them
@@ -853,13 +857,18 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
(is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
- if (enable_lbrv == current_enable_lbrv)
- return;
+ if (enable_lbrv && !current_enable_lbrv)
+ __svm_enable_lbrv(vcpu);
+ else if (!enable_lbrv && current_enable_lbrv)
+ __svm_disable_lbrv(vcpu);
- if (enable_lbrv)
- svm_enable_lbrv(vcpu);
- else
- svm_disable_lbrv(vcpu);
+ /*
+ * During nested transitions, it is possible that the current VMCB has
+ * LBR_CTL set, but the previous LBR_CTL had it cleared (or vice versa).
+ * In this case, even though LBR_CTL does not need an update, intercepts
+ * do, so always recalculate the intercepts here.
+ */
+ svm_recalc_lbr_msr_intercepts(vcpu);
}
void disable_nmi_singlestep(struct vcpu_svm *svm)
--
2.51.2.1041.gc1ab5b90ca-goog
On Fri, 11 Oct 2024 22:38:53 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
>
> Currently when the length of a symbol is longer than 0x7f characters,
> its type shown in /proc/kallsyms can be incorrect.
>
> I found this issue when reading the code, but it can be reproduced by
> following steps:
>
> 1. Define a function which symbol length is 130 characters:
>
> #define X13(x) x##x##x##x##x##x##x##x##x##x##x##x##x
> static noinline void X13(x123456789)(void)
> {
> printk("hello world\n");
> }
>
> 2. The type in vmlinux is 't':
>
> $ nm vmlinux | grep x123456
> ffffffff816290f0 t x123456789x123456789x123456789x12[...]
>
> 3. Then boot the kernel, the type shown in /proc/kallsyms becomes 'g'
> instead of the expected 't':
>
> # cat /proc/kallsyms | grep x123456
> ffffffff816290f0 g x123456789x123456789x123456789x12[...]
>
> The root cause is that, after commit 73bbb94466fd ("kallsyms: support
> "big" kernel symbols"), ULEB128 was used to encode symbol name length.
> That is, for "big" kernel symbols of which name length is longer than
> 0x7f characters, the length info is encoded into 2 bytes.
>
> kallsyms_get_symbol_type() expects to read the first char of the
> symbol name which indicates the symbol type. However, due to the
> "big" symbol case not being handled, the symbol type read from
> /proc/kallsyms may be wrong, so handle it properly.
>
> Cc: stable(a)vger.kernel.org
> Fixes: 73bbb94466fd ("kallsyms: support "big" kernel symbols")
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Gary made me aware of this thread (thanks!) -- we are coming from:
https://lore.kernel.org/all/aQjua6zkEHYNVN3X@x1/
For which I sent this patch without knowing about this one:
https://lore.kernel.org/rust-for-linux/20251107050414.511648-1-ojeda@kernel…
This has been seen now by Arnaldo (Cc'ing) in a real system, so I think
we should take this one since it was first, with:
Cc: stable(a)vger.kernel.org
Thanks!
Cheers,
Miguel
css_task_iter_next() pins and returns a task, but the task can do whatever
between that and cgroup_procs_show() being called, including dying and
losing its PID. When that happens, task_pid_vnr() returns 0.
d245698d727a ("cgroup: Defer task cgroup unlink until after the task is
done switching out") makes this more likely as tasks now stay iterable with
css_task_iter_next() until the last schedule is complete, which can be
after the task has lost its PID.
Showing "0" in cgroup.procs or cgroup.threads is confusing and can lead to
surprising outcomes. For example, if a user tries to kill PID 0, it kills
all processes in the current process group.
Skip entries with PID 0 by returning SEQ_SKIP.
Cc: stable(a)vger.kernel.org
Signed-off-by: Tejun Heo <tj(a)kernel.org>
---
kernel/cgroup/cgroup.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5287,6 +5287,17 @@ static void *cgroup_procs_start(struct s
static int cgroup_procs_show(struct seq_file *s, void *v)
{
+ pid_t pid = task_pid_vnr(v);
+
+ /*
+ * css_task_iter_next() could have visited a task which has already lost
+ * its PID but is not dead yet or the task could have been unhashed
+ * since css_task_iter_next(). In such cases, $pid would be 0 here.
+ * Don't confuse userspace with it.
+ */
+ if (unlikely(!pid))
+ return SEQ_SKIP;
+
seq_printf(s, "%d\n", task_pid_vnr(v));
return 0;
}
The patch titled
Subject: mm, swap: fix potential UAF issue for VMA readahead
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-swap-fix-potential-uaf-issue-for-vma-readahead.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kairui Song <kasong(a)tencent.com>
Subject: mm, swap: fix potential UAF issue for VMA readahead
Date: Tue, 11 Nov 2025 21:36:08 +0800
Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device
pinning"), the common helper for allocating and preparing a folio in the
swap cache layer no longer tries to get a swap device reference
internally, because all callers of __read_swap_cache_async are already
holding a swap entry reference. The repeated swap device pinning isn't
needed on the same swap device.
Caller of VMA readahead is also holding a reference to the target entry's
swap device, but VMA readahead walks the page table, so it might encounter
swap entries from other devices, and call __read_swap_cache_async on
another device without holding a reference to it.
So it is possible to cause a UAF when swapoff of device A raced with
swapin on device B, and VMA readahead tries to read swap entries from
device A. It's not easy to trigger, but in theory, it could cause real
issues.
Make VMA readahead try to get the device reference first if the swap
device is a different one from the target entry.
Link: https://lkml.kernel.org/r/20251111-swap-fix-vma-uaf-v1-1-41c660e58562@tence…
Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
Suggested-by: Huang Ying <ying.huang(a)linux.alibaba.com>
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Kemeng Shi <shikemeng(a)huaweicloud.com>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swap_state.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
--- a/mm/swap_state.c~mm-swap-fix-potential-uaf-issue-for-vma-readahead
+++ a/mm/swap_state.c
@@ -748,6 +748,8 @@ static struct folio *swap_vma_readahead(
blk_start_plug(&plug);
for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
+ struct swap_info_struct *si = NULL;
+
if (!pte++) {
pte = pte_offset_map(vmf->pmd, addr);
if (!pte)
@@ -761,8 +763,19 @@ static struct folio *swap_vma_readahead(
continue;
pte_unmap(pte);
pte = NULL;
+ /*
+ * Readahead entry may come from a device that we are not
+ * holding a reference to, try to grab a reference, or skip.
+ */
+ if (swp_type(entry) != swp_type(targ_entry)) {
+ si = get_swap_device(entry);
+ if (!si)
+ continue;
+ }
folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
&page_allocated, false);
+ if (si)
+ put_swap_device(si);
if (!folio)
continue;
if (page_allocated) {
_
Patches currently in -mm which might be from kasong(a)tencent.com are
mm-swap-fix-potential-uaf-issue-for-vma-readahead.patch
mm-swap-do-not-perform-synchronous-discard-during-allocation.patch
mm-swap-rename-helper-for-setup-bad-slots.patch
mm-swap-cleanup-swap-entry-allocation-parameter.patch
mm-migrate-swap-drop-usage-of-folio_index.patch
mm-swap-remove-redundant-argument-for-isolating-a-cluster.patch
revert-mm-swap-avoid-redundant-swap-device-pinning.patch
The sysfs buffer passed to alarms_store() is allocated with 'size + 1'
bytes and a NUL terminator is appended. However, the 'size' argument
does not account for this extra byte. The original code then allocated
'size' bytes and used strcpy() to copy 'buf', which always writes one
byte past the allocated buffer since strcpy() copies until the NUL
terminator at index 'size'.
Fix this by parsing the 'buf' parameter directly using simple_strtoll()
without allocating any intermediate memory or string copying. This
removes the overflow while simplifying the code.
Cc: stable(a)vger.kernel.org
Fixes: e2c94d6f5720 ("w1_therm: adding alarm sysfs entry")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
Compile-tested only.
Changes in v4:
- Use simple_strtoll because kstrtoint also parses long long internally
- Return -ERANGE in addition to -EINVAL to match kstrtoint's behavior
- Remove any changes unrelated to fixing the buffer overflow (Krzysztof)
while maintaining the same behavior and return values as before
- Link to v3: https://lore.kernel.org/lkml/20251030155614.447905-1-thorsten.blum@linux.de…
Changes in v3:
- Add integer range check for 'temp' to match kstrtoint() behavior
- Explicitly cast 'temp' to int when calling int_to_short()
- Link to v2: https://lore.kernel.org/lkml/20251029130045.70127-2-thorsten.blum@linux.dev/
Changes in v2:
- Fix buffer overflow instead of truncating the copy using strscpy()
- Parse buffer directly using simple_strtol() as suggested by David
- Update patch subject and description
- Link to v1: https://lore.kernel.org/lkml/20251017170047.114224-2-thorsten.blum@linux.de…
---
drivers/w1/slaves/w1_therm.c | 64 ++++++++++++------------------------
1 file changed, 21 insertions(+), 43 deletions(-)
diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 9ccedb3264fb..5707fa34e804 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -1836,55 +1836,36 @@ static ssize_t alarms_store(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct therm_info info;
u8 new_config_register[3]; /* array of data to be written */
- int temp, ret;
- char *token = NULL;
+ long long temp;
+ int ret = 0;
s8 tl, th; /* 1 byte per value + temp ring order */
- char *p_args, *orig;
-
- p_args = orig = kmalloc(size, GFP_KERNEL);
- /* Safe string copys as buf is const */
- if (!p_args) {
- dev_warn(device,
- "%s: error unable to allocate memory %d\n",
- __func__, -ENOMEM);
- return size;
- }
- strcpy(p_args, buf);
-
- /* Split string using space char */
- token = strsep(&p_args, " ");
-
- if (!token) {
- dev_info(device,
- "%s: error parsing args %d\n", __func__, -EINVAL);
- goto free_m;
- }
-
- /* Convert 1st entry to int */
- ret = kstrtoint (token, 10, &temp);
+ const char *p = buf;
+ char *endp;
+
+ temp = simple_strtoll(p, &endp, 10);
+ if (p == endp || *endp != ' ')
+ ret = -EINVAL;
+ else if (temp < INT_MIN || temp > INT_MAX)
+ ret = -ERANGE;
if (ret) {
dev_info(device,
"%s: error parsing args %d\n", __func__, ret);
- goto free_m;
+ goto err;
}
tl = int_to_short(temp);
- /* Split string using space char */
- token = strsep(&p_args, " ");
- if (!token) {
- dev_info(device,
- "%s: error parsing args %d\n", __func__, -EINVAL);
- goto free_m;
- }
- /* Convert 2nd entry to int */
- ret = kstrtoint (token, 10, &temp);
+ p = endp + 1;
+ temp = simple_strtoll(p, &endp, 10);
+ if (p == endp)
+ ret = -EINVAL;
+ else if (temp < INT_MIN || temp > INT_MAX)
+ ret = -ERANGE;
if (ret) {
dev_info(device,
"%s: error parsing args %d\n", __func__, ret);
- goto free_m;
+ goto err;
}
-
/* Prepare to cast to short by eliminating out of range values */
th = int_to_short(temp);
@@ -1905,7 +1886,7 @@ static ssize_t alarms_store(struct device *device,
dev_info(device,
"%s: error reading from the slave device %d\n",
__func__, ret);
- goto free_m;
+ goto err;
}
/* Write data in the device RAM */
@@ -1913,7 +1894,7 @@ static ssize_t alarms_store(struct device *device,
dev_info(device,
"%s: Device not supported by the driver %d\n",
__func__, -ENODEV);
- goto free_m;
+ goto err;
}
ret = SLAVE_SPECIFIC_FUNC(sl)->write_data(sl, new_config_register);
@@ -1922,10 +1903,7 @@ static ssize_t alarms_store(struct device *device,
"%s: error writing to the slave device %d\n",
__func__, ret);
-free_m:
- /* free allocated memory */
- kfree(orig);
-
+err:
return size;
}
--
2.51.1
commit 44e8241c51f762aafa50ed116da68fd6ecdcc954 upstream.
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable(a)vger.kernel.org
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
arch/arm/crypto/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index c46c05548080..c5d676e7f16b 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -145,10 +145,10 @@ config CRYPTO_NHPOLY1305_NEON
depends on KERNEL_MODE_NEON
select CRYPTO_NHPOLY1305
config CRYPTO_CURVE25519_NEON
tristate "NEON accelerated Curve25519 scalar multiplication library"
- depends on KERNEL_MODE_NEON
+ depends on KERNEL_MODE_NEON && !CPU_BIG_ENDIAN
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
endif
base-commit: df70e44fa05b01476a78d0f6a210354784ff0992
--
2.51.2
commit 44e8241c51f762aafa50ed116da68fd6ecdcc954 upstream.
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable(a)vger.kernel.org
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
arch/arm/crypto/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 149a5bd6b88c..d3d318df0e38 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -164,10 +164,10 @@ config CRYPTO_NHPOLY1305_NEON
depends on KERNEL_MODE_NEON
select CRYPTO_NHPOLY1305
config CRYPTO_CURVE25519_NEON
tristate "NEON accelerated Curve25519 scalar multiplication library"
- depends on KERNEL_MODE_NEON
+ depends on KERNEL_MODE_NEON && !CPU_BIG_ENDIAN
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
endif
base-commit: cc5ec87693063acebb60f587e8a019ba9b94ae0e
--
2.51.2
commit 44e8241c51f762aafa50ed116da68fd6ecdcc954 upstream.
On big endian arm kernels, the arm optimized Curve25519 code produces
incorrect outputs and fails the Curve25519 test. This has been true
ever since this code was added.
It seems that hardly anyone (or even no one?) actually uses big endian
arm kernels. But as long as they're ostensibly supported, we should
disable this code on them so that it's not accidentally used.
Note: for future-proofing, use !CPU_BIG_ENDIAN instead of
CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could
get removed in the future if big endian support gets dropped.
Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation")
Cc: stable(a)vger.kernel.org
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
arch/arm/crypto/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 3858c4d4cb98..f6323b84631f 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -2,11 +2,11 @@
menu "Accelerated Cryptographic Algorithms for CPU (arm)"
config CRYPTO_CURVE25519_NEON
tristate "Public key crypto: Curve25519 (NEON)"
- depends on KERNEL_MODE_NEON
+ depends on KERNEL_MODE_NEON && !CPU_BIG_ENDIAN
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519
help
Curve25519 algorithm
base-commit: f6e38ae624cf7eb96fb444a8ca2d07caa8d9c8fe
--
2.51.2