- Linux-stable-mirror - lists.linaro.org

by Matthew Auld

Daniele noticed that the fix in commit 2d2be279f1ca ("drm/xe: fix UAF around queue destruction") looks to have been unintentionally removed as part of handling a conflict in some past merge commit. Add it back. Fixes: ac44ff7cec33 ("Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes") Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio(a)intel.com> Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.12+ --- drivers/gpu/drm/xe/xe_guc_submit.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index 80f748baad3f..2b61d017eeca 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -229,6 +229,17 @@ static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q) static void guc_submit_fini(struct drm_device *drm, void *arg) { struct xe_guc *guc = arg; + struct xe_device *xe = guc_to_xe(guc); + struct xe_gt *gt = guc_to_gt(guc); + int ret; + + ret = wait_event_timeout(guc->submission_state.fini_wq, + xa_empty(&guc->submission_state.exec_queue_lookup), + HZ * 5); + + drain_workqueue(xe->destroy_wq); + + xe_gt_assert(gt, ret); xa_destroy(&guc->submission_state.exec_queue_lookup); } -- 2.49.0

1 month, 1 week

2
1
0 0

[for-linus][PATCH 1/5] ftrace: Fix UAF when lookup kallsym after ftrace disabled

by Steven Rostedt

From: Ye Bin <yebin10(a)huawei.com> The following issue happens with a buggy module: BUG: unable to handle page fault for address: ffffffffc05d0218 PGD 1bd66f067 P4D 1bd66f067 PUD 1bd671067 PMD 101808067 PTE 0 Oops: Oops: 0000 [#1] SMP KASAN PTI Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS RIP: 0010:sized_strscpy+0x81/0x2f0 RSP: 0018:ffff88812d76fa08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffffffc0601010 RCX: dffffc0000000000 RDX: 0000000000000038 RSI: dffffc0000000000 RDI: ffff88812608da2d RBP: 8080808080808080 R08: ffff88812608da2d R09: ffff88812608da68 R10: ffff88812608d82d R11: ffff88812608d810 R12: 0000000000000038 R13: ffff88812608da2d R14: ffffffffc05d0218 R15: fefefefefefefeff FS: 00007fef552de740(0000) GS:ffff8884251c7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc05d0218 CR3: 00000001146f0000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ftrace_mod_get_kallsym+0x1ac/0x590 update_iter_mod+0x239/0x5b0 s_next+0x5b/0xa0 seq_read_iter+0x8c9/0x1070 seq_read+0x249/0x3b0 proc_reg_read+0x1b0/0x280 vfs_read+0x17f/0x920 ksys_read+0xf3/0x1c0 do_syscall_64+0x5f/0x2e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The above issue may happen as follows: (1) Add kprobe tracepoint; (2) insmod test.ko; (3) Module triggers ftrace disabled; (4) rmmod test.ko; (5) cat /proc/kallsyms; --> Will trigger UAF as test.ko already removed; ftrace_mod_get_kallsym() ... strscpy(module_name, mod_map->mod->name, MODULE_NAME_LEN); ... The problem is when a module triggers an issue with ftrace and sets ftrace_disable. The ftrace_disable is set when an anomaly is discovered and to prevent any more damage, ftrace stops all text modification. The issue that happened was that the ftrace_disable stops more than just the text modification. When a module is loaded, its init functions can also be traced. Because kallsyms deletes the init functions after a module has loaded, ftrace saves them when the module is loaded and function tracing is enabled. This allows the output of the function trace to show the init function names instead of just their raw memory addresses. When a module is removed, ftrace_release_mod() is called, and if ftrace_disable is set, it just returns without doing anything more. The problem here is that it leaves the mod_list still around and if kallsyms is called, it will call into this code and access the module memory that has already been freed as it will return: strscpy(module_name, mod_map->mod->name, MODULE_NAME_LEN); Where the "mod" no longer exists and triggers a UAF bug. Link: https://lore.kernel.org/all/20250523135452.626d8dcd@gandalf.local.home/ Cc: stable(a)vger.kernel.org Fixes: aba4b5c22cba ("ftrace: Save module init functions kallsyms symbols for tracing") Link: https://lore.kernel.org/20250529111955.2349189-2-yebin@huaweicloud.com Signed-off-by: Ye Bin <yebin10(a)huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/ftrace.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 1af952cba48d..84fd2f8263fa 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -7438,9 +7438,10 @@ void ftrace_release_mod(struct module *mod) mutex_lock(&ftrace_lock); - if (ftrace_disabled) - goto out_unlock; - + /* + * To avoid the UAF problem after the module is unloaded, the + * 'mod_map' resource needs to be released unconditionally. + */ list_for_each_entry_safe(mod_map, n, &ftrace_mod_maps, list) { if (mod_map->mod == mod) { list_del_rcu(&mod_map->list); @@ -7449,6 +7450,9 @@ void ftrace_release_mod(struct module *mod) } } + if (ftrace_disabled) + goto out_unlock; + /* * Each module has its own ftrace_pages, remove * them from the list. -- 2.47.2

1 month, 1 week

1
0
0 0

[PATCH 6.14 00/73] 6.14.10-rc1 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 6.14.10 release. There are 73 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed, 04 Jun 2025 13:42:20 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.14.10-rc… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.14.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 6.14.10-rc1 Namjae Jeon <linkinjeon(a)kernel.org> ksmbd: use list_first_entry_or_null for opinfo_get_list() Nishanth Menon <nm(a)ti.com> net: ethernet: ti: am65-cpsw: Lower random mac address error print to info Mark Pearson <mpearson-lenovo(a)squebb.ca> platform/x86: thinkpad_acpi: Ignore battery threshold change event notification Kailang Yang <kailang(a)realtek.com> ALSA: hda/realtek - restore auto-mute mode for Dell Chrome platform Valtteri Koskivuori <vkoskiv(a)gmail.com> platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys Trond Myklebust <trond.myklebust(a)hammerspace.com> NFS: Avoid flushing data while holding directory locks in nfs_rename() Purva Yeshi <purvayeshi550(a)gmail.com> char: tpm: tpm-buf: Add sanity check fallback in read helpers Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com> drm/xe: Save the gt pointer in lrc and drop the tile Aradhya Bhatia <aradhya.bhatia(a)intel.com> drm/xe/xe2hpg: Add Wa_22021007897 Ilya Guterman <amfernusus(a)gmail.com> nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro Alan Adamson <alan.adamson(a)oracle.com> nvme: all namespaces in a subsystem must adhere to a common atomic write size Alessandro Grassi <alessandro.grassi(a)mailbox.org> spi: spi-sun4i: fix early activation Algea Cao <algea.cao(a)rock-chips.com> phy: phy-rockchip-samsung-hdptx: Fix PHY PLL output 50.25MHz error Hal Feng <hal.feng(a)starfivetech.com> phy: starfive: jh7110-usb: Fix USB 2.0 host occasional detection failure Alan Adamson <alan.adamson(a)oracle.com> nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing Aurabindo Pillai <aurabindo.pillai(a)amd.com> drm/amd/display: check stream id dml21 wrapper to get plane_id George Shen <george.shen(a)amd.com> drm/amd/display: fix link_set_dpms_off multi-display MST corner case Markus Burri <markus.burri(a)mt.com> gpio: virtuser: fix potential out-of-bound write Damien Le Moal <dlemoal(a)kernel.org> nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq() Masahiro Yamada <masahiroy(a)kernel.org> um: let 'make clean' properly clean underlying SUBARCH as well Sami Tolvanen <samitolvanen(a)google.com> kbuild: Require pahole <v1.28 or >v1.29 with GENDWARFKSYMS on X86 John Chau <johnchau(a)0atlas.com> platform/x86: thinkpad_acpi: Support also NEC Lavie X1475JAS Jeff Layton <jlayton(a)kernel.org> nfs: don't share pNFS DS connections between net namespaces Milton Barrera <miltonjosue2001(a)gmail.com> HID: quirks: Add ADATA XPG alpha wireless mouse support Mario Limonciello <mario.limonciello(a)amd.com> HID: amd_sfh: Avoid clearing reports for SRA sensor Purva Yeshi <purvayeshi550(a)gmail.com> dmaengine: idxd: cdev: Fix uninitialized use of sva in idxd_cdev_open Christian Brauner <brauner(a)kernel.org> coredump: hand a pidfd to the usermode coredump helper Christian Brauner <brauner(a)kernel.org> coredump: fix error handling for replace_fd() Robin Murphy <robin.murphy(a)arm.com> iommu: Handle yet another race around registration Robin Murphy <robin.murphy(a)arm.com> perf/arm-cmn: Add CMN S3 ACPI binding Robin Murphy <robin.murphy(a)arm.com> perf/arm-cmn: Initialise cmn->cpu earlier Robin Murphy <robin.murphy(a)arm.com> perf/arm-cmn: Fix REQ2/SNP2 mixup Pedro Tammela <pctammela(a)mojatatu.com> net_sched: hfsc: Address reentrant enqueue adding class to eltree twice Siddharth Vadapalli <s-vadapalli(a)ti.com> arm64: dts: ti: k3-j784s4-j742s2-main-common: Fix length of serdes_ln_ctrl Siddharth Vadapalli <s-vadapalli(a)ti.com> arm64: dts: ti: k3-j722s-main: Disable "serdes_wiz0" and "serdes_wiz1" Siddharth Vadapalli <s-vadapalli(a)ti.com> arm64: dts: ti: k3-j722s-evm: Enable "serdes_wiz0" and "serdes_wiz1" Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-j721e-sk: Add requiried voltage supplies for IMX219 Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-j721e-sk: Remove clock-names property from IMX219 overlay Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-j721e-sk: Add DT nodes for power regulators Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-am68-sk: Fix regulator hierarchy Judith Mendez <jm(a)ti.com> arm64: dts: ti: k3-am65-main: Add missing taps to sdhci0 Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-am62x: Rename I2C switch to I2C mux in OV5640 overlay Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-am62x: Rename I2C switch to I2C mux in IMX219 overlay Yemike Abhilash Chandra <y-abhilashchandra(a)ti.com> arm64: dts: ti: k3-am62x: Remove clock-names property from IMX219 overlay Judith Mendez <jm(a)ti.com> arm64: dts: ti: k3-am62p-j722s-common-main: Set eMMC clock parent to default Judith Mendez <jm(a)ti.com> arm64: dts: ti: k3-am62a-main: Set eMMC clock parent to default Judith Mendez <jm(a)ti.com> arm64: dts: ti: k3-am62-main: Set eMMC clock parent to default Abel Vesa <abel.vesa(a)linaro.org> arm64: dts: qcom: x1e80100: Fix PCIe 3rd controller DBI size Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100: Add GPU cooling Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100: Apply consistent critical thermal shutdown Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100: Fix video thermal zone Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-yoga-slim7x: mark l12b and l15b always-on Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-qcp: mark l12b and l15b always-on Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100-qcp: Fix vreg_l2j_1p2 voltage Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100-lenovo-yoga-slim7x: Fix vreg_l2j_1p2 voltage Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-hp-x14: mark l12b and l15b always-on Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100-hp-omnibook-x14: Fix vreg_l2j_1p2 voltage Juerg Haefliger <juerg.haefliger(a)canonical.com> arm64: dts: qcom: x1e80100-hp-omnibook-x14: Enable SMB2360 0 and 1 Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e80100-dell-xps13-9345: mark l12b and l15b always-on Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e80100-asus-vivobook-s15: Fix vreg_l2j_1p2 voltage Johan Hovold <johan+linaro(a)kernel.org> arm64: dts: qcom: x1e001de-devkit: mark l12b and l15b always-on Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: x1e001de-devkit: Fix vreg_l2j_1p2 voltage Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: sm8650: Add missing properties for cryptobam Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: sm8550: Add missing properties for cryptobam Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: sm8450: Add missing properties for cryptobam Alok Tiwari <alok.a.tiwari(a)oracle.com> arm64: dts: qcom: sm8350: Fix typo in pil_camera_mem node Karthik Sanagavarapu <quic_kartsana(a)quicinc.com> arm64: dts: qcom: sa8775p: Remove cdsp compute-cb@10 Ling Xu <quic_lxu5(a)quicinc.com> arm64: dts: qcom: sa8775p: Remove extra entries from the iommus property Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: sa8775p: Add missing properties for cryptobam Stephan Gerhold <stephan.gerhold(a)linaro.org> arm64: dts: qcom: ipq9574: Add missing properties for cryptobam Lukasz Czechowski <lukasz.czechowski(a)thaumatec.com> arm64: dts: rockchip: fix internal USB hub instability on RK3399 Puma Niravkumar L Rabara <niravkumar.l.rabara(a)intel.com> arm64: dts: socfpga: agilex5: fix gpio0 address Axel Forsman <axfo(a)kvaser.com> can: kvaser_pciefd: Force IRQ edge in case of nested IRQ ------------- Diffstat: Makefile | 4 +- arch/arm64/boot/dts/intel/socfpga_agilex5.dtsi | 4 +- arch/arm64/boot/dts/qcom/ipq9574.dtsi | 2 + arch/arm64/boot/dts/qcom/sa8775p.dtsi | 248 ++--------------- arch/arm64/boot/dts/qcom/sm8350.dtsi | 2 +- arch/arm64/boot/dts/qcom/sm8450.dtsi | 2 + arch/arm64/boot/dts/qcom/sm8550.dtsi | 2 + arch/arm64/boot/dts/qcom/sm8650.dtsi | 2 + arch/arm64/boot/dts/qcom/x1e001de-devkit.dts | 6 +- .../boot/dts/qcom/x1e80100-asus-vivobook-s15.dts | 4 +- .../boot/dts/qcom/x1e80100-dell-xps13-9345.dts | 2 + .../boot/dts/qcom/x1e80100-hp-omnibook-x14.dts | 14 +- .../boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts | 7 +- arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 6 +- arch/arm64/boot/dts/qcom/x1e80100.dtsi | 309 +++++++++++---------- arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 40 ++- arch/arm64/boot/dts/ti/k3-am62-main.dtsi | 2 - arch/arm64/boot/dts/ti/k3-am62a-main.dtsi | 2 - .../boot/dts/ti/k3-am62p-j722s-common-main.dtsi | 2 - .../arm64/boot/dts/ti/k3-am62x-sk-csi2-imx219.dtso | 3 +- .../arm64/boot/dts/ti/k3-am62x-sk-csi2-ov5640.dtso | 2 +- .../boot/dts/ti/k3-am62x-sk-csi2-tevi-ov5640.dtso | 2 +- arch/arm64/boot/dts/ti/k3-am65-main.dtsi | 2 + arch/arm64/boot/dts/ti/k3-am68-sk-base-board.dts | 13 +- .../boot/dts/ti/k3-j721e-sk-csi2-dual-imx219.dtso | 35 ++- arch/arm64/boot/dts/ti/k3-j721e-sk.dts | 31 +++ arch/arm64/boot/dts/ti/k3-j722s-evm.dts | 8 + arch/arm64/boot/dts/ti/k3-j722s-main.dtsi | 4 + .../boot/dts/ti/k3-j784s4-j742s2-main-common.dtsi | 2 +- arch/um/Makefile | 1 + drivers/char/tpm/tpm-buf.c | 6 +- drivers/dma/idxd/cdev.c | 4 +- drivers/gpio/gpio-virtuser.c | 12 +- .../dc/dml2/dml21/dml21_translation_helper.c | 20 +- drivers/gpu/drm/amd/display/dc/link/link_dpms.c | 13 +- drivers/gpu/drm/xe/regs/xe_gt_regs.h | 1 + drivers/gpu/drm/xe/xe_lrc.c | 4 +- drivers/gpu/drm/xe/xe_lrc_types.h | 4 +- drivers/gpu/drm/xe/xe_wa.c | 4 + drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_init.c | 5 + drivers/hid/hid-ids.h | 4 + drivers/hid/hid-quirks.c | 2 + drivers/iommu/iommu.c | 26 +- drivers/net/can/kvaser_pciefd.c | 83 +++--- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +- drivers/nvme/host/core.c | 30 +- drivers/nvme/host/multipath.c | 3 +- drivers/nvme/host/nvme.h | 3 +- drivers/nvme/host/pci.c | 2 + drivers/nvme/target/pci-epf.c | 10 +- drivers/perf/arm-cmn.c | 11 +- drivers/phy/rockchip/phy-rockchip-samsung-hdptx.c | 2 + drivers/phy/starfive/phy-jh7110-usb.c | 7 + drivers/platform/x86/fujitsu-laptop.c | 33 ++- drivers/platform/x86/thinkpad_acpi.c | 7 + drivers/spi/spi-sun4i.c | 5 +- fs/coredump.c | 65 ++++- fs/nfs/client.c | 2 + fs/nfs/dir.c | 15 +- fs/nfs/filelayout/filelayoutdev.c | 6 +- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 +- fs/nfs/pnfs.h | 4 +- fs/nfs/pnfs_nfs.c | 9 +- fs/smb/server/oplock.c | 7 +- include/linux/coredump.h | 1 + include/linux/iommu.h | 2 + include/linux/nfs_fs_sb.h | 12 +- kernel/module/Kconfig | 5 + net/sched/sch_hfsc.c | 9 +- sound/pci/hda/patch_realtek.c | 5 +- 70 files changed, 674 insertions(+), 540 deletions(-)

1 month, 1 week

8
80
0 0

[PATCH v3] mm: userfaultfd: fix race of userfaultfd_move and swap cache

by Kairui Song

From: Kairui Song <kasong(a)tencent.com> On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache lookup, and try to move the found folio to the faulting vma when. Currently, it relies on the PTE value check to ensure the moved folio still belongs to the src swap entry, which turns out is not reliable. While working and reviewing the swap table series with Barry, following existing race is observed and reproduced [1]: ( move_pages_pte is moving src_pte to dst_pte, where src_pte is a swap entry PTE holding swap entry S1, and S1 isn't in the swap cache.) CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Here it got entry = S1 ... < Somehow interrupted> ... <swapin src_pte, alloc and use folio A> // folio A is just a new allocated folio // and get installed into src_pte <frees swap entry S1> // src_pte now points to folio A, S1 // has swap count == 0, it can be freed // by folio_swap_swap or swap // allocator's reclaim. <try to swap out another folio B> // folio B is a folio in another VMA. <put folio B to swap cache using S1 > // S1 is freed, folio B could use it // for swap out with no problem. ... folio = filemap_get_folio(S1) // Got folio B here !!! ... < Somehow interrupted again> ... <swapin folio B and free S1> // Now S1 is free to be used again. <swapout src_pte & folio A using S1> // Now src_pte is a swap entry pte // holding S1 again. folio_trylock(folio) move_swap_pte double_pt_lock is_pte_pages_stable // Check passed because src_pte == S1 folio_move_anon_rmap(...) // Moved invalid folio B here !!! The race window is very short and requires multiple collisions of multiple rare events, so it's very unlikely to happen, but with a deliberately constructed reproducer and increased time window, it can be reproduced [1]. It's also possible that folio (A) is swapped in, and swapped out again after the filemap_get_folio lookup, in such case folio (A) may stay in swap cache so it needs to be moved too. In this case we should also try again so kernel won't miss a folio move. Fix this by checking if the folio is the valid swap cache folio after acquiring the folio lock, and checking the swap cache again after acquiring the src_pte lock. SWP_SYNCRHONIZE_IO path does make the problem more complex, but so far we don't need to worry about that since folios only might get exposed to swap cache in the swap out path, and it's covered in this patch too by checking the swap cache again after acquiring src_pte lock. Testing with a simple C program to allocate and move several GB of memory did not show any observable performance change. Cc: <stable(a)vger.kernel.org> Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoS… [1] Signed-off-by: Kairui Song <kasong(a)tencent.com> --- V1: https://lore.kernel.org/linux-mm/20250530201710.81365-1-ryncsn@gmail.com/ Changes: - Check swap_map instead of doing a filemap lookup after acquiring the PTE lock to minimize critical section overhead [ Barry Song, Lokesh Gidra ] V2: https://lore.kernel.org/linux-mm/20250601200108.23186-1-ryncsn@gmail.com/ Changes: - Move the folio and swap check inside move_swap_pte to avoid skipping the check and potential overhead [ Lokesh Gidra ] - Add a READ_ONCE for the swap_map read to ensure it reads a up to dated value. mm/userfaultfd.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index bc473ad21202..5dc05346e360 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1084,8 +1084,18 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, pte_t orig_dst_pte, pte_t orig_src_pte, pmd_t *dst_pmd, pmd_t dst_pmdval, spinlock_t *dst_ptl, spinlock_t *src_ptl, - struct folio *src_folio) + struct folio *src_folio, + struct swap_info_struct *si, swp_entry_t entry) { + /* + * Check if the folio still belongs to the target swap entry after + * acquiring the lock. Folio can be freed in the swap cache while + * not locked. + */ + if (src_folio && unlikely(!folio_test_swapcache(src_folio) || + entry.val != src_folio->swap.val)) + return -EAGAIN; + double_pt_lock(dst_ptl, src_ptl); if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, @@ -1102,6 +1112,15 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, if (src_folio) { folio_move_anon_rmap(src_folio, dst_vma); src_folio->index = linear_page_index(dst_vma, dst_addr); + } else { + /* + * Check if the swap entry is cached after acquiring the src_pte + * lock. Or we might miss a new loaded swap cache folio. + */ + if (READ_ONCE(si->swap_map[swp_offset(entry)]) & SWAP_HAS_CACHE) { + double_pt_unlock(dst_ptl, src_ptl); + return -EAGAIN; + } } orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte); @@ -1412,7 +1431,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, } err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte, orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval, - dst_ptl, src_ptl, src_folio); + dst_ptl, src_ptl, src_folio, si, entry); } out: -- 2.49.0

1 month, 1 week

4
6
0 0

[tip: x86/urgent] x86/iopl: Cure TIF_IO_BITMAP inconsistencies

by tip-bot2 for Thomas Gleixner

The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 8b68e978718f14fdcb080c2a7791c52a0d09bc6d Gitweb: https://git.kernel.org/tip/8b68e978718f14fdcb080c2a7791c52a0d09bc6d Author: Thomas Gleixner <tglx(a)linutronix.de> AuthorDate: Wed, 26 Feb 2025 16:01:57 +01:00 Committer: Borislav Petkov (AMD) <bp(a)alien8.de> CommitterDate: Tue, 03 Jun 2025 15:56:39 +02:00 x86/iopl: Cure TIF_IO_BITMAP inconsistencies io_bitmap_exit() is invoked from exit_thread() when a task exists or when a fork fails. In the latter case the exit_thread() cleans up resources which were allocated during fork(). io_bitmap_exit() invokes task_update_io_bitmap(), which in turn ends up in tss_update_io_bitmap(). tss_update_io_bitmap() operates on the current task. If current has TIF_IO_BITMAP set, but no bitmap installed, tss_update_io_bitmap() crashes with a NULL pointer dereference. There are two issues, which lead to that problem: 1) io_bitmap_exit() should not invoke task_update_io_bitmap() when the task, which is cleaned up, is not the current task. That's a clear indicator for a cleanup after a failed fork(). 2) A task should not have TIF_IO_BITMAP set and neither a bitmap installed nor IOPL emulation level 3 activated. This happens when a kernel thread is created in the context of a user space thread, which has TIF_IO_BITMAP set as the thread flags are copied and the IO bitmap pointer is cleared. Other than in the failed fork() case this has no impact because kernel threads including IO workers never return to user space and therefore never invoke tss_update_io_bitmap(). Cure this by adding the missing cleanups and checks: 1) Prevent io_bitmap_exit() to invoke task_update_io_bitmap() if the to be cleaned up task is not the current task. 2) Clear TIF_IO_BITMAP in copy_thread() unconditionally. For user space forks it is set later, when the IO bitmap is inherited in io_bitmap_share(). For paranoia sake, add a warning into tss_update_io_bitmap() to catch the case, when that code is invoked with inconsistent state. Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped") Reported-by: syzbot+e2b1803445d236442e54(a)syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/87wmdceom2.ffs@tglx --- arch/x86/kernel/ioport.c | 13 +++++++++---- arch/x86/kernel/process.c | 6 ++++++ 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c index 6290dd1..ff40f09 100644 --- a/arch/x86/kernel/ioport.c +++ b/arch/x86/kernel/ioport.c @@ -33,8 +33,9 @@ void io_bitmap_share(struct task_struct *tsk) set_tsk_thread_flag(tsk, TIF_IO_BITMAP); } -static void task_update_io_bitmap(struct task_struct *tsk) +static void task_update_io_bitmap(void) { + struct task_struct *tsk = current; struct thread_struct *t = &tsk->thread; if (t->iopl_emul == 3 || t->io_bitmap) { @@ -54,7 +55,12 @@ void io_bitmap_exit(struct task_struct *tsk) struct io_bitmap *iobm = tsk->thread.io_bitmap; tsk->thread.io_bitmap = NULL; - task_update_io_bitmap(tsk); + /* + * Don't touch the TSS when invoked on a failed fork(). TSS + * reflects the state of @current and not the state of @tsk. + */ + if (tsk == current) + task_update_io_bitmap(); if (iobm && refcount_dec_and_test(&iobm->refcnt)) kfree(iobm); } @@ -192,8 +198,7 @@ SYSCALL_DEFINE1(iopl, unsigned int, level) } t->iopl_emul = level; - task_update_io_bitmap(current); - + task_update_io_bitmap(); return 0; } diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c1d2dac..704883c 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -176,6 +176,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->ret_addr = (unsigned long) ret_from_fork_asm; p->thread.sp = (unsigned long) fork_frame; p->thread.io_bitmap = NULL; + clear_tsk_thread_flag(p, TIF_IO_BITMAP); p->thread.iopl_warn = 0; memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps)); @@ -464,6 +465,11 @@ void native_tss_update_io_bitmap(void) } else { struct io_bitmap *iobm = t->io_bitmap; + if (WARN_ON_ONCE(!iobm)) { + clear_thread_flag(TIF_IO_BITMAP); + native_tss_invalidate_io_bitmap(); + } + /* * Only copy bitmap data when the sequence number differs. The * update time is accounted to the incoming task.

1 month, 1 week

1
0
0 0

[PATCH v2 1/2] tty: Register device *after* creating the cdev for a tty

by Max Staudt

This change makes the tty device file available only after the tty's backing character device is ready. Since 6a7e6f78c235975cc14d4e141fa088afffe7062c, the class device is registered before the cdev is created, and userspace may pick it up, yet open() will fail because the backing cdev doesn't exist yet. Userspace is racing the bottom half of tty_register_device_attr() here, specifically the call to tty_cdev_add(). dev_set_uevent_suppress() was used to work around this, but this fails on embedded systems that rely on bare devtmpfs rather than udev. On such systems, the device file is created as part of device_add(), and userspace can pick it up via inotify, irrespective of uevent suppression. So let's undo the existing patch, and create the cdev first, and only afterwards register the class device in the kernel's device tree. However, this restores the original race of the cdev existing before the class device is registered, and an attempt to tty_[k]open() the chardev between these two steps will lead to tty->dev being assigned NULL by alloc_tty_struct(). This will be addressed in a second patch. Fixes: 6a7e6f78c235 ("tty: close race between device register and open") Signed-off-by: Max Staudt <max(a)enpas.org> Cc: <stable(a)vger.kernel.org> --- drivers/tty/tty_io.c | 54 +++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index ca9b7d7bad2b..e922b84524d2 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -3245,6 +3245,7 @@ struct device *tty_register_device_attr(struct tty_driver *driver, struct ktermios *tp; struct device *dev; int retval; + bool cdev_added = false; if (index >= driver->num) { pr_err("%s: Attempt to register invalid tty line number (%d)\n", @@ -3257,24 +3258,6 @@ struct device *tty_register_device_attr(struct tty_driver *driver, else tty_line_name(driver, index, name); - dev = kzalloc(sizeof(*dev), GFP_KERNEL); - if (!dev) - return ERR_PTR(-ENOMEM); - - dev->devt = devt; - dev->class = &tty_class; - dev->parent = device; - dev->release = tty_device_create_release; - dev_set_name(dev, "%s", name); - dev->groups = attr_grp; - dev_set_drvdata(dev, drvdata); - - dev_set_uevent_suppress(dev, 1); - - retval = device_register(dev); - if (retval) - goto err_put; - if (!(driver->flags & TTY_DRIVER_DYNAMIC_ALLOC)) { /* * Free any saved termios data so that the termios state is @@ -3288,19 +3271,44 @@ struct device *tty_register_device_attr(struct tty_driver *driver, retval = tty_cdev_add(driver, devt, index, 1); if (retval) - goto err_del; + return ERR_PTR(retval); + + cdev_added = true; + } + + dev = kzalloc(sizeof(*dev), GFP_KERNEL); + if (!dev) { + retval = -ENOMEM; + goto err_del_cdev; } - dev_set_uevent_suppress(dev, 0); - kobject_uevent(&dev->kobj, KOBJ_ADD); + dev->devt = devt; + dev->class = &tty_class; + dev->parent = device; + dev->release = tty_device_create_release; + dev_set_name(dev, "%s", name); + dev->groups = attr_grp; + dev_set_drvdata(dev, drvdata); + + retval = device_register(dev); + if (retval) + goto err_put; return dev; -err_del: - device_del(dev); err_put: + /* + * device_register() calls device_add(), after which + * we must use put_device() instead of kfree(). + */ put_device(dev); +err_del_cdev: + if (cdev_added) { + cdev_del(driver->cdevs[index]); + driver->cdevs[index] = NULL; + } + return ERR_PTR(retval); } EXPORT_SYMBOL_GPL(tty_register_device_attr); -- 2.39.5

1 month, 1 week

4
5
0 0

[PATCH v2 4/7] LoongArch: KVM: Check validation of num_cpu from user space

by Bibo Mao

The maximum supported cpu number is EIOINTC_ROUTE_MAX_VCPUS about irqchip eiointc, here add validation about cpu number to avoid array pointer overflow. Cc: stable(a)vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- arch/loongarch/kvm/intc/eiointc.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index b48511f903b5..ed80bf290755 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -798,7 +798,7 @@ static int kvm_eiointc_ctrl_access(struct kvm_device *dev, int ret = 0; unsigned long flags; unsigned long type = (unsigned long)attr->attr; - u32 i, start_irq; + u32 i, start_irq, val; void __user *data; struct loongarch_eiointc *s = dev->kvm->arch.eiointc; @@ -806,7 +806,12 @@ static int kvm_eiointc_ctrl_access(struct kvm_device *dev, spin_lock_irqsave(&s->lock, flags); switch (type) { case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_NUM_CPU: - if (copy_from_user(&s->num_cpu, data, 4)) + if (copy_from_user(&val, data, 4) == 0) { + if (val < EIOINTC_ROUTE_MAX_VCPUS) + s->num_cpu = val; + else + ret = -EINVAL; + } else ret = -EFAULT; break; case KVM_DEV_LOONGARCH_EXTIOI_CTRL_INIT_FEATURE: @@ -835,7 +840,7 @@ static int kvm_eiointc_regs_access(struct kvm_device *dev, struct kvm_device_attr *attr, bool is_write) { - int addr, cpuid, offset, ret = 0; + int addr, cpu, offset, ret = 0; unsigned long flags; void *p = NULL; void __user *data; @@ -843,7 +848,7 @@ static int kvm_eiointc_regs_access(struct kvm_device *dev, s = dev->kvm->arch.eiointc; addr = attr->attr; - cpuid = addr >> 16; + cpu = addr >> 16; addr &= 0xffff; data = (void __user *)attr->addr; switch (addr) { @@ -868,8 +873,11 @@ static int kvm_eiointc_regs_access(struct kvm_device *dev, p = &s->isr.reg_u32[offset]; break; case EIOINTC_COREISR_START ... EIOINTC_COREISR_END: + if (cpu >= s->num_cpu) + return -EINVAL; + offset = (addr - EIOINTC_COREISR_START) / 4; - p = &s->coreisr.reg_u32[cpuid][offset]; + p = &s->coreisr.reg_u32[cpu][offset]; break; case EIOINTC_COREMAP_START ... EIOINTC_COREMAP_END: offset = (addr - EIOINTC_COREMAP_START) / 4; -- 2.39.3

1 month, 1 week

1
0
0 0

[PATCH v2 3/7] LoongArch: KVM: Disable update property num_cpu and feature with eiointc

by Bibo Mao

Property num_cpu and feature is read-only once eiointc is created, which is set with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL attr group before device creation. Attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS is to update register and software state for migration and reset usage, property num_cpu and feature can not be update again if it is created already. Here discard write operation with property num_cpu and feature in attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL. Cc: stable(a)vger.kernel.org Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions") Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- arch/loongarch/kvm/intc/eiointc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index 0b648c56b0c3..b48511f903b5 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -910,9 +910,22 @@ static int kvm_eiointc_sw_status_access(struct kvm_device *dev, data = (void __user *)attr->addr; switch (addr) { case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_NUM_CPU: + /* + * Property num_cpu and feature is read-only once eiointc is + * created with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL group API + * + * Disable writing with KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS + * group API + */ + if (is_write) + return ret; + p = &s->num_cpu; break; case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_FEATURE: + if (is_write) + return ret; + p = &s->features; break; case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_STATE: -- 2.39.3

1 month, 1 week

1
0
0 0

[PATCH v2 2/7] LoongArch: KVM: Check interrupt route from physical cpu with eiointc

by Bibo Mao

With eiointc interrupt controller, physical cpu id is set for irq route. However function kvm_get_vcpu() is used to get destination vCPU when delivering irq. With API kvm_get_vcpu(), logical cpu is used. With API kvm_get_vcpu_by_cpuid(), vCPU can be searched from physical cpu id. Cc: stable(a)vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- arch/loongarch/kvm/intc/eiointc.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index d2c521b0e923..0b648c56b0c3 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -9,7 +9,8 @@ static void eiointc_set_sw_coreisr(struct loongarch_eiointc *s) { - int ipnum, cpu, irq_index, irq_mask, irq; + int ipnum, cpu, irq_index, irq_mask, irq, cpuid; + struct kvm_vcpu *vcpu; for (irq = 0; irq < EIOINTC_IRQS; irq++) { ipnum = s->ipmap.reg_u8[irq / 32]; @@ -20,7 +21,12 @@ static void eiointc_set_sw_coreisr(struct loongarch_eiointc *s) irq_index = irq / 32; irq_mask = BIT(irq & 0x1f); - cpu = s->coremap.reg_u8[irq]; + cpuid = s->coremap.reg_u8[irq]; + vcpu = kvm_get_vcpu_by_cpuid(s->kvm, cpuid); + if (vcpu == NULL) + continue; + + cpu = vcpu->vcpu_id; if (!!(s->coreisr.reg_u32[cpu][irq_index] & irq_mask)) set_bit(irq, s->sw_coreisr[cpu][ipnum]); else @@ -68,17 +74,23 @@ static void eiointc_update_irq(struct loongarch_eiointc *s, int irq, int level) static inline void eiointc_update_sw_coremap(struct loongarch_eiointc *s, int irq, u64 val, u32 len, bool notify) { - int i, cpu; + int i, cpu, cpuid; + struct kvm_vcpu *vcpu; for (i = 0; i < len; i++) { - cpu = val & 0xff; + cpuid = val & 0xff; val = val >> 8; if (!(s->status & BIT(EIOINTC_ENABLE_CPU_ENCODE))) { - cpu = ffs(cpu) - 1; - cpu = (cpu >= 4) ? 0 : cpu; + cpuid = ffs(cpuid) - 1; + cpuid = (cpuid >= 4) ? 0 : cpuid; } + vcpu = kvm_get_vcpu_by_cpuid(s->kvm, cpuid); + if (vcpu == NULL) + continue; + + cpu = vcpu->vcpu_id; if (s->sw_coremap[irq + i] == cpu) continue; -- 2.39.3

1 month, 1 week

1
0
0 0

[PATCH v2 1/7] LoongArch: KVM: Fix interrupt route update with eiointc

by Bibo Mao

With function eiointc_update_sw_coremap(), there is forced assignment like val = *(u64 *)pvalue. Parameter pvalue may be pointer to char type or others, there is problem with forced assignment with u64 type. Here the detailed value is passed rather address pointer. Cc: stable(a)vger.kernel.org Fixes: 3956a52bc05b ("LoongArch: KVM: Add EIOINTC read and write functions") Signed-off-by: Bibo Mao <maobibo(a)loongson.cn> --- arch/loongarch/kvm/intc/eiointc.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c index f39929d7bf8a..d2c521b0e923 100644 --- a/arch/loongarch/kvm/intc/eiointc.c +++ b/arch/loongarch/kvm/intc/eiointc.c @@ -66,10 +66,9 @@ static void eiointc_update_irq(struct loongarch_eiointc *s, int irq, int level) } static inline void eiointc_update_sw_coremap(struct loongarch_eiointc *s, - int irq, void *pvalue, u32 len, bool notify) + int irq, u64 val, u32 len, bool notify) { int i, cpu; - u64 val = *(u64 *)pvalue; for (i = 0; i < len; i++) { cpu = val & 0xff; @@ -398,7 +397,7 @@ static int loongarch_eiointc_writeb(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq; s->coremap.reg_u8[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -484,7 +483,7 @@ static int loongarch_eiointc_writew(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq >> 1; s->coremap.reg_u16[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -570,7 +569,7 @@ static int loongarch_eiointc_writel(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq >> 2; s->coremap.reg_u32[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -656,7 +655,7 @@ static int loongarch_eiointc_writeq(struct kvm_vcpu *vcpu, irq = offset - EIOINTC_COREMAP_START; index = irq >> 3; s->coremap.reg_u64[index] = data; - eiointc_update_sw_coremap(s, irq, (void *)&data, sizeof(data), true); + eiointc_update_sw_coremap(s, irq, data, sizeof(data), true); break; default: ret = -EINVAL; @@ -809,7 +808,7 @@ static int kvm_eiointc_ctrl_access(struct kvm_device *dev, for (i = 0; i < (EIOINTC_IRQS / 4); i++) { start_irq = i * 4; eiointc_update_sw_coremap(s, start_irq, - (void *)&s->coremap.reg_u32[i], sizeof(u32), false); + s->coremap.reg_u32[i], sizeof(u32), false); } break; default: -- 2.39.3

1 month, 1 week

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror