- Linux-stable-mirror - lists.linaro.org

[stable 6.12+] x86/pkeys: Simplify PKRU update in signal frame

by Ben Hutchings

Hi stable maintainers, Please apply commit d1e420772cd1 ("x86/pkeys: Simplify PKRU update in signal frame") to the stable branches for 6.12 and later. This fixes a regression introduced in 6.13 by commit ae6012d72fa6 ("x86/pkeys: Ensure updated PKRU value is XRSTOR'd"), which was also backported in 6.12.5. Ben. -- Ben Hutchings 73.46% of all statistics are made up.

5 days, 22 hours

2
3
0 0

[PATCH] x86/fpu: Delay instruction pointer fixup until after after warning

by Dave Hansen

From: Dave Hansen <dave.hansen(a)linux.intel.com> Right now, if XRSTOR fails a console message like this is be printed: Bad FPU state detected at restore_fpregs_from_fpstate+0x9a/0x170, reinitializing FPU registers. However, the text location (...+0x9a in this case) is the instruction *AFTER* the XRSTOR. The highlighted instruction in the "Code:" dump also points one instruction late. The reason is that the "fixup" moves RIP up to pass the bad XRSTOR and keep on running after returning from the #GP handler. But it does this fixup before warning. The resulting warning output is nonsensical because it looks like e non-FPU-related instruction is #GP'ing. Do not fix up RIP until after printing the warning. Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Fixes: d5c8028b4788 ("x86/fpu: Reinitialize FPU registers if restoring FPU state fails") Cc: stable(a)vger.kernel.org Cc: Eric Biggers <ebiggers(a)google.com> Cc: Rik van Riel <riel(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Chang S. Bae <chang.seok.bae(a)intel.com> --- b/arch/x86/mm/extable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN arch/x86/mm/extable.c~fixup-fpu-gp-ip-later arch/x86/mm/extable.c --- a/arch/x86/mm/extable.c~fixup-fpu-gp-ip-later 2025-06-18 12:21:30.231719499 -0700 +++ b/arch/x86/mm/extable.c 2025-06-18 12:25:53.979954060 -0700 @@ -122,11 +122,11 @@ static bool ex_handler_sgx(const struct static bool ex_handler_fprestore(const struct exception_table_entry *fixup, struct pt_regs *regs) { - regs->ip = ex_fixup_addr(fixup); - WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.", (void *)instruction_pointer(regs)); + regs->ip = ex_fixup_addr(fixup); + fpu_reset_from_exception_fixup(); return true; } _

6 days

4
4
0 0

[tip: x86/urgent] x86/traps: Initialize DR6 by writing its architectural reset value

by tip-bot2 for Xin Li (Intel)

The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 5f465c148c61e876b6d6eacd8e8e365f2d47758f Gitweb: https://git.kernel.org/tip/5f465c148c61e876b6d6eacd8e8e365f2d47758f Author: Xin Li (Intel) <xin(a)zytor.com> AuthorDate: Fri, 20 Jun 2025 16:15:03 -07:00 Committer: Dave Hansen <dave.hansen(a)linux.intel.com> CommitterDate: Tue, 24 Jun 2025 13:15:51 -07:00 x86/traps: Initialize DR6 by writing its architectural reset value Initialize DR6 by writing its architectural reset value to avoid incorrectly zeroing DR6 to clear DR6.BLD at boot time, which leads to a false bus lock detected warning. The Intel SDM says: 1) Certain debug exceptions may clear bits 0-3 of DR6. 2) BLD induced #DB clears DR6.BLD and any other debug exception doesn't modify DR6.BLD. 3) RTM induced #DB clears DR6.RTM and any other debug exception sets DR6.RTM. To avoid confusion in identifying debug exceptions, debug handlers should set DR6.BLD and DR6.RTM, and clear other DR6 bits before returning. The DR6 architectural reset value 0xFFFF0FF0, already defined as macro DR6_RESERVED, satisfies these requirements, so just use it to reinitialize DR6 whenever needed. Since clear_all_debug_regs() no longer zeros all debug registers, rename it to initialize_debug_regs() to better reflect its current behavior. Since debug_read_clear_dr6() no longer clears DR6, rename it to debug_read_reset_dr6() to better reflect its current behavior. Fixes: ebb1064e7c2e9 ("x86/traps: Handle #DB for bus lock") Reported-by: Sohil Mehta <sohil.mehta(a)intel.com> Suggested-by: H. Peter Anvin (Intel) <hpa(a)zytor.com> Signed-off-by: Xin Li (Intel) <xin(a)zytor.com> Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta(a)intel.com> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Tested-by: Sohil Mehta <sohil.mehta(a)intel.com> Link: https://lore.kernel.org/lkml/06e68373-a92b-472e-8fd9-ba548119770c@intel.com/ Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-2-xin%40zytor.com --- arch/x86/include/uapi/asm/debugreg.h | 21 ++++++++++++++++- arch/x86/kernel/cpu/common.c | 24 +++++++------------ arch/x86/kernel/traps.c | 34 ++++++++++++++++----------- 3 files changed, 51 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h index 0007ba0..41da492 100644 --- a/arch/x86/include/uapi/asm/debugreg.h +++ b/arch/x86/include/uapi/asm/debugreg.h @@ -15,7 +15,26 @@ which debugging register was responsible for the trap. The other bits are either reserved or not of interest to us. */ -/* Define reserved bits in DR6 which are always set to 1 */ +/* + * Define bits in DR6 which are set to 1 by default. + * + * This is also the DR6 architectural value following Power-up, Reset or INIT. + * + * Note, with the introduction of Bus Lock Detection (BLD) and Restricted + * Transactional Memory (RTM), the DR6 register has been modified: + * + * 1) BLD flag (bit 11) is no longer reserved to 1 if the CPU supports + * Bus Lock Detection. The assertion of a bus lock could clear it. + * + * 2) RTM flag (bit 16) is no longer reserved to 1 if the CPU supports + * restricted transactional memory. #DB occurred inside an RTM region + * could clear it. + * + * Apparently, DR6.BLD and DR6.RTM are active low bits. + * + * As a result, DR6_RESERVED is an incorrect name now, but it is kept for + * compatibility. + */ #define DR6_RESERVED (0xFFFF0FF0) #define DR_TRAP0 (0x1) /* db0 */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8feb8fd..0f6c280 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2243,20 +2243,16 @@ EXPORT_PER_CPU_SYMBOL(__stack_chk_guard); #endif #endif -/* - * Clear all 6 debug registers: - */ -static void clear_all_debug_regs(void) +static void initialize_debug_regs(void) { - int i; - - for (i = 0; i < 8; i++) { - /* Ignore db4, db5 */ - if ((i == 4) || (i == 5)) - continue; - - set_debugreg(0, i); - } + /* Control register first -- to make sure everything is disabled. */ + set_debugreg(0, 7); + set_debugreg(DR6_RESERVED, 6); + /* dr5 and dr4 don't exist */ + set_debugreg(0, 3); + set_debugreg(0, 2); + set_debugreg(0, 1); + set_debugreg(0, 0); } #ifdef CONFIG_KGDB @@ -2417,7 +2413,7 @@ void cpu_init(void) load_mm_ldt(&init_mm); - clear_all_debug_regs(); + initialize_debug_regs(); dbg_restore_debug_regs(); doublefault_init_cpu_tss(); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c5c897a..36354b4 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -1022,24 +1022,32 @@ static bool is_sysenter_singlestep(struct pt_regs *regs) #endif } -static __always_inline unsigned long debug_read_clear_dr6(void) +static __always_inline unsigned long debug_read_reset_dr6(void) { unsigned long dr6; + get_debugreg(dr6, 6); + dr6 ^= DR6_RESERVED; /* Flip to positive polarity */ + /* * The Intel SDM says: * - * Certain debug exceptions may clear bits 0-3. The remaining - * contents of the DR6 register are never cleared by the - * processor. To avoid confusion in identifying debug - * exceptions, debug handlers should clear the register before - * returning to the interrupted task. + * Certain debug exceptions may clear bits 0-3 of DR6. + * + * BLD induced #DB clears DR6.BLD and any other debug + * exception doesn't modify DR6.BLD. * - * Keep it simple: clear DR6 immediately. + * RTM induced #DB clears DR6.RTM and any other debug + * exception sets DR6.RTM. + * + * To avoid confusion in identifying debug exceptions, + * debug handlers should set DR6.BLD and DR6.RTM, and + * clear other DR6 bits before returning. + * + * Keep it simple: write DR6 with its architectural reset + * value 0xFFFF0FF0, defined as DR6_RESERVED, immediately. */ - get_debugreg(dr6, 6); set_debugreg(DR6_RESERVED, 6); - dr6 ^= DR6_RESERVED; /* Flip to positive polarity */ return dr6; } @@ -1239,13 +1247,13 @@ out: /* IST stack entry */ DEFINE_IDTENTRY_DEBUG(exc_debug) { - exc_debug_kernel(regs, debug_read_clear_dr6()); + exc_debug_kernel(regs, debug_read_reset_dr6()); } /* User entry, runs on regular task stack */ DEFINE_IDTENTRY_DEBUG_USER(exc_debug) { - exc_debug_user(regs, debug_read_clear_dr6()); + exc_debug_user(regs, debug_read_reset_dr6()); } #ifdef CONFIG_X86_FRED @@ -1264,7 +1272,7 @@ DEFINE_FREDENTRY_DEBUG(exc_debug) { /* * FRED #DB stores DR6 on the stack in the format which - * debug_read_clear_dr6() returns for the IDT entry points. + * debug_read_reset_dr6() returns for the IDT entry points. */ unsigned long dr6 = fred_event_data(regs); @@ -1279,7 +1287,7 @@ DEFINE_FREDENTRY_DEBUG(exc_debug) /* 32 bit does not have separate entry points. */ DEFINE_IDTENTRY_RAW(exc_debug) { - unsigned long dr6 = debug_read_clear_dr6(); + unsigned long dr6 = debug_read_reset_dr6(); if (user_mode(regs)) exc_debug_user(regs, dr6);

6 days

1
0
0 0

+ maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: maple_tree: fix mt_destroy_walk() on root leaf node has been added to the -mm mm-hotfixes-unstable branch. Its filename is maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Wei Yang <richard.weiyang(a)gmail.com> Subject: maple_tree: fix mt_destroy_walk() on root leaf node Date: Tue, 24 Jun 2025 15:18:40 -0400 On destroy, we should set each node dead. But current code miss this when the maple tree has only the root node. The reason is mt_destroy_walk() leverage mte_destroy_descend() to set node dead, but this is skipped since the only root node is a leaf. Fixes this by setting the node dead if it is a leaf. Link: https://lore.kernel.org/all/20250407231354.11771-1-richard.weiyang@gmail.co… Link: https://lkml.kernel.org/r/20250624191841.64682-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com> Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/maple_tree.c | 1 + 1 file changed, 1 insertion(+) --- a/lib/maple_tree.c~maple_tree-fix-mt_destroy_walk-on-root-leaf-node +++ a/lib/maple_tree.c @@ -5319,6 +5319,7 @@ static void mt_destroy_walk(struct maple struct maple_enode *start; if (mte_is_leaf(enode)) { + mte_set_node_dead(enode); node->type = mte_node_type(enode); goto free_leaf; } _ Patches currently in -mm which might be from richard.weiyang(a)gmail.com are maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch maple_tree-assert-retrieving-new-value-on-a-tree-containing-just-a-leaf-node.patch

6 days, 1 hour

1
0
0 0

[PATCH v3] nvmem: imx-ocotp: fix MAC address byte length

by Steffen Bätz

The commit "13bcd440f2ff nvmem: core: verify cell's raw_len" caused an extension of the "mac-address" cell from 6 to 8 bytes due to word_size of 4 bytes. Thus, the required byte swap for the mac-address of the full buffer length, caused an trucation of the read mac-address. From the original address 70:B3:D5:14:E9:0E to 00:00:70:B3:D5:14 After swapping only the first 6 bytes, the mac-address is correctly passed to the upper layers. Fixes: 13bcd440f2ff ("nvmem: core: verify cell's raw_len") Cc: stable(a)vger.kernel.org Signed-off-by: Steffen Bätz <steffen(a)innosonix.de> --- v3: - replace magic number 6 with ETH_ALEN - Fix misleading indentation and properly group 'mac-address' statements v2: - Add Cc: stable(a)vger.kernel.org as requested by Greg KH's patch bot drivers/nvmem/imx-ocotp-ele.c | 6 +++++- drivers/nvmem/imx-ocotp.c | 6 +++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/nvmem/imx-ocotp-ele.c b/drivers/nvmem/imx-ocotp-ele.c index ca6dd71d8a2e..9ef01c91dfa6 100644 --- a/drivers/nvmem/imx-ocotp-ele.c +++ b/drivers/nvmem/imx-ocotp-ele.c @@ -12,6 +12,7 @@ #include <linux/of.h> #include <linux/platform_device.h> #include <linux/slab.h> +#include <linux/if_ether.h> /* ETH_ALEN */ enum fuse_type { FUSE_FSB = BIT(0), @@ -118,9 +119,12 @@ static int imx_ocotp_cell_pp(void *context, const char *id, int index, int i; /* Deal with some post processing of nvmem cell data */ - if (id && !strcmp(id, "mac-address")) + if (id && !strcmp(id, "mac-address")) { + if (bytes > ETH_ALEN) + bytes = ETH_ALEN; for (i = 0; i < bytes / 2; i++) swap(buf[i], buf[bytes - i - 1]); + } return 0; } diff --git a/drivers/nvmem/imx-ocotp.c b/drivers/nvmem/imx-ocotp.c index 79dd4fda0329..1343cafc37cc 100644 --- a/drivers/nvmem/imx-ocotp.c +++ b/drivers/nvmem/imx-ocotp.c @@ -23,6 +23,7 @@ #include <linux/platform_device.h> #include <linux/slab.h> #include <linux/delay.h> +#include <linux/if_ether.h> /* ETH_ALEN */ #define IMX_OCOTP_OFFSET_B0W0 0x400 /* Offset from base address of the * OTP Bank0 Word0 @@ -227,9 +228,12 @@ static int imx_ocotp_cell_pp(void *context, const char *id, int index, int i; /* Deal with some post processing of nvmem cell data */ - if (id && !strcmp(id, "mac-address")) + if (id && !strcmp(id, "mac-address")) { + if (bytes > ETH_ALEN) + bytes = ETH_ALEN; for (i = 0; i < bytes / 2; i++) swap(buf[i], buf[bytes - i - 1]); + } return 0; } -- 2.43.0 -- *innosonix GmbH* Hauptstr. 35 96482 Ahorn central: +49 9561 7459980 www.innosonix.de <http://www.innosonix.de> innosonix GmbH Geschäftsführer: Markus Bätz, Steffen Bätz USt.-IdNr / VAT-Nr.: DE266020313 EORI-Nr.: DE240121536680271 HRB 5192 Coburg WEEE-Reg.-Nr. DE88021242

6 days, 2 hours

2
1
0 0

[PATCH v1] arm64: dts: ti: k3-am62-verdin: Enable pull-ups on I2C buses

by Emanuele Ghidoli

From: Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com> Enable internal bias pull-ups on the SoC-side I2C buses that do not have external pull resistors populated on the SoM. This ensures proper default line levels. Cc: stable(a)vger.kernel.org Fixes: 316b80246b16 ("arm64: dts: ti: add verdin am62") Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com> --- arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi index 1ea8f64b1b3b..bc2289d74774 100644 --- a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi +++ b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi @@ -507,16 +507,16 @@ AM62X_IOPAD(0x01ec, PIN_INPUT_PULLUP, 0) /* (A17) I2C1_SDA */ /* SODIMM 12 */ /* Verdin I2C_2_DSI */ pinctrl_i2c2: main-i2c2-default-pins { pinctrl-single,pins = < - AM62X_IOPAD(0x00b0, PIN_INPUT, 1) /* (K22) GPMC0_CSn2.I2C2_SCL */ /* SODIMM 55 */ - AM62X_IOPAD(0x00b4, PIN_INPUT, 1) /* (K24) GPMC0_CSn3.I2C2_SDA */ /* SODIMM 53 */ + AM62X_IOPAD(0x00b0, PIN_INPUT_PULLUP, 1) /* (K22) GPMC0_CSn2.I2C2_SCL */ /* SODIMM 55 */ + AM62X_IOPAD(0x00b4, PIN_INPUT_PULLUP, 1) /* (K24) GPMC0_CSn3.I2C2_SDA */ /* SODIMM 53 */ >; }; /* Verdin I2C_4_CSI */ pinctrl_i2c3: main-i2c3-default-pins { pinctrl-single,pins = < - AM62X_IOPAD(0x01d0, PIN_INPUT, 2) /* (A15) UART0_CTSn.I2C3_SCL */ /* SODIMM 95 */ - AM62X_IOPAD(0x01d4, PIN_INPUT, 2) /* (B15) UART0_RTSn.I2C3_SDA */ /* SODIMM 93 */ + AM62X_IOPAD(0x01d0, PIN_INPUT_PULLUP, 2) /* (A15) UART0_CTSn.I2C3_SCL */ /* SODIMM 95 */ + AM62X_IOPAD(0x01d4, PIN_INPUT_PULLUP, 2) /* (B15) UART0_RTSn.I2C3_SDA */ /* SODIMM 93 */ >; }; @@ -786,8 +786,8 @@ AM62X_MCU_IOPAD(0x0010, PIN_INPUT, 7) /* (C9) MCU_SPI0_D1.MCU_GPIO0_4 */ /* SODI /* Verdin I2C_3_HDMI */ pinctrl_mcu_i2c0: mcu-i2c0-default-pins { pinctrl-single,pins = < - AM62X_MCU_IOPAD(0x0044, PIN_INPUT, 0) /* (A8) MCU_I2C0_SCL */ /* SODIMM 59 */ - AM62X_MCU_IOPAD(0x0048, PIN_INPUT, 0) /* (D10) MCU_I2C0_SDA */ /* SODIMM 57 */ + AM62X_MCU_IOPAD(0x0044, PIN_INPUT_PULLUP, 0) /* (A8) MCU_I2C0_SCL */ /* SODIMM 59 */ + AM62X_MCU_IOPAD(0x0048, PIN_INPUT_PULLUP, 0) /* (D10) MCU_I2C0_SDA */ /* SODIMM 57 */ >; }; -- 2.43.0

6 days, 2 hours

3
2
0 0

[PATCH 1/2] drm/xe: Move DSB l2 flush to a more sensible place

by Matthew Auld

From: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Flushing l2 is only needed after all data has been written. Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340") Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.12+ Reviewed-by: Matthew Auld <matthew.auld(a)intel.com> Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> --- drivers/gpu/drm/xe/display/xe_dsb_buffer.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c index f95375451e2f..9f941fc2e36b 100644 --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c @@ -17,10 +17,7 @@ u32 intel_dsb_buffer_ggtt_offset(struct intel_dsb_buffer *dsb_buf) void intel_dsb_buffer_write(struct intel_dsb_buffer *dsb_buf, u32 idx, u32 val) { - struct xe_device *xe = dsb_buf->vma->bo->tile->xe; - iosys_map_wr(&dsb_buf->vma->bo->vmap, idx * 4, u32, val); - xe_device_l2_flush(xe); } u32 intel_dsb_buffer_read(struct intel_dsb_buffer *dsb_buf, u32 idx) @@ -30,12 +27,9 @@ u32 intel_dsb_buffer_read(struct intel_dsb_buffer *dsb_buf, u32 idx) void intel_dsb_buffer_memset(struct intel_dsb_buffer *dsb_buf, u32 idx, u32 val, size_t size) { - struct xe_device *xe = dsb_buf->vma->bo->tile->xe; - WARN_ON(idx > (dsb_buf->buf_size - size) / sizeof(*dsb_buf->cmd_buf)); iosys_map_memset(&dsb_buf->vma->bo->vmap, idx * 4, val, size); - xe_device_l2_flush(xe); } bool intel_dsb_buffer_create(struct intel_crtc *crtc, struct intel_dsb_buffer *dsb_buf, size_t size) @@ -74,9 +68,12 @@ void intel_dsb_buffer_cleanup(struct intel_dsb_buffer *dsb_buf) void intel_dsb_buffer_flush_map(struct intel_dsb_buffer *dsb_buf) { + struct xe_device *xe = dsb_buf->vma->bo->tile->xe; + /* * The memory barrier here is to ensure coherency of DSB vs MMIO, * both for weak ordering archs and discrete cards. */ - xe_device_wmb(dsb_buf->vma->bo->tile->xe); + xe_device_wmb(xe); + xe_device_l2_flush(xe); } -- 2.49.0

6 days, 3 hours

3
7
0 0

FAILED: patch "[PATCH] mm/hugetlb: unshare page tables during VMA split, not before" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 081056dc00a27bccb55ccc3c6f230a3d5fd3f7e0 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062040-detection-sufferer-7865@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 081056dc00a27bccb55ccc3c6f230a3d5fd3f7e0 Mon Sep 17 00:00:00 2001 From: Jann Horn <jannh(a)google.com> Date: Tue, 27 May 2025 23:23:53 +0200 Subject: [PATCH] mm/hugetlb: unshare page tables during VMA split, not before Currently, __split_vma() triggers hugetlb page table unsharing through vm_ops->may_split(). This happens before the VMA lock and rmap locks are taken - which is too early, it allows racing VMA-locked page faults in our process and racing rmap walks from other processes to cause page tables to be shared again before we actually perform the split. Fix it by explicitly calling into the hugetlb unshare logic from __split_vma() in the same place where THP splitting also happens. At that point, both the VMA and the rmap(s) are write-locked. An annoying detail is that we can now call into the helper hugetlb_unshare_pmds() from two different locking contexts: 1. from hugetlb_split(), holding: - mmap lock (exclusively) - VMA lock - file rmap lock (exclusively) 2. hugetlb_unshare_all_pmds(), which I think is designed to be able to call us with only the mmap lock held (in shared mode), but currently only runs while holding mmap lock (exclusively) and VMA lock Backporting note: This commit fixes a racy protection that was introduced in commit b30c14cd6102 ("hugetlb: unshare some PMDs when splitting VMAs"); that commit claimed to fix an issue introduced in 5.13, but it should actually also go all the way back. [jannh(a)google.com: v2] Link: https://lkml.kernel.org/r/20250528-hugetlb-fixes-splitrace-v2-1-1329349bad1… Link: https://lkml.kernel.org/r/20250528-hugetlb-fixes-splitrace-v2-0-1329349bad1… Link: https://lkml.kernel.org/r/20250527-hugetlb-fixes-splitrace-v1-1-f4136f5ec58… Fixes: 39dde65c9940 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Jann Horn <jannh(a)google.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> [b30c14cd6102: hugetlb: unshare some PMDs when splitting VMAs] Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 0598f36931de..42f374e828a2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -279,6 +279,7 @@ bool is_hugetlb_entry_migration(pte_t pte); bool is_hugetlb_entry_hwpoisoned(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); void fixup_hugetlb_reservations(struct vm_area_struct *vma); +void hugetlb_split(struct vm_area_struct *vma, unsigned long addr); #else /* !CONFIG_HUGETLB_PAGE */ @@ -476,6 +477,8 @@ static inline void fixup_hugetlb_reservations(struct vm_area_struct *vma) { } +static inline void hugetlb_split(struct vm_area_struct *vma, unsigned long addr) {} + #endif /* !CONFIG_HUGETLB_PAGE */ #ifndef pgd_write diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f0b1d53079f9..7ba020d489d4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -121,7 +121,7 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma); static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, - unsigned long start, unsigned long end); + unsigned long start, unsigned long end, bool take_locks); static struct resv_map *vma_resv_map(struct vm_area_struct *vma); static void hugetlb_free_folio(struct folio *folio) @@ -5426,26 +5426,40 @@ static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) { if (addr & ~(huge_page_mask(hstate_vma(vma)))) return -EINVAL; + return 0; +} +void hugetlb_split(struct vm_area_struct *vma, unsigned long addr) +{ /* * PMD sharing is only possible for PUD_SIZE-aligned address ranges * in HugeTLB VMAs. If we will lose PUD_SIZE alignment due to this * split, unshare PMDs in the PUD_SIZE interval surrounding addr now. + * This function is called in the middle of a VMA split operation, with + * MM, VMA and rmap all write-locked to prevent concurrent page table + * walks (except hardware and gup_fast()). */ + vma_assert_write_locked(vma); + i_mmap_assert_write_locked(vma->vm_file->f_mapping); + if (addr & ~PUD_MASK) { - /* - * hugetlb_vm_op_split is called right before we attempt to - * split the VMA. We will need to unshare PMDs in the old and - * new VMAs, so let's unshare before we split. - */ unsigned long floor = addr & PUD_MASK; unsigned long ceil = floor + PUD_SIZE; - if (floor >= vma->vm_start && ceil <= vma->vm_end) - hugetlb_unshare_pmds(vma, floor, ceil); + if (floor >= vma->vm_start && ceil <= vma->vm_end) { + /* + * Locking: + * Use take_locks=false here. + * The file rmap lock is already held. + * The hugetlb VMA lock can't be taken when we already + * hold the file rmap lock, and we don't need it because + * its purpose is to synchronize against concurrent page + * table walks, which are not possible thanks to the + * locks held by our caller. + */ + hugetlb_unshare_pmds(vma, floor, ceil, /* take_locks = */ false); + } } - - return 0; } static unsigned long hugetlb_vm_op_pagesize(struct vm_area_struct *vma) @@ -7885,9 +7899,16 @@ void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int re spin_unlock_irq(&hugetlb_lock); } +/* + * If @take_locks is false, the caller must ensure that no concurrent page table + * access can happen (except for gup_fast() and hardware page walks). + * If @take_locks is true, we take the hugetlb VMA lock (to lock out things like + * concurrent page fault handling) and the file rmap lock. + */ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, - unsigned long end) + unsigned long end, + bool take_locks) { struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); @@ -7911,8 +7932,12 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, start, end); mmu_notifier_invalidate_range_start(&range); - hugetlb_vma_lock_write(vma); - i_mmap_lock_write(vma->vm_file->f_mapping); + if (take_locks) { + hugetlb_vma_lock_write(vma); + i_mmap_lock_write(vma->vm_file->f_mapping); + } else { + i_mmap_assert_write_locked(vma->vm_file->f_mapping); + } for (address = start; address < end; address += PUD_SIZE) { ptep = hugetlb_walk(vma, address, sz); if (!ptep) @@ -7922,8 +7947,10 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, spin_unlock(ptl); } flush_hugetlb_tlb_range(vma, start, end); - i_mmap_unlock_write(vma->vm_file->f_mapping); - hugetlb_vma_unlock_write(vma); + if (take_locks) { + i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); + } /* * No need to call mmu_notifier_arch_invalidate_secondary_tlbs(), see * Documentation/mm/mmu_notifier.rst. @@ -7938,7 +7965,8 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { hugetlb_unshare_pmds(vma, ALIGN(vma->vm_start, PUD_SIZE), - ALIGN_DOWN(vma->vm_end, PUD_SIZE)); + ALIGN_DOWN(vma->vm_end, PUD_SIZE), + /* take_locks = */ true); } /* diff --git a/mm/vma.c b/mm/vma.c index 1c6595f282e5..7ebc9eb608f4 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -539,7 +539,14 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, init_vma_prep(&vp, vma); vp.insert = new; vma_prepare(&vp); + + /* + * Get rid of huge pages and shared page tables straddling the split + * boundary. + */ vma_adjust_trans_huge(vma, vma->vm_start, addr, NULL); + if (is_vm_hugetlb_page(vma)) + hugetlb_split(vma, addr); if (new_below) { vma->vm_start = addr; diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index 441feb21aa5a..4505b1c31be1 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -932,6 +932,8 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, (void)next; } +static inline void hugetlb_split(struct vm_area_struct *, unsigned long) {} + static inline void vma_iter_free(struct vma_iterator *vmi) { mas_destroy(&vmi->mas);

6 days, 4 hours

3
3
0 0

[PATCH] PCI: Extend isolated function probing to LoongArch

by Huacai Chen

Like s390 and the jailhouse hypervisor, LoongArch's PCI architecture allows passing isolated PCI functions to a guest OS instance. So it is possible that there is a multi-function device without function 0 for the host or guest. Allow probing such functions by adding a IS_ENABLED(CONFIG_LOONGARCH) case in the hypervisor_isolated_pci_functions() helper. This is similar to commit 189c6c33ff421def040b9 ("PCI: Extend isolated function probing to s390"). Cc: <stable(a)vger.kernel.org> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- include/linux/hypervisor.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index 9efbc54e35e5..be5417303ecf 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -37,6 +37,9 @@ static inline bool hypervisor_isolated_pci_functions(void) if (IS_ENABLED(CONFIG_S390)) return true; + if (IS_ENABLED(CONFIG_LOONGARCH)) + return true; + return jailhouse_paravirt(); } -- 2.47.1

6 days, 5 hours

2
1
0 0

[PATCH v2 1/2] rtc: pcf2127: fix SPI command byte for PCF2131

by Elena Popa

PCF2131 was not responding to read/write operations using SPI. PCF2131 has a different command byte definition, compared to PCF2127/29. Added the new command byte definition when PCF2131 is detected. Fixes: afc505bf9039 ("rtc: pcf2127: add support for PCF2131 RTC") Cc: stable(a)vger.kernel.org Signed-off-by: Elena Popa <elena.popa(a)nxp.com> Acked-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com> --- Changes for v2: - explicitly mentioned SPI --- drivers/rtc/rtc-pcf2127.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c index 31c7dca8f469..2c7917bc2a31 100644 --- a/drivers/rtc/rtc-pcf2127.c +++ b/drivers/rtc/rtc-pcf2127.c @@ -1538,6 +1538,11 @@ static int pcf2127_spi_probe(struct spi_device *spi) variant = &pcf21xx_cfg[type]; } + if (variant->type == PCF2131) { + config.read_flag_mask = 0x0; + config.write_flag_mask = 0x0; + } + config.max_register = variant->max_register, regmap = devm_regmap_init_spi(spi, &config); -- 2.34.1

6 days, 7 hours

2
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror