February 2025 - Linux-stable-mirror

[PATCH v3] PCI: fix reference leak in pci_alloc_child_bus()

by Ma Ke

When device_register(&child->dev) failed, we should call put_device() to explicitly release child->dev. As comment of device_register() says, 'NOTE: _Never_ directly free @dev after calling this function, even if it returned an error! Always use put_device() to give up the reference initialized in this function instead.' Found by code review. Cc: stable(a)vger.kernel.org Fixes: 4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible") Signed-off-by: Ma Ke <make24(a)iscas.ac.cn> --- Changes in v3: - modified the description as suggestions. Changes in v2: - added the bug description about the comment of device_add(); - fixed the patch as suggestions; - added Cc and Fixes table. --- drivers/pci/probe.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 2e81ab0f5a25..51b78fcda4eb 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1174,7 +1174,10 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, add_dev: pci_set_bus_msi_domain(child); ret = device_register(&child->dev); - WARN_ON(ret < 0); + if (WARN_ON(ret < 0)) { + put_device(&child->dev); + return NULL; + } pcibios_add_bus(child); -- 2.25.1

4 months, 2 weeks

3
2
0 0

[PATCH v2 RESEND] PCI: fix reference leak in pci_register_host_bridge()

by Ma Ke

Once device_register() failed, we should call put_device() to decrement reference count for cleanup. Or it could cause memory leak. device_register() includes device_add(). As comment of device_add() says, 'if device_add() succeeds, you should call device_del() when you want to get rid of it. If device_add() has not succeeded, use only put_device() to drop the reference count'. Found by code review. Cc: stable(a)vger.kernel.org Fixes: 37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface") Signed-off-by: Ma Ke <make24(a)iscas.ac.cn> --- Changes in v2: - modified the patch description. --- drivers/pci/probe.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 246744d8d268..7b1d7ce3a83e 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1018,8 +1018,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge) name = dev_name(&bus->dev); err = device_register(&bus->dev); - if (err) + if (err) { + put_device(&bus->dev); goto unregister; + } pcibios_add_bus(bus); -- 2.25.1

4 months, 2 weeks

2
1
0 0

[PATCH] Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection"

by Christian Heusel

This reverts commit 235b630eda072d7e7b102ab346d6b8a2c028a772. This commit was found responsible for issues with SD card recognition, as users had to re-insert their cards in the readers and wait for a while. As for some people the SD card was involved in the boot process it also caused boot failures. Cc: stable(a)vger.kernel.org Link: https://bbs.archlinux.org/viewtopic.php?id=303321 Fixes: 235b630eda07 ("drivers/card_reader/rtsx_usb: Restore interrupt based detection") Reported-by: qf <quintafeira(a)tutanota.com> Closes: https://lore.kernel.org/all/1de87dfa-1e81-45b7-8dcb-ad86c21d5352@heusel.eu Signed-off-by: Christian Heusel <christian(a)heusel.eu> --- drivers/misc/cardreader/rtsx_usb.c | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/drivers/misc/cardreader/rtsx_usb.c b/drivers/misc/cardreader/rtsx_usb.c index e0174da5e9fc39ae96b70ce70d57a87dfaa2ebdb..77b0490a1b38d79134d48020bd49a9fa6f0df967 100644 --- a/drivers/misc/cardreader/rtsx_usb.c +++ b/drivers/misc/cardreader/rtsx_usb.c @@ -286,7 +286,6 @@ static int rtsx_usb_get_status_with_bulk(struct rtsx_ucr *ucr, u16 *status) int rtsx_usb_get_card_status(struct rtsx_ucr *ucr, u16 *status) { int ret; - u8 interrupt_val = 0; u16 *buf; if (!status) @@ -309,20 +308,6 @@ int rtsx_usb_get_card_status(struct rtsx_ucr *ucr, u16 *status) ret = rtsx_usb_get_status_with_bulk(ucr, status); } - rtsx_usb_read_register(ucr, CARD_INT_PEND, &interrupt_val); - /* Cross check presence with interrupts */ - if (*status & XD_CD) - if (!(interrupt_val & XD_INT)) - *status &= ~XD_CD; - - if (*status & SD_CD) - if (!(interrupt_val & SD_INT)) - *status &= ~SD_CD; - - if (*status & MS_CD) - if (!(interrupt_val & MS_INT)) - *status &= ~MS_CD; - /* usb_control_msg may return positive when success */ if (ret < 0) return ret; --- base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6 change-id: 20250224-revert-sdcard-patch-f7a7453d4d8a Best regards, -- Christian Heusel <christian(a)heusel.eu>

4 months, 2 weeks

2
2
0 0

[PATCH] pinctrl: nomadik: Add error handling for find_nmk_gpio_from_pin

by Wentao Liang

When find_nmk_gpio_from_pin fails to find a valid GPIO chip for the given pin, the bit variable remains uninitialized. This uninitialized value is then passed to __nmk_gpio_set_mode, leading to undefined behavior and undesired address access. To fix this, add error handling to check the return value of find_nmk_gpio_from_pin. Log an error message indicating an invalid pin offset and return -EINVAL immediately If the function fails. Fixes: 75d270fda64d ("gpio: nomadik: request dynamic ID allocation") Cc: stable(a)vger.kernel.org # 6.9+ Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn> --- drivers/pinctrl/nomadik/pinctrl-nomadik.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/nomadik/pinctrl-nomadik.c b/drivers/pinctrl/nomadik/pinctrl-nomadik.c index f4f10c60c1d2..4155137b0674 100644 --- a/drivers/pinctrl/nomadik/pinctrl-nomadik.c +++ b/drivers/pinctrl/nomadik/pinctrl-nomadik.c @@ -985,7 +985,7 @@ static int nmk_gpio_request_enable(struct pinctrl_dev *pctldev, unsigned int pin) { struct nmk_pinctrl *npct = pinctrl_dev_get_drvdata(pctldev); - struct nmk_gpio_chip *nmk_chip; + struct nmk_gpio_chip *nmk_chip, *r; struct gpio_chip *chip; unsigned int bit; @@ -1002,7 +1002,12 @@ static int nmk_gpio_request_enable(struct pinctrl_dev *pctldev, dev_dbg(npct->dev, "enable pin %u as GPIO\n", pin); - find_nmk_gpio_from_pin(pin, &bit); + r = find_nmk_gpio_from_pin(pin, &bit); + if (!r) { + dev_err(npct->dev, + "invalid pin offset %d\n", pin); + return -EINVAL; + } clk_enable(nmk_chip->clk); /* There is no glitch when converting any pin to GPIO */ -- 2.42.0.windows.2

4 months, 2 weeks

3
2
0 0

[tip: sched/urgent] sched/core: Prevent rescheduling when interrupts are disabled

by tip-bot2 for Thomas Gleixner

The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 82c387ef7568c0d96a918a5a78d9cad6256cfa15 Gitweb: https://git.kernel.org/tip/82c387ef7568c0d96a918a5a78d9cad6256cfa15 Author: Thomas Gleixner <tglx(a)linutronix.de> AuthorDate: Mon, 16 Dec 2024 14:20:56 +01:00 Committer: Ingo Molnar <mingo(a)kernel.org> CommitterDate: Thu, 27 Feb 2025 21:13:57 +01:00 sched/core: Prevent rescheduling when interrupts are disabled David reported a warning observed while loop testing kexec jump: Interrupts enabled after irqrouter_resume+0x0/0x50 WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220 kernel_kexec+0xf6/0x180 __do_sys_reboot+0x206/0x250 do_syscall_64+0x95/0x180 The corresponding interrupt flag trace: hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90 hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90 That means __up_console_sem() was invoked with interrupts enabled. Further instrumentation revealed that in the interrupt disabled section of kexec jump one of the syscore_suspend() callbacks woke up a task, which set the NEED_RESCHED flag. A later callback in the resume path invoked cond_resched() which in turn led to the invocation of the scheduler: __cond_resched+0x21/0x60 down_timeout+0x18/0x60 acpi_os_wait_semaphore+0x4c/0x80 acpi_ut_acquire_mutex+0x3d/0x100 acpi_ns_get_node+0x27/0x60 acpi_ns_evaluate+0x1cb/0x2d0 acpi_rs_set_srs_method_data+0x156/0x190 acpi_pci_link_set+0x11c/0x290 irqrouter_resume+0x54/0x60 syscore_resume+0x6a/0x200 kernel_kexec+0x145/0x1c0 __do_sys_reboot+0xeb/0x240 do_syscall_64+0x95/0x180 This is a long standing problem, which probably got more visible with the recent printk changes. Something does a task wakeup and the scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and invokes schedule() from a completely bogus context. The scheduler enables interrupts after context switching, which causes the above warning at the end. Quite some of the code paths in syscore_suspend()/resume() can result in triggering a wakeup with the exactly same consequences. They might not have done so yet, but as they share a lot of code with normal operations it's just a question of time. The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling models. Full preemption is not affected as cond_resched() is disabled and the preemption check preemptible() takes the interrupt disabled flag into account. Cure the problem by adding a corresponding check into cond_resched(). Reported-by: David Woodhouse <dwmw(a)amazon.co.uk> Suggested-by: Peter Zijlstra <peterz(a)infradead.org> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Tested-by: David Woodhouse <dwmw(a)amazon.co.uk> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@… --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9aecd91..6718990 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7285,7 +7285,7 @@ out_unlock: #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) int __sched __cond_resched(void) { - if (should_resched(0)) { + if (should_resched(0) && !irqs_disabled()) { preempt_schedule_common(); return 1; }

4 months, 2 weeks

1
0
0 0

[tip: sched/urgent] sched/core: Prevent rescheduling when interrupts are disabled

by tip-bot2 for Thomas Gleixner

The following commit has been merged into the sched/urgent branch of tip: Commit-ID: c092dc7d88c1214e109591790c9021a0f734677a Gitweb: https://git.kernel.org/tip/c092dc7d88c1214e109591790c9021a0f734677a Author: Thomas Gleixner <tglx(a)linutronix.de> AuthorDate: Mon, 16 Dec 2024 14:20:56 +01:00 Committer: Ingo Molnar <mingo(a)kernel.org> CommitterDate: Thu, 27 Feb 2025 20:55:16 +01:00 sched/core: Prevent rescheduling when interrupts are disabled David reported a warning observed while loop testing kexec jump: Interrupts enabled after irqrouter_resume+0x0/0x50 WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220 kernel_kexec+0xf6/0x180 __do_sys_reboot+0x206/0x250 do_syscall_64+0x95/0x180 The corresponding interrupt flag trace: hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90 hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90 That means __up_console_sem() was invoked with interrupts enabled. Further instrumentation revealed that in the interrupt disabled section of kexec jump one of the syscore_suspend() callbacks woke up a task, which set the NEED_RESCHED flag. A later callback in the resume path invoked cond_resched() which in turn led to the invocation of the scheduler: __cond_resched+0x21/0x60 down_timeout+0x18/0x60 acpi_os_wait_semaphore+0x4c/0x80 acpi_ut_acquire_mutex+0x3d/0x100 acpi_ns_get_node+0x27/0x60 acpi_ns_evaluate+0x1cb/0x2d0 acpi_rs_set_srs_method_data+0x156/0x190 acpi_pci_link_set+0x11c/0x290 irqrouter_resume+0x54/0x60 syscore_resume+0x6a/0x200 kernel_kexec+0x145/0x1c0 __do_sys_reboot+0xeb/0x240 do_syscall_64+0x95/0x180 This is a long standing problem, which probably got more visible with the recent printk changes. Something does a task wakeup and the scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and invokes schedule() from a completely bogus context. The scheduler enables interrupts after context switching, which causes the above warning at the end. Quite some of the code paths in syscore_suspend()/resume() can result in triggering a wakeup with the exactly same consequences. They might not have done so yet, but as they share a lot of code with normal operations it's just a question of time. The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling models. Full preemption is not affected as cond_resched() is disabled and the preemption check preemptible() takes the interrupt disabled flag into account. Cure the problem by adding a corresponding check into cond_resched(). Reported-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Tested-by: David Woodhouse <dwmw(a)amazon.co.uk> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@… --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9aecd91..6718990 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7285,7 +7285,7 @@ out_unlock: #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) int __sched __cond_resched(void) { - if (should_resched(0)) { + if (should_resched(0) && !irqs_disabled()) { preempt_schedule_common(); return 1; }

4 months, 2 weeks

1
0
0 0

[PATCH v6 2/3] x86/tdx: Fix arch_safe_halt() execution for TDX VMs

by Vishal Annapurve

Direct HLT instruction execution causes #VEs for TDX VMs which is routed to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE handler will enable interrupts before TDCALL is routed to hypervisor leading to missed wakeup events. Current TDX spec doesn't expose interruptibility state information to allow #VE handler to selectively enable interrupts. To bypass this issue, TDX VMs need to replace "sti;hlt" execution with direct TDCALL followed by explicit interrupt flag update. Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") prevented the idle routines from executing HLT instruction in STI-shadow. But it missed the paravirt routine which can be reached like this as an example: acpi_safe_halt() => raw_safe_halt() => arch_safe_halt() => irq.safe_halt() => pv_native_safe_halt() To reliably handle arch_safe_halt() for TDX VMs, introduce explicit dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt() routines with TDX-safe versions that execute direct TDCALL and needed interrupt flag updates. Executing direct TDCALL brings in additional benefit of avoiding HLT related #VEs altogether. Cc: stable(a)vger.kernel.org Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") Signed-off-by: Vishal Annapurve <vannapurve(a)google.com> --- arch/x86/Kconfig | 1 + arch/x86/coco/tdx/tdx.c | 26 +++++++++++++++++++++++++- arch/x86/include/asm/tdx.h | 2 +- arch/x86/kernel/process.c | 2 +- 4 files changed, 28 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index be2c311f5118..933c046e8966 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -902,6 +902,7 @@ config INTEL_TDX_GUEST depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC depends on EFI_STUB + depends on PARAVIRT select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 32809a06dab4..6aad910d119d 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -14,6 +14,7 @@ #include <asm/ia32.h> #include <asm/insn.h> #include <asm/insn-eval.h> +#include <asm/paravirt_types.h> #include <asm/pgtable.h> #include <asm/set_memory.h> #include <asm/traps.h> @@ -398,7 +399,7 @@ static int handle_halt(struct ve_info *ve) return ve_instr_len(ve); } -void __cpuidle tdx_safe_halt(void) +void __cpuidle tdx_halt(void) { const bool irq_disabled = false; @@ -409,6 +410,16 @@ void __cpuidle tdx_safe_halt(void) WARN_ONCE(1, "HLT instruction emulation failed\n"); } +static void __cpuidle tdx_safe_halt(void) +{ + tdx_halt(); + /* + * "__cpuidle" section doesn't support instrumentation, so stick + * with raw_* variant that avoids tracing hooks. + */ + raw_local_irq_enable(); +} + static int read_msr(struct pt_regs *regs, struct ve_info *ve) { struct tdx_module_args args = { @@ -1109,6 +1120,19 @@ void __init tdx_early_init(void) x86_platform.guest.enc_kexec_begin = tdx_kexec_begin; x86_platform.guest.enc_kexec_finish = tdx_kexec_finish; + /* + * Avoid "sti;hlt" execution in TDX guests as HLT induces a #VE that + * will enable interrupts before HLT TDCALL invocation if executed + * in STI-shadow, possibly resulting in missed wakeup events. + * + * Modify all possible HLT execution paths to use TDX specific routines + * that directly execute TDCALL and toggle the interrupt state as + * needed after TDCALL completion. This also reduces HLT related #VEs + * in addition to having a reliable halt logic execution. + */ + pv_ops.irq.safe_halt = tdx_safe_halt; + pv_ops.irq.halt = tdx_halt; + /* * TDX intercepts the RDMSR to read the X2APIC ID in the parallel * bringup low level code. That raises #VE which cannot be handled diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b4b16dafd55e..393ee2dfaab1 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -58,7 +58,7 @@ void tdx_get_ve_info(struct ve_info *ve); bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); -void tdx_safe_halt(void); +void tdx_halt(void); bool tdx_early_handle_ve(struct pt_regs *regs); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 6da6769d7254..d11956a178df 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -934,7 +934,7 @@ void __init select_idle_routine(void) static_call_update(x86_idle, mwait_idle); } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { pr_info("using TDX aware idle routine\n"); - static_call_update(x86_idle, tdx_safe_halt); + static_call_update(x86_idle, tdx_halt); } else { static_call_update(x86_idle, default_idle); } -- 2.48.1.658.g4767266eb4-goog

4 months, 2 weeks

2
2
0 0

[PATCH] LoongArch: Use polling play_dead() when resuming from hibernation

by Huacai Chen

When CONFIG_RANDOM_KMALLOC_CACHES or other randomization infrastructrue enabled, the idle_task's stack may different between the booting kernel and target kernel. So when resuming from hibernation, an ACTION_BOOT_CPU IPI wakeup the idle instruction in arch_cpu_idle_dead() and jump to the interrupt handler. But since the stack pointer is changed, the interrupt handler cannot restore correct context. So rename the current arch_cpu_idle_dead() to idle_play_dead(), make it as the default version of play_dead(), and the new arch_cpu_idle_dead() call play_dead() directly. For hibernation, implement an arch-specific hibernate_resume_nonboot_cpu_disable() to use the polling version (idle instruction is replace by nop, and irq is disabled) of play_dead(), i.e. poll_play_dead(), to avoid IPI handler corrupting the idle_task's stack when resuming from hibernation. This solution is a little similar to commit 406f992e4a372dafbe3c ("x86 / hibernate: Use hlt_play_dead() when resuming from hibernation"). Cc: stable(a)vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- arch/loongarch/kernel/smp.c | 40 ++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c index fbf747447f13..308478f29278 100644 --- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -19,6 +19,7 @@ #include <linux/smp.h> #include <linux/threads.h> #include <linux/export.h> +#include <linux/suspend.h> #include <linux/syscore_ops.h> #include <linux/time.h> #include <linux/tracepoint.h> @@ -423,7 +424,7 @@ void loongson_cpu_die(unsigned int cpu) mb(); } -void __noreturn arch_cpu_idle_dead(void) +static void __noreturn idle_play_dead(void) { register uint64_t addr; register void (*init_fn)(void); @@ -447,6 +448,43 @@ void __noreturn arch_cpu_idle_dead(void) BUG(); } +static void __noreturn poll_play_dead(void) +{ + register uint64_t addr; + register void (*init_fn)(void); + + idle_task_exit(); + __this_cpu_write(cpu_state, CPU_DEAD); + + __smp_mb(); + do { + __asm__ __volatile__("nop\n\t"); + addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0); + } while (addr == 0); + + init_fn = (void *)TO_CACHE(addr); + iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_CLEAR); + + init_fn(); + BUG(); +} + +static void (*play_dead)(void) = idle_play_dead; + +void __noreturn arch_cpu_idle_dead(void) +{ + play_dead(); + BUG(); /* play_dead() doesn't return */ +} + +#ifdef CONFIG_HIBERNATION +int hibernate_resume_nonboot_cpu_disable(void) +{ + play_dead = poll_play_dead; + return suspend_disable_secondary_cpus(); +} +#endif + #endif /* -- 2.47.1

4 months, 2 weeks

2
1
0 0

[PATCH] mm: fix finish_fault() handling for large folios

by Brian Geffon

When handling faults for anon shmem finish_fault() will attempt to install ptes for the entire folio. Unfortunately if it encounters a single non-pte_none entry in that range it will bail, even if the pte that triggered the fault is still pte_none. When this situation happens the fault will be retried endlessly never making forward progress. This patch fixes this behavior and if it detects that a pte in the range is not pte_none it will fall back to setting just the pte for the address that triggered the fault. Cc: stable(a)vger.kernel.org Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Hugh Dickins <hughd(a)google.com> Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio") Reported-by: Marek Maslanka <mmaslanka(a)google.com> Signed-off-by: Brian Geffon <bgeffon(a)google.com> --- mm/memory.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b4d3d4893267..32de626ec1da 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5258,9 +5258,22 @@ vm_fault_t finish_fault(struct vm_fault *vmf) ret = VM_FAULT_NOPAGE; goto unlock; } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); - ret = VM_FAULT_NOPAGE; - goto unlock; + /* + * We encountered a set pte, let's just try to install the + * pte for the original fault if that pte is still pte none. + */ + pgoff_t idx = (vmf->address - addr) / PAGE_SIZE; + + if (!pte_none(ptep_get_lockless(vmf->pte + idx))) { + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); + ret = VM_FAULT_NOPAGE; + goto unlock; + } + + vmf->pte = vmf->pte + idx; + page = folio_page(folio, idx); + addr = vmf->address; + nr_pages = 1; } folio_ref_add(folio, nr_pages - 1); -- 2.48.1.711.g2feabab25a-goog

4 months, 2 weeks

4
11
0 0

[PATCH] xen/pciback: Make missing GSI non-fatal

by Jason Andryuk

A PCI may not have a legacy IRQ. In that case, do not fail assigning to the pciback stub. Instead just skip xen_pvh_setup_gsi(). This will leave psdev->gsi == -1. In that case, when reading the value via IOCTL_PRIVCMD_PCIDEV_GET_GSI, return -ENOENT. Userspace can used this to distinquish from other errors. Fixes: b166b8ab4189 ("xen/pvh: Setup gsi for passthrough device") Cc: stable(a)vger.kernel.org Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com> --- drivers/xen/acpi.c | 4 ++-- drivers/xen/xen-pciback/pci_stub.c | 17 ++++++++++------- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c index d2ee605c5ca1..d6ab0cb3ba3f 100644 --- a/drivers/xen/acpi.c +++ b/drivers/xen/acpi.c @@ -101,7 +101,7 @@ int xen_acpi_get_gsi_info(struct pci_dev *dev, pin = dev->pin; if (!pin) - return -EINVAL; + return -ENOENT; entry = acpi_pci_irq_lookup(dev, pin); if (entry) { @@ -116,7 +116,7 @@ int xen_acpi_get_gsi_info(struct pci_dev *dev, gsi = -1; if (gsi < 0) - return -EINVAL; + return -ENOENT; *gsi_out = gsi; *trigger_out = trigger; diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index b616b7768c3b..9715c2f70586 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -240,6 +240,9 @@ static int pcistub_get_gsi_from_sbdf(unsigned int sbdf) if (!psdev) return -ENODEV; + if (psdev->gsi == -1) + return -ENOENT; + return psdev->gsi; } #endif @@ -475,14 +478,14 @@ static int pcistub_init_device(struct pcistub_device *psdev) #ifdef CONFIG_XEN_ACPI if (xen_initial_domain() && xen_pvh_domain()) { err = xen_acpi_get_gsi_info(dev, &gsi, &trigger, &polarity); - if (err) { - dev_err(&dev->dev, "Fail to get gsi info!\n"); - goto config_release; + if (err && err != -ENOENT) { + dev_err(&dev->dev, "Failed to get gsi info! %d\n", err); + } else if (!err) { + err = xen_pvh_setup_gsi(gsi, trigger, polarity); + if (err) + goto config_release; + psdev->gsi = gsi; } - err = xen_pvh_setup_gsi(gsi, trigger, polarity); - if (err) - goto config_release; - psdev->gsi = gsi; } #endif -- 2.34.1

4 months, 2 weeks

3
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2025