- Linux-stable-mirror - lists.linaro.org

[PATCH v2 1/2] PCI: Forcefully set the PCI_REASSIGN_ALL_BUS flag for Marvell CN96XX/CN10XXX boards

by Bo Sun

On our Marvell OCTEON CN96XX board, we observed the following panic on the latest kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20 Hardware name: Marvell OcteonTX CN96XX board (DT) pc : of_pci_add_properties+0x278/0x4c8 Call trace: of_pci_add_properties+0x278/0x4c8 (P) of_pci_make_dev_node+0xe0/0x158 pci_bus_add_device+0x158/0x228 pci_bus_add_devices+0x40/0x98 pci_host_probe+0x94/0x118 pci_host_common_probe+0x130/0x1b0 platform_probe+0x70/0xf0 The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range: pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00 pci_bus 0002:00: root bus resource [bus 00-ff] pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0 pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0 pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0 pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0 pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0 pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0 pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0 pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0 pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0 pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0 pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0 pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0 pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0 pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0 pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0 pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0 This regression was introduced by commit 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags"). On our board, the 0002:00:07.0 bridge is misconfigured by the bootloader. Both its secondary and subordinate bus numbers are initialized to 0, while its fixed secondary bus number is set to 8. However, bus number 8 is also assigned to another bridge (0002:00:0f.0). Although this is a bootloader issue, before the change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set by default when PCI_PROBE_ONLY was not enabled, ensuing that all the bus number for these bridges were reassigned, avoiding any conflicts. After the change introduced in commit 7246a4520b4b, the bus numbers assigned by the bootloader are reused by all other bridges, except the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure 0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by bootloader. However, since a pci_bus has already been allocated for bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for 0002:00:07.0. This results in a pci bridge device without a pci_bus attached (pdev->subordinate == NULL). Consequently, accessing pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer dereference. To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when PCI_PROBE_ONLY is not enabled in order to work around issue like the one described above. Cc: stable(a)vger.kernel.org Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags") Signed-off-by: Bo Sun <Bo.Sun.CN(a)windriver.com> --- Changes in v2: - Added explicit comment about the quirk, as requested by Mani. - Made commit message more clear, as requested by Bjorn. drivers/pci/quirks.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 82b21e34c545..cec58c7479e1 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -6181,6 +6181,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect); +/* + * Quirk for Marvell CN96XX/CN10XXX boards: + * + * Adds PCI_REASSIGN_ALL_BUS unless PCI_PROBE_ONLY is set, forcing bus number + * reassignment to avoid conflicts caused by bootloader misconfigured PCI bridges. + * + * This resolves a regression introduced by commit 7246a4520b4b ("PCI: Use + * preserve_config in place of pci_flags"), which removed this behavior. + */ +static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev) +{ + if (!pci_has_flag(PCI_PROBE_ONLY)) + pci_add_flags(PCI_REASSIGN_ALL_BUS); +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002, + quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr); + #ifdef CONFIG_PCIEASPM /* * Several Intel DG2 graphics devices advertise that they can only tolerate -- 2.48.1

3 months, 2 weeks

2
1
0 0

[merged mm-stable] mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages has been removed from the -mm tree. Its filename was mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Shuai Xue <xueshuai(a)linux.alibaba.com> Subject: mm/hwpoison: do not send SIGBUS to processes with recovered clean pages Date: Wed, 12 Mar 2025 19:28:51 +0800 When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. - Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR. Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name. - Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read. Thus: 1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A. Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison). - Why user process is killed for instr case Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") tries to fix noise message "Memory error not recovered" and skips duplicate SIGBUSs due to the race. But it also introduced a bug that kill_accessing_process() return -EHWPOISON for instr case, as result, kill_me_maybe() send a SIGBUS to user process. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure(). For dirty pages, memory_failure() invokes try_to_unmap() with the TTU_HWPOISON flag, converting the PTE to a hwpoison entry. As a result, kill_accessing_process(): - call walk_page_range() and return 1 regardless of whether try_to_unmap() succeeds or fails, - call kill_proc() to make sure a SIGBUS is sent - return -EHWPOISON to indicate that SIGBUS is already sent to the process and kill_me_maybe() doesn't have to send it again. However, for clean pages, the TTU_HWPOISON flag is cleared, leaving the PTE unchanged and not converted to a hwpoison entry. Conversely, for clean pages where PTE entries are not marked as hwpoison, kill_accessing_process() returns -EFAULT, causing kill_me_maybe() to send a SIGBUS. Console log looks like this: Memory failure: 0x827ca68: corrupted page was clean: dropped without side effects Memory failure: 0x827ca68: recovery action for clean LRU page: Recovered Memory failure: 0x827ca68: already hardware poisoned mce: Memory error not recovered To fix it, return 0 for "corrupted page was clean", preventing an unnecessary SIGBUS to user process. [1] https://lore.kernel.org/lkml/20250217063335.22257-1-xueshuai@linux.alibaba.… Link: https://lkml.kernel.org/r/20250312112852.82415-3-xueshuai@linux.alibaba.com Fixes: 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com> Tested-by: Tony Luck <tony.luck(a)intel.com> Acked-by: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Borislav Betkov <bp(a)alien8.de> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Jane Chu <jane.chu(a)oracle.com> Cc: Jarkko Sakkinen <jarkko(a)kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: Josh Poimboeuf <jpoimboe(a)kernel.org> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Ruidong Tian <tianruidong(a)linux.alibaba.com> Cc: Thomas Gleinxer <tglx(a)linutronix.de> Cc: Yazen Ghannam <yazen.ghannam(a)amd.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) --- a/mm/memory-failure.c~mm-hwpoison-do-not-send-sigbus-to-processes-with-recovered-clean-pages +++ a/mm/memory-failure.c @@ -881,12 +881,17 @@ static int kill_accessing_process(struct mmap_read_lock(p->mm); ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwpoison_walk_ops, (void *)&priv); + /* + * ret = 1 when CMCI wins, regardless of whether try_to_unmap() + * succeeds or fails, then kill the process with SIGBUS. + * ret = 0 when poison page is a clean page and it's dropped, no + * SIGBUS is needed. + */ if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); - else - ret = 0; mmap_read_unlock(p->mm); - return ret > 0 ? -EHWPOISON : -EFAULT; + + return ret > 0 ? -EHWPOISON : 0; } /* _ Patches currently in -mm which might be from xueshuai(a)linux.alibaba.com are

3 months, 2 weeks

1
0
0 0

[merged mm-stable] x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: x86/mce: use is_copy_from_user() to determine copy-from-user context has been removed from the -mm tree. Its filename was x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Shuai Xue <xueshuai(a)linux.alibaba.com> Subject: x86/mce: use is_copy_from_user() to determine copy-from-user context Date: Wed, 12 Mar 2025 19:28:50 +0800 Patch series "mm/hwpoison: Fix regressions in memory failure handling", v4. ## 1. What am I trying to do: This patchset resolves two critical regressions related to memory failure handling that have appeared in the upstream kernel since version 5.17, as compared to 5.10 LTS. - copyin case: poison found in user page while kernel copying from user space - instr case: poison found while instruction fetching in user space ## 2. What is the expected outcome and why - For copyin case: Kernel can recover from poison found where kernel is doing get_user() or copy_from_user() if those places get an error return and the kernel return -EFAULT to the process instead of crashing. More specifily, MCE handler checks the fixup handler type to decide whether an in kernel #MC can be recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space. - For instr case: If a poison found while instruction fetching in user space, full recovery is possible. User process takes #PF, Linux allocates a new page and fills by reading from storage. ## 3. What actually happens and why - For copyin case: kernel panic since v5.17 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS handling when posion found in get_user() or copy_from_user(). - For instr case: user process is killed by a SIGBUS signal due to #CMCI and #MCE race When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. ### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR. Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name. ### Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read. Thus: 1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A. Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison). ## Why user process is killed for instr case Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not recovered"") tries to fix noise message "Memory error not recovered" and skips duplicate SIGBUSs due to the race. But it also introduced a bug that kill_accessing_process() return -EHWPOISON for instr case, as result, kill_me_maybe() send a SIGBUS to user process. # 4. The fix, in my opinion, should be: - For copyin case: The key point is whether the error context is in a read from user memory. We do not care about the ex-type if we know its a MOV reading from userspace. is_copy_from_user() return true when both of the following two checks are true: - the current instruction is copy - source address is user memory If copy_user is true, we set m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV; Then do_machine_check() will try fixup_exception() first. - For instr case: let kill_accessing_process() return 0 to prevent a SIGBUS. - For patch 3: The return value of memory_failure() is quite important while discussed instr case regression with Tony and Miaohe for patch 2, so add comment about the return value. This patch (of 3): Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and commit 4c132d1d844a ("x86/futex: Remove .fixup usage") updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. The error context for copy-from-user operations no longer functions as an in-kernel recovery context. Consequently, the error context for copy-from-user operations no longer functions as an in-kernel recovery context, resulting in kernel panics with the message: "Machine check: Data load in unrecoverable area of kernel." To address this, it is crucial to identify if an error context involves a read operation from user memory. The function is_copy_from_user() can be utilized to determine: - the current operation is copy - when reading user memory When these conditions are met, is_copy_from_user() will return true, confirming that it is indeed a direct copy from user memory. This check is essential for correctly handling the context of errors in these operations without relying on the extable fixup types that previously allowed for in-kernel recovery. So, use is_copy_from_user() to determine if a context is copy user directly. Link: https://lkml.kernel.org/r/20250312112852.82415-1-xueshuai@linux.alibaba.com Link: https://lkml.kernel.org/r/20250312112852.82415-2-xueshuai@linux.alibaba.com Fixes: 4c132d1d844a ("x86/futex: Remove .fixup usage") Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz(a)infradead.org> Acked-by: Borislav Petkov (AMD) <bp(a)alien8.de> Tested-by: Tony Luck <tony.luck(a)intel.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Borislav Betkov <bp(a)alien8.de> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Josh Poimboeuf <jpoimboe(a)kernel.org> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Ruidong Tian <tianruidong(a)linux.alibaba.com> Cc: Thomas Gleinxer <tglx(a)linutronix.de> Cc: Yazen Ghannam <yazen.ghannam(a)amd.com> Cc: Jane Chu <jane.chu(a)oracle.com> Cc: Jarkko Sakkinen <jarkko(a)kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/kernel/cpu/mce/severity.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) --- a/arch/x86/kernel/cpu/mce/severity.c~x86-mce-use-is_copy_from_user-to-determine-copy-from-user-context +++ a/arch/x86/kernel/cpu/mce/severity.c @@ -300,13 +300,12 @@ static noinstr int error_context(struct copy_user = is_copy_from_user(regs); instrumentation_end(); - switch (fixup_type) { - case EX_TYPE_UACCESS: - if (!copy_user) - return IN_KERNEL; - m->kflags |= MCE_IN_KERNEL_COPYIN; - fallthrough; + if (copy_user) { + m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV; + return IN_KERNEL_RECOV; + } + switch (fixup_type) { case EX_TYPE_FAULT_MCE_SAFE: case EX_TYPE_DEFAULT_MCE_SAFE: m->kflags |= MCE_IN_KERNEL_RECOV; _ Patches currently in -mm which might be from xueshuai(a)linux.alibaba.com are

3 months, 2 weeks

1
0
0 0

[merged mm-stable] mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock has been removed from the -mm tree. Its filename was mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Subject: mm: add missing release barrier on PGDAT_RECLAIM_LOCKED unlock Date: Wed, 12 Mar 2025 10:10:13 -0400 The PGDAT_RECLAIM_LOCKED bit is used to provide mutual exclusion of node reclaim for struct pglist_data using a single bit. It is "locked" with a test_and_set_bit (similarly to a try lock) which provides full ordering with respect to loads and stores done within __node_reclaim(). It is "unlocked" with clear_bit(), which does not provide any ordering with respect to loads and stores done before clearing the bit. The lack of clear_bit() memory ordering with respect to stores within __node_reclaim() can cause a subsequent CPU to fail to observe stores from a prior node reclaim. This is not an issue in practice on TSO (e.g. x86), but it is an issue on weakly-ordered architectures (e.g. arm64). Fix this by using clear_bit_unlock rather than clear_bit to clear PGDAT_RECLAIM_LOCKED with a release memory ordering semantic. This provides stronger memory ordering (release rather than relaxed). Link: https://lkml.kernel.org/r/20250312141014.129725-1-mathieu.desnoyers@efficio… Fixes: d773ed6b856a ("mm: test and set zone reclaim lock before starting reclaim") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Alan Stern <stern(a)rowland.harvard.edu> Cc: Andrea Parri <parri.andrea(a)gmail.com> Cc: Will Deacon <will(a)kernel.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Boqun Feng <boqun.feng(a)gmail.com> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: David Howells <dhowells(a)redhat.com> Cc: Jade Alglave <j.alglave(a)ucl.ac.uk> Cc: Luc Maranget <luc.maranget(a)inria.fr> Cc: "Paul E. McKenney" <paulmck(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmscan.c~mm-add-missing-release-barrier-on-pgdat_reclaim_locked-unlock +++ a/mm/vmscan.c @@ -7581,7 +7581,7 @@ int node_reclaim(struct pglist_data *pgd return NODE_RECLAIM_NOSCAN; ret = __node_reclaim(pgdat, gfp_mask, order); - clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags); + clear_bit_unlock(PGDAT_RECLAIM_LOCKED, &pgdat->flags); if (ret) count_vm_event(PGSCAN_ZONE_RECLAIM_SUCCESS); _ Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are

3 months, 2 weeks

1
0
0 0

[merged mm-stable] mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0 has been removed from the -mm tree. Its filename was mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0 Date: Mon, 10 Mar 2025 20:50:34 +0000 Patch series "refactor mremap and fix bug", v3. The existing mremap() logic has grown organically over a very long period of time, resulting in code that is in many parts, very difficult to follow and full of subtleties and sources of confusion. In addition, it is difficult to thread state through the operation correctly, as function arguments have expanded, some parameters are expected to be temporarily altered during the operation, others are intended to remain static and some can be overridden. This series completely refactors the mremap implementation, sensibly separating functions, adding comments to explain the more subtle aspects of the implementation and making use of small structs to thread state through everything. The reason for doing so is to lay the groundwork for planned future changes to the mremap logic, changes which require the ability to easily pass around state. Additionally, it would be unhelpful to add yet more logic to code that is already difficult to follow without first refactoring it like this. The first patch in this series additionally fixes a bug when a VMA with start address zero is partially remapped. Tested on real hardware under heavy workload and all self tests are passing. This patch (of 3): Consider the case of a partial mremap() (that results in a VMA split) of an accountable VMA (i.e. which has the VM_ACCOUNT flag set) whose start address is zero, with the MREMAP_MAYMOVE flag specified and a scenario where a move does in fact occur: addr end | | v v |-------------| | vma | |-------------| 0 This move is affected by unmapping the range [addr, end). In order to prevent an incorrect decrement of accounted memory which has already been determined, the mremap() code in move_vma() clears VM_ACCOUNT from the VMA prior to doing so, before reestablishing it in each of the VMAs post-split: addr end | | v v |---| |---| | A | | B | |---| |---| Commit 6b73cff239e5 ("mm: change munmap splitting order and move_vma()") changed this logic such as to determine whether there is a need to do so by establishing account_start and account_end and, in the instance where such an operation is required, assigning them to vma->vm_start and vma->vm_end. Later the code checks if the operation is required for 'A' referenced above thusly: if (account_start) { ... } However, if the VMA described above has vma->vm_start == 0, which is now assigned to account_start, this branch will not be executed. As a result, the VMA 'A' above will remain stripped of its VM_ACCOUNT flag, incorrectly. The fix is to simply convert these variables to booleans and set them as required. Link: https://lkml.kernel.org/r/cover.1741639347.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/dc55cb6db25d97c3d9e460de4986a323fa959676.17416393… Fixes: 6b73cff239e5 ("mm: change munmap splitting order and move_vma()") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mremap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/mm/mremap.c~mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0 +++ a/mm/mremap.c @@ -705,8 +705,8 @@ static unsigned long move_vma(struct vm_ unsigned long vm_flags = vma->vm_flags; unsigned long new_pgoff; unsigned long moved_len; - unsigned long account_start = 0; - unsigned long account_end = 0; + bool account_start = false; + bool account_end = false; unsigned long hiwater_vm; int err = 0; bool need_rmap_locks; @@ -790,9 +790,9 @@ static unsigned long move_vma(struct vm_ if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) { vm_flags_clear(vma, VM_ACCOUNT); if (vma->vm_start < old_addr) - account_start = vma->vm_start; + account_start = true; if (vma->vm_end > old_addr + old_len) - account_end = vma->vm_end; + account_end = true; } /* @@ -832,7 +832,7 @@ static unsigned long move_vma(struct vm_ /* OOM: unable to split vma, just get accounts right */ if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) vm_acct_memory(old_len >> PAGE_SHIFT); - account_start = account_end = 0; + account_start = account_end = false; } if (vm_flags & VM_LOCKED) { _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are

3 months, 2 weeks

1
0
0 0

[PATCH RESEND] agp: Fix a potential memory leak bug in agp_amdk7_probe()

by Haoxiang Li

Variable "bridge" is allocated by agp_alloc_bridge() and have to be released by agp_put_bridge() if something goes wrong. In this patch, add the missing call of agp_put_bridge() in agp_amdk7_probe() to prevent potential memory leak bug. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/char/agp/amd-k7-agp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c index 795c8c9ff680..40e1fc462dca 100644 --- a/drivers/char/agp/amd-k7-agp.c +++ b/drivers/char/agp/amd-k7-agp.c @@ -441,6 +441,7 @@ static int agp_amdk7_probe(struct pci_dev *pdev, gfxcard = pci_get_class(PCI_CLASS_DISPLAY_VGA<<8, gfxcard); if (!gfxcard) { dev_info(&pdev->dev, "no AGP VGA controller\n"); + agp_put_bridge(bridge); return -ENODEV; } cap_ptr = pci_find_capability(gfxcard, PCI_CAP_ID_AGP); -- 2.25.1

3 months, 2 weeks

1
0
0 0

[PATCH RESEND] pcmcia: fix a potential null pointer dereference in __iodyn_find_io_region()

by Haoxiang Li

Add check for the return value of pcmcia_make_resource() to prevent null pointer dereference. Fixes: 49b1153adfe1 ("pcmcia: move all pcmcia_resource_ops providers into one module") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/pcmcia/rsrc_iodyn.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pcmcia/rsrc_iodyn.c b/drivers/pcmcia/rsrc_iodyn.c index b04b16496b0c..2677b577c1f8 100644 --- a/drivers/pcmcia/rsrc_iodyn.c +++ b/drivers/pcmcia/rsrc_iodyn.c @@ -62,6 +62,9 @@ static struct resource *__iodyn_find_io_region(struct pcmcia_socket *s, unsigned long min = base; int ret; + if (!res) + return NULL; + data.mask = align - 1; data.offset = base & data.mask; -- 2.25.1

3 months, 2 weeks

1
0
0 0

[PATCH 2/3] mm/memblock: repeat setting reserved region nid if array is doubled

by Wei Yang

Commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") introduce a way to set nid to all reserved region. But there is a corner case it will leave some region with invalid nid. When memblock_set_node() doubles the array of memblock.reserved, it may lead to a new reserved region before current position. The new region will be left with an invalid node id. Repeat the process when detecting it. Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com> CC: Mike Rapoport <rppt(a)kernel.org> CC: Yajun Deng <yajun.deng(a)linux.dev> CC: <stable(a)vger.kernel.org> --- mm/memblock.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/memblock.c b/mm/memblock.c index 85442f1b7f14..302dd7bc622d 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -2184,7 +2184,10 @@ static void __init memmap_init_reserved_pages(void) * set nid on all reserved pages and also treat struct * pages for the NOMAP regions as PageReserved */ +repeat: for_each_mem_region(region) { + unsigned long max = memblock.reserved.max; + nid = memblock_get_region_node(region); start = region->base; end = start + region->size; @@ -2193,6 +2196,15 @@ static void __init memmap_init_reserved_pages(void) reserve_bootmem_region(start, end, nid); memblock_set_node(start, region->size, &memblock.reserved, nid); + + /* + * 'max' is changed means memblock.reserved has been doubled + * its array, which may result a new reserved region before + * current 'start'. Now we should repeat the procedure to set + * its node id. + */ + if (max != memblock.reserved.max) + goto repeat; } /* -- 2.34.1

3 months, 2 weeks

2
4
0 0

[PATCH] m68k: Fix lost column on framebuffer debug console

by Finn Thain

When long lines are sent to the debug console on the framebuffer, the right-most column is lost. Fix this by subtracting 1 from the column count before comparing it with console_struct_cur_column, as the latter counts from zero. Linewrap is handled with a recursive call to console_putc, but this alters the console_struct_cur_row global. Store the old value before calling console_putc, so the right-most character gets rendered on the correct line. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable(a)vger.kernel.org Signed-off-by: Finn Thain <fthain(a)linux-m68k.org> --- arch/m68k/kernel/head.S | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/m68k/kernel/head.S b/arch/m68k/kernel/head.S index 852255cf60de..9c60047764d0 100644 --- a/arch/m68k/kernel/head.S +++ b/arch/m68k/kernel/head.S @@ -3583,11 +3583,16 @@ L(console_not_home): movel %a0@(Lconsole_struct_cur_column),%d0 addql #1,%a0@(Lconsole_struct_cur_column) movel %a0@(Lconsole_struct_num_columns),%d1 + subil #1,%d1 cmpl %d1,%d0 jcs 1f - console_putc #'\n' /* recursion is OK! */ + /* recursion will alter console_struct so load d1 register first */ + movel %a0@(Lconsole_struct_cur_row),%d1 + console_putc #'\n' + jmp 2f 1: movel %a0@(Lconsole_struct_cur_row),%d1 +2: /* * At this point we make a shift in register usage -- 2.45.3

3 months, 2 weeks

1
1
0 0

[PATCH net v2 7/7] net: dsa: mv88e6xxx: workaround RGMII transmit delay erratum for 6320 family

by Marek Behún

Implement the workaround for erratum 3.3 RGMII timing may be out of spec when transmit delay is enabled for the 6320 family, which says: When transmit delay is enabled via Port register 1 bit 14 = 1, duty cycle may be out of spec. Under very rare conditions this may cause the attached device receive CRC errors. Signed-off-by: Marek Behún <kabel(a)kernel.org> Cc: <stable(a)vger.kernel.org> # 5.4.x --- drivers/net/dsa/mv88e6xxx/chip.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 88f479dc328c..901929f96b38 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3674,6 +3674,21 @@ static int mv88e6xxx_stats_setup(struct mv88e6xxx_chip *chip) return mv88e6xxx_g1_stats_clear(chip); } +static int mv88e6320_setup_errata(struct mv88e6xxx_chip *chip) +{ + u16 dummy; + int err; + + /* Workaround for erratum + * 3.3 RGMII timing may be out of spec when transmit delay is enabled + */ + err = mv88e6xxx_port_hidden_write(chip, 0, 0xf, 0x7, 0xe000); + if (err) + return err; + + return mv88e6xxx_port_hidden_read(chip, 0, 0xf, 0x7, &dummy); +} + /* Check if the errata has already been applied. */ static bool mv88e6390_setup_errata_applied(struct mv88e6xxx_chip *chip) { @@ -5130,6 +5145,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = { static const struct mv88e6xxx_ops mv88e6320_ops = { /* MV88E6XXX_FAMILY_6320 */ + .setup_errata = mv88e6320_setup_errata, .ieee_pri_map = mv88e6085_g1_ieee_pri_map, .ip_pri_map = mv88e6085_g1_ip_pri_map, .irl_init_all = mv88e6352_g2_irl_init_all, @@ -5182,6 +5198,7 @@ static const struct mv88e6xxx_ops mv88e6320_ops = { static const struct mv88e6xxx_ops mv88e6321_ops = { /* MV88E6XXX_FAMILY_6320 */ + .setup_errata = mv88e6320_setup_errata, .ieee_pri_map = mv88e6085_g1_ieee_pri_map, .ip_pri_map = mv88e6085_g1_ip_pri_map, .irl_init_all = mv88e6352_g2_irl_init_all, -- 2.48.1

3 months, 2 weeks

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror