October 2024 - Linux-stable-mirror

[PATCH] mm: remove the newlines, which are added for unknown reasons and interfere with bug analysis

by Jeongjun Park

Looking at the source code links for mm/memory.c in the sample reports in the syzbot report links [1]. it looks like the line numbers are designated as lines that have been increased by 1. This may seem like a problem with syzkaller or the addr2line program that assigns the line numbers, but there is no problem with either of them. In the previous commit d61ea1cb0095 ("userfaultfd: UFFD_FEATURE_WP_ASYNC"), when modifying mm/memory.c, an unknown line break is added to the very first line of the file. However, the git.kernel.org site displays the source code with the added line break removed, so even though addr2line has assigned the correct line number, it looks like the line number has increased by 1. This may seem like a trivial thing, but I think it would be appropriate to remove all the newline characters added to the upstream and stable versions, as they are not only incorrect in terms of code style but also hinder bug analysis. [1] https://syzkaller.appspot.com/bug?extid=4145b11cdf925264bff4 https://syzkaller.appspot.com/bug?extid=fa43f1b63e3aa6f66329 https://syzkaller.appspot.com/bug?extid=890a1df7294175947697 Fixes: d61ea1cb0095 ("userfaultfd: UFFD_FEATURE_WP_ASYNC") Cc: stable(a)vger.kernel.org Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> --- mm/memory.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 2366578015ad..7dffe8749014 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1,4 +1,3 @@ - // SPDX-License-Identifier: GPL-2.0-only /* * linux/mm/memory.c --

9 months, 1 week

4
6
0 0

+ lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls has been added to the -mm mm-hotfixes-unstable branch. Its filename is lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Florian Westphal <fw(a)strlen.de> Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls Date: Mon, 7 Oct 2024 22:52:24 +0200 Ben Greear reports following splat: ------------[ cut here ]------------ net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat ... Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 codetag_unload_module+0x19b/0x2a0 ? codetag_load_module+0x80/0x80 nf_nat module exit calls kfree_rcu on those addresses, but the free operation is likely still pending by the time alloc_tag checks for leaks. Wait for outstanding kfree_rcu operations to complete before checking resolves this warning. Reproducer: unshare -n iptables-nft -t nat -A PREROUTING -p tcp grep nf_nat /proc/allocinfo # will list 4 allocations rmmod nft_chain_nat rmmod nf_nat # will WARN. Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de Fixes: a473573964e5 ("lib: code tagging module support") Signed-off-by: Florian Westphal <fw(a)strlen.de> Reported-by: Ben Greear <greearb(a)candelatech.com> Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candela… Cc: Uladzislau Rezki <urezki(a)gmail.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Kent Overstreet <kent.overstreet(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/codetag.c | 2 ++ 1 file changed, 2 insertions(+) --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls +++ a/lib/codetag.c @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module if (!mod) return true; + kvfree_rcu_barrier(); + mutex_lock(&codetag_lock); list_for_each_entry(cttype, &codetag_types, link) { struct codetag_module *found = NULL; _ Patches currently in -mm which might be from fw(a)strlen.de are lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch

9 months, 1 week

1
0
0 0

+ mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race Date: Mon, 07 Oct 2024 23:42:04 +0200 In mremap(), move_page_tables() looks at the type of the PMD entry and the specified address range to figure out by which method the next chunk of page table entries should be moved. At that point, the mmap_lock is held in write mode, but no rmap locks are held yet. For PMD entries that point to page tables and are fully covered by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called, which first takes rmap locks, then does move_normal_pmd(). move_normal_pmd() takes the necessary page table locks at source and destination, then moves an entire page table from the source to the destination. The problem is: The rmap locks, which protect against concurrent page table removal by retract_page_tables() in the THP code, are only taken after the PMD entry has been read and it has been decided how to move it. So we can race as follows (with two processes that have mappings of the same tmpfs file that is stored on a tmpfs mount with huge=advise); note that process A accesses page tables through the MM while process B does it through the file rmap: process A process B ========= ========= mremap mremap_to move_vma move_page_tables get_old_pmd alloc_new_pmd *** PREEMPT *** madvise(MADV_COLLAPSE) do_madvise madvise_walk_vmas madvise_vma_behavior madvise_collapse hpage_collapse_scan_file collapse_file retract_page_tables i_mmap_lock_read(mapping) pmdp_collapse_flush i_mmap_unlock_read(mapping) move_pgt_entry(NORMAL_PMD, ...) take_rmap_locks move_normal_pmd drop_rmap_locks When this happens, move_normal_pmd() can end up creating bogus PMD entries in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect depends on arch-specific and machine-specific details; on x86, you can end up with physical page 0 mapped as a page table, which is likely exploitable for user->kernel privilege escalation. Fix the race by letting process B recheck that the PMD still points to a page table after the rmap locks have been taken. Otherwise, we bail and let the caller fall back to the PTE-level copying path, which will then bail immediately at the pmd_none() check. Bug reachability: Reaching this bug requires that you can create shmem/file THP mappings - anonymous THP uses different code that doesn't zap stuff under rmap locks. File THP is gated on an experimental config flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need shmem THP to hit this bug. As far as I know, getting shmem THP normally requires that you can mount your own tmpfs with the right mount flags, which would require creating your own user+mount namespace; though I don't know if some distros maybe enable shmem THP by default or something like that. Bug impact: This issue can likely be used for user->kernel privilege escalation when it is reachable. Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5… Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock") Closes: https://project-zero.issues.chromium.org/371047675 Co-developed-by: David Hildenbrand <david(a)redhat.com> Signed-off-by: Jann Horn <jannh(a)google.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Joel Fernandes <joel(a)joelfernandes.org> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mremap.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) --- a/mm/mremap.c~mm-mremap-fix-move_normal_pmd-retract_page_tables-race +++ a/mm/mremap.c @@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_ar { spinlock_t *old_ptl, *new_ptl; struct mm_struct *mm = vma->vm_mm; + bool res = false; pmd_t pmd; if (!arch_supports_page_table_move()) @@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_ar if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); - /* Clear the pmd */ pmd = *old_pmd; + + /* Racing with collapse? */ + if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd))) + goto out_unlock; + /* Clear the pmd */ pmd_clear(old_pmd); + res = true; VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); +out_unlock: if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); - return true; + return res; } #else static inline bool move_normal_pmd(struct vm_area_struct *vma, _ Patches currently in -mm which might be from jannh(a)google.com are mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas.patch mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch

9 months, 1 week

1
0
0 0

[to-be-updated] mm-mremap-prevent-racing-change-of-old-pmd-type.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/mremap: prevent racing change of old pmd type has been removed from the -mm tree. Its filename was mm-mremap-prevent-racing-change-of-old-pmd-type.patch This patch was dropped because an updated version will be issued ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: mm/mremap: prevent racing change of old pmd type Date: Wed, 02 Oct 2024 23:07:06 +0200 Prevent move_normal_pmd() in mremap() from racing with retract_page_tables() in MADVISE_COLLAPSE such that pmd_populate(mm, new_pmd, pmd_pgtable(pmd)) operates on an empty source pmd, causing creation of a new pmd which maps physical address 0 as a page table. This bug is only reachable if either CONFIG_READ_ONLY_THP_FOR_FS is set or THP shmem is usable. (Unprivileged namespaces can be used to set up a tmpfs that can contain THP shmem pages with "huge=advise".) If userspace triggers this bug *in multiple processes*, this could likely be used to create stale TLB entries pointing to freed pages or cause kernel UAF by breaking an invariant the rmap code relies on. Fix it by moving the rmap locking up so that it covers the span from reading the PMD entry to moving the page table. Link: https://lkml.kernel.org/r/20241002-move_normal_pmd-vs-collapse-fix-v1-1-782… Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock") Signed-off-by: Jann Horn <jannh(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mremap.c | 68 +++++++++++++++++++++++++++----------------------- 1 file changed, 38 insertions(+), 30 deletions(-) --- a/mm/mremap.c~mm-mremap-prevent-racing-change-of-old-pmd-type +++ a/mm/mremap.c @@ -136,17 +136,17 @@ static pte_t move_soft_dirty_pte(pte_t p static int move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, unsigned long old_addr, unsigned long old_end, struct vm_area_struct *new_vma, pmd_t *new_pmd, - unsigned long new_addr, bool need_rmap_locks) + unsigned long new_addr) { struct mm_struct *mm = vma->vm_mm; pte_t *old_pte, *new_pte, pte; spinlock_t *old_ptl, *new_ptl; bool force_flush = false; unsigned long len = old_end - old_addr; - int err = 0; /* - * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma + * When need_rmap_locks is true in the caller, we are holding the + * i_mmap_rwsem and anon_vma * locks to ensure that rmap will always observe either the old or the * new ptes. This is the easiest way to avoid races with * truncate_pagecache(), page migration, etc... @@ -163,23 +163,18 @@ static int move_ptes(struct vm_area_stru * serialize access to individual ptes, but only rmap traversal * order guarantees that we won't miss both the old and new ptes). */ - if (need_rmap_locks) - take_rmap_locks(vma); /* * We don't have to worry about the ordering of src and dst * pte locks because exclusive mmap_lock prevents deadlock. */ old_pte = pte_offset_map_lock(mm, old_pmd, old_addr, &old_ptl); - if (!old_pte) { - err = -EAGAIN; - goto out; - } + if (!old_pte) + return -EAGAIN; new_pte = pte_offset_map_nolock(mm, new_pmd, new_addr, &new_ptl); if (!new_pte) { pte_unmap_unlock(old_pte, old_ptl); - err = -EAGAIN; - goto out; + return -EAGAIN; } if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); @@ -217,10 +212,7 @@ static int move_ptes(struct vm_area_stru spin_unlock(new_ptl); pte_unmap(new_pte - 1); pte_unmap_unlock(old_pte - 1, old_ptl); -out: - if (need_rmap_locks) - drop_rmap_locks(vma); - return err; + return 0; } #ifndef arch_supports_page_table_move @@ -447,17 +439,14 @@ static __always_inline unsigned long get /* * Attempts to speedup the move by moving entry at the level corresponding to * pgt_entry. Returns true if the move was successful, else false. + * rmap locks are held by the caller. */ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, - void *old_entry, void *new_entry, bool need_rmap_locks) + void *old_entry, void *new_entry) { bool moved = false; - /* See comment in move_ptes() */ - if (need_rmap_locks) - take_rmap_locks(vma); - switch (entry) { case NORMAL_PMD: moved = move_normal_pmd(vma, old_addr, new_addr, old_entry, @@ -483,9 +472,6 @@ static bool move_pgt_entry(enum pgt_entr break; } - if (need_rmap_locks) - drop_rmap_locks(vma); - return moved; } @@ -550,6 +536,7 @@ unsigned long move_page_tables(struct vm struct mmu_notifier_range range; pmd_t *old_pmd, *new_pmd; pud_t *old_pud, *new_pud; + int move_res; if (!len) return 0; @@ -573,6 +560,12 @@ unsigned long move_page_tables(struct vm old_addr, old_end); mmu_notifier_invalidate_range_start(&range); + /* + * Hold rmap locks to ensure the type of the old PUD/PMD entry doesn't + * change under us due to khugepaged or folio splitting. + */ + take_rmap_locks(vma); + for (; old_addr < old_end; old_addr += extent, new_addr += extent) { cond_resched(); /* @@ -590,14 +583,14 @@ unsigned long move_page_tables(struct vm if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { move_pgt_entry(HPAGE_PUD, vma, old_addr, new_addr, - old_pud, new_pud, need_rmap_locks); + old_pud, new_pud); /* We ignore and continue on error? */ continue; } } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) { if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr, - old_pud, new_pud, true)) + old_pud, new_pud)) continue; } @@ -613,7 +606,7 @@ again: pmd_devmap(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, - old_pmd, new_pmd, need_rmap_locks)) + old_pmd, new_pmd)) continue; split_huge_pmd(vma, old_pmd, old_addr); } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && @@ -623,17 +616,32 @@ again: * moving at the PMD level if possible. */ if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr, - old_pmd, new_pmd, true)) + old_pmd, new_pmd)) continue; } if (pmd_none(*old_pmd)) continue; - if (pte_alloc(new_vma->vm_mm, new_pmd)) + + /* + * Temporarily drop the rmap locks while we do a potentially + * slow move_ptes() operation, unless move_ptes() wants them + * held (see comment inside there). + */ + if (!need_rmap_locks) + drop_rmap_locks(vma); + if (pte_alloc(new_vma->vm_mm, new_pmd)) { + if (!need_rmap_locks) + take_rmap_locks(vma); break; - if (move_ptes(vma, old_pmd, old_addr, old_addr + extent, - new_vma, new_pmd, new_addr, need_rmap_locks) < 0) + } + move_res = move_ptes(vma, old_pmd, old_addr, old_addr + extent, + new_vma, new_pmd, new_addr); + if (!need_rmap_locks) + take_rmap_locks(vma); + if (move_res < 0) goto again; } + drop_rmap_locks(vma); mmu_notifier_invalidate_range_end(&range); _ Patches currently in -mm which might be from jannh(a)google.com are mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas.patch mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch

9 months, 1 week

1
0
0 0

[net PATCH v2] net: phy: Remove LED entry from LEDs list on unregister

by Christian Marangi

Commit c938ab4da0eb ("net: phy: Manual remove LEDs to ensure correct ordering") correctly fixed a problem with using devm_ but missed removing the LED entry from the LEDs list. This cause kernel panic on specific scenario where the port for the PHY is torn down and up and the kmod for the PHY is removed. On setting the port down the first time, the assosiacted LEDs are correctly unregistered. The associated kmod for the PHY is now removed. The kmod is now added again and the port is now put up, the associated LED are registered again. On putting the port down again for the second time after these step, the LED list now have 4 elements. With the first 2 already unregistered previously and the 2 new one registered again. This cause a kernel panic as the first 2 element should have been removed. Fix this by correctly removing the element when LED is unregistered. Reported-by: Daniel Golle <daniel(a)makrotopia.org> Tested-by: Daniel Golle <daniel(a)makrotopia.org> Cc: stable(a)vger.kernel.org Fixes: c938ab4da0eb ("net: phy: Manual remove LEDs to ensure correct ordering") Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com> Reviewed-by: Andrew Lunn <andrew(a)lunn.ch> --- Changes v2: - Drop second patch - Add Reviewed-by tag drivers/net/phy/phy_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 560e338b307a..499797646580 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -3326,10 +3326,11 @@ static __maybe_unused int phy_led_hw_is_supported(struct led_classdev *led_cdev, static void phy_leds_unregister(struct phy_device *phydev) { - struct phy_led *phyled; + struct phy_led *phyled, *tmp; - list_for_each_entry(phyled, &phydev->leds, list) { + list_for_each_entry_safe(phyled, tmp, &phydev->leds, list) { led_classdev_unregister(&phyled->led_cdev); + list_del(&phyled->list); } } -- 2.45.2

9 months, 1 week

2
1
0 0

+ mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: enforce a minimal stack gap even against inaccessible VMAs has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: mm: enforce a minimal stack gap even against inaccessible VMAs Date: Tue, 08 Oct 2024 00:55:39 +0200 As explained in the comment block this change adds, we can't tell what userspace's intent is when the stack grows towards an inaccessible VMA. I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc that mixes malloc(), pthread creation, and recursion in just the right way such that the main stack overflows into malloc() arena memory. I don't know of any specific scenario where this is actually exploitable, but it seems like it could be a security problem for sufficiently unlucky userspace. I believe we should ensure that, as long as code is compiled with something like -fstack-check, a stack overflow in it can never cause the main stack to overflow into adjacent heap memory. My fix effectively reverts the behavior for !vma_is_accessible() VMAs to the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas"), so I think it should be a fairly safe change even in case A. Link: https://lkml.kernel.org/r/20241008-stack-gap-inaccessible-v1-1-848d4d891f21… Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack") Signed-off-by: Jann Horn <jannh(a)google.com> Cc: Ben Hutchings <ben(a)decadent.org.uk> Cc: Helge Deller <deller(a)gmx.de> Cc: Hugh Dickins <hughd(a)google.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Rik van Riel <riel(a)redhat.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Willy Tarreau <w(a)1wt.eu> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmap.c | 53 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 46 insertions(+), 7 deletions(-) --- a/mm/mmap.c~mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas +++ a/mm/mmap.c @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area gap_addr = TASK_SIZE; next = find_vma_intersection(mm, vma->vm_end, gap_addr); - if (next && vma_is_accessible(next)) { - if (!(next->vm_flags & VM_GROWSUP)) + if (next && !(next->vm_flags & VM_GROWSUP)) { + /* see comments in expand_downwards() */ + if (vma_is_accessible(prev)) + return -ENOMEM; + if (address == next->vm_start) return -ENOMEM; - /* Check that both stack segments have the same anon_vma? */ } if (next) @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_stru /* Enforce stack_guard_gap */ prev = vma_prev(&vmi); /* Check that both stack segments have the same anon_vma? */ - if (prev) { - if (!(prev->vm_flags & VM_GROWSDOWN) && - vma_is_accessible(prev) && - (address - prev->vm_end < stack_guard_gap)) + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && + (address - prev->vm_end < stack_guard_gap)) { + /* + * If the previous VMA is accessible, this is the normal case + * where the main stack is growing down towards some unrelated + * VMA. Enforce the full stack guard gap. + */ + if (vma_is_accessible(prev)) + return -ENOMEM; + + /* + * If the previous VMA is not accessible, we have a problem: + * We can't tell what userspace's intent is. + * + * Case A: + * Maybe userspace wants to use the previous VMA as a + * "guard region" at the bottom of the main stack, in which case + * userspace wants us to grow the stack until it is adjacent to + * the guard region. Apparently some Java runtime environments + * and Rust do that? + * That is kind of ugly, and in that case userspace really ought + * to ensure that the stack is fully expanded immediately, but + * we have to handle this case. + * + * Case B: + * But maybe the previous VMA is entirely unrelated to the stack + * and is only *temporarily* PROT_NONE. For example, glibc + * malloc arenas create a big PROT_NONE region and then + * progressively mark parts of it as writable. + * In that case, we must not let the stack become adjacent to + * the previous VMA. Otherwise, after the region later becomes + * writable, a stack overflow will cause the stack to grow into + * the previous VMA, and we won't have any stack gap to protect + * against this. + * + * As an ugly tradeoff, enforce a single-page gap. + * A single page will hopefully be small enough to not be + * noticed in case A, while providing the same level of + * protection in case B that normal userspace threads get. + */ + if (address == prev->vm_end) return -ENOMEM; } _ Patches currently in -mm which might be from jannh(a)google.com are mm-mremap-prevent-racing-change-of-old-pmd-type.patch mm-enforce-a-minimal-stack-gap-even-against-inaccessible-vmas.patch

9 months, 1 week

1
0
0 0

[PATCH v2 1/8] net: explicitly clear the sk pointer, when pf->create fails

by Ignat Korchagin

We have recently noticed the exact same KASAN splat as in commit 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails"). The problem is that commit did not fully address the problem, as some pf->create implementations do not use sk_common_release in their error paths. For example, we can use the same reproducer as in the above commit, but changing ping to arping. arping uses AF_PACKET socket and if packet_create fails, it will just sk_free the allocated sk object. While we could chase all the pf->create implementations and make sure they NULL the freed sk object on error from the socket, we can't guarantee future protocols will not make the same mistake. So it is easier to just explicitly NULL the sk pointer upon return from pf->create in __sock_create. We do know that pf->create always releases the allocated sk object on error, so if the pointer is not NULL, it is definitely dangling. Fixes: 6cd4a78d962b ("net: do not leave a dangling sk pointer, when socket creation fails") Signed-off-by: Ignat Korchagin <ignat(a)cloudflare.com> Cc: stable(a)vger.kernel.org --- net/core/sock.c | 3 --- net/socket.c | 7 ++++++- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 039be95c40cf..e6e04081949c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3819,9 +3819,6 @@ void sk_common_release(struct sock *sk) sk->sk_prot->unhash(sk); - if (sk->sk_socket) - sk->sk_socket->sk = NULL; - /* * In this point socket cannot receive new packets, but it is possible * that some packets are in flight because some CPU runs receiver and diff --git a/net/socket.c b/net/socket.c index 601ad74930ef..042451f01c65 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1574,8 +1574,13 @@ int __sock_create(struct net *net, int family, int type, int protocol, rcu_read_unlock(); err = pf->create(net, sock, protocol, kern); - if (err < 0) + if (err < 0) { + /* ->create should release the allocated sock->sk object on error + * but it may leave the dangling pointer + */ + sock->sk = NULL; goto out_module_put; + } /* * Now to bump the refcnt of the [loadable] module that owns this -- 2.39.5

9 months, 1 week

3
2
0 0

FAILED: patch "[PATCH] nfsd: fix delegation_blocked() to block correctly for at" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 45bb63ed20e02ae146336412889fe5450316a84f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100728-graves-septic-4380@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: 45bb63ed20e0 ("nfsd: fix delegation_blocked() to block correctly for at least 30 seconds") b3f255ef6bff ("nfsd: use ktime_get_seconds() for timestamps") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 45bb63ed20e02ae146336412889fe5450316a84f Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb(a)suse.de> Date: Mon, 9 Sep 2024 15:06:36 +1000 Subject: [PATCH] nfsd: fix delegation_blocked() to block correctly for at least 30 seconds The pair of bloom filtered used by delegation_blocked() was intended to block delegations on given filehandles for between 30 and 60 seconds. A new filehandle would be recorded in the "new" bit set. That would then be switch to the "old" bit set between 0 and 30 seconds later, and it would remain as the "old" bit set for 30 seconds. Unfortunately the code intended to clear the old bit set once it reached 30 seconds old, preparing it to be the next new bit set, instead cleared the *new* bit set before switching it to be the old bit set. This means that the "old" bit set is always empty and delegations are blocked between 0 and 30 seconds. This patch updates bd->new before clearing the set with that index, instead of afterwards. Reported-by: Olga Kornievskaia <okorniev(a)redhat.com> Cc: stable(a)vger.kernel.org Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") Signed-off-by: NeilBrown <neilb(a)suse.de> Reviewed-by: Benjamin Coddington <bcodding(a)redhat.com> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index cb5a9ab451c5..ac1859c7cc9d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1078,7 +1078,8 @@ static void nfs4_free_deleg(struct nfs4_stid *stid) * When a delegation is recalled, the filehandle is stored in the "new" * filter. * Every 30 seconds we swap the filters and clear the "new" one, - * unless both are empty of course. + * unless both are empty of course. This results in delegations for a + * given filehandle being blocked for between 30 and 60 seconds. * * Each filter is 256 bits. We hash the filehandle to 32bit and use the * low 3 bytes as hash-table indices. @@ -1107,9 +1108,9 @@ static int delegation_blocked(struct knfsd_fh *fh) if (ktime_get_seconds() - bd->swap_time > 30) { bd->entries -= bd->old_entries; bd->old_entries = bd->entries; + bd->new = 1-bd->new; memset(bd->set[bd->new], 0, sizeof(bd->set[0])); - bd->new = 1-bd->new; bd->swap_time = ktime_get_seconds(); } spin_unlock(&blocked_delegations_lock);

9 months, 1 week

2
1
0 0

[PATCH stable 4.12] nfsd: fix delegation_blocked() to block correctly for at least 30 seconds

by NeilBrown

commit 45bb63ed20e02ae146336412889fe5450316a84f The pair of bloom filtered used by delegation_blocked() was intended to block delegations on given filehandles for between 30 and 60 seconds. A new filehandle would be recorded in the "new" bit set. That would then be switch to the "old" bit set between 0 and 30 seconds later, and it would remain as the "old" bit set for 30 seconds. Unfortunately the code intended to clear the old bit set once it reached 30 seconds old, preparing it to be the next new bit set, instead cleared the *new* bit set before switching it to be the old bit set. This means that the "old" bit set is always empty and delegations are blocked between 0 and 30 seconds. This patch updates bd->new before clearing the set with that index, instead of afterwards. Reported-by: Olga Kornievskaia <okorniev(a)redhat.com> Cc: stable(a)vger.kernel.org Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") Signed-off-by: NeilBrown <neilb(a)suse.de> Reviewed-by: Benjamin Coddington <bcodding(a)redhat.com> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> --- fs/nfsd/nfs4state.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 7ac644d64ab1..d45487d82d44 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -743,7 +743,8 @@ static void nfs4_free_deleg(struct nfs4_stid *stid) * When a delegation is recalled, the filehandle is stored in the "new" * filter. * Every 30 seconds we swap the filters and clear the "new" one, - * unless both are empty of course. + * unless both are empty of course. This results in delegations for a + * given filehandle being blocked for between 30 and 60 seconds. * * Each filter is 256 bits. We hash the filehandle to 32bit and use the * low 3 bytes as hash-table indices. @@ -772,9 +773,9 @@ static int delegation_blocked(struct knfsd_fh *fh) if (seconds_since_boot() - bd->swap_time > 30) { bd->entries -= bd->old_entries; bd->old_entries = bd->entries; + bd->new = 1-bd->new; memset(bd->set[bd->new], 0, sizeof(bd->set[0])); - bd->new = 1-bd->new; bd->swap_time = seconds_since_boot(); } spin_unlock(&blocked_delegations_lock); base-commit: de2cffe297563c815c840cfa14b77a0868b61e53 -- 2.46.0

9 months, 1 week

1
0
0 0

[PATCH v5 1/5] tpm: Return on tpm2_create_null_primary() failure

by Jarkko Sakkinen

tpm2_sessions_init() does not ignores the result of tpm2_create_null_primary(). Address this by returning -ENODEV to the caller. Cc: stable(a)vger.kernel.org # v6.10+ Fixes: d2add27cf2b8 ("tpm: Add NULL primary creation") Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org> --- v5: - Do not print klog messages on error, as tpm2_save_context() already takes care of this. v4: - Fixed up stable version. v3: - Handle TPM and POSIX error separately and return -ENODEV always back to the caller. v2: - Refined the commit message. --- drivers/char/tpm/tpm2-sessions.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index d3521aadd43e..0f09ac33ae99 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -1338,7 +1338,8 @@ static int tpm2_create_null_primary(struct tpm_chip *chip) tpm2_flush_context(chip, null_key); } - return rc; + /* Map all errors to -ENODEV: */ + return rc ? -ENODEV : rc; } /** @@ -1354,7 +1355,7 @@ int tpm2_sessions_init(struct tpm_chip *chip) rc = tpm2_create_null_primary(chip); if (rc) - dev_err(&chip->dev, "TPM: security failed (NULL seed derivation): %d\n", rc); + return rc; chip->auth = kmalloc(sizeof(*chip->auth), GFP_KERNEL); if (!chip->auth) -- 2.46.1

9 months, 1 week

2
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror October 2024