This is the start of the stable review cycle for the 6.4.3 release. There are 8 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Tue, 11 Jul 2023 11:13:33 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.3-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.4.3-rc1
Suren Baghdasaryan surenb@google.com fork: lock VMAs of the parent process when forking, again
Suren Baghdasaryan surenb@google.com fork: lock VMAs of the parent process when forking
Liu Shixin liushixin2@huawei.com bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
Peter Collingbourne pcc@google.com mm: call arch_swap_restore() from do_swap_page()
Hugh Dickins hughd@google.com mm: lock newly mapped VMA with corrected ordering
Suren Baghdasaryan surenb@google.com mm: lock newly mapped VMA which can be modified after it becomes visible
Suren Baghdasaryan surenb@google.com mm: lock a vma before stack expansion
Suren Baghdasaryan surenb@google.com mm: disable CONFIG_PER_VMA_LOCK until its fixed
-------------
Diffstat:
Makefile | 4 ++-- include/linux/bootmem_info.h | 2 ++ kernel/fork.c | 7 +++++++ mm/Kconfig | 3 ++- mm/memory.c | 7 +++++++ mm/mmap.c | 6 ++++++ 6 files changed, 26 insertions(+), 3 deletions(-)
From: Suren Baghdasaryan surenb@google.com
commit f96c48670319d685d18d50819ed0c1ef751ed2ac upstream.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Disable per-VMA locks config to prevent this issue until the fix is confirmed. This is expected to be a temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624 [2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-3-surenb@google.com Reported-by: Jiri Slaby jirislaby@kernel.org Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Jacob Young jacobly.alt@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624 Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Signed-off-by: Suren Baghdasaryan surenb@google.com Cc: David Hildenbrand david@redhat.com Cc: Holger Hoffstätte holger@applied-asynchrony.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/Kconfig +++ b/mm/Kconfig @@ -1198,8 +1198,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK def_bool n
config PER_VMA_LOCK - def_bool y + bool "Enable per-vma locking during page fault handling." depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP + depends on BROKEN help Allow per-vma locking during page fault handling.
From: Suren Baghdasaryan surenb@google.com
commit c137381f71aec755fbf47cd4e9bd4dce752c054c upstream.
With recent changes necessitating mmap_lock to be held for write while expanding a stack, per-VMA locks should follow the same rules and be write-locked to prevent page faults into the VMA being expanded. Add the necessary locking.
Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan surenb@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -1975,6 +1975,8 @@ static int expand_upwards(struct vm_area return -ENOMEM; }
+ /* Lock the VMA before expanding to prevent concurrent page faults */ + vma_start_write(vma); /* * vma->vm_start/vm_end cannot change under us because the caller * is required to hold the mmap_lock in read mode. We need the @@ -2062,6 +2064,8 @@ int expand_downwards(struct vm_area_stru return -ENOMEM; }
+ /* Lock the VMA before expanding to prevent concurrent page faults */ + vma_start_write(vma); /* * vma->vm_start/vm_end cannot change under us because the caller * is required to hold the mmap_lock in read mode. We need the
From: Suren Baghdasaryan surenb@google.com
commit 33313a747e81af9f31d0d45de78c9397fa3655eb upstream.
mmap_region adds a newly created VMA into VMA tree and might modify it afterwards before dropping the mmap_lock. This poses a problem for page faults handled under per-VMA locks because they don't take the mmap_lock and can stumble on this VMA while it's still being modified. Currently this does not pose a problem since post-addition modifications are done only for file-backed VMAs, which are not handled under per-VMA lock. However, once support for handling file-backed page faults with per-VMA locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree. Other places where a new VMA is added into VMA tree do not modify it after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan surenb@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mmap.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -2804,6 +2804,8 @@ cannot_expand: if (vma->vm_file) i_mmap_lock_write(vma->vm_file->f_mapping);
+ /* Lock the VMA since it is modified after insertion into VMA tree */ + vma_start_write(vma); vma_iter_store(&vmi, vma); mm->map_count++; if (vma->vm_file) {
From: Hugh Dickins hughd@google.com
commit 1c7873e3364570ec89343ff4877e0f27a7b21a61 upstream.
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f but task is already holding lock: (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible") Cc: stable@vger.kernel.org Signed-off-by: Hugh Dickins hughd@google.com Tested-by: Suren Baghdasaryan surenb@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -2801,11 +2801,11 @@ cannot_expand: if (vma_iter_prealloc(&vmi)) goto close_and_free_vma;
+ /* Lock the VMA since it is modified after insertion into VMA tree */ + vma_start_write(vma); if (vma->vm_file) i_mmap_lock_write(vma->vm_file->f_mapping);
- /* Lock the VMA since it is modified after insertion into VMA tree */ - vma_start_write(vma); vma_iter_store(&vmi, vma); mm->map_count++; if (vma->vm_file) {
From: Peter Collingbourne pcc@google.com
commit 6dca4ac6fc91fd41ea4d6c4511838d37f4e0eab2 upstream.
Commit c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved the call to swap_free() before the call to set_pte_at(), which meant that the MTE tags could end up being freed before set_pte_at() had a chance to restore them. Fix it by adding a call to the arch_swap_restore() hook before the call to swap_free().
Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510... Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") Signed-off-by: Peter Collingbourne pcc@google.com Reported-by: Qun-wei Lin Qun-wei.Lin@mediatek.com Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@m... Acked-by: David Hildenbrand david@redhat.com Acked-by: "Huang, Ying" ying.huang@intel.com Reviewed-by: Steven Price steven.price@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Cc: stable@vger.kernel.org [6.1+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/mm/memory.c +++ b/mm/memory.c @@ -3933,6 +3933,13 @@ vm_fault_t do_swap_page(struct vm_fault }
/* + * Some architectures may have to restore extra metadata to the page + * when reading from swap. This metadata may be indexed by swap entry + * so this must be called before swap_free(). + */ + arch_swap_restore(entry, folio); + + /* * Remove the swap entry and conditionally try to free up the swapcache. * We're already holding a reference on the page but haven't mapped it * yet.
From: Liu Shixin liushixin2@huawei.com
commit 028725e73375a1ff080bbdf9fb503306d0116f28 upstream.
commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem") fix an overlaps existing problem of kmemleak. But the problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in this case, free_bootmem_page() will call free_reserved_page() directly.
Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when HAVE_BOOTMEM_INFO_NODE is disabled.
Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Liu Shixin liushixin2@huawei.com Acked-by: Muchun Song songmuchun@bytedance.com Cc: Matthew Wilcox willy@infradead.org Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bootmem_info.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/include/linux/bootmem_info.h +++ b/include/linux/bootmem_info.h @@ -3,6 +3,7 @@ #define __LINUX_BOOTMEM_INFO_H
#include <linux/mm.h> +#include <linux/kmemleak.h>
/* * Types for free bootmem stored in page->lru.next. These have to be in @@ -59,6 +60,7 @@ static inline void get_page_bootmem(unsi
static inline void free_bootmem_page(struct page *page) { + kmemleak_free_part(page_to_virt(page), PAGE_SIZE); free_reserved_page(page); } #endif
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork. I tested the fix locally using the reproducer and was unable to reproduce the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic.
Patch 2/2 disables per-VMA locks until the fix is tested and verified.
This patch (of 2):
When forking a child process, parent write-protects an anonymous page and COW-shares it with the child being forked using copy_present_pte(). Parent's TLB is flushed right before we drop the parent's mmap_lock in dup_mmap(). If we get a write-fault before that TLB flush in the parent, and we end up replacing that anonymous page in the parent process in do_wp_page() (because, COW-shared with the child), this might lead to some stale writable TLB entries targeting the wrong (old) page. Similar issue happened in the past with userfaultfd (see flush_tlb_page() call inside do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents concurrent page faults during fork operation and avoids this issue. This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic.
Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Signed-off-by: Suren Baghdasaryan surenb@google.com Suggested-by: David Hildenbrand david@redhat.com Reported-by: Jiri Slaby jirislaby@kernel.org Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Holger Hoffstätte holger@applied-asynchrony.com Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asy... Reported-by: Jacob Young jacobly.alt@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624 Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com Acked-by: David Hildenbrand david@redhat.com Tested-by: Holger Hoffsttte holger@applied-asynchrony.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK + /* Disallow any page faults before calling flush_cache_dup_mm */ + for_each_vma(old_vmi, mpnt) + vma_start_write(mpnt); + vma_iter_set(&old_vmi, 0); +#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
On 09.07.23 13:14, Greg Kroah-Hartman wrote:
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process. [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK
- /* Disallow any page faults before calling flush_cache_dup_mm */
- for_each_vma(old_vmi, mpnt)
vma_start_write(mpnt);
- vma_iter_set(&old_vmi, 0);
+#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
What am I missing?
Ciao, Thorsten (who noticed this just by chance)
On Sun, Jul 09, 2023 at 02:39:00PM +0200, Thorsten Leemhuis wrote:
On 09.07.23 13:14, Greg Kroah-Hartman wrote:
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process. [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK
- /* Disallow any page faults before calling flush_cache_dup_mm */
- for_each_vma(old_vmi, mpnt)
vma_start_write(mpnt);
- vma_iter_set(&old_vmi, 0);
+#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
Look at 946c6b59c56d ("Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Where Linus manually dropped those #ifdefs.
Hm, I'll leave them for now in 6.4.y as that is "safer", but if Suren feels comfortable, I'll gladly take a patch from him to drop them in the 6.4.y tree as well.
thanks,
greg k-h
On Sun, Jul 9, 2023 at 6:32 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 02:39:00PM +0200, Thorsten Leemhuis wrote:
On 09.07.23 13:14, Greg Kroah-Hartman wrote:
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process. [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK
- /* Disallow any page faults before calling flush_cache_dup_mm */
- for_each_vma(old_vmi, mpnt)
vma_start_write(mpnt);
- vma_iter_set(&old_vmi, 0);
+#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
Look at 946c6b59c56d ("Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Where Linus manually dropped those #ifdefs.
Hm, I'll leave them for now in 6.4.y as that is "safer", but if Suren feels comfortable, I'll gladly take a patch from him to drop them in the 6.4.y tree as well.
Hi Greg, Give me a couple hours to get back to my computer. Linus took a different version of this patch and changed the description quite a bit. Once I'm home I can send you the patchset that was merged into his tree. Also let me know if you want to disable CONFIG_PER_VMA_LOCK in the stable branch (the patch called "[PATCH 6.4 1/8] mm: disable CONFIG_PER_VMA_LOCK until its fixed" which Linus did not take AFAIKT). Thanks, Suren.
thanks,
greg k-h
On Sun, Jul 09, 2023 at 09:04:09AM -0700, Suren Baghdasaryan wrote:
On Sun, Jul 9, 2023 at 6:32 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 02:39:00PM +0200, Thorsten Leemhuis wrote:
On 09.07.23 13:14, Greg Kroah-Hartman wrote:
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process. [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK
- /* Disallow any page faults before calling flush_cache_dup_mm */
- for_each_vma(old_vmi, mpnt)
vma_start_write(mpnt);
- vma_iter_set(&old_vmi, 0);
+#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
Look at 946c6b59c56d ("Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Where Linus manually dropped those #ifdefs.
Hm, I'll leave them for now in 6.4.y as that is "safer", but if Suren feels comfortable, I'll gladly take a patch from him to drop them in the 6.4.y tree as well.
Hi Greg, Give me a couple hours to get back to my computer. Linus took a different version of this patch and changed the description quite a bit. Once I'm home I can send you the patchset that was merged into his tree. Also let me know if you want to disable CONFIG_PER_VMA_LOCK in the stable branch (the patch called "[PATCH 6.4 1/8] mm: disable CONFIG_PER_VMA_LOCK until its fixed" which Linus did not take AFAIKT).
No rush, you can do this on Monday.
I took the patches that Linus added to his tree already into the stable 6.4.y tree, and it's in the -rc release I pushed out a few hours ago.
So if you want to look at the -rc release, that would be great, the full list of patches can be seen here: https://lore.kernel.org/r/20230709111345.297026264@linuxfoundation.org
thanks,
greg k-h
On Sun, Jul 9, 2023 at 7:48 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 09:04:09AM -0700, Suren Baghdasaryan wrote:
On Sun, Jul 9, 2023 at 6:32 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 02:39:00PM +0200, Thorsten Leemhuis wrote:
On 09.07.23 13:14, Greg Kroah-Hartman wrote:
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process. [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK
- /* Disallow any page faults before calling flush_cache_dup_mm */
- for_each_vma(old_vmi, mpnt)
vma_start_write(mpnt);
- vma_iter_set(&old_vmi, 0);
+#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
Look at 946c6b59c56d ("Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Where Linus manually dropped those #ifdefs.
Hm, I'll leave them for now in 6.4.y as that is "safer", but if Suren feels comfortable, I'll gladly take a patch from him to drop them in the 6.4.y tree as well.
Hi Greg, Give me a couple hours to get back to my computer. Linus took a different version of this patch and changed the description quite a bit. Once I'm home I can send you the patchset that was merged into his tree. Also let me know if you want to disable CONFIG_PER_VMA_LOCK in the stable branch (the patch called "[PATCH 6.4 1/8] mm: disable CONFIG_PER_VMA_LOCK until its fixed" which Linus did not take AFAIKT).
No rush, you can do this on Monday.
I took the patches that Linus added to his tree already into the stable 6.4.y tree, and it's in the -rc release I pushed out a few hours ago.
I just checked your stable master branch and it's perfectly in sync with Linus' tree.
So if you want to look at the -rc release, that would be great, the full list of patches can be seen here: https://lore.kernel.org/r/20230709111345.297026264@linuxfoundation.org
Let me sync git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git to see what's happening there. Thanks, Suren.
thanks,
greg k-h
On Sun, Jul 9, 2023 at 7:53 PM Suren Baghdasaryan surenb@google.com wrote:
On Sun, Jul 9, 2023 at 7:48 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 09:04:09AM -0700, Suren Baghdasaryan wrote:
On Sun, Jul 9, 2023 at 6:32 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 02:39:00PM +0200, Thorsten Leemhuis wrote:
On 09.07.23 13:14, Greg Kroah-Hartman wrote:
From: Suren Baghdasaryan surenb@google.com
commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream.
Patch series "Avoid memory corruption caused by per-VMA locks", v4.
A memory corruption was reported in [1] with bisection pointing to the patch [2] enabling per-VMA locks for x86. Based on the reproducer provided in [1] we suspect this is caused by the lack of VMA locking while forking a child process. [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
kernel/fork.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str retval = -EINTR; goto fail_uprobe_end; } +#ifdef CONFIG_PER_VMA_LOCK
- /* Disallow any page faults before calling flush_cache_dup_mm */
- for_each_vma(old_vmi, mpnt)
vma_start_write(mpnt);
- vma_iter_set(&old_vmi, 0);
+#endif flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
Look at 946c6b59c56d ("Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Where Linus manually dropped those #ifdefs.
Hm, I'll leave them for now in 6.4.y as that is "safer", but if Suren feels comfortable, I'll gladly take a patch from him to drop them in the 6.4.y tree as well.
Hi Greg, Give me a couple hours to get back to my computer. Linus took a different version of this patch and changed the description quite a bit. Once I'm home I can send you the patchset that was merged into his tree. Also let me know if you want to disable CONFIG_PER_VMA_LOCK in the stable branch (the patch called "[PATCH 6.4 1/8] mm: disable CONFIG_PER_VMA_LOCK until its fixed" which Linus did not take AFAIKT).
No rush, you can do this on Monday.
I took the patches that Linus added to his tree already into the stable 6.4.y tree, and it's in the -rc release I pushed out a few hours ago.
I just checked your stable master branch and it's perfectly in sync with Linus' tree.
So if you want to look at the -rc release, that would be great, the full list of patches can be seen here: https://lore.kernel.org/r/20230709111345.297026264@linuxfoundation.org
Let me sync git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git to see what's happening there.
Ok, I'm looking at the linux-6.4.y branch in git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
This patch is not needed (obsolete): 32d458fa68fe ("fork: lock VMAs of the parent process when forking")
Patch fb49c455323f ("fork: lock VMAs of the parent process when forking, again") can be renamed into "fork: lock VMAs of the parent process when forking" as its original.
Patch 11eaf9aa0699 ("mm: disable CONFIG_PER_VMA_LOCK until its fixed") was removed in Linus' tree, see comment in 946c6b59c56dc6e7d8364a8959cb36bf6d10bc37 saying: "The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since it was all hopefully fixed in mainline.". Unless you want to keep CONFIG_PER_VMA_LOCK disabled in the stable tree, that patch should also be dropped.
Everything else LGTM. Thanks, Suren.
Thanks, Suren.
thanks,
greg k-h
On Sun, Jul 09, 2023 at 08:24:29PM +0000, Suren Baghdasaryan wrote:
On Sun, Jul 9, 2023 at 7:53 PM Suren Baghdasaryan surenb@google.com wrote:
On Sun, Jul 9, 2023 at 7:48 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 09:04:09AM -0700, Suren Baghdasaryan wrote:
On Sun, Jul 9, 2023 at 6:32 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Jul 09, 2023 at 02:39:00PM +0200, Thorsten Leemhuis wrote:
On 09.07.23 13:14, Greg Kroah-Hartman wrote: > From: Suren Baghdasaryan surenb@google.com > > commit 2b4f3b4987b56365b981f44a7e843efa5b6619b9 upstream. > > Patch series "Avoid memory corruption caused by per-VMA locks", v4. > > A memory corruption was reported in [1] with bisection pointing to the > patch [2] enabling per-VMA locks for x86. Based on the reproducer > provided in [1] we suspect this is caused by the lack of VMA locking while > forking a child process. > [...]
Question from someone that is neither a C nor a git expert -- and thus might say something totally stupid below (and thus maybe should not have sent this mail at all).
But I have to wonder: is adding this patch to stable necessary given patch 8/8?
FWIW, this change looks like this:
> --- > kernel/fork.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -662,6 +662,12 @@ static __latent_entropy int dup_mmap(str > retval = -EINTR; > goto fail_uprobe_end; > } > +#ifdef CONFIG_PER_VMA_LOCK > + /* Disallow any page faults before calling flush_cache_dup_mm */ > + for_each_vma(old_vmi, mpnt) > + vma_start_write(mpnt); > + vma_iter_set(&old_vmi, 0); > +#endif > flush_cache_dup_mm(oldmm); > uprobe_dup_mmap(oldmm, mm); > /*
But when I look at kernel/fork.c in mainline I can't see this bit. I also only see Linus' change (e.g. patch 8/8 in this series) when I look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kerne...
Look at 946c6b59c56d ("Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Where Linus manually dropped those #ifdefs.
Hm, I'll leave them for now in 6.4.y as that is "safer", but if Suren feels comfortable, I'll gladly take a patch from him to drop them in the 6.4.y tree as well.
Hi Greg, Give me a couple hours to get back to my computer. Linus took a different version of this patch and changed the description quite a bit. Once I'm home I can send you the patchset that was merged into his tree. Also let me know if you want to disable CONFIG_PER_VMA_LOCK in the stable branch (the patch called "[PATCH 6.4 1/8] mm: disable CONFIG_PER_VMA_LOCK until its fixed" which Linus did not take AFAIKT).
No rush, you can do this on Monday.
I took the patches that Linus added to his tree already into the stable 6.4.y tree, and it's in the -rc release I pushed out a few hours ago.
I just checked your stable master branch and it's perfectly in sync with Linus' tree.
So if you want to look at the -rc release, that would be great, the full list of patches can be seen here: https://lore.kernel.org/r/20230709111345.297026264@linuxfoundation.org
Let me sync git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git to see what's happening there.
Ok, I'm looking at the linux-6.4.y branch in git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
This patch is not needed (obsolete): 32d458fa68fe ("fork: lock VMAs of the parent process when forking")
Ok, I can drop that.
Patch fb49c455323f ("fork: lock VMAs of the parent process when forking, again") can be renamed into "fork: lock VMAs of the parent process when forking" as its original.
Yes, I had to rename it, git doesn't like patches with identical names.
Patch 11eaf9aa0699 ("mm: disable CONFIG_PER_VMA_LOCK until its fixed") was removed in Linus' tree, see comment in 946c6b59c56dc6e7d8364a8959cb36bf6d10bc37 saying: "The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since it was all hopefully fixed in mainline.". Unless you want to keep CONFIG_PER_VMA_LOCK disabled in the stable tree, that patch should also be dropped.
Ok, let me drop this too.
I'll push out a -rc2 with these changes, let me go work on it now...
thanks,
gre gk-h
From: Suren Baghdasaryan surenb@google.com
commit fb49c455323ff8319a123dd312be9082c49a23a5 upstream.
When forking a child process, the parent write-protects anonymous pages and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they are being processed, as we expect both the vma and the pte's behind it to be stable. For example, the anon_vma_fork() expects the parents vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the source vma, defeating the anon_vma_clone() that wasn't done because the parent vma originally didn't have an anon_vma, but we now might end up copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed exclusion with concurrent page faults. But now we need to do a vma_start_write() to make sure no concurrent faults happen on this vma while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~5% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand david@redhat.com Reported-by: Jiri Slaby jirislaby@kernel.org Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Holger Hoffstätte holger@applied-asynchrony.com Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asy... Reported-by: Jacob Young jacobly.alt@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624 Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan surenb@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/fork.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -696,6 +696,7 @@ static __latent_entropy int dup_mmap(str for_each_vma(old_vmi, mpnt) { struct file *file;
+ vma_start_write(mpnt); if (mpnt->vm_flags & VM_DONTCOPY) { vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt)); continue;
linux-stable-mirror@lists.linaro.org