The patch titled
Subject: kasan, slub: fix HW_TAGS zeroing with slub_debug
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-slub-fix-hw_tags-zeroing-with-slub_debug.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrey Konovalov <andreyknvl(a)google.com>
Subject: kasan, slub: fix HW_TAGS zeroing with slub_debug
Date: Wed, 5 Jul 2023 14:44:02 +0200
Commit 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") added precise kmalloc redzone poisoning to
the slub_debug functionality.
However, this commit didn't account for HW_TAGS KASAN fully initializing
the object via its built-in memory initialization feature. Even though
HW_TAGS KASAN memory initialization contains special memory initialization
handling for when slub_debug is enabled, it does not account for in-object
slub_debug redzones. As a result, HW_TAGS KASAN can overwrite these
redzones and cause false-positive slub_debug reports.
To fix the issue, avoid HW_TAGS KASAN memory initialization when
slub_debug is enabled altogether. Implement this by moving the
__slub_debug_enabled check to slab_post_alloc_hook. Common slab code
seems like a more appropriate place for a slub_debug check anyway.
Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.16885610…
Fixes: 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested")
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Reported-by: Will Deacon <will(a)kernel.org>
Acked-by: Marco Elver <elver(a)google.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: kasan-dev(a)googlegroups.com
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan.h | 12 ------------
mm/slab.h | 16 ++++++++++++++--
2 files changed, 14 insertions(+), 14 deletions(-)
--- a/mm/kasan/kasan.h~kasan-slub-fix-hw_tags-zeroing-with-slub_debug
+++ a/mm/kasan/kasan.h
@@ -466,18 +466,6 @@ static inline void kasan_unpoison(const
if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
return;
- /*
- * Explicitly initialize the memory with the precise object size to
- * avoid overwriting the slab redzone. This disables initialization in
- * the arch code and may thus lead to performance penalty. This penalty
- * does not affect production builds, as slab redzones are not enabled
- * there.
- */
- if (__slub_debug_enabled() &&
- init && ((unsigned long)size & KASAN_GRANULE_MASK)) {
- init = false;
- memzero_explicit((void *)addr, size);
- }
size = round_up(size, KASAN_GRANULE_SIZE);
hw_set_mem_tag_range((void *)addr, size, tag, init);
--- a/mm/slab.h~kasan-slub-fix-hw_tags-zeroing-with-slub_debug
+++ a/mm/slab.h
@@ -723,6 +723,7 @@ static inline void slab_post_alloc_hook(
unsigned int orig_size)
{
unsigned int zero_size = s->object_size;
+ bool kasan_init = init;
size_t i;
flags &= gfp_allowed_mask;
@@ -740,6 +741,17 @@ static inline void slab_post_alloc_hook(
zero_size = orig_size;
/*
+ * When slub_debug is enabled, avoid memory initialization integrated
+ * into KASAN and instead zero out the memory via the memset below with
+ * the proper size. Otherwise, KASAN might overwrite SLUB redzones and
+ * cause false-positive reports. This does not lead to a performance
+ * penalty on production builds, as slub_debug is not intended to be
+ * enabled there.
+ */
+ if (__slub_debug_enabled())
+ kasan_init = false;
+
+ /*
* As memory initialization might be integrated into KASAN,
* kasan_slab_alloc and initialization memset must be
* kept together to avoid discrepancies in behavior.
@@ -747,8 +759,8 @@ static inline void slab_post_alloc_hook(
* As p[i] might get tagged, memset and kmemleak hook come after KASAN.
*/
for (i = 0; i < size; i++) {
- p[i] = kasan_slab_alloc(s, p[i], flags, init);
- if (p[i] && init && !kasan_has_integrated_init())
+ p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init);
+ if (p[i] && init && (!kasan_init || !kasan_has_integrated_init()))
memset(p[i], 0, zero_size);
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
_
Patches currently in -mm which might be from andreyknvl(a)google.com are
kasan-fix-type-cast-in-memory_is_poisoned_n.patch
kasan-slub-fix-hw_tags-zeroing-with-slub_debug.patch
The patch titled
Subject: mm: disable CONFIG_PER_VMA_LOCK until its fixed
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-disable-config_per_vma_lock-until-its-fixed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm: disable CONFIG_PER_VMA_LOCK until its fixed
Date: Tue, 4 Jul 2023 23:37:11 -0700
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Disable per-VMA locks config to
prevent this issue while the problem is being investigated. This is
expected to be a temporary measure.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com
Link: https://lkml.kernel.org/r/20230705063711.2670599-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Chris Li <chriscli(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Hans de Goede <hdegoede(a)redhat.com>
Cc: Holger Hoffst��tte <holger(a)applied-asynchrony.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Michel Lespinasse <michel(a)lespinasse.org>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Minchan Kim <minchan(a)google.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <peterz(a)infradead.org>
Cc: Punit Agrawal <punit.agrawal(a)bytedance.com>
Cc: <regressions(a)lists.linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/Kconfig~mm-disable-config_per_vma_lock-until-its-fixed
+++ a/mm/Kconfig
@@ -1224,8 +1224,9 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
def_bool n
config PER_VMA_LOCK
- def_bool y
+ bool "Enable per-vma locking during page fault handling."
depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
+ depends on BROKEN
help
Allow per-vma locking during page fault handling.
_
Patches currently in -mm which might be from surenb(a)google.com are
fork-lock-vmas-of-the-parent-process-when-forking.patch
mm-disable-config_per_vma_lock-until-its-fixed.patch
swap-remove-remnants-of-polling-from-read_swap_cache_async.patch
mm-add-missing-vm_fault_result_trace-name-for-vm_fault_completed.patch
mm-drop-per-vma-lock-when-returning-vm_fault_retry-or-vm_fault_completed.patch
mm-change-folio_lock_or_retry-to-use-vm_fault-directly.patch
mm-handle-swap-page-faults-under-per-vma-lock.patch
mm-handle-userfaults-under-vma-lock.patch
mm-disable-config_per_vma_lock-by-default-until-its-fixed.patch
The patch titled
Subject: fork: lock VMAs of the parent process when forking
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fork-lock-vmas-of-the-parent-process-when-forking.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: fork: lock VMAs of the parent process when forking
Date: Tue, 4 Jul 2023 23:37:10 -0700
Patch series "Avoid memory corruption caused by per-VMA locks", v2.
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86. Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.
Patch 1/2 in the series implements proper VMA locking during fork. I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Patch 2/2 disabled per-VMA locks until the fix is tested and verified.
This patch (of 2):
When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page. Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue. This
fix can potentially regress some fork-heavy workloads. Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~5%
regression. If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations
are possible if this regression proves to be problematic.
Link: https://lkml.kernel.org/r/20230705063711.2670599-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230705063711.2670599-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffst��tte <holger(a)applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-as…
Reported-by: Jacob Young <jacobly.alt(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: <regressions(a)lists.linux.dev>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Chris Li <chriscli(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Hans de Goede <hdegoede(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Michel Lespinasse <michel(a)lespinasse.org>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Minchan Kim <minchan(a)google.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <peterz(a)infradead.org>
Cc: Punit Agrawal <punit.agrawal(a)bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
--- a/kernel/fork.c~fork-lock-vmas-of-the-parent-process-when-forking
+++ a/kernel/fork.c
@@ -686,6 +686,7 @@ static __latent_entropy int dup_mmap(str
for_each_vma(old_vmi, mpnt) {
struct file *file;
+ vma_start_write(mpnt);
if (mpnt->vm_flags & VM_DONTCOPY) {
vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
continue;
_
Patches currently in -mm which might be from surenb(a)google.com are
fork-lock-vmas-of-the-parent-process-when-forking.patch
mm-disable-config_per_vma_lock-until-its-fixed.patch
swap-remove-remnants-of-polling-from-read_swap_cache_async.patch
mm-add-missing-vm_fault_result_trace-name-for-vm_fault_completed.patch
mm-drop-per-vma-lock-when-returning-vm_fault_retry-or-vm_fault_completed.patch
mm-change-folio_lock_or_retry-to-use-vm_fault-directly.patch
mm-handle-swap-page-faults-under-per-vma-lock.patch
mm-handle-userfaults-under-vma-lock.patch
mm-disable-config_per_vma_lock-by-default-until-its-fixed.patch
The soundwire subsystem uses two completion structures that allow
drivers to wait for soundwire device to become enumerated on the bus and
initialised by their drivers, respectively.
The code implementing the signalling is currently broken as it does not
signal all current and future waiters and also uses the wrong
reinitialisation function, which can potentially lead to memory
corruption if there are still waiters on the queue.
Not signalling future waiters specifically breaks sound card probe
deferrals as codec drivers can not tell that the soundwire device is
already attached when being reprobed. Some codec runtime PM
implementations suffer from similar problems as waiting for enumeration
during resume can also timeout despite the device already having been
enumerated.
Fixes: fb9469e54fa7 ("soundwire: bus: fix race condition with enumeration_complete signaling")
Fixes: a90def068127 ("soundwire: bus: fix race condition with initialization_complete signaling")
Cc: stable(a)vger.kernel.org # 5.7
Cc: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Cc: Rander Wang <rander.wang(a)linux.intel.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/soundwire/bus.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
index 1ea6a64f8c4a..66e5dba919fa 100644
--- a/drivers/soundwire/bus.c
+++ b/drivers/soundwire/bus.c
@@ -908,8 +908,8 @@ static void sdw_modify_slave_status(struct sdw_slave *slave,
"initializing enumeration and init completion for Slave %d\n",
slave->dev_num);
- init_completion(&slave->enumeration_complete);
- init_completion(&slave->initialization_complete);
+ reinit_completion(&slave->enumeration_complete);
+ reinit_completion(&slave->initialization_complete);
} else if ((status == SDW_SLAVE_ATTACHED) &&
(slave->status == SDW_SLAVE_UNATTACHED)) {
@@ -917,7 +917,7 @@ static void sdw_modify_slave_status(struct sdw_slave *slave,
"signaling enumeration completion for Slave %d\n",
slave->dev_num);
- complete(&slave->enumeration_complete);
+ complete_all(&slave->enumeration_complete);
}
slave->status = status;
mutex_unlock(&bus->bus_lock);
@@ -1941,7 +1941,7 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
"signaling initialization completion for Slave %d\n",
slave->dev_num);
- complete(&slave->initialization_complete);
+ complete_all(&slave->initialization_complete);
/*
* If the manager became pm_runtime active, the peripherals will be
--
2.39.3