[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..34315d09b544 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Chen Linxuan: backport same logic without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
v1 -> v2: use vma_is_secretmem() instead of directly checking
vma->vm_file->f_op == &secretmem_fops
v2 -> v3: keep original comment
---
lib/buildid.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..8d839ff5548e 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma_is_secretmem(vma))
+ return -EFAULT;
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: Waiman Long <longman(a)redhat.com>
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d(a)syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 9ee381e4d2a4d..a26c915430ba0 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
#include <linux/ftrace.h>
@@ -37,7 +38,7 @@ static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
/**
* down - acquire the semaphore
@@ -182,13 +183,16 @@ EXPORT_SYMBOL(down_timeout);
void up(struct semaphore *sem)
{
unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
- __up(sem);
+ __up(sem, &wake_q);
raw_spin_unlock_irqrestore(&sem->lock, flags);
+ if (!wake_q_empty(&wake_q))
+ wake_up_q(&wake_q);
}
EXPORT_SYMBOL(up);
@@ -256,11 +260,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
}
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+ struct wake_q_head *wake_q)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
struct semaphore_waiter, list);
list_del(&waiter->list);
waiter->up = true;
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}
--
2.39.5
From: Waiman Long <longman(a)redhat.com>
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]
A circular lock dependency splat has been seen involving down_trylock():
======================================================
WARNING: possible circular locking dependency detected
6.12.0-41.el10.s390x+debug
------------------------------------------------------
dd/32479 is trying to acquire lock:
0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
but task is already holding lock:
000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
the existing dependency chain (in reverse order) is:
-> #4 (&zone->lock){-.-.}-{2:2}:
-> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
-> #2 (&rq->__lock){-.-.}-{2:2}:
-> #1 (&p->pi_lock){-.-.}-{2:2}:
-> #0 ((console_sem).lock){-.-.}-{2:2}:
The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.
The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.
-> #4 (&zone->lock){-.-.}-{2:2}:
__lock_acquire+0xe86/0x1cc0
lock_acquire.part.0+0x258/0x630
lock_acquire+0xb8/0xe0
_raw_spin_lock_irqsave+0xb4/0x120
rmqueue_bulk+0xac/0x8f0
__rmqueue_pcplist+0x580/0x830
rmqueue_pcplist+0xfc/0x470
rmqueue.isra.0+0xdec/0x11b0
get_page_from_freelist+0x2ee/0xeb0
__alloc_pages_noprof+0x2c2/0x520
alloc_pages_mpol_noprof+0x1fc/0x4d0
alloc_pages_noprof+0x8c/0xe0
allocate_slab+0x320/0x460
___slab_alloc+0xa58/0x12b0
__slab_alloc.isra.0+0x42/0x60
kmem_cache_alloc_noprof+0x304/0x350
fill_pool+0xf6/0x450
debug_object_activate+0xfe/0x360
enqueue_hrtimer+0x34/0x190
__run_hrtimer+0x3c8/0x4c0
__hrtimer_run_queues+0x1b2/0x260
hrtimer_interrupt+0x316/0x760
do_IRQ+0x9a/0xe0
do_irq_async+0xf6/0x160
Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.
Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.
Reported-by: yzbot+ed801a886dfdbfe7136d(a)syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 34bfae72f2952..de9117c0e671e 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
#include <linux/semaphore.h>
#include <linux/spinlock.h>
#include <linux/ftrace.h>
@@ -38,7 +39,7 @@ static noinline void __down(struct semaphore *sem);
static noinline int __down_interruptible(struct semaphore *sem);
static noinline int __down_killable(struct semaphore *sem);
static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
/**
* down - acquire the semaphore
@@ -183,13 +184,16 @@ EXPORT_SYMBOL(down_timeout);
void __sched up(struct semaphore *sem)
{
unsigned long flags;
+ DEFINE_WAKE_Q(wake_q);
raw_spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
- __up(sem);
+ __up(sem, &wake_q);
raw_spin_unlock_irqrestore(&sem->lock, flags);
+ if (!wake_q_empty(&wake_q))
+ wake_up_q(&wake_q);
}
EXPORT_SYMBOL(up);
@@ -269,11 +273,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
}
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+ struct wake_q_head *wake_q)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
struct semaphore_waiter, list);
list_del(&waiter->list);
waiter->up = true;
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}
--
2.39.5
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031621-july-parkway-796f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619 Mon Sep 17 00:00:00 2001
From: Matthew Maurer <mmaurer(a)google.com>
Date: Wed, 8 Jan 2025 23:35:08 +0000
Subject: [PATCH] rust: Disallow BTF generation with Rust + LTO
The kernel cannot currently self-parse BTF containing Rust debug
information. pahole uses the language of the CU to determine whether to
filter out debug information when generating the BTF. When LTO is
enabled, Rust code can cross CU boundaries, resulting in Rust debug
information in CUs labeled as C. This results in a system which cannot
parse its own BTF.
Signed-off-by: Matthew Maurer <mmaurer(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole")
Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@…
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..324c2886b2ea 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1973,7 +1973,7 @@ config RUST
depends on !MODVERSIONS || GENDWARFKSYMS
depends on !GCC_PLUGIN_RANDSTRUCT
depends on !RANDSTRUCT
- depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
+ depends on !DEBUG_INFO_BTF || (PAHOLE_HAS_LANG_EXCLUDE && !LTO)
depends on !CFI_CLANG || HAVE_CFI_ICALL_NORMALIZE_INTEGERS_RUSTC
select CFI_ICALL_NORMALIZE_INTEGERS if CFI_CLANG
depends on !CALL_PADDING || RUSTC_VERSION >= 108100
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031622-rocker-narrow-b1e3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5daa0c35a1f0e7a6c3b8ba9cb721e7d1ace6e619 Mon Sep 17 00:00:00 2001
From: Matthew Maurer <mmaurer(a)google.com>
Date: Wed, 8 Jan 2025 23:35:08 +0000
Subject: [PATCH] rust: Disallow BTF generation with Rust + LTO
The kernel cannot currently self-parse BTF containing Rust debug
information. pahole uses the language of the CU to determine whether to
filter out debug information when generating the BTF. When LTO is
enabled, Rust code can cross CU boundaries, resulting in Rust debug
information in CUs labeled as C. This results in a system which cannot
parse its own BTF.
Signed-off-by: Matthew Maurer <mmaurer(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole")
Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@…
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..324c2886b2ea 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1973,7 +1973,7 @@ config RUST
depends on !MODVERSIONS || GENDWARFKSYMS
depends on !GCC_PLUGIN_RANDSTRUCT
depends on !RANDSTRUCT
- depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
+ depends on !DEBUG_INFO_BTF || (PAHOLE_HAS_LANG_EXCLUDE && !LTO)
depends on !CFI_CLANG || HAVE_CFI_ICALL_NORMALIZE_INTEGERS_RUSTC
select CFI_ICALL_NORMALIZE_INTEGERS if CFI_CLANG
depends on !CALL_PADDING || RUSTC_VERSION >= 108100
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 9360dfe4cbd62ff1eb8217b815964931523b75b3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031611-flatly-revolving-8c7b@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9360dfe4cbd62ff1eb8217b815964931523b75b3 Mon Sep 17 00:00:00 2001
From: Andrea Righi <arighi(a)nvidia.com>
Date: Mon, 3 Mar 2025 18:51:59 +0100
Subject: [PATCH] sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids
range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel
crash.
To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and
trigger an scx error if an invalid CPU is specified.
Fixes: f0e1a0643a59b ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable(a)vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi(a)nvidia.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da199cfc7..7b9dfee858e7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6422,6 +6422,9 @@ static bool check_builtin_idle_enabled(void)
__bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
u64 wake_flags, bool *is_idle)
{
+ if (!ops_cpu_valid(prev_cpu, NULL))
+ goto prev_cpu;
+
if (!check_builtin_idle_enabled())
goto prev_cpu;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 9360dfe4cbd62ff1eb8217b815964931523b75b3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031614-broiler-overgrown-4bd4@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9360dfe4cbd62ff1eb8217b815964931523b75b3 Mon Sep 17 00:00:00 2001
From: Andrea Righi <arighi(a)nvidia.com>
Date: Mon, 3 Mar 2025 18:51:59 +0100
Subject: [PATCH] sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()
If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids
range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel
crash.
To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and
trigger an scx error if an invalid CPU is specified.
Fixes: f0e1a0643a59b ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable(a)vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi <arighi(a)nvidia.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da199cfc7..7b9dfee858e7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6422,6 +6422,9 @@ static bool check_builtin_idle_enabled(void)
__bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
u64 wake_flags, bool *is_idle)
{
+ if (!ops_cpu_valid(prev_cpu, NULL))
+ goto prev_cpu;
+
if (!check_builtin_idle_enabled())
goto prev_cpu;
Set the MMC_CAP_NEED_RSP_BUSY capability for the sdhci-pxav3 host to
prevent conversion of R1B responses to R1. Without this, the eMMC card
in the samsung,coreprimevelte smartphone using the Marvell PXA1908 SoC
with this mmc host doesn't probe with the ETIMEDOUT error originating in
__mmc_poll_for_busy.
Note that the other issues reported for this phone and host, namely
floods of "Tuning failed, falling back to fixed sampling clock" dmesg
messages for the eMMC and unstable SDIO are not mitigated by this
change.
Link: https://lore.kernel.org/r/20200310153340.5593-1-ulf.hansson@linaro.org/
Link: https://lore.kernel.org/r/D7204PWIGQGI.1FRFQPPIEE2P9@matfyz.cz/
Link: https://lore.kernel.org/r/20250115-pxa1908-lkml-v14-0-847d24f3665a@skole.hr/
Cc: Duje Mihanović <duje.mihanovic(a)skole.hr>
Cc: stable(a)vger.kernel.org
Signed-off-by: Karel Balej <balejk(a)matfyz.cz>
---
drivers/mmc/host/sdhci-pxav3.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mmc/host/sdhci-pxav3.c b/drivers/mmc/host/sdhci-pxav3.c
index 990723a008ae..3fb56face3d8 100644
--- a/drivers/mmc/host/sdhci-pxav3.c
+++ b/drivers/mmc/host/sdhci-pxav3.c
@@ -399,6 +399,7 @@ static int sdhci_pxav3_probe(struct platform_device *pdev)
if (!IS_ERR(pxa->clk_core))
clk_prepare_enable(pxa->clk_core);
+ host->mmc->caps |= MMC_CAP_NEED_RSP_BUSY;
/* enable 1/8V DDR capable */
host->mmc->caps |= MMC_CAP_1_8V_DDR;
--
2.48.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Fixes: a781e5aa59110 ("remoteproc: core: Prevent system suspend during remoteproc recovery")
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..908a7b8f6c7e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1916,7 +1916,8 @@ int rproc_boot(struct rproc *rproc)
pr_err("invalid rproc handle\n");
return -EINVAL;
}
-
+
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
---
drivers/remoteproc/remoteproc_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..908a7b8f6c7e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1916,7 +1916,8 @@ int rproc_boot(struct rproc *rproc)
pr_err("invalid rproc handle\n");
return -EINVAL;
}
-
+
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
Add device awake calls in case of rproc boot and rproc shutdown path.
Currently, device awake call is only present in the recovery path
of remoteproc. If a user stops and starts rproc by using the sysfs
interface, then on pm suspension the firmware loading fails. Keep the
device awake in such a case just like it is done for the recovery path.
Signed-off-by: Souradeep Chowdhury <quic_schowdhu(a)quicinc.com>
Cc: stable(a)vger.kernel.org
---
drivers/remoteproc/remoteproc_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index c2cf0d277729..908a7b8f6c7e 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1916,7 +1916,8 @@ int rproc_boot(struct rproc *rproc)
pr_err("invalid rproc handle\n");
return -EINVAL;
}
-
+
+ pm_stay_awake(rproc->dev.parent);
dev = &rproc->dev;
ret = mutex_lock_interruptible(&rproc->lock);
@@ -1961,6 +1962,7 @@ int rproc_boot(struct rproc *rproc)
atomic_dec(&rproc->power);
unlock_mutex:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_boot);
@@ -1991,6 +1993,7 @@ int rproc_shutdown(struct rproc *rproc)
struct device *dev = &rproc->dev;
int ret = 0;
+ pm_stay_awake(rproc->dev.parent);
ret = mutex_lock_interruptible(&rproc->lock);
if (ret) {
dev_err(dev, "can't lock rproc %s: %d\n", rproc->name, ret);
@@ -2027,6 +2030,7 @@ int rproc_shutdown(struct rproc *rproc)
rproc->table_ptr = NULL;
out:
mutex_unlock(&rproc->lock);
+ pm_relax(rproc->dev.parent);
return ret;
}
EXPORT_SYMBOL(rproc_shutdown);
--
2.34.1
The quilt patch titled
Subject: sparc/mm: avoid calling arch_enter/leave_lazy_mmu() in set_ptes
has been removed from the -mm tree. Its filename was
sparc-mm-avoid-calling-arch_enter-leave_lazy_mmu-in-set_ptes.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: sparc/mm: avoid calling arch_enter/leave_lazy_mmu() in set_ptes
Date: Mon, 3 Mar 2025 14:15:38 +0000
With commit 1a10a44dfc1d ("sparc64: implement the new page table range
API") set_ptes was added to the sparc architecture. The implementation
included calling arch_enter/leave_lazy_mmu() calls.
The patch removes the usage of arch_enter/leave_lazy_mmu() since this
implies nesting of lazy mmu regions which is not supported. Without this
fix, lazy mmu mode is effectively disabled because we exit the mode after
the first set_ptes:
remap_pte_range()
-> arch_enter_lazy_mmu()
-> set_ptes()
-> arch_enter_lazy_mmu()
-> arch_leave_lazy_mmu()
-> arch_leave_lazy_mmu()
Powerpc suffered the same problem and fixed it in a corresponding way with
commit 47b8def9358c ("powerpc/mm: Avoid calling
arch_enter/leave_lazy_mmu() in set_ptes").
Link: https://lkml.kernel.org/r/20250303141542.3371656-5-ryan.roberts@arm.com
Fixes: 1a10a44dfc1d ("sparc64: implement the new page table range API")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Andreas Larsson <andreas(a)gaisler.com>
Acked-by: Juergen Gross <jgross(a)suse.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juegren Gross <jgross(a)suse.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/sparc/include/asm/pgtable_64.h | 2 --
1 file changed, 2 deletions(-)
--- a/arch/sparc/include/asm/pgtable_64.h~sparc-mm-avoid-calling-arch_enter-leave_lazy_mmu-in-set_ptes
+++ a/arch/sparc/include/asm/pgtable_64.h
@@ -936,7 +936,6 @@ static inline void __set_pte_at(struct m
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{
- arch_enter_lazy_mmu_mode();
for (;;) {
__set_pte_at(mm, addr, ptep, pte, 0);
if (--nr == 0)
@@ -945,7 +944,6 @@ static inline void set_ptes(struct mm_st
pte_val(pte) += PAGE_SIZE;
addr += PAGE_SIZE;
}
- arch_leave_lazy_mmu_mode();
}
#define set_ptes set_ptes
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-use-ptep_get-instead-of-directly-dereferencing-pte_t.patch
The quilt patch titled
Subject: sparc/mm: disable preemption in lazy mmu mode
has been removed from the -mm tree. Its filename was
sparc-mm-disable-preemption-in-lazy-mmu-mode.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: sparc/mm: disable preemption in lazy mmu mode
Date: Mon, 3 Mar 2025 14:15:37 +0000
Since commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") it's been possible for arch_[enter|leave]_lazy_mmu_mode() to be
called without holding a page table lock (for the kernel mappings case),
and therefore it is possible that preemption may occur while in the lazy
mmu mode. The Sparc lazy mmu implementation is not robust to preemption
since it stores the lazy mode state in a per-cpu structure and does not
attempt to manage that state on task switch.
Powerpc had the same issue and fixed it by explicitly disabling preemption
in arch_enter_lazy_mmu_mode() and re-enabling in
arch_leave_lazy_mmu_mode(). See commit b9ef323ea168 ("powerpc/64s:
Disable preemption in hash lazy mmu mode").
Given Sparc's lazy mmu mode is based on powerpc's, let's fix it in the
same way here.
Link: https://lkml.kernel.org/r/20250303141542.3371656-4-ryan.roberts@arm.com
Fixes: 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Andreas Larsson <andreas(a)gaisler.com>
Acked-by: Juergen Gross <jgross(a)suse.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juegren Gross <jgross(a)suse.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/sparc/mm/tlb.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/sparc/mm/tlb.c~sparc-mm-disable-preemption-in-lazy-mmu-mode
+++ a/arch/sparc/mm/tlb.c
@@ -52,8 +52,10 @@ out:
void arch_enter_lazy_mmu_mode(void)
{
- struct tlb_batch *tb = this_cpu_ptr(&tlb_batch);
+ struct tlb_batch *tb;
+ preempt_disable();
+ tb = this_cpu_ptr(&tlb_batch);
tb->active = 1;
}
@@ -64,6 +66,7 @@ void arch_leave_lazy_mmu_mode(void)
if (tb->tlb_nr)
flush_tlb_pending();
tb->active = 0;
+ preempt_enable();
}
static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr,
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-use-ptep_get-instead-of-directly-dereferencing-pte_t.patch
The quilt patch titled
Subject: mm: fix lazy mmu docs and usage
has been removed from the -mm tree. Its filename was
mm-fix-lazy-mmu-docs-and-usage.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: fix lazy mmu docs and usage
Date: Mon, 3 Mar 2025 14:15:35 +0000
Patch series "Fix lazy mmu mode", v2.
I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As
part of that, I will extend lazy mmu mode to cover kernel mappings in
vmalloc table walkers. While lazy mmu mode is already used for kernel
mappings in a few places, this will extend it's use significantly.
Having reviewed the existing lazy mmu implementations in powerpc, sparc
and x86, it looks like there are a bunch of bugs, some of which may be
more likely to trigger once I extend the use of lazy mmu. So this series
attempts to clarify the requirements and fix all the bugs in advance of
that series. See patch #1 commit log for all the details.
This patch (of 5):
The docs, implementations and use of arch_[enter|leave]_lazy_mmu_mode() is
a bit of a mess (to put it politely). There are a number of issues
related to nesting of lazy mmu regions and confusion over whether the
task, when in a lazy mmu region, is preemptible or not. Fix all the
issues relating to the core-mm. Follow up commits will fix the
arch-specific implementations. 3 arches implement lazy mmu; powerpc,
sparc and x86.
When arch_[enter|leave]_lazy_mmu_mode() was first introduced by commit
6606c3e0da53 ("[PATCH] paravirt: lazy mmu mode hooks.patch"), it was
expected that lazy mmu regions would never nest and that the appropriate
page table lock(s) would be held while in the region, thus ensuring the
region is non-preemptible. Additionally lazy mmu regions were only used
during manipulation of user mappings.
Commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy
updates") started invoking the lazy mmu mode in apply_to_pte_range(),
which is used for both user and kernel mappings. For kernel mappings the
region is no longer protected by any lock so there is no longer any
guarantee about non-preemptibility. Additionally, for RT configs, the
holding the PTL only implies no CPU migration, it doesn't prevent
preemption.
Commit bcc6cc832573 ("mm: add default definition of set_ptes()") added
arch_[enter|leave]_lazy_mmu_mode() to the default implementation of
set_ptes(), used by x86. So after this commit, lazy mmu regions can be
nested. Additionally commit 1a10a44dfc1d ("sparc64: implement the new
page table range API") and commit 9fee28baa601 ("powerpc: implement the
new page table range API") did the same for the sparc and powerpc
set_ptes() overrides.
powerpc couldn't deal with preemption so avoids it in commit b9ef323ea168
("powerpc/64s: Disable preemption in hash lazy mmu mode"), which
explicitly disables preemption for the whole region in its implementation.
x86 can support preemption (or at least it could until it tried to add
support nesting; more on this below). Sparc looks to be totally broken in
the face of preemption, as far as I can tell.
powerpc can't deal with nesting, so avoids it in commit 47b8def9358c
("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"),
which removes the lazy mmu calls from its implementation of set_ptes().
x86 attempted to support nesting in commit 49147beb0ccb ("x86/xen: allow
nesting of same lazy mode") but as far as I can tell, this breaks its
support for preemption.
In short, it's all a mess; the semantics for
arch_[enter|leave]_lazy_mmu_mode() are not clearly defined and as a result
the implementations all have different expectations, sticking plasters and
bugs.
arm64 is aiming to start using these hooks, so let's clean everything up
before adding an arm64 implementation. Update the documentation to state
that lazy mmu regions can never be nested, must not be called in interrupt
context and preemption may or may not be enabled for the duration of the
region. And fix the generic implementation of set_ptes() to avoid
nesting.
arch-specific fixes to conform to the new spec will proceed this one.
These issues were spotted by code review and I have no evidence of issues
being reported in the wild.
Link: https://lkml.kernel.org/r/20250303141542.3371656-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20250303141542.3371656-2-ryan.roberts@arm.com
Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Juergen Gross <jgross(a)suse.com>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juegren Gross <jgross(a)suse.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pgtable.h | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
--- a/include/linux/pgtable.h~mm-fix-lazy-mmu-docs-and-usage
+++ a/include/linux/pgtable.h
@@ -222,10 +222,14 @@ static inline int pmd_dirty(pmd_t pmd)
* hazard could result in the direct mode hypervisor case, since the actual
* write to the page tables may not yet have taken place, so reads though
* a raw PTE pointer after it has been modified are not guaranteed to be
- * up to date. This mode can only be entered and left under the protection of
- * the page table locks for all page tables which may be modified. In the UP
- * case, this is required so that preemption is disabled, and in the SMP case,
- * it must synchronize the delayed page table writes properly on other CPUs.
+ * up to date.
+ *
+ * In the general case, no lock is guaranteed to be held between entry and exit
+ * of the lazy mode. So the implementation must assume preemption may be enabled
+ * and cpu migration is possible; it must take steps to be robust against this.
+ * (In practice, for user PTE updates, the appropriate page table lock(s) are
+ * held, but for kernel PTE updates, no lock is held). Nesting is not permitted
+ * and the mode cannot be used in interrupt context.
*/
#ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
#define arch_enter_lazy_mmu_mode() do {} while (0)
@@ -287,7 +291,6 @@ static inline void set_ptes(struct mm_st
{
page_table_check_ptes_set(mm, ptep, pte, nr);
- arch_enter_lazy_mmu_mode();
for (;;) {
set_pte(ptep, pte);
if (--nr == 0)
@@ -295,7 +298,6 @@ static inline void set_ptes(struct mm_st
ptep++;
pte = pte_next_pfn(pte);
}
- arch_leave_lazy_mmu_mode();
}
#endif
#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-use-ptep_get-instead-of-directly-dereferencing-pte_t.patch
The quilt patch titled
Subject: mm: make page_mapped_in_vma() hugetlb walk aware
has been removed from the -mm tree. Its filename was
mm-make-page_mapped_in_vma-hugetlb-walk-aware.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jane Chu <jane.chu(a)oracle.com>
Subject: mm: make page_mapped_in_vma() hugetlb walk aware
Date: Mon, 24 Feb 2025 14:14:45 -0700
When a process consumes a UE in a page, the memory failure handler
attempts to collect information for a potential SIGBUS. If the page is an
anonymous page, page_mapped_in_vma(page, vma) is invoked in order to
1. retrieve the vaddr from the process' address space,
2. verify that the vaddr is indeed mapped to the poisoned page,
where 'page' is the precise small page with UE.
It's been observed that when injecting poison to a non-head subpage of an
anonymous hugetlb page, no SIGBUS shows up, while injecting to the head
page produces a SIGBUS. The cause is that, though hugetlb_walk() returns
a valid pmd entry (on x86), but check_pte() detects mismatch between the
head page per the pmd and the input subpage. Thus the vaddr is considered
not mapped to the subpage and the process is not collected for SIGBUS
purpose. This is the calling stack:
collect_procs_anon
page_mapped_in_vma
page_vma_mapped_walk
hugetlb_walk
huge_pte_lock
check_pte
check_pte() header says that it
"check if [pvmw->pfn, @pvmw->pfn + @pvmw->nr_pages) is mapped at the @pvmw->pte"
but practically works only if pvmw->pfn is the head page pfn at pvmw->pte.
Hindsight acknowledging that some pvmw->pte could point to a hugepage of
some sort such that it makes sense to make check_pte() work for hugepage.
Link: https://lkml.kernel.org/r/20250224211445.2663312-1-jane.chu@oracle.com
Signed-off-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com>
Cc: linmiaohe <linmiaohe(a)huawei.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
--- a/mm/page_vma_mapped.c~mm-make-page_mapped_in_vma-hugetlb-walk-aware
+++ a/mm/page_vma_mapped.c
@@ -84,6 +84,7 @@ again:
* mapped at the @pvmw->pte
* @pvmw: page_vma_mapped_walk struct, includes a pair pte and pfn range
* for checking
+ * @pte_nr: the number of small pages described by @pvmw->pte.
*
* page_vma_mapped_walk() found a place where pfn range is *potentially*
* mapped. check_pte() has to validate this.
@@ -100,7 +101,7 @@ again:
* Otherwise, return false.
*
*/
-static bool check_pte(struct page_vma_mapped_walk *pvmw)
+static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
{
unsigned long pfn;
pte_t ptent = ptep_get(pvmw->pte);
@@ -132,7 +133,11 @@ static bool check_pte(struct page_vma_ma
pfn = pte_pfn(ptent);
}
- return (pfn - pvmw->pfn) < pvmw->nr_pages;
+ if ((pfn + pte_nr - 1) < pvmw->pfn)
+ return false;
+ if (pfn > (pvmw->pfn + pvmw->nr_pages - 1))
+ return false;
+ return true;
}
/* Returns true if the two ranges overlap. Careful to not overflow. */
@@ -207,7 +212,7 @@ bool page_vma_mapped_walk(struct page_vm
return false;
pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte);
- if (!check_pte(pvmw))
+ if (!check_pte(pvmw, pages_per_huge_page(hstate)))
return not_found(pvmw);
return true;
}
@@ -290,7 +295,7 @@ restart:
goto next_pte;
}
this_pte:
- if (check_pte(pvmw))
+ if (check_pte(pvmw, 1))
return true;
next_pte:
do {
_
Patches currently in -mm which might be from jane.chu(a)oracle.com are
The quilt patch titled
Subject: mm/damon: avoid applying DAMOS action to same entity multiple times
has been removed from the -mm tree. Its filename was
mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon: avoid applying DAMOS action to same entity multiple times
Date: Fri, 7 Feb 2025 13:20:33 -0800
'paddr' DAMON operations set can apply a DAMOS scheme's action to a large
folio multiple times in single DAMOS-regions-walk if the folio is laid on
multiple DAMON regions. Add a field for DAMOS scheme object that can be
used by the underlying ops to know what was the last entity that the
scheme's action has applied. The core layer unsets the field when each
DAMOS-regions-walk is done for the given scheme. And update 'paddr' ops
to use the infrastructure to avoid the problem.
Link: https://lkml.kernel.org/r/20250207212033.45269-3-sj@kernel.org
Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: Usama Arif <usamaarif642(a)gmail.com>
Closes: https://lore.kernel.org/20250203225604.44742-3-usamaarif642@gmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/damon.h | 11 +++++++++++
mm/damon/core.c | 1 +
mm/damon/paddr.c | 39 +++++++++++++++++++++++++++------------
3 files changed, 39 insertions(+), 12 deletions(-)
--- a/include/linux/damon.h~mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times
+++ a/include/linux/damon.h
@@ -432,6 +432,7 @@ struct damos_access_pattern {
* @wmarks: Watermarks for automated (in)activation of this scheme.
* @target_nid: Destination node if @action is "migrate_{hot,cold}".
* @filters: Additional set of &struct damos_filter for &action.
+ * @last_applied: Last @action applied ops-managing entity.
* @stat: Statistics of this scheme.
* @list: List head for siblings.
*
@@ -454,6 +455,15 @@ struct damos_access_pattern {
* implementation could check pages of the region and skip &action to respect
* &filters
*
+ * The minimum entity that @action can be applied depends on the underlying
+ * &struct damon_operations. Since it may not be aligned with the core layer
+ * abstract, namely &struct damon_region, &struct damon_operations could apply
+ * @action to same entity multiple times. Large folios that underlying on
+ * multiple &struct damon region objects could be such examples. The &struct
+ * damon_operations can use @last_applied to avoid that. DAMOS core logic
+ * unsets @last_applied when each regions walking for applying the scheme is
+ * finished.
+ *
* After applying the &action to each region, &stat_count and &stat_sz is
* updated to reflect the number of regions and total size of regions that the
* &action is applied.
@@ -482,6 +492,7 @@ struct damos {
int target_nid;
};
struct list_head filters;
+ void *last_applied;
struct damos_stat stat;
struct list_head list;
};
--- a/mm/damon/core.c~mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times
+++ a/mm/damon/core.c
@@ -1856,6 +1856,7 @@ static void kdamond_apply_schemes(struct
s->next_apply_sis = c->passed_sample_intervals +
(s->apply_interval_us ? s->apply_interval_us :
c->attrs.aggr_interval) / sample_interval;
+ s->last_applied = NULL;
}
}
--- a/mm/damon/paddr.c~mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times
+++ a/mm/damon/paddr.c
@@ -254,6 +254,17 @@ static bool damos_pa_filter_out(struct d
return false;
}
+static bool damon_pa_invalid_damos_folio(struct folio *folio, struct damos *s)
+{
+ if (!folio)
+ return true;
+ if (folio == s->last_applied) {
+ folio_put(folio);
+ return true;
+ }
+ return false;
+}
+
static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s,
unsigned long *sz_filter_passed)
{
@@ -261,6 +272,7 @@ static unsigned long damon_pa_pageout(st
LIST_HEAD(folio_list);
bool install_young_filter = true;
struct damos_filter *filter;
+ struct folio *folio;
/* check access in page level again by default */
damos_for_each_filter(filter, s) {
@@ -279,9 +291,8 @@ static unsigned long damon_pa_pageout(st
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -307,6 +318,7 @@ put_folio:
damos_destroy_filter(filter);
applied = reclaim_pages(&folio_list);
cond_resched();
+ s->last_applied = folio;
return applied * PAGE_SIZE;
}
@@ -315,12 +327,12 @@ static inline unsigned long damon_pa_mar
unsigned long *sz_filter_passed)
{
unsigned long addr, applied = 0;
+ struct folio *folio;
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -339,6 +351,7 @@ put_folio:
addr += folio_size(folio);
folio_put(folio);
}
+ s->last_applied = folio;
return applied * PAGE_SIZE;
}
@@ -482,12 +495,12 @@ static unsigned long damon_pa_migrate(st
{
unsigned long addr, applied;
LIST_HEAD(folio_list);
+ struct folio *folio;
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -506,6 +519,7 @@ put_folio:
}
applied = damon_pa_migrate_pages(&folio_list, s->target_nid);
cond_resched();
+ s->last_applied = folio;
return applied * PAGE_SIZE;
}
@@ -523,15 +537,15 @@ static unsigned long damon_pa_stat(struc
{
unsigned long addr;
LIST_HEAD(folio_list);
+ struct folio *folio;
if (!damon_pa_scheme_has_filter(s))
return 0;
addr = r->ar.start;
while (addr < r->ar.end) {
- struct folio *folio = damon_get_folio(PHYS_PFN(addr));
-
- if (!folio) {
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
}
@@ -541,6 +555,7 @@ static unsigned long damon_pa_stat(struc
addr += folio_size(folio);
folio_put(folio);
}
+ s->last_applied = folio;
return 0;
}
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-sysfs-schemes-let-damon_sysfs_scheme_set_filters-be-used-for-different-named-directories.patch
mm-damon-sysfs-schemes-implement-core_filters-and-ops_filters-directories.patch
mm-damon-sysfs-schemes-commit-filters-in-coreops_filters-directories.patch
mm-damon-core-expose-damos_filter_for_ops-to-damon-kernel-api-callers.patch
mm-damon-sysfs-schemes-record-filters-of-which-layer-should-be-added-to-the-given-filters-directory.patch
mm-damon-sysfs-schemes-return-error-when-for-attempts-to-install-filters-on-wrong-sysfs-directory.patch
docs-abi-damon-document-coreops_filters-directories.patch
docs-admin-guide-mm-damon-usage-update-for-coreops_filters-directories.patch
mm-damon-sysfs-validate-user-inputs-from-damon_sysfs_commit_input.patch
mm-damon-core-invoke-kdamond_call-after-merging-is-done-if-possible.patch
mm-damon-core-make-damon_set_attrs-be-safe-to-be-called-from-damon_call.patch
mm-damon-sysfs-handle-commit-command-using-damon_call.patch
mm-damon-sysfs-remove-damon_sysfs_cmd_request-code-from-damon_sysfs_handle_cmd.patch
mm-damon-sysfs-remove-damon_sysfs_cmd_request_callback-and-its-callers.patch
mm-damon-sysfs-remove-damon_sysfs_cmd_request-and-its-readers.patch
mm-damon-sysfs-schemes-remove-obsolete-comment-for-damon_sysfs_schemes_clear_regions.patch
mm-damon-remove-damon_callback-private.patch
mm-damon-remove-before_start-of-damon_callback.patch
mm-damon-remove-damon_callback-after_sampling.patch
mm-damon-remove-damon_callback-before_damos_apply.patch
mm-damon-remove-damon_operations-reset_aggregated.patch
mm-damon-sysfs-schemes-avoid-wformat-security-warning-on-damon_sysfs_access_pattern_add_range_dir.patch
mm-madvise-use-is_memory_failure-from-madvise_do_behavior.patch
mm-madvise-split-out-populate-behavior-check-logic.patch
mm-madvise-deduplicate-madvise_do_behavior-skip-case-handlings.patch
mm-madvise-remove-len-parameter-of-madvise_do_behavior.patch
The quilt patch titled
Subject: mm/damon/ops: have damon_get_folio return folio even for tail pages
has been removed from the -mm tree. Its filename was
mm-damon-ops-have-damon_get_folio-return-folio-even-for-tail-pages.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Usama Arif <usamaarif642(a)gmail.com>
Subject: mm/damon/ops: have damon_get_folio return folio even for tail pages
Date: Fri, 7 Feb 2025 13:20:32 -0800
Patch series "mm/damon/paddr: fix large folios access and schemes handling".
DAMON operations set for physical address space, namely 'paddr', treats
tail pages as unaccessed always. It can also apply DAMOS action to a
large folio multiple times within single DAMOS' regions walking. As a
result, the monitoring output has poor quality and DAMOS works in
unexpected ways when large folios are being used. Fix those.
The patches were parts of Usama's hugepage_size DAMOS filter patch
series[1]. The first fix has collected from there with a slight commit
message change for the subject prefix. The second fix is re-written by SJ
and posted as an RFC before this series. The second one also got a slight
commit message change for the subject prefix.
[1] https://lore.kernel.org/20250203225604.44742-1-usamaarif642@gmail.com
[2] https://lore.kernel.org/20250206231103.38298-1-sj@kernel.org
This patch (of 2):
This effectively adds support for large folios in damon for paddr, as
damon_pa_mkold/young won't get a null folio from this function and won't
ignore it, hence access will be checked and reported. This also means
that larger folios will be considered for different DAMOS actions like
pageout, prioritization and migration. As these DAMOS actions will
consider larger folios, iterate through the region at folio_size and not
PAGE_SIZE intervals. This should not have an affect on vaddr, as
damon_young_pmd_entry considers pmd entries.
Link: https://lkml.kernel.org/r/20250207212033.45269-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250207212033.45269-2-sj@kernel.org
Fixes: a28397beb55b ("mm/damon: implement primitives for physical address space monitoring")
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/ops-common.c | 2 +-
mm/damon/paddr.c | 24 ++++++++++++++++++------
2 files changed, 19 insertions(+), 7 deletions(-)
--- a/mm/damon/ops-common.c~mm-damon-ops-have-damon_get_folio-return-folio-even-for-tail-pages
+++ a/mm/damon/ops-common.c
@@ -26,7 +26,7 @@ struct folio *damon_get_folio(unsigned l
struct page *page = pfn_to_online_page(pfn);
struct folio *folio;
- if (!page || PageTail(page))
+ if (!page)
return NULL;
folio = page_folio(page);
--- a/mm/damon/paddr.c~mm-damon-ops-have-damon_get_folio-return-folio-even-for-tail-pages
+++ a/mm/damon/paddr.c
@@ -277,11 +277,14 @@ static unsigned long damon_pa_pageout(st
damos_add_filter(s, filter);
}
- for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ addr = r->ar.start;
+ while (addr < r->ar.end) {
struct folio *folio = damon_get_folio(PHYS_PFN(addr));
- if (!folio)
+ if (!folio) {
+ addr += PAGE_SIZE;
continue;
+ }
if (damos_pa_filter_out(s, folio))
goto put_folio;
@@ -297,6 +300,7 @@ static unsigned long damon_pa_pageout(st
else
list_add(&folio->lru, &folio_list);
put_folio:
+ addr += folio_size(folio);
folio_put(folio);
}
if (install_young_filter)
@@ -312,11 +316,14 @@ static inline unsigned long damon_pa_mar
{
unsigned long addr, applied = 0;
- for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ addr = r->ar.start;
+ while (addr < r->ar.end) {
struct folio *folio = damon_get_folio(PHYS_PFN(addr));
- if (!folio)
+ if (!folio) {
+ addr += PAGE_SIZE;
continue;
+ }
if (damos_pa_filter_out(s, folio))
goto put_folio;
@@ -329,6 +336,7 @@ static inline unsigned long damon_pa_mar
folio_deactivate(folio);
applied += folio_nr_pages(folio);
put_folio:
+ addr += folio_size(folio);
folio_put(folio);
}
return applied * PAGE_SIZE;
@@ -475,11 +483,14 @@ static unsigned long damon_pa_migrate(st
unsigned long addr, applied;
LIST_HEAD(folio_list);
- for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ addr = r->ar.start;
+ while (addr < r->ar.end) {
struct folio *folio = damon_get_folio(PHYS_PFN(addr));
- if (!folio)
+ if (!folio) {
+ addr += PAGE_SIZE;
continue;
+ }
if (damos_pa_filter_out(s, folio))
goto put_folio;
@@ -490,6 +501,7 @@ static unsigned long damon_pa_migrate(st
goto put_folio;
list_add(&folio->lru, &folio_list);
put_folio:
+ addr += folio_size(folio);
folio_put(folio);
}
applied = damon_pa_migrate_pages(&folio_list, s->target_nid);
_
Patches currently in -mm which might be from usamaarif642(a)gmail.com are
The quilt patch titled
Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
has been removed from the -mm tree. Its filename was
mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
Date: Mon, 10 Feb 2025 20:37:44 +0100
Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP,
let's add a safety net in case FOLL_SPLIT_PMD usage would ever be
reworked.
In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the
generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have
returned a page. In particular, hugetlb folios that are not PMD-sized
would never have been prone to FOLL_SPLIT_PMD.
hugetlb folios can be anonymous, and page_make_device_exclusive_one() is
not really prepared for handling them at all. So let's spell that out.
Link: https://lkml.kernel.org/r/20250210193801.781278-3-david@redhat.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Alex Shi <alexs(a)kernel.org>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Lyude <lyude(a)redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yanteng Si <si.yanteng(a)linux.dev>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/rmap.c~mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive
+++ a/mm/rmap.c
@@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(
* Restrict to anonymous folios for now to avoid potential writeback
* issues.
*/
- if (!folio_test_anon(folio))
+ if (!folio_test_anon(folio) || folio_test_hugetlb(folio))
return false;
rmap_walk(folio, &rwc);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-factor-out-large-folio-handling-from-folio_order-into-folio_large_order.patch
mm-factor-out-large-folio-handling-from-folio_nr_pages-into-folio_large_nr_pages.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page-fix.patch
mm-move-hugetlb-specific-things-in-folio-to-page.patch
mm-move-_pincount-in-folio-to-page-on-32bit.patch
mm-move-_entire_mapcount-in-folio-to-page-on-32bit.patch
mm-rmap-pass-dst_vma-to-folio_dup_file_rmap_pte-and-friends.patch
mm-rmap-pass-vma-to-__folio_add_rmap.patch
mm-rmap-abstract-large-mapcount-operations-for-large-folios-hugetlb.patch
bit_spinlock-__always_inline-unlock-functions.patch
mm-rmap-use-folio_large_nr_pages-in-add-remove-functions.patch
mm-rmap-basic-mm-owner-tracking-for-large-folios-hugetlb.patch
mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
mm-convert-folio_likely_mapped_shared-to-folio_maybe_mapped_shared.patch
mm-config_no_page_mapcount-to-prepare-for-not-maintain-per-page-mapcounts-in-large-folios.patch
fs-proc-page-remove-per-page-mapcount-dependency-for-proc-kpagecount-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-pm_mmap_exclusive-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-mapmax-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-smaps-smaps_rollup-config_no_page_mapcount.patch
mm-stop-maintaining-the-per-page-mapcount-of-large-folios-config_no_page_mapcount.patch
The quilt patch titled
Subject: mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
has been removed from the -mm tree. Its filename was
mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
Date: Mon, 10 Feb 2025 20:37:43 +0100
Patch series "mm: fixes for device-exclusive entries (hmm)", v2.
Discussing the PageTail() call in make_device_exclusive_range() with
Willy, I recently discovered [1] that device-exclusive handling does not
properly work with THP, making the hmm-tests selftests fail if THPs are
enabled on the system.
Looking into more details, I found that hugetlb is not properly fenced,
and I realized that something that was bugging me for longer -- how
device-exclusive entries interact with mapcounts -- completely breaks
migration/swapout/split/hwpoison handling of these folios while they have
device-exclusive PTEs.
The program below can be used to allocate 1 GiB worth of pages and making
them device-exclusive on a kernel with CONFIG_TEST_HMM.
Once they are device-exclusive, these folios cannot get swapped out
(proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much
one forces memory reclaim), and when having a memory block onlined to
ZONE_MOVABLE, trying to offline it will loop forever and complain about
failed migration of a page that should be movable.
# echo offline > /sys/devices/system/memory/memory136/state
# echo online_movable > /sys/devices/system/memory/memory136/state
# ./hmm-swap &
... wait until everything is device-exclusive
# echo offline > /sys/devices/system/memory/memory136/state
[ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000
index:0x7f20671f7 pfn:0x442b6a
[ 285.196618][T14882] memcg:ffff888179298000
[ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|
dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)
[ 285.201734][T14882] raw: ...
[ 285.204464][T14882] raw: ...
[ 285.207196][T14882] page dumped because: migration failure
[ 285.209072][T14882] page_owner tracks the page as allocated
[ 285.210915][T14882] page last allocated via order 0, migratetype
Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774
[ 285.216765][T14882] post_alloc_hook+0x197/0x1b0
[ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280
[ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740
[ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540
[ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340
[ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0
[ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0
[ 285.230822][T14882] handle_mm_fault+0x368/0x9f0
...
This series fixes all issues I found so far. There is no easy way to fix
without a bigger rework/cleanup. I have a bunch of cleanups on top (some
previous sent, some the result of the discussion in v1) that I will send
out separately once this landed and I get to it.
I wish we could just use some special present PROT_NONE PTEs instead of
these (non-present, non-none) fake-swap entries; but that just results in
the same problem we keep having (lack of spare PTE bits), and staring at
other similar fake-swap entries, that ship has sailed.
With this series, make_device_exclusive() doesn't actually belong into
mm/rmap.c anymore, but I'll leave moving that for another day.
I only tested this series with the hmm-tests selftests due to lack of HW,
so I'd appreciate some testing, especially if the interaction between two
GPUs wanting a device-exclusive entry works as expected.
<program>
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd)
struct hmm_dmirror_cmd {
__u64 addr;
__u64 ptr;
__u64 npages;
__u64 cpages;
__u64 faults;
};
const size_t size = 1 * 1024 * 1024 * 1024ul;
const size_t chunk_size = 2 * 1024 * 1024ul;
int main(void)
{
struct hmm_dmirror_cmd cmd;
size_t cur_size;
int fd, ret;
char *addr, *mirror;
fd = open("/dev/hmm_dmirror1", O_RDWR, 0);
if (fd < 0) {
perror("open failed\n");
exit(1);
}
addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap failed\n");
exit(1);
}
madvise(addr, size, MADV_NOHUGEPAGE);
memset(addr, 1, size);
mirror = malloc(chunk_size);
for (cur_size = 0; cur_size < size; cur_size += chunk_size) {
cmd.addr = (uintptr_t)addr + cur_size;
cmd.ptr = (uintptr_t)mirror;
cmd.npages = chunk_size / getpagesize();
ret = ioctl(fd, HMM_DMIRROR_EXCLUSIVE, &cmd);
if (ret) {
perror("ioctl failed\n");
exit(1);
}
}
pause();
return 0;
}
</program>
[1] https://lkml.kernel.org/r/25e02685-4f1d-47fa-be5b-01ff85bb0ce2@redhat.com
This patch (of 17):
We only have two FOLL_SPLIT_PMD users. While uprobe refuses hugetlb
early, make_device_exclusive_range() can end up getting called on hugetlb
VMAs.
Right now, this means that with a PMD-sized hugetlb page, we can end up
calling split_huge_pmd(), because pmd_trans_huge() also succeeds with
hugetlb PMDs.
For example, using a modified hmm-test selftest one can trigger:
[ 207.017134][T14945] ------------[ cut here ]------------
[ 207.018614][T14945] kernel BUG at mm/page_table_check.c:87!
[ 207.019716][T14945] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 207.021072][T14945] CPU: 3 UID: 0 PID: ...
[ 207.023036][T14945] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 207.024834][T14945] RIP: 0010:page_table_check_clear.part.0+0x488/0x510
[ 207.026128][T14945] Code: ...
[ 207.029965][T14945] RSP: 0018:ffffc9000cb8f348 EFLAGS: 00010293
[ 207.031139][T14945] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff8249a0cd
[ 207.032649][T14945] RDX: ffff88811e883c80 RSI: ffffffff8249a357 RDI: ffff88811e883c80
[ 207.034183][T14945] RBP: ffff888105c0a050 R08: 0000000000000005 R09: 0000000000000000
[ 207.035688][T14945] R10: 00000000ffffffff R11: 0000000000000003 R12: 0000000000000001
[ 207.037203][T14945] R13: 0000000000000200 R14: 0000000000000001 R15: dffffc0000000000
[ 207.038711][T14945] FS: 00007f2783275740(0000) GS:ffff8881f4980000(0000) knlGS:0000000000000000
[ 207.040407][T14945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 207.041660][T14945] CR2: 00007f2782c00000 CR3: 0000000132356000 CR4: 0000000000750ef0
[ 207.043196][T14945] PKRU: 55555554
[ 207.043880][T14945] Call Trace:
[ 207.044506][T14945] <TASK>
[ 207.045086][T14945] ? __die+0x51/0x92
[ 207.045864][T14945] ? die+0x29/0x50
[ 207.046596][T14945] ? do_trap+0x250/0x320
[ 207.047430][T14945] ? do_error_trap+0xe7/0x220
[ 207.048346][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.049535][T14945] ? handle_invalid_op+0x34/0x40
[ 207.050494][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.051681][T14945] ? exc_invalid_op+0x2e/0x50
[ 207.052589][T14945] ? asm_exc_invalid_op+0x1a/0x20
[ 207.053596][T14945] ? page_table_check_clear.part.0+0x1fd/0x510
[ 207.054790][T14945] ? page_table_check_clear.part.0+0x487/0x510
[ 207.055993][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.057195][T14945] ? page_table_check_clear.part.0+0x487/0x510
[ 207.058384][T14945] __page_table_check_pmd_clear+0x34b/0x5a0
[ 207.059524][T14945] ? __pfx___page_table_check_pmd_clear+0x10/0x10
[ 207.060775][T14945] ? __pfx___mutex_unlock_slowpath+0x10/0x10
[ 207.061940][T14945] ? __pfx___lock_acquire+0x10/0x10
[ 207.062967][T14945] pmdp_huge_clear_flush+0x279/0x360
[ 207.064024][T14945] split_huge_pmd_locked+0x82b/0x3750
...
Before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic
follow_page_mask code"), we would have ignored the flag; instead, let's
simply refuse the combination completely in check_vma_flags(): the caller
is likely not prepared to handle any hugetlb folios.
We'll teach make_device_exclusive_range() separately to ignore any hugetlb
folios as a future-proof safety net.
Link: https://lkml.kernel.org/r/20250210193801.781278-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250210193801.781278-2-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Alex Shi <alexs(a)kernel.org>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Lyude <lyude(a)redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yanteng Si <si.yanteng(a)linux.dev>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/gup.c~mm-gup-reject-foll_split_pmd-with-hugetlb-vmas
+++ a/mm/gup.c
@@ -1283,6 +1283,9 @@ static int check_vma_flags(struct vm_are
if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
return -EOPNOTSUPP;
+ if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
+ return -EOPNOTSUPP;
+
if (vma_is_secretmem(vma))
return -EFAULT;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-factor-out-large-folio-handling-from-folio_order-into-folio_large_order.patch
mm-factor-out-large-folio-handling-from-folio_nr_pages-into-folio_large_nr_pages.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page-fix.patch
mm-move-hugetlb-specific-things-in-folio-to-page.patch
mm-move-_pincount-in-folio-to-page-on-32bit.patch
mm-move-_entire_mapcount-in-folio-to-page-on-32bit.patch
mm-rmap-pass-dst_vma-to-folio_dup_file_rmap_pte-and-friends.patch
mm-rmap-pass-vma-to-__folio_add_rmap.patch
mm-rmap-abstract-large-mapcount-operations-for-large-folios-hugetlb.patch
bit_spinlock-__always_inline-unlock-functions.patch
mm-rmap-use-folio_large_nr_pages-in-add-remove-functions.patch
mm-rmap-basic-mm-owner-tracking-for-large-folios-hugetlb.patch
mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
mm-convert-folio_likely_mapped_shared-to-folio_maybe_mapped_shared.patch
mm-config_no_page_mapcount-to-prepare-for-not-maintain-per-page-mapcounts-in-large-folios.patch
fs-proc-page-remove-per-page-mapcount-dependency-for-proc-kpagecount-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-pm_mmap_exclusive-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-mapmax-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-smaps-smaps_rollup-config_no_page_mapcount.patch
mm-stop-maintaining-the-per-page-mapcount-of-large-folios-config_no_page_mapcount.patch
From: Eric Dumazet <edumazet(a)google.com>
tcp_abort() has the same issue than the one fixed in the prior patch
in tcp_write_err().
commit 5ce4645c23cf5f048eb8e9ce49e514bababdee85 upstream.
To apply commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4,
this patch must be applied first.
In order to get consistent results from tcp_poll(), we must call
sk_error_report() after tcp_done().
We can use tcp_done_with_error() to centralize this logic.
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Link: https://lore.kernel.org/r/20240528125253.1966136-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v5.10+
[youngmin: Resolved minor conflict in net/ipv4/tcp.c]
Signed-off-by: Youngmin Nam <youngmin.nam(a)samsung.com>
---
net/ipv4/tcp.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7ad82be40f34..9fe164aa185c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4630,13 +4630,9 @@ int tcp_abort(struct sock *sk, int err)
bh_lock_sock(sk);
if (!sock_flag(sk, SOCK_DEAD)) {
- WRITE_ONCE(sk->sk_err, err);
- /* This barrier is coupled with smp_rmb() in tcp_poll() */
- smp_wmb();
- sk_error_report(sk);
if (tcp_need_reset(sk->sk_state))
tcp_send_active_reset(sk, GFP_ATOMIC);
- tcp_done(sk);
+ tcp_done_with_error(sk, err);
}
bh_unlock_sock(sk);
--
2.39.2
This fix is the deadline version of the change made to the rt scheduler
here:
https://lore.kernel.org/lkml/20250225180553.167995-1-harshit@nutanix.com/
Please go through the original change for more details on the issue.
In this fix we bail out or retry in the push_dl_task, if the task is no
longer at the head of pushable tasks list because this list changed
while trying to lock the runqueue of the other CPU.
Signed-off-by: Harshit Agarwal <harshit(a)nutanix.com>
Cc: stable(a)vger.kernel.org
---
kernel/sched/deadline.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 38e4537790af..c5048969c640 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2704,6 +2704,7 @@ static int push_dl_task(struct rq *rq)
{
struct task_struct *next_task;
struct rq *later_rq;
+ struct task_struct *task;
int ret = 0;
next_task = pick_next_pushable_dl_task(rq);
@@ -2734,15 +2735,30 @@ static int push_dl_task(struct rq *rq)
/* Will lock the rq it'll find */
later_rq = find_lock_later_rq(next_task, rq);
- if (!later_rq) {
- struct task_struct *task;
+ task = pick_next_pushable_dl_task(rq);
+ if (later_rq && (!task || task != next_task)) {
+ /*
+ * We must check all this again, since
+ * find_lock_later_rq releases rq->lock and it is
+ * then possible that next_task has migrated and
+ * is no longer at the head of the pushable list.
+ */
+ double_unlock_balance(rq, later_rq);
+ if (!task) {
+ /* No more tasks */
+ goto out;
+ }
+ put_task_struct(next_task);
+ next_task = task;
+ goto retry;
+ }
+ if (!later_rq) {
/*
* We must check all this again, since
* find_lock_later_rq releases rq->lock and it is
* then possible that next_task has migrated.
*/
- task = pick_next_pushable_dl_task(rq);
if (task == next_task) {
/*
* The task is still there. We don't try
@@ -2751,9 +2767,10 @@ static int push_dl_task(struct rq *rq)
goto out;
}
- if (!task)
+ if (!task) {
/* No more tasks */
goto out;
+ }
put_task_struct(next_task);
next_task = task;
--
2.22.3
Upon encountering errors during the HSIC pinctrl handling section the
regulator should be disabled.
Use devm_add_action_or_reset() to let the regulator-disabling routine be
handled by device resource management stack.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4d6141288c33 ("usb: chipidea: imx: pinctrl for HSIC is optional")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
v2: simplify the patch taking advantage of devm-helper (Frank Li)
It looks like devm_regulator_get_enable_optional() exists for this very
use case utilized in the driver but it's not present in old supported
stable kernels and I may say those dependency-patches wouldn't apply there
cleanly. So fix the problem first, further code simplification is a
subject to cleanup patch.
drivers/usb/chipidea/ci_hdrc_imx.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 619779eef333..d942b3c72640 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -336,6 +336,13 @@ static int ci_hdrc_imx_notify_event(struct ci_hdrc *ci, unsigned int event)
return ret;
}
+static void ci_hdrc_imx_disable_regulator(void *arg)
+{
+ struct ci_hdrc_imx_data *data = arg;
+
+ regulator_disable(data->hsic_pad_regulator);
+}
+
static int ci_hdrc_imx_probe(struct platform_device *pdev)
{
struct ci_hdrc_imx_data *data;
@@ -394,6 +401,13 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
"Failed to enable HSIC pad regulator\n");
goto err_put;
}
+ ret = devm_add_action_or_reset(dev,
+ ci_hdrc_imx_disable_regulator, data);
+ if (ret) {
+ dev_err(dev,
+ "Failed to add regulator devm action\n");
+ goto err_put;
+ }
}
}
@@ -432,11 +446,11 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
ret = imx_get_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = imx_prepare_enable_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = clk_prepare_enable(data->clk_wakeup);
if (ret)
@@ -526,10 +540,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
clk_disable_unprepare(data->clk_wakeup);
err_wakeup_clk:
imx_disable_unprepare_clks(dev);
-disable_hsic_regulator:
- if (data->hsic_pad_regulator)
- /* don't overwrite original ret (cf. EPROBE_DEFER) */
- regulator_disable(data->hsic_pad_regulator);
+qos_remove_request:
if (pdata.flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
@@ -557,8 +568,6 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
clk_disable_unprepare(data->clk_wakeup);
if (data->plat_data->flags & CI_HDRC_PMQOS)
cpu_latency_qos_remove_request(&data->pm_qos_req);
- if (data->hsic_pad_regulator)
- regulator_disable(data->hsic_pad_regulator);
}
if (data->usbmisc_data)
put_device(data->usbmisc_data->dev);
--
2.48.1
usbmisc is an optional device property so it is totally valid for the
corresponding data->usbmisc_data to have a NULL value.
Check that before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: 74adad500346 ("usb: chipidea: ci_hdrc_imx: decrement device's refcount in .remove() and in the error path of .probe()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 1a7fc638213e..619779eef333 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -534,7 +534,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
err_put:
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -559,7 +560,8 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
--
2.48.1
The quilt patch titled
Subject: mm/page_alloc: fix memory accept before watermarks gets initialized
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/page_alloc: fix memory accept before watermarks gets initialized
Date: Mon, 10 Mar 2025 10:28:55 +0200
Watermarks are initialized during the postcore initcall. Until then, all
watermarks are set to zero. This causes cond_accept_memory() to
incorrectly skip memory acceptance because a watermark of 0 is always met.
This can lead to a premature OOM on boot.
To ensure progress, accept one MAX_ORDER page if the watermark is zero.
Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Tested-by: Farrah Chen <farrah.chen(a)intel.com>
Reported-by: Farrah Chen <farrah.chen(a)intel.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Pankaj Gupta <pankaj.gupta(a)amd.com>
Cc: Ashish Kalra <ashish.kalra(a)amd.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe(a)intel.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: "Mike Rapoport (IBM)" <rppt(a)kernel.org>
Cc: Thomas Lendacky <thomas.lendacky(a)amd.com>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-memory-accept-before-watermarks-gets-initialized
+++ a/mm/page_alloc.c
@@ -7004,7 +7004,7 @@ static inline bool has_unaccepted_memory
static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
- long to_accept;
+ long to_accept, wmark;
bool ret = false;
if (!has_unaccepted_memory())
@@ -7013,8 +7013,18 @@ static bool cond_accept_memory(struct zo
if (list_empty(&zone->unaccepted_pages))
return false;
+ wmark = promo_wmark_pages(zone);
+
+ /*
+ * Watermarks have not been initialized yet.
+ *
+ * Accepting one MAX_ORDER page to ensure progress.
+ */
+ if (!wmark)
+ return try_to_accept_memory_one(zone);
+
/* How much to accept to get to promo watermark? */
- to_accept = promo_wmark_pages(zone) -
+ to_accept = wmark -
(zone_page_state(zone, NR_FREE_PAGES) -
__zone_watermark_unusable_free(zone, order, 0) -
zone_page_state(zone, NR_UNACCEPTED));
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
The quilt patch titled
Subject: memcg: drain obj stock on cpu hotplug teardown
has been removed from the -mm tree. Its filename was
memcg-drain-obj-stock-on-cpu-hotplug-teardown.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Subject: memcg: drain obj stock on cpu hotplug teardown
Date: Mon, 10 Mar 2025 16:09:34 -0700
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memcontrol.c~memcg-drain-obj-stock-on-cpu-hotplug-teardown
+++ a/mm/memcontrol.c
@@ -1921,9 +1921,18 @@ void drain_all_stock(struct mem_cgroup *
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old;
+ unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
+
+ /* drain_obj_stock requires stock_lock */
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
+ old = drain_obj_stock(stock);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+
drain_stock(stock);
+ obj_cgroup_put(old);
return 0;
}
_
Patches currently in -mm which might be from shakeel.butt(a)linux.dev are
memcg-add-hierarchical-effective-limits-for-v2.patch
memcg-dont-call-propagate_protected_usage-for-v1.patch
page_counter-track-failcnt-only-for-legacy-cgroups.patch
page_counter-reduce-struct-page_counter-size.patch
memcg-bypass-root-memcg-check-for-skmem-charging.patch
memcg-avoid-refill_stock-for-root-memcg.patch
memcg-move-do_memsw_account-to-config_memcg_v1.patch
The quilt patch titled
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs
has been removed from the -mm tree. Its filename was
mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs
Date: Mon, 10 Mar 2025 11:57:27 -0400
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all
page cache refs. Otherwise, the folio will not be freed, causing memory
leak.
This leak would happen on a filesystem with blocksize > page_size and a
truncate is performed, where the blocksize makes folios split to >0 order
ones, causing truncated folios not being freed.
Link: https://lkml.kernel.org/r/20250310155727.472846-1-ziy@nvidia.com
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com>
Cc: Luis Chamberalin <mcgrof(a)kernel.org>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Pankaj Raghav <p.raghav(a)samsung.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs
+++ a/mm/huge_memory.c
@@ -3304,7 +3304,7 @@ static void __split_huge_page(struct pag
folio_account_cleaned(tail,
inode_to_wb(folio->mapping->host));
__filemap_remove_folio(tail, NULL);
- folio_put(tail);
+ folio_put_refs(tail, folio_nr_pages(tail));
} else if (!folio_test_anon(folio)) {
__xa_store(&folio->mapping->i_pages, tail->index,
tail, 0);
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
xarray-add-xas_try_split-to-split-a-multi-index-entry.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix-2.patch
mm-huge_memory-move-folio-split-common-code-to-__folio_split.patch
mm-huge_memory-add-buddy-allocator-like-non-uniform-folio_split.patch
mm-huge_memory-remove-the-old-unused-__split_huge_page.patch
mm-huge_memory-add-folio_split-to-debugfs-testing-interface.patch
mm-truncate-use-folio_split-in-truncate-operation.patch
selftests-mm-add-tests-for-folio_split-buddy-allocator-like-split.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
The quilt patch titled
Subject: selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
has been removed from the -mm tree. Its filename was
selftests-mm-run_vmtestssh-fix-half_ufd_size_mb-calculation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Rafael Aquini <raquini(a)redhat.com>
Subject: selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation
Date: Tue, 18 Feb 2025 14:22:51 -0500
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64 or
bigger:
...
# ------------------------------------
# running ./uffd-stress hugetlb 128 32
# ------------------------------------
# ERROR: invalid MiB (errno=9, @uffd-stress.c:459)
...
# [FAIL]
not ok 3 uffd-stress hugetlb 128 32 # exit=1
...
The problem boils down to how run_vmtests.sh (mis)calculates the size of
the region it feeds to uffd-stress. The latter expects to see an amount
of MiB while the former is just giving out the number of free hugepages
halved down. This measurement discrepancy ends up violating uffd-stress'
assertion on number of hugetlb pages allocated per CPU, causing it to bail
out with the error above.
This commit fixes that issue by adjusting run_vmtests.sh's
half_ufd_size_MB calculation so it properly renders the region size in
MiB, as expected, while maintaining all of its original constraints in
place.
Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com
Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation")
Signed-off-by: Rafael Aquini <raquini(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/run_vmtests.sh | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/run_vmtests.sh~selftests-mm-run_vmtestssh-fix-half_ufd_size_mb-calculation
+++ a/tools/testing/selftests/mm/run_vmtests.sh
@@ -304,7 +304,9 @@ uffd_stress_bin=./uffd-stress
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16
# Hugetlb tests require source and destination huge pages. Pass in half
# the size of the free pages we have, which is used for *each*.
-half_ufd_size_MB=$((freepgs / 2))
+# uffd-stress expects a region expressed in MiB, so we adjust
+# half_ufd_size_MB accordingly.
+half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2))
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32
CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16
_
Patches currently in -mm which might be from raquini(a)redhat.com are
The quilt patch titled
Subject: mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT
has been removed from the -mm tree. Its filename was
mm-fix-error-handling-in-__filemap_get_folio-with-fgp_nowait.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Raphael S. Carvalho" <raphaelsc(a)scylladb.com>
Subject: mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT
Date: Mon, 24 Feb 2025 11:37:00 -0300
original report:
https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=…
When doing buffered writes with FGP_NOWAIT, under memory pressure, the
system returned ENOMEM despite there being plenty of available memory, to
be reclaimed from page cache. The user space used io_uring interface,
which in turn submits I/O with FGP_NOWAIT (the fast path).
retsnoop pointed to iomap_get_folio:
00:34:16.180612 -> 00:34:16.180651 TID/PID 253786/253721
(reactor-1/combined_tests):
entry_SYSCALL_64_after_hwframe+0x76
do_syscall_64+0x82
__do_sys_io_uring_enter+0x265
io_submit_sqes+0x209
io_issue_sqe+0x5b
io_write+0xdd
xfs_file_buffered_write+0x84
iomap_file_buffered_write+0x1a6
32us [-ENOMEM] iomap_write_begin+0x408
iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x���
pos=0 len=4096 foliop=0xffffb32c296b7b80
! 4us [-ENOMEM] iomap_get_folio
iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x���
pos=0 len=4096
This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
from __filemap_get_folio"), which moved error handling from
io_map_get_folio() to __filemap_get_folio(), but broke FGP_NOWAIT
handling, so ENOMEM is being escaped to user space. Had it correctly
returned -EAGAIN with NOWAIT, either io_uring or user space itself would
be able to retry the request.
It's not enough to patch io_uring since the iomap interface is the one
responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must
return the proper error too.
The patch was tested with scylladb test suite (its original reproducer),
and the tests all pass now when memory is pressured.
Link: https://lkml.kernel.org/r/20250224143700.23035-1-raphaelsc@scylladb.com
Fixes: 66dabbb65d67 ("mm: return an ERR_PTR from __filemap_get_folio")
Signed-off-by: Raphael S. Carvalho <raphaelsc(a)scylladb.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
Cc: "Darrick J. Wong" <djwong(a)kernel.org>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/mm/filemap.c~mm-fix-error-handling-in-__filemap_get_folio-with-fgp_nowait
+++ a/mm/filemap.c
@@ -1986,8 +1986,19 @@ no_page:
if (err == -EEXIST)
goto repeat;
- if (err)
+ if (err) {
+ /*
+ * When NOWAIT I/O fails to allocate folios this could
+ * be due to a nonblocking memory allocation and not
+ * because the system actually is out of memory.
+ * Return -EAGAIN so that there caller retries in a
+ * blocking fashion instead of propagating -ENOMEM
+ * to the application.
+ */
+ if ((fgp_flags & FGP_NOWAIT) && err == -ENOMEM)
+ err = -EAGAIN;
return ERR_PTR(err);
+ }
/*
* filemap_add_folio locks the page, and for mmap
* we expect an unlocked page.
_
Patches currently in -mm which might be from raphaelsc(a)scylladb.com are
The quilt patch titled
Subject: mm/migrate: fix shmem xarray update during migration
has been removed from the -mm tree. Its filename was
mm-migrate-fix-shmem-xarray-update-during-migration.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/migrate: fix shmem xarray update during migration
Date: Wed, 5 Mar 2025 15:04:03 -0500
A shmem folio can be either in page cache or in swap cache, but not at the
same time. Namely, once it is in swap cache, folio->mapping should be
NULL, and the folio is no longer in a shmem mapping.
In __folio_migrate_mapping(), to determine the number of xarray entries to
update, folio_test_swapbacked() is used, but that conflates shmem in page
cache case and shmem in swap cache case. It leads to xarray multi-index
entry corruption, since it turns a sibling entry to a normal entry during
xas_store() (see [1] for a userspace reproduction). Fix it by only using
folio_test_swapcache() to determine whether xarray is storing swap cache
entries or not to choose the right number of xarray entries to update.
[1] https://lore.kernel.org/linux-mm/Z8idPCkaJW1IChjT@casper.infradead.org/
Note:
In __split_huge_page(), folio_test_anon() && folio_test_swapcache() is
used to get swap_cache address space, but that ignores the shmem folio in
swap cache case. It could lead to NULL pointer dereferencing when a
in-swap-cache shmem folio is split at __xa_store(), since
!folio_test_anon() is true and folio->mapping is NULL. But fortunately,
its caller split_huge_page_to_list_to_order() bails out early with EBUSY
when folio->mapping is NULL. So no need to take care of it here.
Link: https://lkml.kernel.org/r/20250305200403.2822855-1-ziy@nvidia.com
Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
Suggested-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/mm/migrate.c~mm-migrate-fix-shmem-xarray-update-during-migration
+++ a/mm/migrate.c
@@ -518,15 +518,13 @@ static int __folio_migrate_mapping(struc
if (folio_test_anon(folio) && folio_test_large(folio))
mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
folio_ref_add(newfolio, nr); /* add cache reference */
- if (folio_test_swapbacked(folio)) {
+ if (folio_test_swapbacked(folio))
__folio_set_swapbacked(newfolio);
- if (folio_test_swapcache(folio)) {
- folio_set_swapcache(newfolio);
- newfolio->private = folio_get_private(folio);
- }
+ if (folio_test_swapcache(folio)) {
+ folio_set_swapcache(newfolio);
+ newfolio->private = folio_get_private(folio);
entries = nr;
} else {
- VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
entries = 1;
}
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
xarray-add-xas_try_split-to-split-a-multi-index-entry.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix-2.patch
mm-huge_memory-move-folio-split-common-code-to-__folio_split.patch
mm-huge_memory-add-buddy-allocator-like-non-uniform-folio_split.patch
mm-huge_memory-remove-the-old-unused-__split_huge_page.patch
mm-huge_memory-add-folio_split-to-debugfs-testing-interface.patch
mm-truncate-use-folio_split-in-truncate-operation.patch
selftests-mm-add-tests-for-folio_split-buddy-allocator-like-split.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
fpga_mgr_test_img_load_sgt() allocates memory for sgt using
kunit_kzalloc() however it does not check if the allocation failed.
It then passes sgt to sg_alloc_table(), which passes it to
__sg_alloc_table(). This function calls memset() on sgt in an attempt to
zero it out. If the allocation fails then sgt will be NULL and the
memset will trigger a NULL pointer dereference.
Fix this by checking the allocation with KUNIT_ASSERT_NOT_ERR_OR_NULL().
Fixes: ccbc1c302115 ("fpga: add an initial KUnit suite for the FPGA Manager")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/fpga/tests/fpga-mgr-test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/fpga/tests/fpga-mgr-test.c b/drivers/fpga/tests/fpga-mgr-test.c
index 9cb37aefbac4..1902ebf5a298 100644
--- a/drivers/fpga/tests/fpga-mgr-test.c
+++ b/drivers/fpga/tests/fpga-mgr-test.c
@@ -263,6 +263,7 @@ static void fpga_mgr_test_img_load_sgt(struct kunit *test)
img_buf = init_test_buffer(test, IMAGE_SIZE);
sgt = kunit_kzalloc(test, sizeof(*sgt), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sgt);
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
KUNIT_ASSERT_EQ(test, ret, 0);
sg_init_one(sgt->sgl, img_buf, IMAGE_SIZE);
--
2.39.5
Currently, RCU boost testing in rcutorture is broken because it relies on
having RT throttling disabled. This means the test will always pass (or
rarely fail). This occurs because recently, RT throttling was replaced
by DL server which boosts CFS tasks even when rcutorture tried to
disable throttling (see rcu_torture_disable_rt_throttle()). However, the
systctl_sched_rt_runtime variable is not considered thus still allowing
RT tasks to be preempted by CFS tasks.
Therefore this patch prevents DL server from starting when RCU torture
sets the sysctl_sched_rt_runtime to -1.
With this patch, boosting in TREE09 fails reliably if RCU_BOOST=n.
Steven also mentioned that this could fix RT usecases where users do not
want DL server to be interfering.
Cc: stable(a)vger.kernel.org
Cc: Paul E. McKenney <paulmck(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Fixes: cea5a3472ac4 ("sched/fair: Cleanup fair_server")
Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com>
---
v1->v2:
Updated Fixes tag (Steven)
Moved the stoppage of DL server to fair (Juri)
kernel/sched/fair.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c0ef435a7aa..d7ba333393f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1242,7 +1242,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
* against fair_server such that it can account for this time
* and possibly avoid running this period.
*/
- if (dl_server_active(&rq->fair_server))
+ if (dl_server_active(&rq->fair_server) && rt_bandwidth_enabled())
dl_server_update(&rq->fair_server, delta_exec);
}
@@ -5957,7 +5957,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
sub_nr_running(rq, queued_delta);
/* Stop the fair server if throttling resulted in no runnable tasks */
- if (rq_h_nr_queued && !rq->cfs.h_nr_queued)
+ if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server))
dl_server_stop(&rq->fair_server);
done:
/*
@@ -6056,7 +6056,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
}
/* Start the fair server if un-throttling resulted in new runnable tasks */
- if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
+ if (!rq_h_nr_queued && rq->cfs.h_nr_queued && rt_bandwidth_enabled())
dl_server_start(&rq->fair_server);
/* At this point se is NULL and we are at root level*/
@@ -7005,9 +7005,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
/* Account for idle runtime */
- if (!rq->nr_running)
+ if (!rq->nr_running && rt_bandwidth_enabled())
dl_server_update_idle_time(rq, rq->curr);
- dl_server_start(&rq->fair_server);
+
+ if (rt_bandwidth_enabled())
+ dl_server_start(&rq->fair_server);
}
/* At this point se is NULL and we are at root level*/
@@ -7134,7 +7136,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
sub_nr_running(rq, h_nr_queued);
- if (rq_h_nr_queued && !rq->cfs.h_nr_queued)
+ if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server))
dl_server_stop(&rq->fair_server);
/* balance early to pull high priority tasks */
--
2.43.0
Hi,
As an exhibiting company at The Intec International Trade Fair for Machine Tools, Manufacturing and Automation (INTEC 2025) we have access to the fully updated attendee list, including last-minute registrants who attended the show.
This exclusive list can help you:
✅ Connect with high-intent leads
✅ Follow up with attendees who visited (or missed) your booth
✅ Maximize your ROI from the event
If you're Interested, I’d love to share more details—and I can also provide pricing information.
Looking forward to your thoughts.
Regards,
Jennifer Martin
Sr. Marketing Manager
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031635-spider-hurricane-4475@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031628-percolate-sulphate-3806@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031627-theorize-manned-7263@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d4234d131b0a3f9e65973f1cdc71bb3560f5d14b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031627-capture-pouring-5da4@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4234d131b0a3f9e65973f1cdc71bb3560f5d14b Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Tue, 4 Mar 2025 15:27:00 +0800
Subject: [PATCH] arm64: mm: Populate vmemmap at the page level if not section
aligned
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Link: https://lore.kernel.org/r/20250304072700.3405036-1-quic_zhenhuah@quicinc.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031631-sensuous-cataract-f3f3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69 Mon Sep 17 00:00:00 2001
From: Conor Dooley <conor.dooley(a)microchip.com>
Date: Mon, 3 Mar 2025 10:47:40 +0000
Subject: [PATCH] spi: microchip-core: prevent RX overflows when transmit size
> FIFO size
When the size of a transfer exceeds the size of the FIFO (32 bytes), RX
overflows will be generated and receive data will be corrupted and
warnings will be produced. For example, here's an error generated by a
transfer of 36 bytes:
spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0
The driver is currently split between handling receiving in the
interrupt handler, and sending outside of it. Move all handling out of
the interrupt handling, and explicitly link the number of bytes read of
of the RX FIFO to the number written into the TX one. This both resolves
the overflow problems as well as simplifying the flow of the driver.
CC: stable(a)vger.kernel.org
Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Link: https://patch.msgid.link/20250303-veal-snooper-712c1dfad336@wendy
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c
index 5b6af55855ef..62ba0bd9cbb7 100644
--- a/drivers/spi/spi-microchip-core.c
+++ b/drivers/spi/spi-microchip-core.c
@@ -70,8 +70,7 @@
#define INT_RX_CHANNEL_OVERFLOW BIT(2)
#define INT_TX_CHANNEL_UNDERRUN BIT(3)
-#define INT_ENABLE_MASK (CONTROL_RX_DATA_INT | CONTROL_TX_DATA_INT | \
- CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
+#define INT_ENABLE_MASK (CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
#define REG_CONTROL (0x00)
#define REG_FRAME_SIZE (0x04)
@@ -133,10 +132,15 @@ static inline void mchp_corespi_disable(struct mchp_corespi *spi)
mchp_corespi_write(spi, REG_CONTROL, control);
}
-static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi, int fifo_max)
{
- while (spi->rx_len >= spi->n_bytes && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)) {
- u32 data = mchp_corespi_read(spi, REG_RX_DATA);
+ for (int i = 0; i < fifo_max; i++) {
+ u32 data;
+
+ while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)
+ ;
+
+ data = mchp_corespi_read(spi, REG_RX_DATA);
spi->rx_len -= spi->n_bytes;
@@ -211,11 +215,10 @@ static inline void mchp_corespi_set_xfer_size(struct mchp_corespi *spi, int len)
mchp_corespi_write(spi, REG_FRAMESUP, len);
}
-static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi, int fifo_max)
{
- int fifo_max, i = 0;
+ int i = 0;
- fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
mchp_corespi_set_xfer_size(spi, fifo_max);
while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) {
@@ -413,19 +416,6 @@ static irqreturn_t mchp_corespi_interrupt(int irq, void *dev_id)
if (intfield == 0)
return IRQ_NONE;
- if (intfield & INT_TXDONE)
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_TXDONE);
-
- if (intfield & INT_RXRDY) {
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_RXRDY);
-
- if (spi->rx_len)
- mchp_corespi_read_fifo(spi);
- }
-
- if (!spi->rx_len && !spi->tx_len)
- finalise = true;
-
if (intfield & INT_RX_CHANNEL_OVERFLOW) {
mchp_corespi_write(spi, REG_INT_CLEAR, INT_RX_CHANNEL_OVERFLOW);
finalise = true;
@@ -512,9 +502,14 @@ static int mchp_corespi_transfer_one(struct spi_controller *host,
mchp_corespi_write(spi, REG_SLAVE_SELECT, spi->pending_slave_select);
- while (spi->tx_len)
- mchp_corespi_write_fifo(spi);
+ while (spi->tx_len) {
+ int fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
+ mchp_corespi_write_fifo(spi, fifo_max);
+ mchp_corespi_read_fifo(spi, fifo_max);
+ }
+
+ spi_finalize_current_transfer(host);
return 1;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031631-outskirts-catfight-a453@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91cf42c63f2d8a9c1bcdfe923218e079b32e1a69 Mon Sep 17 00:00:00 2001
From: Conor Dooley <conor.dooley(a)microchip.com>
Date: Mon, 3 Mar 2025 10:47:40 +0000
Subject: [PATCH] spi: microchip-core: prevent RX overflows when transmit size
> FIFO size
When the size of a transfer exceeds the size of the FIFO (32 bytes), RX
overflows will be generated and receive data will be corrupted and
warnings will be produced. For example, here's an error generated by a
transfer of 36 bytes:
spi_master spi0: mchp_corespi_interrupt: RX OVERFLOW: rxlen: 4, txlen: 0
The driver is currently split between handling receiving in the
interrupt handler, and sending outside of it. Move all handling out of
the interrupt handling, and explicitly link the number of bytes read of
of the RX FIFO to the number written into the TX one. This both resolves
the overflow problems as well as simplifying the flow of the driver.
CC: stable(a)vger.kernel.org
Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Link: https://patch.msgid.link/20250303-veal-snooper-712c1dfad336@wendy
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c
index 5b6af55855ef..62ba0bd9cbb7 100644
--- a/drivers/spi/spi-microchip-core.c
+++ b/drivers/spi/spi-microchip-core.c
@@ -70,8 +70,7 @@
#define INT_RX_CHANNEL_OVERFLOW BIT(2)
#define INT_TX_CHANNEL_UNDERRUN BIT(3)
-#define INT_ENABLE_MASK (CONTROL_RX_DATA_INT | CONTROL_TX_DATA_INT | \
- CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
+#define INT_ENABLE_MASK (CONTROL_RX_OVER_INT | CONTROL_TX_UNDER_INT)
#define REG_CONTROL (0x00)
#define REG_FRAME_SIZE (0x04)
@@ -133,10 +132,15 @@ static inline void mchp_corespi_disable(struct mchp_corespi *spi)
mchp_corespi_write(spi, REG_CONTROL, control);
}
-static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_read_fifo(struct mchp_corespi *spi, int fifo_max)
{
- while (spi->rx_len >= spi->n_bytes && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)) {
- u32 data = mchp_corespi_read(spi, REG_RX_DATA);
+ for (int i = 0; i < fifo_max; i++) {
+ u32 data;
+
+ while (mchp_corespi_read(spi, REG_STATUS) & STATUS_RXFIFO_EMPTY)
+ ;
+
+ data = mchp_corespi_read(spi, REG_RX_DATA);
spi->rx_len -= spi->n_bytes;
@@ -211,11 +215,10 @@ static inline void mchp_corespi_set_xfer_size(struct mchp_corespi *spi, int len)
mchp_corespi_write(spi, REG_FRAMESUP, len);
}
-static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi)
+static inline void mchp_corespi_write_fifo(struct mchp_corespi *spi, int fifo_max)
{
- int fifo_max, i = 0;
+ int i = 0;
- fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
mchp_corespi_set_xfer_size(spi, fifo_max);
while ((i < fifo_max) && !(mchp_corespi_read(spi, REG_STATUS) & STATUS_TXFIFO_FULL)) {
@@ -413,19 +416,6 @@ static irqreturn_t mchp_corespi_interrupt(int irq, void *dev_id)
if (intfield == 0)
return IRQ_NONE;
- if (intfield & INT_TXDONE)
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_TXDONE);
-
- if (intfield & INT_RXRDY) {
- mchp_corespi_write(spi, REG_INT_CLEAR, INT_RXRDY);
-
- if (spi->rx_len)
- mchp_corespi_read_fifo(spi);
- }
-
- if (!spi->rx_len && !spi->tx_len)
- finalise = true;
-
if (intfield & INT_RX_CHANNEL_OVERFLOW) {
mchp_corespi_write(spi, REG_INT_CLEAR, INT_RX_CHANNEL_OVERFLOW);
finalise = true;
@@ -512,9 +502,14 @@ static int mchp_corespi_transfer_one(struct spi_controller *host,
mchp_corespi_write(spi, REG_SLAVE_SELECT, spi->pending_slave_select);
- while (spi->tx_len)
- mchp_corespi_write_fifo(spi);
+ while (spi->tx_len) {
+ int fifo_max = DIV_ROUND_UP(min(spi->tx_len, FIFO_DEPTH), spi->n_bytes);
+ mchp_corespi_write_fifo(spi, fifo_max);
+ mchp_corespi_read_fifo(spi, fifo_max);
+ }
+
+ spi_finalize_current_transfer(host);
return 1;
}
On Sat, Mar 15, 2025 at 07:57:55PM +0100, proxy0(a)tutamail.com wrote:
> Mar 15, 2025, 9:28 PM by lukas(a)wunner.de:
> > After dwelling on this for a while, I'm thinking that it may re-introduce
> > the issue fixed by commit f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock
> > between reset_lock and device_lock"):
> >
> > Looking at the second and third stack trace in its commit message,
> > down_write(reset_lock) in pciehp_reset_slot() is basically equivalent
> > to synchronize_irq() and we're holding device_lock() at that point,
> > hindering progress of pciehp_ist().
> >
> > So I think I have guided you in the wrong direction and I apologize
> > for that.
> >
> > However it seems to me that this should be solvable with the small
> > patch below. Am I missing something?
> >
> > @Joel Mathew Thomas, could you give the below patch a spin and see
> > if it helps?
>
> I've tested the patch series along with the additional patch provided.
>
> Kernel: 6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix
>
> Patches applied:
> - [PATCH 1/4] PCI/hotplug: Disable HPIE over reset
> - [PATCH 2/4] PCI/hotplug: Clearing HPIE for the duration of reset is enough
> - [PATCH 3/4] PCI/hotplug: reset_lock is not required synchronizing with irq thread
> - [PATCH 4/4] PCI/hotplug: Don't enable HPIE in poll mode
> - The latest patch from you:
> + /* Ignore events masked by pciehp_reset_slot(). */
> + events &= ctrl->slot_ctrl;
> + if (!events)
> + return IRQ_HANDLED;
Could you test *only* the quoted diff, i.e. without patches [1/4] - [4/4],
on top of a recent kernel?
Sorry for not having been clear about this.
I believe that patch [1/4] will re-introduce a deadlock we've
already fixed two years ago, so the small diff above seeks to
replace it with a simpler approach that will hopefully avoid
the issue as well.
Thanks,
Lukas
Hi,
I prepared this series about 6 months ago, but never got around to
sending it in. In mainline, we got rid of remap_pfn_range() on the
io_uring side, and this just backports it to 6.6-stable as well.
This eliminates issues with fragmented memory, and hence it'd
be nice to have it in 6.6 stable as well. 6.6-stable also has
mmap support for provided ring buffers, and since we've had an
issue related to remap_pfn_range() there, let's bring it to 6.6
stable as well as a precaution.
--
Jens Axboe
From: Andrew Davis <afd(a)ti.com>
[ Upstream commit 5ab90f40121a9f6a9b368274cd92d0f435dc7cfa ]
The syscon helper device_node_to_regmap() is used to fetch a regmap
registered to a device node. It also currently creates this regmap
if the node did not already have a regmap associated with it. This
should only be used on "syscon" nodes. This driver is not such a
device and instead uses device_node_to_regmap() on its own node as
a hacky way to create a regmap for itself.
This will not work going forward and so we should create our regmap
the normal way by defining our regmap_config, fetching our memory
resource, then using the normal regmap_init_mmio() function.
Signed-off-by: Andrew Davis <afd(a)ti.com>
Tested-by: Nishanth Menon <nm(a)ti.com>
Link: https://lore.kernel.org/r/20250123182234.597665-1-afd@ti.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/phy/ti/phy-gmii-sel.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/phy/ti/phy-gmii-sel.c b/drivers/phy/ti/phy-gmii-sel.c
index e0ca59ae31531..ff5d5e29629fa 100644
--- a/drivers/phy/ti/phy-gmii-sel.c
+++ b/drivers/phy/ti/phy-gmii-sel.c
@@ -424,6 +424,12 @@ static int phy_gmii_sel_init_ports(struct phy_gmii_sel_priv *priv)
return 0;
}
+static const struct regmap_config phy_gmii_sel_regmap_cfg = {
+ .reg_bits = 32,
+ .val_bits = 32,
+ .reg_stride = 4,
+};
+
static int phy_gmii_sel_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -468,7 +474,14 @@ static int phy_gmii_sel_probe(struct platform_device *pdev)
priv->regmap = syscon_node_to_regmap(node->parent);
if (IS_ERR(priv->regmap)) {
- priv->regmap = device_node_to_regmap(node);
+ void __iomem *base;
+
+ base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(base))
+ return dev_err_probe(dev, PTR_ERR(base),
+ "failed to get base memory resource\n");
+
+ priv->regmap = regmap_init_mmio(dev, base, &phy_gmii_sel_regmap_cfg);
if (IS_ERR(priv->regmap))
return dev_err_probe(dev, PTR_ERR(priv->regmap),
"Failed to get syscon\n");
--
2.39.5
From: Zhang Lixu <lixu.zhang(a)intel.com>
[ Upstream commit 4b54ae69197b9f416baa0fceadff7e89075f8454 ]
The timestamps in the Firmware log and HID sensor samples are incorrect.
They show 1970-01-01 because the current IPC driver only uses the first
8 bytes of bootup time when synchronizing time with the firmware. The
firmware converts the bootup time to UTC time, which results in the
display of 1970-01-01.
In write_ipc_from_queue(), when sending the MNG_SYNC_FW_CLOCK message,
the clock is updated according to the definition of ipc_time_update_msg.
However, in _ish_sync_fw_clock(), the message length is specified as the
size of uint64_t when building the doorbell. As a result, the firmware
only receives the first 8 bytes of struct ipc_time_update_msg.
This patch corrects the length in the doorbell to ensure the entire
ipc_time_update_msg is sent, fixing the timestamp issue.
Signed-off-by: Zhang Lixu <lixu.zhang(a)intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hid/intel-ish-hid/ipc/ipc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/hid/intel-ish-hid/ipc/ipc.c b/drivers/hid/intel-ish-hid/ipc/ipc.c
index dd5fc60874ba1..b1a41c90c5741 100644
--- a/drivers/hid/intel-ish-hid/ipc/ipc.c
+++ b/drivers/hid/intel-ish-hid/ipc/ipc.c
@@ -577,14 +577,14 @@ static void fw_reset_work_fn(struct work_struct *unused)
static void _ish_sync_fw_clock(struct ishtp_device *dev)
{
static unsigned long prev_sync;
- uint64_t usec;
+ struct ipc_time_update_msg time = {};
if (prev_sync && time_before(jiffies, prev_sync + 20 * HZ))
return;
prev_sync = jiffies;
- usec = ktime_to_us(ktime_get_boottime());
- ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &usec, sizeof(uint64_t));
+ /* The fields of time would be updated while sending message */
+ ipc_send_mng_msg(dev, MNG_SYNC_FW_CLOCK, &time, sizeof(time));
}
/**
--
2.39.5
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
[ Upstream commit 5ac9b4e935dfc6af41eee2ddc21deb5c36507a9f ]
>From memfd_secret(2) manpage:
The memory areas backing the file created with memfd_secret(2) are
visible only to the processes that have access to the file descriptor.
The memory region is removed from the kernel page tables and only the
page tables of the processes holding the file descriptor map the
corresponding physical memory. (Thus, the pages in the region can't be
accessed by the kernel itself, so that, for example, pointers to the
region can't be passed to system calls.)
We need to handle this special case gracefully in build ID fetching
code. Return -EFAULT whenever secretmem file is passed to build_id_parse()
family of APIs. Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
Fixes: de3ec364c3c3 ("lib/buildid: add single folio-based file reader abstraction")
Reported-by: Yi Lai <yi1.lai(a)intel.com>
Suggested-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Link: https://lore.kernel.org/bpf/20241017175431.6183-A-hca@linux.ibm.com
Link: https://lore.kernel.org/bpf/20241017174713.2157873-1-andrii@kernel.org
[ Linxuan: perform an equivalent direct check without folio-based changes ]
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
Some previous discussions can be found in the following links:
https://lore.kernel.org/stable/05D0A9F7DE394601+20250311100555.310788-2-che…
---
lib/buildid.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..6249bd47fb0b 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -5,6 +5,7 @@
#include <linux/elf.h>
#include <linux/kernel.h>
#include <linux/pagemap.h>
+#include <linux/secretmem.h>
#define BUILD_ID 3
@@ -157,6 +158,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
The retain_ff bit should be updated for a GDSC when it is under SW
control and ON. The current sequence needs to be fixed as the GDSC
needs to update retention and is moved to HW control which does not
guarantee the GDSC to be in enabled state.
During the GDSC FSM state, the GDSC hardware waits for an ACK and the
timeout for the ACK is 2000us as per design requirements.
Signed-off-by: Taniya Das <quic_tdas(a)quicinc.com>
---
Taniya Das (2):
clk: qcom: gdsc: Set retain_ff before moving to HW CTRL
clk: qcom: gdsc: Update the status poll timeout for GDSC
drivers/clk/qcom/gdsc.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
---
base-commit: c674aa7c289e51659e40dda0f954886ef7f80042
change-id: 20250212-gdsc_fixes-77e8b8e27e2f
Best regards,
--
Taniya Das <quic_tdas(a)quicinc.com>
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
DSP expects the periods to be aligned to fragment sizes, currently
setting up to hw constriants on periods bytes is not going to work
correctly as we can endup with periods sizes aligned to 32 bytes however
not aligned to fragment size.
Update the constriants to use fragment size, and also set at step of
10ms for period size to accommodate DSP requirements of 10ms latency.
Fixes: 9b4fe0f1cd79 ("ASoC: qdsp6: audioreach: add q6apm-dai support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
---
sound/soc/qcom/qdsp6/q6apm-dai.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/sound/soc/qcom/qdsp6/q6apm-dai.c b/sound/soc/qcom/qdsp6/q6apm-dai.c
index 90cb24947f31..180ff24041bf 100644
--- a/sound/soc/qcom/qdsp6/q6apm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6apm-dai.c
@@ -385,13 +385,14 @@ static int q6apm_dai_open(struct snd_soc_component *component,
}
}
- ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, 32);
+ /* setup 10ms latency to accommodate DSP restrictions */
+ ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, 480);
if (ret < 0) {
dev_err(dev, "constraint for period bytes step ret = %d\n", ret);
goto err;
}
- ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES, 32);
+ ret = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_SIZE, 480);
if (ret < 0) {
dev_err(dev, "constraint for buffer bytes step ret = %d\n", ret);
goto err;
--
2.39.5
Hi there,
I wanted to check in to see if you've had a chance to review my previous message.
Your feedback would be greatly appreciated.
Best regards,
Kevin
________________________________
From: Kevin Martin
Sent: 12 March 2025 08:08
To: linux-stable-mirror(a)lists.linaro.org<mailto:linux-stable-mirror@lists.linaro.org>
Subject: Get premium contact lists
Hi there,
Hope you're having a great day!
Would you be interested in a recently verified list of NetApp clients to support your outreach?
Let me know, and I'll be happy to share the details.
Best regards,
Kevin Martin
Demand Consultant
If you wish to stop receiving emails, reply with Abolish.
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: bd50b1f5b6f3 ("arm64: dts: qcom: x1e80100: Add Compute Reference Device")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Cc: Rajendra Nayak <quic_rjendra(a)quicinc.com>
Cc: Sibi Sankar <quic_sibis(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1-crd.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1-crd.dtsi b/arch/arm64/boot/dts/qcom/x1-crd.dtsi
index 9e587dc57532..22ebcbe54e24 100644
--- a/arch/arm64/boot/dts/qcom/x1-crd.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1-crd.dtsi
@@ -610,6 +610,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -631,6 +632,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Rajendra Nayak <quic_rjendra(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
index ec594628304a..8f366bf61bbd 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
@@ -437,6 +437,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -458,6 +459,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 45247fe17db2 ("arm64: dts: qcom: x1e80100: add Lenovo Thinkpad Yoga slim 7x devicetree")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts b/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
index a3d53f2ba2c3..9d4ba9728355 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-lenovo-yoga-slim7x.dts
@@ -290,6 +290,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l14b_3p0: ldo14 {
@@ -304,8 +305,8 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
-
};
regulators-1 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 6f18b8d4142c ("arm64: dts: qcom: x1e80100-hp-x14: dt for HP Omnibook X Laptop 14")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Jens Glathe <jens.glathe(a)oldschoolsolutions.biz>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts b/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
index cd860a246c45..ab5addb33b7a 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-hp-omnibook-x14.dts
@@ -633,6 +633,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -654,6 +655,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Note that these supplies currently have no consumers described in
mainline.
Fixes: f5b788d0e8cd ("arm64: dts: qcom: Add support for X1-based Dell XPS 13 9345")
Cc: stable(a)vger.kernel.org # 6.13
Reviewed-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Tested-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
index 86e87f03b0ec..90f588ed7d63 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
@@ -359,6 +359,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -380,6 +381,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 7b8a31e82b87 ("arm64: dts: qcom: Add X1E001DE Snapdragon Devkit for Windows")
Cc: stable(a)vger.kernel.org # 6.14
Cc: Sibi Sankar <quic_sibis(a)quicinc.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e001de-devkit.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts b/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
index 5e3970b26e2f..f92bda2d34f2 100644
--- a/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
+++ b/arch/arm64/boot/dts/qcom/x1e001de-devkit.dts
@@ -507,6 +507,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -528,6 +529,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l16b_2p9: ldo16 {
--
2.48.1
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Fixes: 7d1cbe2f4985 ("arm64: dts: qcom: Add X1E78100 ThinkPad T14s Gen 6")
Cc: stable(a)vger.kernel.org # 6.12
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi b/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
index eff0e73640bc..160c052db1ec 100644
--- a/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e78100-lenovo-thinkpad-t14s.dtsi
@@ -456,6 +456,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -477,6 +478,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.48.1
From: Haibo Chen <haibo.chen(a)nxp.com>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Fixes: 4de349e786a3 ("can: flexcan: fix resume function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
Link: https://patch.msgid.link/20250314110145.899179-1-haibo.chen@nxp.com
Reported-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Closes: https://lore.kernel.org/all/20250314-married-polar-elephant-b15594-mkl@peng…
[mkl: add newlines]
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/flexcan/flexcan-core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/flexcan/flexcan-core.c b/drivers/net/can/flexcan/flexcan-core.c
index ac1a860986df..3a71fd235722 100644
--- a/drivers/net/can/flexcan/flexcan-core.c
+++ b/drivers/net/can/flexcan/flexcan-core.c
@@ -2266,8 +2266,9 @@ static int __maybe_unused flexcan_suspend(struct device *device)
}
netif_stop_queue(dev);
netif_device_detach(dev);
+
+ priv->can.state = CAN_STATE_SLEEPING;
}
- priv->can.state = CAN_STATE_SLEEPING;
return 0;
}
@@ -2278,7 +2279,6 @@ static int __maybe_unused flexcan_resume(struct device *device)
struct flexcan_priv *priv = netdev_priv(dev);
int err;
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
if (netif_running(dev)) {
netif_device_attach(dev);
netif_start_queue(dev);
@@ -2298,6 +2298,8 @@ static int __maybe_unused flexcan_resume(struct device *device)
flexcan_chip_interrupts_enable(dev);
}
+
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
}
return 0;
--
2.47.2
From: Haibo Chen <haibo.chen(a)nxp.com>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Fixes: 4de349e786a3 ("can: flexcan: fix resume function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
---
Changes for v3:
- only handle priv->can.state when netif_running(dev) return true in PM.
---
drivers/net/can/flexcan/flexcan-core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/flexcan/flexcan-core.c b/drivers/net/can/flexcan/flexcan-core.c
index b347a1c93536..d4d342d8f490 100644
--- a/drivers/net/can/flexcan/flexcan-core.c
+++ b/drivers/net/can/flexcan/flexcan-core.c
@@ -2299,8 +2299,8 @@ static int __maybe_unused flexcan_suspend(struct device *device)
}
netif_stop_queue(dev);
netif_device_detach(dev);
+ priv->can.state = CAN_STATE_SLEEPING;
}
- priv->can.state = CAN_STATE_SLEEPING;
return 0;
}
@@ -2311,7 +2311,6 @@ static int __maybe_unused flexcan_resume(struct device *device)
struct flexcan_priv *priv = netdev_priv(dev);
int err;
- priv->can.state = CAN_STATE_ERROR_ACTIVE;
if (netif_running(dev)) {
netif_device_attach(dev);
netif_start_queue(dev);
@@ -2331,6 +2330,7 @@ static int __maybe_unused flexcan_resume(struct device *device)
flexcan_chip_interrupts_enable(dev);
}
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
}
return 0;
--
2.34.1
This patch series addresses 2 issues
1) Fix typo in pattern properties for R-Car V4M.
2) Fix page entries in the AFL list.
v2->v3:
* Collected tags.
* Dropped unused variables cfg and start from
rcar_canfd_configure_afl_rules().
v1->v2:
* Split fixes patches as separate series.
* Added Rb tag from Geert for binding patch.
* Added the tag Cc:stable@vger.kernel.org
Biju Das (2):
dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties
for R-Car V4M
can: rcar_canfd: Fix page entries in the AFL list
.../bindings/net/can/renesas,rcar-canfd.yaml | 2 +-
drivers/net/can/rcar/rcar_canfd.c | 28 ++++++++-----------
2 files changed, 12 insertions(+), 18 deletions(-)
--
2.43.0
Prepare vPMC registers for user-initiated changes after first run. This
is important specifically for debugging Windows on QEMU with GDB; QEMU
tries to write back all visible registers when resuming the VM execution
with GDB, corrupting the PMU state. Windows always uses the PMU so this
can cause adverse effects on that particular OS.
This series also contains patch "KVM: arm64: PMU: Set raw values from
user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}", which reverts semantic
changes made for the mentioned registers in the past. It is necessary
to migrate the PMU state properly on Firecracker, QEMU, and crosvm.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v4:
- Reverted changes for functions implementing ioctls in patch
"KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Removed kvm_pmu_vcpu_reset().
- Reordered function calls in kvm_vcpu_reload_pmu() for better style.
- Link to v3: https://lore.kernel.org/r/20250312-pmc-v3-0-0411cab5dc3d@daynix.com
Changes in v3:
- Added patch "KVM: arm64: PMU: Assume PMU presence in pmu-emul.c".
- Added an explanation of this path series' motivation to each patch.
- Explained why userspace register writes and register reset should be
covered in patch "KVM: arm64: PMU: Reload when user modifies
registers".
- Marked patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" for stable.
- Reoreded so that patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}" would come first.
- Added patch "KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking
PMCNTENSET_EL0".
- Added patch "KVM: arm64: Reload PMCNTENSET_EL0".
- Link to v2: https://lore.kernel.org/r/20250307-pmc-v2-0-6c3375a5f1e4@daynix.com
Changes in v2:
- Changed to utilize KVM_REQ_RELOAD_PMU as suggested by Oliver Upton.
- Added patch "KVM: arm64: PMU: Reload when user modifies registers"
to cover more registers.
- Added patch "KVM: arm64: PMU: Set raw values from user to
PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}".
- Link to v1: https://lore.kernel.org/r/20250302-pmc-v1-1-caff989093dc@daynix.com
---
Akihiko Odaki (7):
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Call kvm_pmu_handle_pmcr() after masking PMCNTENSET_EL0
KVM: arm64: PMU: Reload PMCNTENSET_EL0
KVM: arm64: PMU: Reload when resetting
arch/arm64/kvm/arm.c | 8 ++++---
arch/arm64/kvm/pmu-emul.c | 60 ++++++++++++++---------------------------------
arch/arm64/kvm/reset.c | 3 ---
arch/arm64/kvm/sys_regs.c | 53 +++++++++++++++++++++++------------------
include/kvm/arm_pmu.h | 3 +--
5 files changed, 53 insertions(+), 74 deletions(-)
---
base-commit: da2f480cb24d39d480b1e235eda0dd2d01f8765b
change-id: 20250302-pmc-b90a86af945c
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This is a note to let you know that I've just added the patch titled
iio: adc: ad7768-1: Fix conversion result sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
From 8236644f5ecb180e80ad92d691c22bc509b747bb Mon Sep 17 00:00:00 2001
From: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Date: Thu, 6 Mar 2025 18:00:29 -0300
Subject: iio: adc: ad7768-1: Fix conversion result sign
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
Fixes: a5f8c7da3dbe ("iio: adc: Add AD7768-1 ADC basic support")
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt(a)analog.com>
Signed-off-by: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Signed-off-by: Jonathan Santos <Jonathan.Santos(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://patch.msgid.link/505994d3b71c2aa38ba714d909a68e021f12124c.174126812…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7768-1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index ea829c51e80b..09e7cccfd51c 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -142,7 +142,7 @@ static const struct iio_chan_spec ad7768_channels[] = {
.channel = 0,
.scan_index = 0,
.scan_type = {
- .sign = 'u',
+ .sign = 's',
.realbits = 24,
.storagebits = 32,
.shift = 8,
@@ -373,7 +373,7 @@ static int ad7768_read_raw(struct iio_dev *indio_dev,
iio_device_release_direct(indio_dev);
if (ret < 0)
return ret;
- *val = ret;
+ *val = sign_extend32(ret, chan->scan_type.realbits - 1);
return IIO_VAL_INT;
--
2.48.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad7768-1: Fix conversion result sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 8236644f5ecb180e80ad92d691c22bc509b747bb Mon Sep 17 00:00:00 2001
From: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Date: Thu, 6 Mar 2025 18:00:29 -0300
Subject: iio: adc: ad7768-1: Fix conversion result sign
The ad7768-1 ADC output code is two's complement, meaning that the voltage
conversion result is a signed value.. Since the value is a 24 bit one,
stored in a 32 bit variable, the sign should be extended in order to get
the correct representation.
Also the channel description has been updated to signed representation,
to match the ADC specifications.
Fixes: a5f8c7da3dbe ("iio: adc: Add AD7768-1 ADC basic support")
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Reviewed-by: Marcelo Schmitt <marcelo.schmitt(a)analog.com>
Signed-off-by: Sergiu Cuciurean <sergiu.cuciurean(a)analog.com>
Signed-off-by: Jonathan Santos <Jonathan.Santos(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://patch.msgid.link/505994d3b71c2aa38ba714d909a68e021f12124c.174126812…
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7768-1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad7768-1.c b/drivers/iio/adc/ad7768-1.c
index ea829c51e80b..09e7cccfd51c 100644
--- a/drivers/iio/adc/ad7768-1.c
+++ b/drivers/iio/adc/ad7768-1.c
@@ -142,7 +142,7 @@ static const struct iio_chan_spec ad7768_channels[] = {
.channel = 0,
.scan_index = 0,
.scan_type = {
- .sign = 'u',
+ .sign = 's',
.realbits = 24,
.storagebits = 32,
.shift = 8,
@@ -373,7 +373,7 @@ static int ad7768_read_raw(struct iio_dev *indio_dev,
iio_device_release_direct(indio_dev);
if (ret < 0)
return ret;
- *val = ret;
+ *val = sign_extend32(ret, chan->scan_type.realbits - 1);
return IIO_VAL_INT;
--
2.48.1
Currently on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only
available for KVM Guests running on PowerNV and not for the KVM guests
running on pSeries hypervisors. This prevents a pSeries L2 guest from
leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and
H_STUFF_TCE hcalls that results in slow startup times for large memory
guests.
Support for VFIO on pSeries was restored in commit f431a8cde7f1
("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries"),
making it possible to re-enable this capability on pSeries hosts.
This change enables KVM_CAP_SPAPR_TCE_VFIO for nested PAPR guests on
pSeries, while maintaining the existing behavior on PowerNV. Booting an
L2 guest with 128GB of memory shows an average 11% improvement in
startup time.
Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vaibhav Jain <vaibhav(a)linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Amit Machhiwal <amachhiw(a)linux.ibm.com>
---
Changes since v2:
* Updated the patch description
* v2: https://lore.kernel.org/all/20250129094033.2265211-1-amachhiw@linux.ibm.com/
Changes since v1:
* Addressed review comments from Ritesh
* v1: https://lore.kernel.org/all/20250109132053.158436-1-amachhiw@linux.ibm.com/
arch/powerpc/kvm/powerpc.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ce1d91eed231..a7138eb18d59 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -543,26 +543,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !hv_enabled;
break;
#ifdef CONFIG_KVM_MPIC
case KVM_CAP_IRQ_MPIC:
r = 1;
break;
#endif
#ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
+ fallthrough;
case KVM_CAP_SPAPR_TCE_64:
- r = 1;
- break;
case KVM_CAP_SPAPR_TCE_VFIO:
- r = !!cpu_has_feature(CPU_FTR_HVMODE);
- break;
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
#ifdef CONFIG_KVM_XICS
case KVM_CAP_IRQ_XICS:
#endif
case KVM_CAP_PPC_GET_CPU_CHAR:
r = 1;
break;
#ifdef CONFIG_KVM_XIVE
base-commit: 6537cfb395f352782918d8ee7b7f10ba2cc3cbf2
--
2.48.1
From: Thomas Weißschuh <linux(a)weissschuh.net>
commit fd53aa40e65f518453115b6f56183b0c201db26b upstream.
The ioctl and sysfs handlers unconditionally call the ->enable callback.
Not all drivers implement that callback, leading to NULL dereferences.
Example of affected drivers: ptp_s390.c, ptp_vclock.c and ptp_mock.c.
Instead use a dummy callback if no better was specified by the driver.
Fixes: d94ba80ebbea ("ptp: Added a brand new class driver for ptp clocks.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Richard Cochran <richardcochran(a)gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski(a)linux.intel.com>
Link: https://patch.msgid.link/20250123-ptp-enable-v1-1-b015834d3a47@weissschuh.n…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Conflict due to
42704b26b0f1 ("ptp: Add cycles support for virtual clocks")
not in the tree]
Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com>
---
drivers/ptp/ptp_clock.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 4d775cd8ee3c..c895e26b1f17 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -188,6 +188,11 @@ static void ptp_clock_release(struct device *dev)
kfree(ptp);
}
+static int ptp_enable(struct ptp_clock_info *ptp, struct ptp_clock_request *request, int on)
+{
+ return -EOPNOTSUPP;
+}
+
static void ptp_aux_kworker(struct kthread_work *work)
{
struct ptp_clock *ptp = container_of(work, struct ptp_clock,
@@ -233,6 +238,9 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
mutex_init(&ptp->pincfg_mux);
init_waitqueue_head(&ptp->tsev_wq);
+ if (!ptp->info->enable)
+ ptp->info->enable = ptp_enable;
+
if (ptp->info->do_aux_work) {
kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
ptp->kworker = kthread_create_worker(0, "ptp%d", ptp->index);
--
2.47.1
This patch series addresses a shutdown issue reported in [1].
This problem has been fixed on kernel 6.12 and later, kernel 6.6 is
the last kernel these upstream patches should go to because the Realtek
RTL8852BE chip supported by kernel since v6.2 is the only chip known to
have this problem.
[1] https://github.com/lwfinger/rtw89/issues/372
Zenm Chen (2):
wifi: rtw89: pci: add pre_deinit to be called after probe complete
wifi: rtw89: pci: disable PCIE wake bit when PCIE deinit
drivers/net/wireless/realtek/rtw89/core.c | 2 ++
drivers/net/wireless/realtek/rtw89/core.h | 6 ++++++
drivers/net/wireless/realtek/rtw89/pci.c | 10 ++++++++++
3 files changed, 18 insertions(+)
--
2.48.1
This series addresses GPU reset issues reported in [1], where running a
long compute job would trigger repeated GPU resets, leading to a UI
freeze.
Patches #1 and #2 prevent the same faulty job from being resubmitted in a
loop, mitigating the first cause of the issue.
However, the issue isn't entirely solved. Even with only a single GPU
reset, the UI still freezes on the Raspberry Pi 5, indicating a GPU hang.
Patches #3 to #6 address this by properly configuring the V3D_SMS
registers, which are required for power management and resets in V3D 7.1.
Patch #7 updates the DT maintainership, replacing Emma with the current
v3d driver maintainer.
[1] https://github.com/raspberrypi/linux/issues/6660
Best Regards,
- Maíra
---
v1 -> v2:
- [1/6, 2/6, 5/6] Add Iago's R-b (Iago Toral)
- [3/6] Use V3D_GEN_* macros consistently throughout the driver (Phil Elwell)
- [3/6] Don't add Iago's R-b in 3/6 due to changes in the patch
- [4/6] Add per-compatible restrictions to enforce per‐SoC register rules (Conor Dooley)
- [6/6] Add Emma's A-b, collected through IRC (Emma Anholt)
- [6/6] Add Rob's A-b (Rob Herring)
- Link to v1: https://lore.kernel.org/r/20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@ig…
v2 -> v3:
- [3/7] Add Iago's R-b (Iago Toral)
- [4/7, 5/7] Separate the patches to ease the reviewing process -> Now,
PATCH 4/7 only adds the per-compatible rules and PATCH 5/7 adds the
SMS registers
- [4/7] `allOf` goes above `additionalProperties` (Krzysztof Kozlowski)
- [4/7, 5/7] Sync `reg` and `reg-names` items (Krzysztof Kozlowski)
- Link to v2: https://lore.kernel.org/r/20250308-v3d-gpu-reset-fixes-v2-0-2939c30f0cc4@ig…
v3 -> v4:
- [4/7] BCM2712 has an external reset controller, therefore the "bridge"
register is not needed (Krzysztof Kozlowski)
- [4/7] Remove the word "required" from the reg descriptions (Rob Herring)
- [5/7] Improve commit message (Rob Herring)
- Link to v3: https://lore.kernel.org/r/20250311-v3d-gpu-reset-fixes-v3-0-64f7a4247ec0@ig…
---
Maíra Canal (7):
drm/v3d: Don't run jobs that have errors flagged in its fence
drm/v3d: Set job pointer to NULL when the job's fence has an error
drm/v3d: Associate a V3D tech revision to all supported devices
dt-bindings: gpu: v3d: Add per-compatible register restrictions
dt-bindings: gpu: v3d: Add SMS register to BCM2712 compatible
drm/v3d: Use V3D_SMS registers for power on/off and reset on V3D 7.x
dt-bindings: gpu: Add V3D driver maintainer as DT maintainer
.../devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 77 ++++++++++---
drivers/gpu/drm/v3d/v3d_debugfs.c | 126 ++++++++++-----------
drivers/gpu/drm/v3d/v3d_drv.c | 62 +++++++++-
drivers/gpu/drm/v3d/v3d_drv.h | 22 +++-
drivers/gpu/drm/v3d/v3d_gem.c | 27 ++++-
drivers/gpu/drm/v3d/v3d_irq.c | 6 +-
drivers/gpu/drm/v3d/v3d_perfmon.c | 4 +-
drivers/gpu/drm/v3d/v3d_regs.h | 26 +++++
drivers/gpu/drm/v3d/v3d_sched.c | 29 ++++-
9 files changed, 279 insertions(+), 100 deletions(-)
---
base-commit: 10646ddac2917b31c985ceff0e4982c42a9c924b
change-id: 20250224-v3d-gpu-reset-fixes-2d21fc70711d
Smatch noticed that inode_getblk() can return 1 on successful mapping of
a block instead of expected 0 after commit b405c1e58b73 ("udf: refactor
udf_next_aext() to handle error"). This could confuse some of the
callers and lead to strange failures (although the one reported by
Smatch in udf_mkdir() is impossible to trigger in practice). Fix the
return value of inode_getblk().
Link: https://lore.kernel.org/all/cb514af7-bbe0-435b-934f-dd1d7a16d2cd@stanley.mo…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/udf/inode.c | 1 +
1 file changed, 1 insertion(+)
I plan to merge this patch through my tree.
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 70c907fe8af9..4386dd845e40 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -810,6 +810,7 @@ static int inode_getblk(struct inode *inode, struct udf_map_rq *map)
}
map->oflags = UDF_BLK_MAPPED;
map->pblk = udf_get_lb_pblock(inode->i_sb, &eloc, offset);
+ ret = 0;
goto out_free;
}
--
2.43.0