The current max_pfn equals to zero. In this case, it caused users cannot
get some page information through /proc such as kpagecount. The following
message is displayed by stress-ng test suite with the command
"stress-ng --verbose --physpage 1 -t 1".
# stress-ng --verbose --physpage 1 -t 1
stress-ng: error: [1691] physpage: cannot read page count for address 0x134ac000 in /proc/kpagecount, errno=22 (Invalid argument)
stress-ng: error: [1691] physpage: cannot read page count for address 0x7ffff207c3a8 in /proc/kpagecount, errno=22 (Invalid argument)
stress-ng: error: [1691] physpage: cannot read page count for address 0x134b0000 in /proc/kpagecount, errno=22 (Invalid argument)
...
After applying this patch, the kernel can pass the test.
# stress-ng --verbose --physpage 1 -t 1
stress-ng: debug: [1701] physpage: [1701] started (instance 0 on CPU 3)
stress-ng: debug: [1701] physpage: [1701] exited (instance 0 on CPU 3)
stress-ng: debug: [1700] physpage: [1701] terminated (success)
Cc: stable(a)vger.kernel.org
Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware")
Signed-off-by: Bibo Mao <maobibo(a)loongson.cn>
---
arch/loongarch/kernel/setup.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index edcfdfcad7d2..a9c1184ab893 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -390,6 +390,7 @@ static void __init arch_mem_init(char **cmdline_p)
if (usermem)
pr_info("User-defined physical RAM map overwrite\n");
+ max_low_pfn = max_pfn = PHYS_PFN(memblock_end_of_DRAM());
check_kernel_sections_mem();
/*
base-commit: 848e076317446f9c663771ddec142d7c2eb4cb43
--
2.39.3
The quilt patch titled
Subject: rapidio: add check for rio_add_net() in rio_scan_alloc_net()
has been removed from the -mm tree. Its filename was
rapidio-add-check-for-rio_add_net-in-rio_scan_alloc_net.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Subject: rapidio: add check for rio_add_net() in rio_scan_alloc_net()
Date: Thu, 27 Feb 2025 12:11:31 +0800
The return value of rio_add_net() should be checked. If it fails,
put_device() should be called to free the memory and give up the reference
initialized in rio_add_net().
Link: https://lkml.kernel.org/r/20250227041131.3680761-1-haoxiang_li2024@163.com
Fixes: e6b585ca6e81 ("rapidio: move net allocation into core code")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Cc: Alexandre Bounine <alex.bou9(a)gmail.com>
Cc: Matt Porter <mporter(a)kernel.crashing.org>
Cc: Dan Carpenter <dan.carpenter(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/rapidio/rio-scan.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/rapidio/rio-scan.c~rapidio-add-check-for-rio_add_net-in-rio_scan_alloc_net
+++ a/drivers/rapidio/rio-scan.c
@@ -871,7 +871,10 @@ static struct rio_net *rio_scan_alloc_ne
dev_set_name(&net->dev, "rnet_%d", net->id);
net->dev.parent = &mport->dev;
net->dev.release = rio_scan_release_dev;
- rio_add_net(net);
+ if (rio_add_net(net)) {
+ put_device(&net->dev);
+ net = NULL;
+ }
}
return net;
_
Patches currently in -mm which might be from haoxiang_li2024(a)163.com are
The quilt patch titled
Subject: rapidio: fix an API misues when rio_add_net() fails
has been removed from the -mm tree. Its filename was
rapidio-fix-an-api-misues-when-rio_add_net-fails.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Subject: rapidio: fix an API misues when rio_add_net() fails
Date: Thu, 27 Feb 2025 15:34:09 +0800
rio_add_net() calls device_register() and fails when device_register()
fails. Thus, put_device() should be used rather than kfree(). Add
"mport->net = NULL;" to avoid a use after free issue.
Link: https://lkml.kernel.org/r/20250227073409.3696854-1-haoxiang_li2024@163.com
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Cc: Alexandre Bounine <alex.bou9(a)gmail.com>
Cc: Matt Porter <mporter(a)kernel.crashing.org>
Cc: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/rapidio/devices/rio_mport_cdev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/rapidio/devices/rio_mport_cdev.c~rapidio-fix-an-api-misues-when-rio_add_net-fails
+++ a/drivers/rapidio/devices/rio_mport_cdev.c
@@ -1742,7 +1742,8 @@ static int rio_mport_add_riodev(struct m
err = rio_add_net(net);
if (err) {
rmcd_debug(RDEV, "failed to register net, err=%d", err);
- kfree(net);
+ put_device(&net->dev);
+ mport->net = NULL;
goto cleanup;
}
}
_
Patches currently in -mm which might be from haoxiang_li2024(a)163.com are
The quilt patch titled
Subject: Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
has been removed from the -mm tree. Its filename was
revert-mm-page_allocc-dont-show-protection-in-zones-lowmem_reserve-for-empty-zone.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gabriel Krisman Bertazi <krisman(a)suse.de>
Subject: Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
Date: Tue, 25 Feb 2025 22:22:58 -0500
Commit 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's
->lowmem_reserve[] for empty zone") removes the protection of lower zones
from allocations targeting memory-less high zones. This had an unintended
impact on the pattern of reclaims because it makes the high-zone-targeted
allocation more likely to succeed in lower zones, which adds pressure to
said zones. I.e, the following corresponding checks in
zone_watermark_ok/zone_watermark_fast are less likely to trigger:
if (free_pages <= min + z->lowmem_reserve[highest_zoneidx])
return false;
As a result, we are observing an increase in reclaim and kswapd scans, due
to the increased pressure. This was initially observed as increased
latency in filesystem operations when benchmarking with fio on a machine
with some memory-less zones, but it has since been associated with
increased contention in locks related to memory reclaim. By reverting
this patch, the original performance was recovered on that machine.
The original commit was introduced as a clarification of the
/proc/zoneinfo output, so it doesn't seem there are usecases depending on
it, making the revert a simple solution.
For reference, I collected vmstat with and without this patch on a freshly
booted system running intensive randread io from an nvme for 5 minutes. I
got:
rpm-6.12.0-slfo.1.2 -> pgscan_kswapd 5629543865
Patched -> pgscan_kswapd 33580844
33M scans is similar to what we had in kernels predating this patch.
These numbers is fairly representative of the workload on this machine, as
measured in several runs. So we are talking about a 2-order of magnitude
increase.
Link: https://lkml.kernel.org/r/20250226032258.234099-1-krisman@suse.de
Fixes: 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone")
Signed-off-by: Gabriel Krisman Bertazi <krisman(a)suse.de>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/page_alloc.c~revert-mm-page_allocc-dont-show-protection-in-zones-lowmem_reserve-for-empty-zone
+++ a/mm/page_alloc.c
@@ -5849,11 +5849,10 @@ static void setup_per_zone_lowmem_reserv
for (j = i + 1; j < MAX_NR_ZONES; j++) {
struct zone *upper_zone = &pgdat->node_zones[j];
- bool empty = !zone_managed_pages(upper_zone);
managed_pages += zone_managed_pages(upper_zone);
- if (clear || empty)
+ if (clear)
zone->lowmem_reserve[j] = 0;
else
zone->lowmem_reserve[j] = managed_pages / ratio;
_
Patches currently in -mm which might be from krisman(a)suse.de are
The quilt patch titled
Subject: mm: fix finish_fault() handling for large folios
has been removed from the -mm tree. Its filename was
mm-fix-finish_fault-handling-for-large-folios.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Brian Geffon <bgeffon(a)google.com>
Subject: mm: fix finish_fault() handling for large folios
Date: Wed, 26 Feb 2025 11:23:41 -0500
When handling faults for anon shmem finish_fault() will attempt to install
ptes for the entire folio. Unfortunately if it encounters a single
non-pte_none entry in that range it will bail, even if the pte that
triggered the fault is still pte_none. When this situation happens the
fault will be retried endlessly never making forward progress.
This patch fixes this behavior and if it detects that a pte in the range
is not pte_none it will fall back to setting a single pte.
[bgeffon(a)google.com: tweak whitespace]
Link: https://lkml.kernel.org/r/20250227133236.1296853-1-bgeffon@google.com
Link: https://lkml.kernel.org/r/20250226162341.915535-1-bgeffon@google.com
Fixes: 43e027e41423 ("mm: memory: extend finish_fault() to support large folio")
Signed-off-by: Brian Geffon <bgeffon(a)google.com>
Suggested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Marek Maslanka <mmaslanka(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickens <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--- a/mm/memory.c~mm-fix-finish_fault-handling-for-large-folios
+++ a/mm/memory.c
@@ -5185,7 +5185,11 @@ vm_fault_t finish_fault(struct vm_fault
bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
!(vma->vm_flags & VM_SHARED);
int type, nr_pages;
- unsigned long addr = vmf->address;
+ unsigned long addr;
+ bool needs_fallback = false;
+
+fallback:
+ addr = vmf->address;
/* Did we COW the page? */
if (is_cow)
@@ -5224,7 +5228,8 @@ vm_fault_t finish_fault(struct vm_fault
* approach also applies to non-anonymous-shmem faults to avoid
* inflating the RSS of the process.
*/
- if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) {
+ if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) ||
+ unlikely(needs_fallback)) {
nr_pages = 1;
} else if (nr_pages > 1) {
pgoff_t idx = folio_page_idx(folio, page);
@@ -5260,9 +5265,9 @@ vm_fault_t finish_fault(struct vm_fault
ret = VM_FAULT_NOPAGE;
goto unlock;
} else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) {
- update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages);
- ret = VM_FAULT_NOPAGE;
- goto unlock;
+ needs_fallback = true;
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ goto fallback;
}
folio_ref_add(folio, nr_pages - 1);
_
Patches currently in -mm which might be from bgeffon(a)google.com are
The quilt patch titled
Subject: mm: don't skip arch_sync_kernel_mappings() in error paths
has been removed from the -mm tree. Its filename was
mm-dont-skip-arch_sync_kernel_mappings-in-error-paths.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: don't skip arch_sync_kernel_mappings() in error paths
Date: Wed, 26 Feb 2025 12:16:09 +0000
Fix callers that previously skipped calling arch_sync_kernel_mappings() if
an error occurred during a pgtable update. The call is still required to
sync any pgtable updates that may have occurred prior to hitting the error
condition.
These are theoretical bugs discovered during code review.
Link: https://lkml.kernel.org/r/20250226121610.2401743-1-ryan.roberts@arm.com
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Fixes: 0c95cba49255 ("mm: apply_to_pte_range warn and fail if a large pte is encountered")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christop Hellwig <hch(a)infradead.org>
Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 6 ++++--
mm/vmalloc.c | 4 ++--
2 files changed, 6 insertions(+), 4 deletions(-)
--- a/mm/memory.c~mm-dont-skip-arch_sync_kernel_mappings-in-error-paths
+++ a/mm/memory.c
@@ -3051,8 +3051,10 @@ static int __apply_to_page_range(struct
next = pgd_addr_end(addr, end);
if (pgd_none(*pgd) && !create)
continue;
- if (WARN_ON_ONCE(pgd_leaf(*pgd)))
- return -EINVAL;
+ if (WARN_ON_ONCE(pgd_leaf(*pgd))) {
+ err = -EINVAL;
+ break;
+ }
if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) {
if (!create)
continue;
--- a/mm/vmalloc.c~mm-dont-skip-arch_sync_kernel_mappings-in-error-paths
+++ a/mm/vmalloc.c
@@ -586,13 +586,13 @@ static int vmap_small_pages_range_noflus
mask |= PGTBL_PGD_MODIFIED;
err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
if (err)
- return err;
+ break;
} while (pgd++, addr = next, addr != end);
if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
arch_sync_kernel_mappings(start, end);
- return 0;
+ return err;
}
/*
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-ioremap-pass-pgprot_t-to-ioremap_prot-instead-of-unsigned-long.patch
mm-fix-lazy-mmu-docs-and-usage.patch
fs-proc-task_mmu-reduce-scope-of-lazy-mmu-region.patch
sparc-mm-disable-preemption-in-lazy-mmu-mode.patch
sparc-mm-avoid-calling-arch_enter-leave_lazy_mmu-in-set_ptes.patch
revert-x86-xen-allow-nesting-of-same-lazy-mode.patch
The quilt patch titled
Subject: userfaultfd: fix PTE unmapping stack-allocated PTE copies
has been removed from the -mm tree. Its filename was
userfaultfd-fix-pte-unmapping-stack-allocated-pte-copies.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: fix PTE unmapping stack-allocated PTE copies
Date: Wed, 26 Feb 2025 10:55:09 -0800
Current implementation of move_pages_pte() copies source and destination
PTEs in order to detect concurrent changes to PTEs involved in the move.
However these copies are also used to unmap the PTEs, which will fail if
CONFIG_HIGHPTE is enabled because the copies are allocated on the stack.
Fix this by using the actual PTEs which were kmap()ed.
Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-pte-unmapping-stack-allocated-pte-copies
+++ a/mm/userfaultfd.c
@@ -1290,8 +1290,8 @@ retry:
spin_unlock(src_ptl);
if (!locked) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
@@ -1307,8 +1307,8 @@ retry:
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
/* split_folio() can block */
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
@@ -1333,8 +1333,8 @@ retry:
goto out;
}
if (!anon_vma_trylock_write(src_anon_vma)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
anon_vma_lock_write(src_anon_vma);
@@ -1352,8 +1352,8 @@ retry:
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
@@ -1396,8 +1396,8 @@ retry:
src_folio = folio;
src_folio_pte = orig_src_pte;
if (!folio_trylock(src_folio)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
put_swap_device(si);
si = NULL;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-avoid-extra-mem_alloc_profiling_enabled-checks.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-slab-allocator.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-page-allocator.patch
mm-introduce-vma_start_read_locked_nested-helpers.patch
mm-move-per-vma-lock-into-vm_area_struct.patch
mm-mark-vma-as-detached-until-its-added-into-vma-tree.patch
mm-introduce-vma_iter_store_attached-to-use-with-attached-vmas.patch
mm-mark-vmas-detached-upon-exit.patch
types-move-struct-rcuwait-into-typesh.patch
mm-allow-vma_start_read_locked-vma_start_read_locked_nested-to-fail.patch
mm-move-mmap_init_lock-out-of-the-header-file.patch
mm-uninline-the-main-body-of-vma_start_write.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused-fix.patch
refcount-introduce-__refcount_addinc_not_zero_limited_acquire.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count-fix.patch
mm-move-lesser-used-vma_area_struct-members-into-the-last-cacheline.patch
mm-debug-print-vm_refcnt-state-when-dumping-the-vma.patch
mm-remove-extra-vma_numab_state_init-call.patch
mm-prepare-lock_vma_under_rcu-for-vma-reuse-possibility.patch
mm-make-vma-cache-slab_typesafe_by_rcu.patch
mm-make-vma-cache-slab_typesafe_by_rcu-fix.patch
docs-mm-document-latest-changes-to-vm_lock.patch
The quilt patch titled
Subject: userfaultfd: do not block on locking a large folio with raised refcount
has been removed from the -mm tree. Its filename was
userfaultfd-do-not-block-on-locking-a-large-folio-with-raised-refcount.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: do not block on locking a large folio with raised refcount
Date: Wed, 26 Feb 2025 10:55:08 -0800
Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock
state when it goes into split_folio() with raised folio refcount.
split_folio() expects the reference count to be exactly mapcount +
num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN
otherwise.
If multiple processes are trying to move the same large folio, they raise
the refcount (all tasks succeed in that) then one of them succeeds in
locking the folio, while others will block in folio_lock() while keeping
the refcount raised. The winner of this race will proceed with calling
split_folio() and will fail returning EAGAIN to the caller and unlocking
the folio. The next competing process will get the folio locked and will
go through the same flow. In the meantime the original winner will be
retried and will block in folio_lock(), getting into the queue of waiting
processes only to repeat the same path. All this results in a livelock.
An easy fix would be to avoid waiting for the folio lock while holding
folio refcount, similar to madvise_free_huge_pmd() where folio lock is
acquired before raising the folio refcount. Since we lock and take a
refcount of the folio while holding the PTE lock, changing the order of
these operations should not break anything.
Modify move_pages_pte() to try locking the folio first and if that fails
and the folio is large then return EAGAIN without touching the folio
refcount. If the folio is single-page then split_folio() is not called,
so we don't have this issue. Lokesh has a reproducer [1] and I verified
that this change fixes the issue.
[1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock
[akpm(a)linux-foundation.org: reflow comment to 80 cols, s/end/end up/]
Link: https://lkml.kernel.org/r/20250226185510.2732648-2-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Lokesh Gidra <lokeshgidra(a)google.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
--- a/mm/userfaultfd.c~userfaultfd-do-not-block-on-locking-a-large-folio-with-raised-refcount
+++ a/mm/userfaultfd.c
@@ -1250,6 +1250,7 @@ retry:
*/
if (!src_folio) {
struct folio *folio;
+ bool locked;
/*
* Pin the page while holding the lock to be sure the
@@ -1269,12 +1270,26 @@ retry:
goto out;
}
+ locked = folio_trylock(folio);
+ /*
+ * We avoid waiting for folio lock with a raised
+ * refcount for large folios because extra refcounts
+ * will result in split_folio() failing later and
+ * retrying. If multiple tasks are trying to move a
+ * large folio we can end up livelocking.
+ */
+ if (!locked && folio_test_large(folio)) {
+ spin_unlock(src_ptl);
+ err = -EAGAIN;
+ goto out;
+ }
+
folio_get(folio);
src_folio = folio;
src_folio_pte = orig_src_pte;
spin_unlock(src_ptl);
- if (!folio_trylock(src_folio)) {
+ if (!locked) {
pte_unmap(&orig_src_pte);
pte_unmap(&orig_dst_pte);
src_pte = dst_pte = NULL;
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-avoid-extra-mem_alloc_profiling_enabled-checks.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-slab-allocator.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-page-allocator.patch
mm-introduce-vma_start_read_locked_nested-helpers.patch
mm-move-per-vma-lock-into-vm_area_struct.patch
mm-mark-vma-as-detached-until-its-added-into-vma-tree.patch
mm-introduce-vma_iter_store_attached-to-use-with-attached-vmas.patch
mm-mark-vmas-detached-upon-exit.patch
types-move-struct-rcuwait-into-typesh.patch
mm-allow-vma_start_read_locked-vma_start_read_locked_nested-to-fail.patch
mm-move-mmap_init_lock-out-of-the-header-file.patch
mm-uninline-the-main-body-of-vma_start_write.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused-fix.patch
refcount-introduce-__refcount_addinc_not_zero_limited_acquire.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count-fix.patch
mm-move-lesser-used-vma_area_struct-members-into-the-last-cacheline.patch
mm-debug-print-vm_refcnt-state-when-dumping-the-vma.patch
mm-remove-extra-vma_numab_state_init-call.patch
mm-prepare-lock_vma_under_rcu-for-vma-reuse-possibility.patch
mm-make-vma-cache-slab_typesafe_by_rcu.patch
mm-make-vma-cache-slab_typesafe_by_rcu-fix.patch
docs-mm-document-latest-changes-to-vm_lock.patch
The quilt patch titled
Subject: mm: shmem: fix potential data corruption during shmem swapin
has been removed from the -mm tree. Its filename was
mm-shmem-fix-potential-data-corruption-during-shmem-swapin.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Subject: mm: shmem: fix potential data corruption during shmem swapin
Date: Tue, 25 Feb 2025 17:52:55 +0800
Alex and Kairui reported some issues (system hang or data corruption) when
swapping out or swapping in large shmem folios. This is especially easy
to reproduce when the tmpfs is mount with the 'huge=within_size'
parameter. Thanks to Kairui's reproducer, the issue can be easily
replicated.
The root cause of the problem is that swap readahead may asynchronously
swap in order 0 folios into the swap cache, while the shmem mapping can
still store large swap entries. Then an order 0 folio is inserted into
the shmem mapping without splitting the large swap entry, which overwrites
the original large swap entry, leading to data corruption.
When getting a folio from the swap cache, we should split the large swap
entry stored in the shmem mapping if the orders do not match, to fix this
issue.
Link: https://lkml.kernel.org/r/2fe47c557e74e9df5fe2437ccdc6c9115fa1bf70.17404769…
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Alex Xu (Hello71) <alex_y_xu(a)yahoo.ca>
Reported-by: Kairui Song <ryncsn(a)gmail.com>
Closes: https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
Tested-by: Kairui Song <kasong(a)tencent.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Matthew Wilcow <willy(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
--- a/mm/shmem.c~mm-shmem-fix-potential-data-corruption-during-shmem-swapin
+++ a/mm/shmem.c
@@ -2253,7 +2253,7 @@ static int shmem_swapin_folio(struct ino
struct folio *folio = NULL;
bool skip_swapcache = false;
swp_entry_t swap;
- int error, nr_pages;
+ int error, nr_pages, order, split_order;
VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
swap = radix_to_swp_entry(*foliop);
@@ -2272,10 +2272,9 @@ static int shmem_swapin_folio(struct ino
/* Look it up and read it in.. */
folio = swap_cache_get_folio(swap, NULL, 0);
+ order = xa_get_order(&mapping->i_pages, index);
if (!folio) {
- int order = xa_get_order(&mapping->i_pages, index);
bool fallback_order0 = false;
- int split_order;
/* Or update major stats only when swapin succeeds?? */
if (fault_type) {
@@ -2339,6 +2338,29 @@ static int shmem_swapin_folio(struct ino
error = -ENOMEM;
goto failed;
}
+ } else if (order != folio_order(folio)) {
+ /*
+ * Swap readahead may swap in order 0 folios into swapcache
+ * asynchronously, while the shmem mapping can still stores
+ * large swap entries. In such cases, we should split the
+ * large swap entry to prevent possible data corruption.
+ */
+ split_order = shmem_split_large_entry(inode, index, swap, gfp);
+ if (split_order < 0) {
+ error = split_order;
+ goto failed;
+ }
+
+ /*
+ * If the large swap entry has already been split, it is
+ * necessary to recalculate the new swap entry based on
+ * the old order alignment.
+ */
+ if (split_order > 0) {
+ pgoff_t offset = index - round_down(index, 1 << split_order);
+
+ swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
+ }
}
alloced:
@@ -2346,7 +2368,8 @@ alloced:
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
folio->swap.val != swap.val ||
- !shmem_confirm_swap(mapping, index, swap)) {
+ !shmem_confirm_swap(mapping, index, swap) ||
+ xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
error = -EEXIST;
goto unlock;
}
_
Patches currently in -mm which might be from baolin.wang(a)linux.alibaba.com are
mm-shmem-drop-the-unused-macro.patch
mm-shmem-remove-fadvise-comments.patch
mm-shmem-remove-duplicate-error-validation.patch
mm-shmem-change-the-return-value-of-shmem_find_swap_entries.patch
mm-shmem-factor-out-the-within_size-logic-into-a-new-helper.patch
maintainers-add-myself-as-shmem-reviewer.patch
The quilt patch titled
Subject: mm: fix kernel BUG when userfaultfd_move encounters swapcache
has been removed from the -mm tree. Its filename was
mm-fix-kernel-bug-when-userfaultfd_move-encounters-swapcache.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Barry Song <v-songbaohua(a)oppo.com>
Subject: mm: fix kernel BUG when userfaultfd_move encounters swapcache
Date: Wed, 26 Feb 2025 13:14:00 +1300
userfaultfd_move() checks whether the PTE entry is present or a
swap entry.
- If the PTE entry is present, move_present_pte() handles folio
migration by setting:
src_folio->index = linear_page_index(dst_vma, dst_addr);
- If the PTE entry is a swap entry, move_swap_pte() simply copies
the PTE to the new dst_addr.
This approach is incorrect because, even if the PTE is a swap entry,
it can still reference a folio that remains in the swap cache.
This creates a race window between steps 2 and 4.
1. add_to_swap: The folio is added to the swapcache.
2. try_to_unmap: PTEs are converted to swap entries.
3. pageout: The folio is written back.
4. Swapcache is cleared.
If userfaultfd_move() occurs in the window between steps 2 and 4,
after the swap PTE has been moved to the destination, accessing the
destination triggers do_swap_page(), which may locate the folio in
the swapcache. However, since the folio's index has not been updated
to match the destination VMA, do_swap_page() will detect a mismatch.
This can result in two critical issues depending on the system
configuration.
If KSM is disabled, both small and large folios can trigger a BUG
during the add_rmap operation due to:
page_pgoff(folio, page) != linear_page_index(vma, address)
[ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c index:0xffffaf150 pfn:0x4667c
[ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_mapped:1 pincount:0
[ 13.337716] memcg:ffff00000405f000
[ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|owner_priv_1|head|swapbacked|node=0|zone=0|lastcpupid=0xffff)
[ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361
[ 13.339204] head: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000
[ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff00000000 0000000000000001
[ 13.339546] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address))
[ 13.340190] ------------[ cut here ]------------
[ 13.340316] kernel BUG at mm/rmap.c:1380!
[ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 13.340969] Modules linked in:
[ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-rc3-gcf42737e247a-dirty #299
[ 13.341470] Hardware name: linux,dummy-virt (DT)
[ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0
[ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0
[ 13.342018] sp : ffff80008752bb20
[ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 0000000000000001
[ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001
[ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffdffc0199f00
[ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 00000000ffffffff
[ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662866666f67705f
[ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff800083728ab0
[ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff80008011bc40
[ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000829eebf8
[ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000
[ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000000000000005f
[ 13.343876] Call trace:
[ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P)
[ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320
[ 13.344333] do_swap_page+0x1060/0x1400
[ 13.344417] __handle_mm_fault+0x61c/0xbc8
[ 13.344504] handle_mm_fault+0xd8/0x2e8
[ 13.344586] do_page_fault+0x20c/0x770
[ 13.344673] do_translation_fault+0xb4/0xf0
[ 13.344759] do_mem_abort+0x48/0xa0
[ 13.344842] el0_da+0x58/0x130
[ 13.344914] el0t_64_sync_handler+0xc4/0x138
[ 13.345002] el0t_64_sync+0x1ac/0x1b0
[ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000)
[ 13.345504] ---[ end trace 0000000000000000 ]---
[ 13.345715] note: a.out[107] exited with irqs disabled
[ 13.345954] note: a.out[107] exited with preempt_count 2
If KSM is enabled, Peter Xu also discovered that do_swap_page() may
trigger an unexpected CoW operation for small folios because
ksm_might_need_to_copy() allocates a new folio when the folio index
does not match linear_page_index(vma, addr).
This patch also checks the swapcache when handling swap entries. If a
match is found in the swapcache, it processes it similarly to a present
PTE.
However, there are some differences. For example, the folio is no longer
exclusive because folio_try_share_anon_rmap_pte() is performed during
unmapping.
Furthermore, in the case of swapcache, the folio has already been
unmapped, eliminating the risk of concurrent rmap walks and removing the
need to acquire src_folio's anon_vma or lock.
Note that for large folios, in the swapcache handling path, we directly
return -EBUSY since split_folio() will return -EBUSY regardless if
the folio is under writeback or unmapped. This is not an urgent issue,
so a follow-up patch may address it separately.
[v-songbaohua(a)oppo.com: minor cleanup according to Peter Xu]
Link: https://lkml.kernel.org/r/20250226024411.47092-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20250226001400.9129-1-21cnbao@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Nicolas Geoffray <ngeoffray(a)google.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: ZhangPeng <zhangpeng362(a)huawei.com>
Cc: Tangquan Zheng <zhengtangquan(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 74 ++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 66 insertions(+), 8 deletions(-)
--- a/mm/userfaultfd.c~mm-fix-kernel-bug-when-userfaultfd_move-encounters-swapcache
+++ a/mm/userfaultfd.c
@@ -18,6 +18,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include "internal.h"
+#include "swap.h"
static __always_inline
bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end)
@@ -1076,16 +1077,14 @@ out:
return err;
}
-static int move_swap_pte(struct mm_struct *mm,
+static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
unsigned long dst_addr, unsigned long src_addr,
pte_t *dst_pte, pte_t *src_pte,
pte_t orig_dst_pte, pte_t orig_src_pte,
pmd_t *dst_pmd, pmd_t dst_pmdval,
- spinlock_t *dst_ptl, spinlock_t *src_ptl)
+ spinlock_t *dst_ptl, spinlock_t *src_ptl,
+ struct folio *src_folio)
{
- if (!pte_swp_exclusive(orig_src_pte))
- return -EBUSY;
-
double_pt_lock(dst_ptl, src_ptl);
if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte,
@@ -1094,6 +1093,16 @@ static int move_swap_pte(struct mm_struc
return -EAGAIN;
}
+ /*
+ * The src_folio resides in the swapcache, requiring an update to its
+ * index and mapping to align with the dst_vma, where a swap-in may
+ * occur and hit the swapcache after moving the PTE.
+ */
+ if (src_folio) {
+ folio_move_anon_rmap(src_folio, dst_vma);
+ src_folio->index = linear_page_index(dst_vma, dst_addr);
+ }
+
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
@@ -1141,6 +1150,7 @@ static int move_pages_pte(struct mm_stru
__u64 mode)
{
swp_entry_t entry;
+ struct swap_info_struct *si = NULL;
pte_t orig_src_pte, orig_dst_pte;
pte_t src_folio_pte;
spinlock_t *src_ptl, *dst_ptl;
@@ -1322,6 +1332,8 @@ retry:
orig_dst_pte, orig_src_pte, dst_pmd,
dst_pmdval, dst_ptl, src_ptl, src_folio);
} else {
+ struct folio *folio = NULL;
+
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
@@ -1335,9 +1347,53 @@ retry:
goto out;
}
- err = move_swap_pte(mm, dst_addr, src_addr, dst_pte, src_pte,
- orig_dst_pte, orig_src_pte, dst_pmd,
- dst_pmdval, dst_ptl, src_ptl);
+ if (!pte_swp_exclusive(orig_src_pte)) {
+ err = -EBUSY;
+ goto out;
+ }
+
+ si = get_swap_device(entry);
+ if (unlikely(!si)) {
+ err = -EAGAIN;
+ goto out;
+ }
+ /*
+ * Verify the existence of the swapcache. If present, the folio's
+ * index and mapping must be updated even when the PTE is a swap
+ * entry. The anon_vma lock is not taken during this process since
+ * the folio has already been unmapped, and the swap entry is
+ * exclusive, preventing rmap walks.
+ *
+ * For large folios, return -EBUSY immediately, as split_folio()
+ * also returns -EBUSY when attempting to split unmapped large
+ * folios in the swapcache. This issue needs to be resolved
+ * separately to allow proper handling.
+ */
+ if (!src_folio)
+ folio = filemap_get_folio(swap_address_space(entry),
+ swap_cache_index(entry));
+ if (!IS_ERR_OR_NULL(folio)) {
+ if (folio_test_large(folio)) {
+ err = -EBUSY;
+ folio_put(folio);
+ goto out;
+ }
+ src_folio = folio;
+ src_folio_pte = orig_src_pte;
+ if (!folio_trylock(src_folio)) {
+ pte_unmap(&orig_src_pte);
+ pte_unmap(&orig_dst_pte);
+ src_pte = dst_pte = NULL;
+ put_swap_device(si);
+ si = NULL;
+ /* now we can block and wait */
+ folio_lock(src_folio);
+ goto retry;
+ }
+ }
+ err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte,
+ orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval,
+ dst_ptl, src_ptl, src_folio);
}
out:
@@ -1354,6 +1410,8 @@ out:
if (src_pte)
pte_unmap(src_pte);
mmu_notifier_invalidate_range_end(&range);
+ if (si)
+ put_swap_device(si);
return err;
}
_
Patches currently in -mm which might be from v-songbaohua(a)oppo.com are
mm-set-folio-swapbacked-iff-folios-are-dirty-in-try_to_unmap_one.patch
mm-support-tlbbatch-flush-for-a-range-of-ptes.patch
mm-support-batched-unmap-for-lazyfree-large-folios-during-reclamation.patch
mm-avoid-splitting-pmd-for-lazyfree-pmd-mapped-thp-in-try_to_unmap.patch
The quilt patch titled
Subject: selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
has been removed from the -mm tree. Its filename was
selftests-damon-damon_nr_regions-sort-collected-regiosn-before-checking-with-min-max-boundaries.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
Date: Tue, 25 Feb 2025 14:23:33 -0800
damon_nr_regions.py starts DAMON, periodically collect number of regions
in snapshots, and see if it is in the requested range. The check code
assumes the numbers are sorted on the collection list, but there is no
such guarantee. Hence this can result in false positive test success.
Sort the list before doing the check.
Link: https://lkml.kernel.org/r/20250225222333.505646-4-sj@kernel.org
Fixes: 781497347d1b ("selftests/damon: implement test for min/max_nr_regions")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/damon/damon_nr_regions.py | 1 +
1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/damon/damon_nr_regions.py~selftests-damon-damon_nr_regions-sort-collected-regiosn-before-checking-with-min-max-boundaries
+++ a/tools/testing/selftests/damon/damon_nr_regions.py
@@ -65,6 +65,7 @@ def test_nr_regions(real_nr_regions, min
test_name = 'nr_regions test with %d/%d/%d real/min/max nr_regions' % (
real_nr_regions, min_nr_regions, max_nr_regions)
+ collected_nr_regions.sort()
if (collected_nr_regions[0] < min_nr_regions or
collected_nr_regions[-1] > max_nr_regions):
print('fail %s' % test_name)
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-respect-core-layer-filters-allowance-decision-on-ops-layer.patch
mm-damon-core-initialize-damos-walk_completed-in-damon_new_scheme.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise.patch
mm-madvise-split-out-madvise-input-validity-check.patch
mm-madvise-split-out-madvise-behavior-execution.patch
mm-madvise-remove-redundant-mmap_lock-operations-from-process_madvise.patch
mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times.patch
mm-damon-core-unset-damos-walk_completed-after-confimed-set.patch
mm-damon-core-do-not-call-damos_walk_control-walk-if-walk-is-completed.patch
mm-damon-core-do-damos-walking-in-entire-regions-granularity.patch
mm-damon-introduce-damos-filter-type-hugepage_size-fix.patch
docs-mm-damon-design-fix-typo-on-damos-filters-usage-doc-link.patch
docs-mm-damon-design-document-hugepage_size-filter.patch
docs-damon-move-damos-filter-type-names-and-meaning-to-design-doc.patch
docs-mm-damon-design-clarify-handling-layer-based-filters-evaluation-sequence.patch
docs-mm-damon-design-categorize-damos-filter-types-based-on-handling-layer.patch
mm-damon-implement-a-new-damos-filter-type-for-unmapped-pages.patch
docs-mm-damon-design-document-unmapped-damos-filter-type.patch
mm-damon-add-data-structure-for-monitoring-intervals-auto-tuning.patch
mm-damon-core-implement-intervals-auto-tuning.patch
mm-damon-sysfs-implement-intervals-tuning-goal-directory.patch
mm-damon-sysfs-commit-intervals-tuning-goal.patch
mm-damon-sysfs-implement-a-command-to-update-auto-tuned-monitoring-intervals.patch
docs-mm-damon-design-document-for-intervals-auto-tuning.patch
docs-mm-damon-design-document-for-intervals-auto-tuning-fix.patch
docs-abi-damon-document-intervals-auto-tuning-abi.patch
docs-admin-guide-mm-damon-usage-add-intervals_goal-directory-on-the-hierarchy.patch
mm-damon-core-introduce-damos-ops_filters.patch
mm-damon-paddr-support-ops_filters.patch
mm-damon-core-support-committing-ops_filters.patch
mm-damon-core-put-ops-handled-filters-to-damos-ops_filters.patch
mm-damon-paddr-support-only-damos-ops_filters.patch
mm-damon-add-default-allow-reject-behavior-fields-to-struct-damos.patch
mm-damon-core-set-damos_filter-default-allowance-behavior-based-on-installed-filters.patch
mm-damon-paddr-respect-ops_filters_default_reject.patch
docs-mm-damon-design-update-for-changed-filter-default-behavior.patch
mm-damon-sysfs-schemes-let-damon_sysfs_scheme_set_filters-be-used-for-different-named-directories.patch
mm-damon-sysfs-schemes-implement-core_filters-and-ops_filters-directories.patch
mm-damon-sysfs-schemes-commit-filters-in-coreops_filters-directories.patch
mm-damon-core-expose-damos_filter_for_ops-to-damon-kernel-api-callers.patch
mm-damon-sysfs-schemes-record-filters-of-which-layer-should-be-added-to-the-given-filters-directory.patch
mm-damon-sysfs-schemes-return-error-when-for-attempts-to-install-filters-on-wrong-sysfs-directory.patch
docs-abi-damon-document-coreops_filters-directories.patch
docs-admin-guide-mm-damon-usage-update-for-coreops_filters-directories.patch
The quilt patch titled
Subject: selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
has been removed from the -mm tree. Its filename was
selftests-damon-damon_nr_regions-set-ops-update-for-merge-results-check-to-100ms.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
Date: Tue, 25 Feb 2025 14:23:32 -0800
damon_nr_regions.py updates max_nr_regions to a number smaller than
expected number of real regions and confirms DAMON respect the harsh
limit. To give time for DAMON to make changes for the regions, 3
aggregation intervals (300 milliseconds) are given.
The internal mechanism works with not only the max_nr_regions, but also
sz_limit, though. It avoids merging region if that casn make region of
size larger than sz_limit. In the test, sz_limit is set too small to
achive the new max_nr_regions, unless it is updated for the new
min_nr_regions. But the update is done only once per operations set
update interval, which is one second by default.
Hence, the test randomly incurs false positive failures. Fix it by
setting the ops interval same to aggregation interval, to make sure
sz_limit is updated by the time of the check.
Link: https://lkml.kernel.org/r/20250225222333.505646-3-sj@kernel.org
Fixes: 8bf890c81612 ("selftests/damon/damon_nr_regions: test online-tuned max_nr_regions")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/damon/damon_nr_regions.py | 1 +
1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/damon/damon_nr_regions.py~selftests-damon-damon_nr_regions-set-ops-update-for-merge-results-check-to-100ms
+++ a/tools/testing/selftests/damon/damon_nr_regions.py
@@ -109,6 +109,7 @@ def main():
attrs = kdamonds.kdamonds[0].contexts[0].monitoring_attrs
attrs.min_nr_regions = 3
attrs.max_nr_regions = 7
+ attrs.update_us = 100000
err = kdamonds.kdamonds[0].commit()
if err is not None:
proc.terminate()
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-respect-core-layer-filters-allowance-decision-on-ops-layer.patch
mm-damon-core-initialize-damos-walk_completed-in-damon_new_scheme.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise.patch
mm-madvise-split-out-madvise-input-validity-check.patch
mm-madvise-split-out-madvise-behavior-execution.patch
mm-madvise-remove-redundant-mmap_lock-operations-from-process_madvise.patch
mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times.patch
mm-damon-core-unset-damos-walk_completed-after-confimed-set.patch
mm-damon-core-do-not-call-damos_walk_control-walk-if-walk-is-completed.patch
mm-damon-core-do-damos-walking-in-entire-regions-granularity.patch
mm-damon-introduce-damos-filter-type-hugepage_size-fix.patch
docs-mm-damon-design-fix-typo-on-damos-filters-usage-doc-link.patch
docs-mm-damon-design-document-hugepage_size-filter.patch
docs-damon-move-damos-filter-type-names-and-meaning-to-design-doc.patch
docs-mm-damon-design-clarify-handling-layer-based-filters-evaluation-sequence.patch
docs-mm-damon-design-categorize-damos-filter-types-based-on-handling-layer.patch
mm-damon-implement-a-new-damos-filter-type-for-unmapped-pages.patch
docs-mm-damon-design-document-unmapped-damos-filter-type.patch
mm-damon-add-data-structure-for-monitoring-intervals-auto-tuning.patch
mm-damon-core-implement-intervals-auto-tuning.patch
mm-damon-sysfs-implement-intervals-tuning-goal-directory.patch
mm-damon-sysfs-commit-intervals-tuning-goal.patch
mm-damon-sysfs-implement-a-command-to-update-auto-tuned-monitoring-intervals.patch
docs-mm-damon-design-document-for-intervals-auto-tuning.patch
docs-mm-damon-design-document-for-intervals-auto-tuning-fix.patch
docs-abi-damon-document-intervals-auto-tuning-abi.patch
docs-admin-guide-mm-damon-usage-add-intervals_goal-directory-on-the-hierarchy.patch
mm-damon-core-introduce-damos-ops_filters.patch
mm-damon-paddr-support-ops_filters.patch
mm-damon-core-support-committing-ops_filters.patch
mm-damon-core-put-ops-handled-filters-to-damos-ops_filters.patch
mm-damon-paddr-support-only-damos-ops_filters.patch
mm-damon-add-default-allow-reject-behavior-fields-to-struct-damos.patch
mm-damon-core-set-damos_filter-default-allowance-behavior-based-on-installed-filters.patch
mm-damon-paddr-respect-ops_filters_default_reject.patch
docs-mm-damon-design-update-for-changed-filter-default-behavior.patch
mm-damon-sysfs-schemes-let-damon_sysfs_scheme_set_filters-be-used-for-different-named-directories.patch
mm-damon-sysfs-schemes-implement-core_filters-and-ops_filters-directories.patch
mm-damon-sysfs-schemes-commit-filters-in-coreops_filters-directories.patch
mm-damon-core-expose-damos_filter_for_ops-to-damon-kernel-api-callers.patch
mm-damon-sysfs-schemes-record-filters-of-which-layer-should-be-added-to-the-given-filters-directory.patch
mm-damon-sysfs-schemes-return-error-when-for-attempts-to-install-filters-on-wrong-sysfs-directory.patch
docs-abi-damon-document-coreops_filters-directories.patch
docs-admin-guide-mm-damon-usage-update-for-coreops_filters-directories.patch
The quilt patch titled
Subject: selftests/damon/damos_quota: make real expectation of quota exceeds
has been removed from the -mm tree. Its filename was
selftests-damon-damos_quota-make-real-expectation-of-quota-exceeds.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: selftests/damon/damos_quota: make real expectation of quota exceeds
Date: Tue, 25 Feb 2025 14:23:31 -0800
Patch series "selftests/damon: three fixes for false results".
Fix three DAMON selftest bugs that cause two and one false positive
failures and successes.
This patch (of 3):
damos_quota.py assumes the quota will always exceeded. But whether quota
will be exceeded or not depend on the monitoring results. Actually the
monitored workload has chaning access pattern and hence sometimes the
quota may not really be exceeded. As a result, false positive test
failures happen. Expect how much time the quota will be exceeded by
checking the monitoring results, and use it instead of the naive
assumption.
Link: https://lkml.kernel.org/r/20250225222333.505646-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20250225222333.505646-2-sj@kernel.org
Fixes: 51f58c9da14b ("selftests/damon: add a test for DAMOS quota")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/damon/damos_quota.py | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
--- a/tools/testing/selftests/damon/damos_quota.py~selftests-damon-damos_quota-make-real-expectation-of-quota-exceeds
+++ a/tools/testing/selftests/damon/damos_quota.py
@@ -51,16 +51,19 @@ def main():
nr_quota_exceeds = scheme.stats.qt_exceeds
wss_collected.sort()
+ nr_expected_quota_exceeds = 0
for wss in wss_collected:
if wss > sz_quota:
print('quota is not kept: %s > %s' % (wss, sz_quota))
print('collected samples are as below')
print('\n'.join(['%d' % wss for wss in wss_collected]))
exit(1)
+ if wss == sz_quota:
+ nr_expected_quota_exceeds += 1
- if nr_quota_exceeds < len(wss_collected):
- print('quota is not always exceeded: %d > %d' %
- (len(wss_collected), nr_quota_exceeds))
+ if nr_quota_exceeds < nr_expected_quota_exceeds:
+ print('quota is exceeded less than expected: %d < %d' %
+ (nr_quota_exceeds, nr_expected_quota_exceeds))
exit(1)
if __name__ == '__main__':
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-respect-core-layer-filters-allowance-decision-on-ops-layer.patch
mm-damon-core-initialize-damos-walk_completed-in-damon_new_scheme.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise.patch
mm-madvise-split-out-madvise-input-validity-check.patch
mm-madvise-split-out-madvise-behavior-execution.patch
mm-madvise-remove-redundant-mmap_lock-operations-from-process_madvise.patch
mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times.patch
mm-damon-core-unset-damos-walk_completed-after-confimed-set.patch
mm-damon-core-do-not-call-damos_walk_control-walk-if-walk-is-completed.patch
mm-damon-core-do-damos-walking-in-entire-regions-granularity.patch
mm-damon-introduce-damos-filter-type-hugepage_size-fix.patch
docs-mm-damon-design-fix-typo-on-damos-filters-usage-doc-link.patch
docs-mm-damon-design-document-hugepage_size-filter.patch
docs-damon-move-damos-filter-type-names-and-meaning-to-design-doc.patch
docs-mm-damon-design-clarify-handling-layer-based-filters-evaluation-sequence.patch
docs-mm-damon-design-categorize-damos-filter-types-based-on-handling-layer.patch
mm-damon-implement-a-new-damos-filter-type-for-unmapped-pages.patch
docs-mm-damon-design-document-unmapped-damos-filter-type.patch
mm-damon-add-data-structure-for-monitoring-intervals-auto-tuning.patch
mm-damon-core-implement-intervals-auto-tuning.patch
mm-damon-sysfs-implement-intervals-tuning-goal-directory.patch
mm-damon-sysfs-commit-intervals-tuning-goal.patch
mm-damon-sysfs-implement-a-command-to-update-auto-tuned-monitoring-intervals.patch
docs-mm-damon-design-document-for-intervals-auto-tuning.patch
docs-mm-damon-design-document-for-intervals-auto-tuning-fix.patch
docs-abi-damon-document-intervals-auto-tuning-abi.patch
docs-admin-guide-mm-damon-usage-add-intervals_goal-directory-on-the-hierarchy.patch
mm-damon-core-introduce-damos-ops_filters.patch
mm-damon-paddr-support-ops_filters.patch
mm-damon-core-support-committing-ops_filters.patch
mm-damon-core-put-ops-handled-filters-to-damos-ops_filters.patch
mm-damon-paddr-support-only-damos-ops_filters.patch
mm-damon-add-default-allow-reject-behavior-fields-to-struct-damos.patch
mm-damon-core-set-damos_filter-default-allowance-behavior-based-on-installed-filters.patch
mm-damon-paddr-respect-ops_filters_default_reject.patch
docs-mm-damon-design-update-for-changed-filter-default-behavior.patch
mm-damon-sysfs-schemes-let-damon_sysfs_scheme_set_filters-be-used-for-different-named-directories.patch
mm-damon-sysfs-schemes-implement-core_filters-and-ops_filters-directories.patch
mm-damon-sysfs-schemes-commit-filters-in-coreops_filters-directories.patch
mm-damon-core-expose-damos_filter_for_ops-to-damon-kernel-api-callers.patch
mm-damon-sysfs-schemes-record-filters-of-which-layer-should-be-added-to-the-given-filters-directory.patch
mm-damon-sysfs-schemes-return-error-when-for-attempts-to-install-filters-on-wrong-sysfs-directory.patch
docs-abi-damon-document-coreops_filters-directories.patch
docs-admin-guide-mm-damon-usage-update-for-coreops_filters-directories.patch
The quilt patch titled
Subject: NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
has been removed from the -mm tree. Its filename was
nfs-fix-nfs_release_folio-to-not-deadlock-via-kcompactd-writeback.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mike Snitzer <snitzer(a)kernel.org>
Subject: NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
Date: Mon, 24 Feb 2025 21:20:02 -0500
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for it so
nfs_release_folio() can skip calling nfs_wb_folio() from kcompactd.
Otherwise NFS can deadlock waiting for kcompactd enduced writeback which
recurses back to NFS (which triggers writeback to NFSD via NFS loopback
mount on the same host, NFSD blocks waiting for XFS's call to
__filemap_get_folio):
6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.
{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}
[akpm(a)linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/20250225022002.26141-1-snitzer@kernel.org
Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
Cc: Anna Schumaker <anna.schumaker(a)oracle.com>
Cc: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nfs/file.c | 3 ++-
include/linux/compaction.h | 5 +++++
include/linux/sched.h | 2 +-
mm/compaction.c | 3 +++
4 files changed, 11 insertions(+), 2 deletions(-)
--- a/fs/nfs/file.c~nfs-fix-nfs_release_folio-to-not-deadlock-via-kcompactd-writeback
+++ a/fs/nfs/file.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/gfp.h>
#include <linux/swap.h>
+#include <linux/compaction.h>
#include <linux/uaccess.h>
#include <linux/filelock.h>
@@ -457,7 +458,7 @@ static bool nfs_release_folio(struct fol
/* If the private flag is set, then the folio is not freeable */
if (folio_test_private(folio)) {
if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
- current_is_kswapd())
+ current_is_kswapd() || current_is_kcompactd())
return false;
if (nfs_wb_folio(folio->mapping->host, folio) < 0)
return false;
--- a/include/linux/compaction.h~nfs-fix-nfs_release_folio-to-not-deadlock-via-kcompactd-writeback
+++ a/include/linux/compaction.h
@@ -80,6 +80,11 @@ static inline unsigned long compact_gap(
return 2UL << order;
}
+static inline int current_is_kcompactd(void)
+{
+ return current->flags & PF_KCOMPACTD;
+}
+
#ifdef CONFIG_COMPACTION
extern unsigned int extfrag_for_order(struct zone *zone, unsigned int order);
--- a/include/linux/sched.h~nfs-fix-nfs_release_folio-to-not-deadlock-via-kcompactd-writeback
+++ a/include/linux/sched.h
@@ -1701,7 +1701,7 @@ extern struct pid *cad_pid;
#define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */
#define PF_USER_WORKER 0x00004000 /* Kernel thread cloned from userspace thread */
#define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */
-#define PF__HOLE__00010000 0x00010000
+#define PF_KCOMPACTD 0x00010000 /* I am kcompactd */
#define PF_KSWAPD 0x00020000 /* I am kswapd */
#define PF_MEMALLOC_NOFS 0x00040000 /* All allocations inherit GFP_NOFS. See memalloc_nfs_save() */
#define PF_MEMALLOC_NOIO 0x00080000 /* All allocations inherit GFP_NOIO. See memalloc_noio_save() */
--- a/mm/compaction.c~nfs-fix-nfs_release_folio-to-not-deadlock-via-kcompactd-writeback
+++ a/mm/compaction.c
@@ -3181,6 +3181,7 @@ static int kcompactd(void *p)
long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
long timeout = default_timeout;
+ current->flags |= PF_KCOMPACTD;
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3237,6 +3238,8 @@ static int kcompactd(void *p)
pgdat->proactive_compact_trigger = false;
}
+ current->flags &= ~PF_KCOMPACTD;
+
return 0;
}
_
Patches currently in -mm which might be from snitzer(a)kernel.org are
The quilt patch titled
Subject: mm: abort vma_modify() on merge out of memory failure
has been removed from the -mm tree. Its filename was
mm-abort-vma_modify-on-merge-out-of-memory-failure.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm: abort vma_modify() on merge out of memory failure
Date: Sat, 22 Feb 2025 16:19:52 +0000
The remainder of vma_modify() relies upon the vmg state remaining pristine
after a merge attempt.
Usually this is the case, however in the one edge case scenario of a merge
attempt failing not due to the specified range being unmergeable, but
rather due to an out of memory error arising when attempting to commit the
merge, this assumption becomes untrue.
This results in vmg->start, end being modified, and thus the proceeding
attempts to split the VMA will be done with invalid start/end values.
Thankfully, it is likely practically impossible for us to hit this in
reality, as it would require a maple tree node pre-allocation failure that
would likely never happen due to it being 'too small to fail', i.e. the
kernel would simply keep retrying reclaim until it succeeded.
However, this scenario remains theoretically possible, and what we are
doing here is wrong so we must correct it.
The safest option is, when this scenario occurs, to simply give up the
operation. If we cannot allocate memory to merge, then we cannot allocate
memory to split either (perhaps moreso!).
Any scenario where this would be happening would be under very extreme
(likely fatal) memory pressure, so it's best we give up early.
So there is no doubt it is appropriate to simply bail out in this
scenario.
However, in general we must if at all possible never assume VMG state is
stable after a merge attempt, since merge operations update VMG fields.
As a result, additionally also make this clear by storing start, end in
local variables.
The issue was reported originally by syzkaller, and by Brad Spengler (via
an off-list discussion), and in both instances it manifested as a
triggering of the assert:
VM_WARN_ON_VMG(start >= end, vmg);
In vma_merge_existing_range().
It seems at least one scenario in which this is occurring is one in which
the merge being attempted is due to an madvise() across multiple VMAs
which looks like this:
start end
|<------>|
|----------|------|
| vma | next |
|----------|------|
When madvise_walk_vmas() is invoked, we first find vma in the above
(determining prev to be equal to vma as we are offset into vma), and then
enter the loop.
We determine the end of vma that forms part of the range we are
madvise()'ing by setting 'tmp' to this value:
/* Here vma->vm_start <= start < (end|vma->vm_end) */
tmp = vma->vm_end;
We then invoke the madvise() operation via visit(), letting prev get
updated to point to vma as part of the operation:
/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
error = visit(vma, &prev, start, tmp, arg);
Where the visit() function pointer in this instance is
madvise_vma_behavior().
As observed in syzkaller reports, it is ultimately madvise_update_vma()
that is invoked, calling vma_modify_flags_name() and vma_modify() in turn.
Then, in vma_modify(), we attempt the merge:
merged = vma_merge_existing_range(vmg);
if (merged)
return merged;
We invoke this with vmg->start, end set to start, tmp as such:
start tmp
|<--->|
|----------|------|
| vma | next |
|----------|------|
We find ourselves in the merge right scenario, but the one in which we
cannot remove the middle (we are offset into vma).
Here we have a special case where vmg->start, end get set to perhaps
unintuitive values - we intended to shrink the middle VMA and expand the
next.
This means vmg->start, end are set to... vma->vm_start, start.
Now the commit_merge() fails, and vmg->start, end are left like this.
This means we return to the rest of vma_modify() with vmg->start, end
(here denoted as start', end') set as:
start' end'
|<-->|
|----------|------|
| vma | next |
|----------|------|
So we now erroneously try to split accordingly. This is where the
unfortunate stuff begins.
We start with:
/* Split any preceding portion of the VMA. */
if (vma->vm_start < vmg->start) {
...
}
This doesn't trigger as we are no longer offset into vma at the start.
But then we invoke:
/* Split any trailing portion of the VMA. */
if (vma->vm_end > vmg->end) {
...
}
Which does get invoked. This leaves us with:
start' end'
|<-->|
|----|-----|------|
| vma| new | next |
|----|-----|------|
We then return ultimately to madvise_walk_vmas(). Here 'new' is unknown,
and putting back the values known in this function we are faced with:
start tmp end
| | |
|----|-----|------|
| vma| new | next |
|----|-----|------|
prev
Then:
start = tmp;
So:
start end
| |
|----|-----|------|
| vma| new | next |
|----|-----|------|
prev
The following code does not cause anything to happen:
if (prev && start < prev->vm_end)
start = prev->vm_end;
if (start >= end)
break;
And then we invoke:
if (prev)
vma = find_vma(mm, prev->vm_end);
Which is where a problem occurs - we don't know about 'new' so we
essentially look for the vma after prev, which is new, whereas we actually
intended to discover next!
So we end up with:
start end
| |
|----|-----|------|
|prev| vma | next |
|----|-----|------|
And we have successfully bypassed all of the checks madvise_walk_vmas()
has to ensure early exit should we end up moving out of range.
We loop around, and hit:
/* Here vma->vm_start <= start < (end|vma->vm_end) */
tmp = vma->vm_end;
Oh dear. Now we have:
tmp
start end
| |
|----|-----|------|
|prev| vma | next |
|----|-----|------|
We then invoke:
/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
error = visit(vma, &prev, start, tmp, arg);
Where start == tmp. That is, a zero range. This is not good.
We invoke visit() which is madvise_vma_behavior() which does not check the
range (for good reason, it assumes all checks have been done before it was
called), which in turn finally calls madvise_update_vma().
The madvise_update_vma() function calls vma_modify_flags_name() in turn,
which ultimately invokes vma_modify() with... start == end.
vma_modify() calls vma_merge_existing_range() and finally we hit:
VM_WARN_ON_VMG(start >= end, vmg);
Which triggers, as start == end.
While it might be useful to add some CONFIG_DEBUG_VM asserts in these
instances to catch this kind of error, since we have just eliminated any
possibility of that happening, we will add such asserts separately as to
reduce churn and aid backporting.
Link: https://lkml.kernel.org/r/20250222161952.41957-1-lorenzo.stoakes@oracle.com
Fixes: 2f1c6611b0a8 ("mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Tested-by: Brad Spengler <brad.spengler(a)opensrcsec.com>
Reported-by: Brad Spengler <brad.spengler(a)opensrcsec.com>
Reported-by: syzbot+46423ed8fa1f1148c6e4(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/6774c98f.050a0220.25abdd.0991.GAE@google.c…
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vma.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--- a/mm/vma.c~mm-abort-vma_modify-on-merge-out-of-memory-failure
+++ a/mm/vma.c
@@ -1509,24 +1509,28 @@ int do_vmi_munmap(struct vma_iterator *v
static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg)
{
struct vm_area_struct *vma = vmg->vma;
+ unsigned long start = vmg->start;
+ unsigned long end = vmg->end;
struct vm_area_struct *merged;
/* First, try to merge. */
merged = vma_merge_existing_range(vmg);
if (merged)
return merged;
+ if (vmg_nomem(vmg))
+ return ERR_PTR(-ENOMEM);
/* Split any preceding portion of the VMA. */
- if (vma->vm_start < vmg->start) {
- int err = split_vma(vmg->vmi, vma, vmg->start, 1);
+ if (vma->vm_start < start) {
+ int err = split_vma(vmg->vmi, vma, start, 1);
if (err)
return ERR_PTR(err);
}
/* Split any trailing portion of the VMA. */
- if (vma->vm_end > vmg->end) {
- int err = split_vma(vmg->vmi, vma, vmg->end, 0);
+ if (vma->vm_end > end) {
+ int err = split_vma(vmg->vmi, vma, end, 0);
if (err)
return ERR_PTR(err);
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-simplify-vma-merge-structure-and-expand-comments.patch
mm-further-refactor-commit_merge.patch
mm-eliminate-adj_start-parameter-from-commit_merge.patch
mm-make-vmg-target-consistent-and-further-simplify-commit_merge.patch
mm-completely-abstract-unnecessary-adj_start-calculation.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise-fix.patch
mm-use-read-write_once-for-vma-vm_flags-on-migrate-mprotect.patch
mm-refactor-rmap_walk_file-to-separate-out-traversal-logic.patch
mm-provide-mapping_wrprotect_range-function.patch
fb_defio-do-not-use-deprecated-page-mapping-index-fields.patch
fb_defio-do-not-use-deprecated-page-mapping-index-fields-fix.patch
mm-allow-guard-regions-in-file-backed-and-read-only-mappings.patch
selftests-mm-rename-guard-pages-to-guard-regions.patch
selftests-mm-rename-guard-pages-to-guard-regions-fix.patch
tools-selftests-expand-all-guard-region-tests-to-file-backed.patch
tools-selftests-add-file-shmem-backed-mapping-guard-region-tests.patch
fs-proc-task_mmu-add-guard-region-bit-to-pagemap.patch
tools-selftests-add-guard-region-test-for-proc-pid-pagemap.patch
tools-selftests-add-guard-region-test-for-proc-pid-pagemap-fix.patch
mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
mm-mremap-refactor-mremap-system-call-implementation.patch
mm-mremap-introduce-and-use-vma_remap_struct-threaded-state.patch
mm-mremap-initial-refactor-of-move_vma.patch
mm-mremap-complete-refactor-of-move_vma.patch
mm-mremap-refactor-move_page_tables-abstracting-state.patch
mm-mremap-thread-state-through-move-page-table-operation.patch
The quilt patch titled
Subject: mm/hugetlb: wait for hugetlb folios to be freed
has been removed from the -mm tree. Its filename was
mm-hugetlb-wait-for-hugetlb-folios-to-be-freed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ge Yang <yangge1116(a)126.com>
Subject: mm/hugetlb: wait for hugetlb folios to be freed
Date: Wed, 19 Feb 2025 11:46:44 +0800
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 5 +++++
mm/hugetlb.c | 8 ++++++++
mm/page_isolation.c | 10 ++++++++++
3 files changed, 23 insertions(+)
--- a/include/linux/hugetlb.h~mm-hugetlb-wait-for-hugetlb-folios-to-be-freed
+++ a/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
--- a/mm/hugetlb.c~mm-hugetlb-wait-for-hugetlb-folios-to-be-freed
+++ a/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigne
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
--- a/mm/page_isolation.c~mm-hugetlb-wait-for-hugetlb-folios-to-be-freed
+++ a/mm/page_isolation.c
@@ -608,6 +608,16 @@ int test_pages_isolated(unsigned long st
int ret;
/*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
+ /*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
* Then we just check migratetype first.
_
Patches currently in -mm which might be from yangge1116(a)126.com are
The quilt patch titled
Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
has been removed from the -mm tree. Its filename was
hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
Date: Mon, 17 Feb 2025 09:43:29 +0800
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~hwpoison-memory_hotplug-lock-folio-before-unmap-hwpoisoned-folio
+++ a/mm/memory_hotplug.c
@@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned lo
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
- if (folio_mapped(folio))
+ if (folio_mapped(folio)) {
+ folio_lock(folio);
unmap_poisoned_folio(folio, pfn, false);
+ folio_unlock(folio);
+ }
goto put_folio;
}
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
The quilt patch titled
Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range
has been removed from the -mm tree. Its filename was
mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: mm: memory-hotplug: check folio ref count first in do_migrate_range
Date: Mon, 17 Feb 2025 09:43:28 +0800
If a folio has an increased reference count, folio_try_get() will acquire
it, perform necessary operations, and then release it. In the case of a
poisoned folio without an elevated reference count (which is unlikely for
memory-failure), folio_try_get() will simply bypass it.
Therefore, relocate the folio_try_get() function, responsible for checking
and acquiring this reference count at first.
Link: https://lkml.kernel.org/r/20250217014329.3610326-3-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory-hotplug-check-folio-ref-count-first-in-do_migrate_range
+++ a/mm/memory_hotplug.c
@@ -1822,12 +1822,12 @@ static void do_migrate_range(unsigned lo
if (folio_test_large(folio))
pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
- /*
- * HWPoison pages have elevated reference counts so the migration would
- * fail on them. It also doesn't make any sense to migrate them in the
- * first place. Still try to unmap such a page in case it is still mapped
- * (keep the unmap as the catch all safety net).
- */
+ if (!folio_try_get(folio))
+ continue;
+
+ if (unlikely(page_folio(page) != folio))
+ goto put_folio;
+
if (folio_test_hwpoison(folio) ||
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
@@ -1835,14 +1835,8 @@ static void do_migrate_range(unsigned lo
if (folio_mapped(folio))
unmap_poisoned_folio(folio, pfn, false);
- continue;
- }
-
- if (!folio_try_get(folio))
- continue;
-
- if (unlikely(page_folio(page) != folio))
goto put_folio;
+ }
if (!isolate_folio_to_list(folio, &source)) {
if (__ratelimit(&migrate_rs)) {
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
The quilt patch titled
Subject: mm: memory-failure: update ttu flag inside unmap_poisoned_folio
has been removed from the -mm tree. Its filename was
mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ma Wupeng <mawupeng1(a)huawei.com>
Subject: mm: memory-failure: update ttu flag inside unmap_poisoned_folio
Date: Mon, 17 Feb 2025 09:43:27 +0800
Patch series "mm: memory_failure: unmap poisoned folio during migrate
properly", v3.
Fix two bugs during folio migration if the folio is poisoned.
This patch (of 3):
Commit 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to
TTU_HWPOISON") introduce TTU_HWPOISON to replace TTU_IGNORE_HWPOISON in
order to stop send SIGBUS signal when accessing an error page after a
memory error on a clean folio. However during page migration, anon folio
must be set with TTU_HWPOISON during unmap_*(). For pagecache we need
some policy just like the one in hwpoison_user_mappings to set this flag.
So move this policy from hwpoison_user_mappings to unmap_poisoned_folio to
handle this warning properly.
Warning will be produced during unamp poison folio with the following log:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 365 at mm/rmap.c:1847 try_to_unmap_one+0x8fc/0xd3c
Modules linked in:
CPU: 1 UID: 0 PID: 365 Comm: bash Tainted: G W 6.13.0-rc1-00018-gacdb4bbda7ab #42
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0x8fc/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0x8fc/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
---[ end trace 0000000000000000 ]---
[mawupeng1(a)huawei.com: unmap_poisoned_folio(): remove shadowed local `mapping', per Miaohe]
Link: https://lkml.kernel.org/r/20250219060653.3849083-1-mawupeng1@huawei.com
Link: https://lkml.kernel.org/r/20250217014329.3610326-1-mawupeng1@huawei.com
Link: https://lkml.kernel.org/r/20250217014329.3610326-2-mawupeng1@huawei.com
Fixes: 6da6b1d4a7df ("mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Ma Wupeng <mawupeng1(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 5 ++-
mm/memory-failure.c | 63 ++++++++++++++++++++----------------------
mm/memory_hotplug.c | 3 +-
3 files changed, 36 insertions(+), 35 deletions(-)
--- a/mm/internal.h~mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio
+++ a/mm/internal.h
@@ -1115,7 +1115,7 @@ static inline int find_next_best_node(in
* mm/memory-failure.c
*/
#ifdef CONFIG_MEMORY_FAILURE
-void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu);
+int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill);
void shake_folio(struct folio *folio);
extern int hwpoison_filter(struct page *p);
@@ -1138,8 +1138,9 @@ unsigned long page_mapped_in_vma(const s
struct vm_area_struct *vma);
#else
-static inline void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
+static inline int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
{
+ return -EBUSY;
}
#endif
--- a/mm/memory-failure.c~mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio
+++ a/mm/memory-failure.c
@@ -1556,11 +1556,35 @@ static int get_hwpoison_page(struct page
return ret;
}
-void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
+int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
{
- if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
- struct address_space *mapping;
+ enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
+ struct address_space *mapping;
+
+ if (folio_test_swapcache(folio)) {
+ pr_err("%#lx: keeping poisoned page in swap cache\n", pfn);
+ ttu &= ~TTU_HWPOISON;
+ }
+
+ /*
+ * Propagate the dirty bit from PTEs to struct page first, because we
+ * need this to decide if we should kill or just drop the page.
+ * XXX: the dirty test could be racy: set_page_dirty() may not always
+ * be called inside page lock (it's recommended but not enforced).
+ */
+ mapping = folio_mapping(folio);
+ if (!must_kill && !folio_test_dirty(folio) && mapping &&
+ mapping_can_writeback(mapping)) {
+ if (folio_mkclean(folio)) {
+ folio_set_dirty(folio);
+ } else {
+ ttu &= ~TTU_HWPOISON;
+ pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
+ pfn);
+ }
+ }
+ if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
/*
* For hugetlb folios in shared mappings, try_to_unmap
* could potentially call huge_pmd_unshare. Because of
@@ -1572,7 +1596,7 @@ void unmap_poisoned_folio(struct folio *
if (!mapping) {
pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
folio_pfn(folio));
- return;
+ return -EBUSY;
}
try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
@@ -1580,6 +1604,8 @@ void unmap_poisoned_folio(struct folio *
} else {
try_to_unmap(folio, ttu);
}
+
+ return folio_mapped(folio) ? -EBUSY : 0;
}
/*
@@ -1589,8 +1615,6 @@ void unmap_poisoned_folio(struct folio *
static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
unsigned long pfn, int flags)
{
- enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
- struct address_space *mapping;
LIST_HEAD(tokill);
bool unmap_success;
int forcekill;
@@ -1613,29 +1637,6 @@ static bool hwpoison_user_mappings(struc
if (!folio_mapped(folio))
return true;
- if (folio_test_swapcache(folio)) {
- pr_err("%#lx: keeping poisoned page in swap cache\n", pfn);
- ttu &= ~TTU_HWPOISON;
- }
-
- /*
- * Propagate the dirty bit from PTEs to struct page first, because we
- * need this to decide if we should kill or just drop the page.
- * XXX: the dirty test could be racy: set_page_dirty() may not always
- * be called inside page lock (it's recommended but not enforced).
- */
- mapping = folio_mapping(folio);
- if (!(flags & MF_MUST_KILL) && !folio_test_dirty(folio) && mapping &&
- mapping_can_writeback(mapping)) {
- if (folio_mkclean(folio)) {
- folio_set_dirty(folio);
- } else {
- ttu &= ~TTU_HWPOISON;
- pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
- pfn);
- }
- }
-
/*
* First collect all the processes that have the page
* mapped in dirty form. This has to be done before try_to_unmap,
@@ -1643,9 +1644,7 @@ static bool hwpoison_user_mappings(struc
*/
collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
- unmap_poisoned_folio(folio, ttu);
-
- unmap_success = !folio_mapped(folio);
+ unmap_success = !unmap_poisoned_folio(folio, pfn, flags & MF_MUST_KILL);
if (!unmap_success)
pr_err("%#lx: failed to unmap page (folio mapcount=%d)\n",
pfn, folio_mapcount(folio));
--- a/mm/memory_hotplug.c~mm-memory-failure-update-ttu-flag-inside-unmap_poisoned_folio
+++ a/mm/memory_hotplug.c
@@ -1833,7 +1833,8 @@ static void do_migrate_range(unsigned lo
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
if (folio_mapped(folio))
- unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
+ unmap_poisoned_folio(folio, pfn, false);
+
continue;
}
_
Patches currently in -mm which might be from mawupeng1(a)huawei.com are
The quilt patch titled
Subject: arm: pgtable: fix NULL pointer dereference issue
has been removed from the -mm tree. Its filename was
arm-pgtable-fix-null-pointer-dereference-issue.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Subject: arm: pgtable: fix NULL pointer dereference issue
Date: Mon, 17 Feb 2025 10:49:24 +0800
When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
parameter is NULL, which will cause a NULL pointer dereference issue in
adjust_pte():
Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
Hardware name: Atmel AT91SAM9
PC is at update_mmu_cache_range+0x1e0/0x278
LR is at pte_offset_map_rw_nolock+0x18/0x2c
Call trace:
update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
remove_migration_pte from rmap_walk_file+0xcc/0x130
rmap_walk_file from remove_migration_ptes+0x90/0xa4
remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
migrate_pages_batch from migrate_pages+0x188/0x488
migrate_pages from compact_zone+0x56c/0x954
compact_zone from compact_node+0x90/0xf0
compact_node from kcompactd+0x1d4/0x204
kcompactd from kthread+0x120/0x12c
kthread from ret_from_fork+0x14/0x38
Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
enabled. In addition, if two vmas map to the same PTE page, there is no
need to hold the pte lock again, otherwise a deadlock will occur. Just
add the need_lock parameter to let adjust_pte() know this information.
Link: https://lkml.kernel.org/r/20250217024924.57996-1-zhengqi.arch@bytedance.com
Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reported-by: Ezra Buehler <ezra.buehler(a)husqvarnagroup.com>
Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0c…
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Ezra Buehler <ezra.buehler(a)husqvarnagroup.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Russel King <linux(a)armlinux.org.uk>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm/mm/fault-armv.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
--- a/arch/arm/mm/fault-armv.c~arm-pgtable-fix-null-pointer-dereference-issue
+++ a/arch/arm/mm/fault-armv.c
@@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_
}
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
- unsigned long pfn, struct vm_fault *vmf)
+ unsigned long pfn, bool need_lock)
{
spinlock_t *ptl;
pgd_t *pgd;
@@ -99,12 +99,11 @@ again:
if (!pte)
return 0;
- /*
- * If we are using split PTE locks, then we need to take the page
- * lock here. Otherwise we are using shared mm->page_table_lock
- * which is already locked, thus cannot take it.
- */
- if (ptl != vmf->ptl) {
+ if (need_lock) {
+ /*
+ * Use nested version here to indicate that we are already
+ * holding one similar spinlock.
+ */
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
pte_unmap_unlock(pte, ptl);
@@ -114,7 +113,7 @@ again:
ret = do_adjust_pte(vma, address, pfn, pte);
- if (ptl != vmf->ptl)
+ if (need_lock)
spin_unlock(ptl);
pte_unmap(pte);
@@ -123,9 +122,10 @@ again:
static void
make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep, unsigned long pfn,
- struct vm_fault *vmf)
+ unsigned long addr, pte_t *ptep, unsigned long pfn)
{
+ const unsigned long pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE);
+ const unsigned long pmd_end_addr = pmd_start_addr + PMD_SIZE;
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *mpnt;
unsigned long offset;
@@ -142,6 +142,14 @@ make_coherent(struct address_space *mapp
flush_dcache_mmap_lock(mapping);
vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
/*
+ * If we are using split PTE locks, then we need to take the pte
+ * lock. Otherwise we are using shared mm->page_table_lock which
+ * is already locked, thus cannot take it.
+ */
+ bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS);
+ unsigned long mpnt_addr;
+
+ /*
* If this VMA is not in our MM, we can ignore it.
* Note that we intentionally mask out the VMA
* that we are fixing up.
@@ -151,7 +159,12 @@ make_coherent(struct address_space *mapp
if (!(mpnt->vm_flags & VM_MAYSHARE))
continue;
offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
+ mpnt_addr = mpnt->vm_start + offset;
+
+ /* Avoid deadlocks by not grabbing the same PTE lock again. */
+ if (mpnt_addr >= pmd_start_addr && mpnt_addr < pmd_end_addr)
+ need_lock = false;
+ aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock);
}
flush_dcache_mmap_unlock(mapping);
if (aliases)
@@ -194,7 +207,7 @@ void update_mmu_cache_range(struct vm_fa
__flush_dcache_folio(mapping, folio);
if (mapping) {
if (cache_is_vivt())
- make_coherent(mapping, vma, addr, ptep, pfn, vmf);
+ make_coherent(mapping, vma, addr, ptep, pfn);
else if (vma->vm_flags & VM_EXEC)
__flush_icache_all();
}
_
Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are
mm-pgtable-make-generic-tlb_remove_table-use-struct-ptdesc.patch
mm-pgtable-change-pt-parameter-of-tlb_remove_ptdesc-to-struct-ptdesc.patch
mm-pgtable-convert-some-architectures-to-use-tlb_remove_ptdesc.patch
mm-pgtable-convert-some-architectures-to-use-tlb_remove_ptdesc-v2.patch
riscv-pgtable-unconditionally-use-tlb_remove_ptdesc.patch
x86-pgtable-convert-to-use-tlb_remove_ptdesc.patch
mm-pgtable-remove-tlb_remove_page_ptdesc.patch
The quilt patch titled
Subject: m68k: sun3: add check for __pgd_alloc()
has been removed from the -mm tree. Its filename was
m68k-sun3-add-check-for-__pgd_alloc.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Haoxiang Li <haoxiang_li2024(a)163.com>
Subject: m68k: sun3: add check for __pgd_alloc()
Date: Tue, 18 Feb 2025 00:00:17 +0800
Add check for the return value of __pgd_alloc() in pgd_alloc() to prevent
null pointer dereference.
Link: https://lkml.kernel.org/r/20250217160017.2375536-1-haoxiang_li2024@163.com
Fixes: a9b3c355c2e6 ("asm-generic: pgalloc: provide generic __pgd_{alloc,free}")
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Sam Creasey <sammy(a)sammy.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/m68k/include/asm/sun3_pgalloc.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/m68k/include/asm/sun3_pgalloc.h~m68k-sun3-add-check-for-__pgd_alloc
+++ a/arch/m68k/include/asm/sun3_pgalloc.h
@@ -44,8 +44,10 @@ static inline pgd_t * pgd_alloc(struct m
pgd_t *new_pgd;
new_pgd = __pgd_alloc(mm, 0);
- memcpy(new_pgd, swapper_pg_dir, PAGE_SIZE);
- memset(new_pgd, 0, (PAGE_OFFSET >> PGDIR_SHIFT));
+ if (likely(new_pgd != NULL)) {
+ memcpy(new_pgd, swapper_pg_dir, PAGE_SIZE);
+ memset(new_pgd, 0, (PAGE_OFFSET >> PGDIR_SHIFT));
+ }
return new_pgd;
}
_
Patches currently in -mm which might be from haoxiang_li2024(a)163.com are
The quilt patch titled
Subject: selftests/damon/damos_quota_goal: handle minimum quota that cannot be further reduced
has been removed from the -mm tree. Its filename was
selftests-damon-damos_quota_goal-handle-minimum-quota-that-cannot-be-further-reduced.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: selftests/damon/damos_quota_goal: handle minimum quota that cannot be further reduced
Date: Mon, 17 Feb 2025 10:23:04 -0800
damos_quota_goal.py selftest see if DAMOS quota goals tuning feature
increases or reduces the effective size quota for given score as expected.
The tuning feature sets the minimum quota size as one byte, so if the
effective size quota is already one, we cannot expect it further be
reduced. However the test is not aware of the edge case, and fails since
it shown no expected change of the effective quota. Handle the case by
updating the failure logic for no change to see if it was the case, and
simply skips to next test input.
Link: https://lkml.kernel.org/r/20250217182304.45215-1-sj@kernel.org
Fixes: f1c07c0a1662 ("selftests/damon: add a test for DAMOS quota goal")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202502171423.b28a918d-lkp@intel.com
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.10.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/damon/damos_quota_goal.py | 3 +++
1 file changed, 3 insertions(+)
--- a/tools/testing/selftests/damon/damos_quota_goal.py~selftests-damon-damos_quota_goal-handle-minimum-quota-that-cannot-be-further-reduced
+++ a/tools/testing/selftests/damon/damos_quota_goal.py
@@ -63,6 +63,9 @@ def main():
if last_effective_bytes != 0 else -1.0))
if last_effective_bytes == goal.effective_bytes:
+ # effective quota was already minimum that cannot be more reduced
+ if expect_increase is False and last_effective_bytes == 1:
+ continue
print('efective bytes not changed: %d' % goal.effective_bytes)
exit(1)
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-respect-core-layer-filters-allowance-decision-on-ops-layer.patch
mm-damon-core-initialize-damos-walk_completed-in-damon_new_scheme.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise.patch
mm-madvise-split-out-madvise-input-validity-check.patch
mm-madvise-split-out-madvise-behavior-execution.patch
mm-madvise-remove-redundant-mmap_lock-operations-from-process_madvise.patch
mm-damon-avoid-applying-damos-action-to-same-entity-multiple-times.patch
mm-damon-core-unset-damos-walk_completed-after-confimed-set.patch
mm-damon-core-do-not-call-damos_walk_control-walk-if-walk-is-completed.patch
mm-damon-core-do-damos-walking-in-entire-regions-granularity.patch
mm-damon-introduce-damos-filter-type-hugepage_size-fix.patch
docs-mm-damon-design-fix-typo-on-damos-filters-usage-doc-link.patch
docs-mm-damon-design-document-hugepage_size-filter.patch
docs-damon-move-damos-filter-type-names-and-meaning-to-design-doc.patch
docs-mm-damon-design-clarify-handling-layer-based-filters-evaluation-sequence.patch
docs-mm-damon-design-categorize-damos-filter-types-based-on-handling-layer.patch
mm-damon-implement-a-new-damos-filter-type-for-unmapped-pages.patch
docs-mm-damon-design-document-unmapped-damos-filter-type.patch
mm-damon-add-data-structure-for-monitoring-intervals-auto-tuning.patch
mm-damon-core-implement-intervals-auto-tuning.patch
mm-damon-sysfs-implement-intervals-tuning-goal-directory.patch
mm-damon-sysfs-commit-intervals-tuning-goal.patch
mm-damon-sysfs-implement-a-command-to-update-auto-tuned-monitoring-intervals.patch
docs-mm-damon-design-document-for-intervals-auto-tuning.patch
docs-mm-damon-design-document-for-intervals-auto-tuning-fix.patch
docs-abi-damon-document-intervals-auto-tuning-abi.patch
docs-admin-guide-mm-damon-usage-add-intervals_goal-directory-on-the-hierarchy.patch
mm-damon-core-introduce-damos-ops_filters.patch
mm-damon-paddr-support-ops_filters.patch
mm-damon-core-support-committing-ops_filters.patch
mm-damon-core-put-ops-handled-filters-to-damos-ops_filters.patch
mm-damon-paddr-support-only-damos-ops_filters.patch
mm-damon-add-default-allow-reject-behavior-fields-to-struct-damos.patch
mm-damon-core-set-damos_filter-default-allowance-behavior-based-on-installed-filters.patch
mm-damon-paddr-respect-ops_filters_default_reject.patch
docs-mm-damon-design-update-for-changed-filter-default-behavior.patch
mm-damon-sysfs-schemes-let-damon_sysfs_scheme_set_filters-be-used-for-different-named-directories.patch
mm-damon-sysfs-schemes-implement-core_filters-and-ops_filters-directories.patch
mm-damon-sysfs-schemes-commit-filters-in-coreops_filters-directories.patch
mm-damon-core-expose-damos_filter_for_ops-to-damon-kernel-api-callers.patch
mm-damon-sysfs-schemes-record-filters-of-which-layer-should-be-added-to-the-given-filters-directory.patch
mm-damon-sysfs-schemes-return-error-when-for-attempts-to-install-filters-on-wrong-sysfs-directory.patch
docs-abi-damon-document-coreops_filters-directories.patch
docs-admin-guide-mm-damon-usage-update-for-coreops_filters-directories.patch
Currently on stable trees we have support for netmem/devmem RX but not
TX. It is not safe to forward/redirect an RX unreadable netmem packet
into the device's TX path, as the device may call dma-mapping APIs on
dma addrs that should not be passed to it.
Fix this by preventing the xmit of unreadable skbs.
Tested by configuring tc redirect:
sudo tc qdisc add dev eth1 ingress
sudo tc filter add dev eth1 ingress protocol ip prio 1 flower ip_proto \
tcp src_ip 192.168.1.12 action mirred egress redirect dev eth1
Before, I see unreadable skbs in the driver's TX path passed to dma
mapping APIs.
After, I don't see unreadable skbs in the driver's TX path passed to dma
mapping APIs.
Fixes: 65249feb6b3d ("net: add support for skbs with unreadable frags")
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
---
net/core/dev.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 30da277c5a6f..63b31afacf84 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3914,6 +3914,9 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
skb = validate_xmit_xfrm(skb, features, again);
+ if (!skb_frags_readable(skb))
+ goto out_kfree_skb;
+
return skb;
out_kfree_skb:
base-commit: 3c6a041b317a9bb0c707343c0b99d2a29d523390
--
2.48.1.711.g2feabab25a-goog
According to [1], `NonNull<T>` and `#[repr(transparent)]` wrapper types
such as our custom `KBox<T>` have the null pointer optimization only if
`T: Sized`. Thus remove the `Zeroable` implementation for the unsized
case.
Link: https://doc.rust-lang.org/stable/std/option/index.html#representation [1]
Cc: stable(a)vger.kernel.org # v6.12+ (a custom patch will be needed for 6.6.y)
Fixes: 38cde0bd7b67 ("rust: init: add `Zeroable` trait and `init::zeroed` function")
Signed-off-by: Benno Lossin <benno.lossin(a)proton.me>
---
rust/kernel/init.rs | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/rust/kernel/init.rs b/rust/kernel/init.rs
index 7fd1ea8265a5..8bbd5e3398fc 100644
--- a/rust/kernel/init.rs
+++ b/rust/kernel/init.rs
@@ -1418,17 +1418,14 @@ macro_rules! impl_zeroable {
// SAFETY: `T: Zeroable` and `UnsafeCell` is `repr(transparent)`.
{<T: ?Sized + Zeroable>} UnsafeCell<T>,
- // SAFETY: All zeros is equivalent to `None` (option layout optimization guarantee).
+ // SAFETY: All zeros is equivalent to `None` (option layout optimization guarantee:
+ // https://doc.rust-lang.org/stable/std/option/index.html#representation).
Option<NonZeroU8>, Option<NonZeroU16>, Option<NonZeroU32>, Option<NonZeroU64>,
Option<NonZeroU128>, Option<NonZeroUsize>,
Option<NonZeroI8>, Option<NonZeroI16>, Option<NonZeroI32>, Option<NonZeroI64>,
Option<NonZeroI128>, Option<NonZeroIsize>,
-
- // SAFETY: All zeros is equivalent to `None` (option layout optimization guarantee).
- //
- // In this case we are allowed to use `T: ?Sized`, since all zeros is the `None` variant.
- {<T: ?Sized>} Option<NonNull<T>>,
- {<T: ?Sized>} Option<KBox<T>>,
+ {<T>} Option<NonNull<T>>,
+ {<T>} Option<KBox<T>>,
// SAFETY: `null` pointer is valid.
//
base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
--
2.48.1
In commit 392e34b6bc22 ("kbuild: rust: remove the `alloc` crate and
`GlobalAlloc`") we stopped using the upstream `alloc` crate.
Thus remove a few leftover mentions treewide.
Cc: stable(a)vger.kernel.org # Also to 6.12.y after the `alloc` backport lands
Fixes: 392e34b6bc22 ("kbuild: rust: remove the `alloc` crate and `GlobalAlloc`")
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
Documentation/rust/quick-start.rst | 2 +-
rust/kernel/lib.rs | 2 +-
scripts/rustdoc_test_gen.rs | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst
index 4aa50e5fcb8c..6d2607870ba4 100644
--- a/Documentation/rust/quick-start.rst
+++ b/Documentation/rust/quick-start.rst
@@ -145,7 +145,7 @@ Rust standard library source
****************************
The Rust standard library source is required because the build system will
-cross-compile ``core`` and ``alloc``.
+cross-compile ``core``.
If ``rustup`` is being used, run::
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 398242f92a96..7697c60b2d1a 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -6,7 +6,7 @@
//! usage by Rust code in the kernel and is shared by all of them.
//!
//! In other words, all the rest of the Rust code in the kernel (e.g. kernel
-//! modules written in Rust) depends on [`core`], [`alloc`] and this crate.
+//! modules written in Rust) depends on [`core`] and this crate.
//!
//! If you need a kernel C API that is not ported or wrapped yet here, then
//! do so first instead of bypassing this crate.
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 5ebd42ae4a3f..76aaa8329413 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -15,8 +15,8 @@
//! - Test code should be able to define functions and call them, without having to carry
//! the context.
//!
-//! - Later on, we may want to be able to test non-kernel code (e.g. `core`, `alloc` or
-//! third-party crates) which likely use the standard library `assert*!` macros.
+//! - Later on, we may want to be able to test non-kernel code (e.g. `core` or third-party
+//! crates) which likely use the standard library `assert*!` macros.
//!
//! For this reason, instead of the passed context, `kunit_get_current_test()` is used instead
//! (i.e. `current->kunit_test`).
base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
--
2.48.1
The patch titled
Subject: mm/migrate: fix shmem xarray update during migration
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-fix-shmem-xarray-update-during-migration.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/migrate: fix shmem xarray update during migration
Date: Wed, 5 Mar 2025 15:04:03 -0500
A shmem folio can be either in page cache or in swap cache, but not at the
same time. Namely, once it is in swap cache, folio->mapping should be
NULL, and the folio is no longer in a shmem mapping.
In __folio_migrate_mapping(), to determine the number of xarray entries to
update, folio_test_swapbacked() is used, but that conflates shmem in page
cache case and shmem in swap cache case. It leads to xarray multi-index
entry corruption, since it turns a sibling entry to a normal entry during
xas_store() (see [1] for a userspace reproduction). Fix it by only using
folio_test_swapcache() to determine whether xarray is storing swap cache
entries or not to choose the right number of xarray entries to update.
[1] https://lore.kernel.org/linux-mm/Z8idPCkaJW1IChjT@casper.infradead.org/
Note:
In __split_huge_page(), folio_test_anon() && folio_test_swapcache() is
used to get swap_cache address space, but that ignores the shmem folio in
swap cache case. It could lead to NULL pointer dereferencing when a
in-swap-cache shmem folio is split at __xa_store(), since
!folio_test_anon() is true and folio->mapping is NULL. But fortunately,
its caller split_huge_page_to_list_to_order() bails out early with EBUSY
when folio->mapping is NULL. So no need to take care of it here.
Link: https://lkml.kernel.org/r/20250305200403.2822855-1-ziy@nvidia.com
Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
Suggested-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/mm/migrate.c~mm-migrate-fix-shmem-xarray-update-during-migration
+++ a/mm/migrate.c
@@ -518,15 +518,13 @@ static int __folio_migrate_mapping(struc
if (folio_test_anon(folio) && folio_test_large(folio))
mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON, 1);
folio_ref_add(newfolio, nr); /* add cache reference */
- if (folio_test_swapbacked(folio)) {
+ if (folio_test_swapbacked(folio))
__folio_set_swapbacked(newfolio);
- if (folio_test_swapcache(folio)) {
- folio_set_swapcache(newfolio);
- newfolio->private = folio_get_private(folio);
- }
+ if (folio_test_swapcache(folio)) {
+ folio_set_swapcache(newfolio);
+ newfolio->private = folio_get_private(folio);
entries = nr;
} else {
- VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
entries = 1;
}
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
mm-migrate-fix-shmem-xarray-update-during-migration.patch
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
In 2020, there's been an unnoticed change which rightfully attempted to
report probe deferrals upon DMA absence by checking the return value of
dma_request_chan_by_mask(). By doing so, it also reported errors which
were simply ignored otherwise, likely on purpose.
This change actually turned a void return into an error code. Hence, not
only the -EPROBE_DEFER error codes but all error codes got reported to
the callers, now failing to probe in the absence of Rx DMA channel,
despite the fact that DMA seems to not be supported natively by many
implementations.
Looking at the history, this change probably led to:
ad2775dc3fc5 ("spi: cadence-quadspi: Disable the DAC for Intel LGM SoC")
f724c296f2f2 ("spi: cadence-quadspi: fix Direct Access Mode disable for SoCFPGA")
In my case, the AM62A LP SK core octo-SPI node from TI does not
advertise any DMA channel, hinting that there is likely no support for
it, but yet when the support for the am654 compatible was added, DMA
seemed to be used, so just discarding its use with the
CQSPI_DISABLE_DAC_MODE quirk for this compatible does not seem the
correct approach.
Let's get change the return condition back to:
- return a probe deferral error if we get one
- ignore the return value otherwise
The "error" log level was however likely too high for something that is
expected to fail, so let's lower it arbitrarily to the info level.
Fixes: 935da5e5100f ("mtd: spi-nor: cadence-quadspi: Handle probe deferral while requesting DMA channel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/spi/spi-cadence-quadspi.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index 0cd37a7436d5..c90462783b3f 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -1658,6 +1658,12 @@ static int cqspi_request_mmap_dma(struct cqspi_st *cqspi)
int ret = PTR_ERR(cqspi->rx_chan);
cqspi->rx_chan = NULL;
+ if (ret == -ENODEV) {
+ /* DMA support is not mandatory */
+ dev_info(&cqspi->pdev->dev, "No Rx DMA available\n");
+ return 0;
+ }
+
return dev_err_probe(&cqspi->pdev->dev, ret, "No Rx DMA available\n");
}
init_completion(&cqspi->rx_dma_complete);
--
2.48.1
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
‘initrd_start_early’ undeclared (first use in this function); did you
mean ‘initrd_start’? in arch/x86/kernel/cpu/microcode/core.o
(arch/x86/kernel/cpu/microcode/core.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:b19e4ff21824fec766ed87bd97bf9372369920…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 9f243f9dd2684b4b27f8be0cbed639052bc9b22e
Log excerpt:
=====================================================
arch/x86/kernel/cpu/microcode/core.c:198:25: error:
‘initrd_start_early’ undeclared (first use in this function); did you
mean ‘initrd_start’?
198 | start = initrd_start_early;
| ^~~~~~~~~~~~~~~~~~
| initrd_start
arch/x86/kernel/cpu/microcode/core.c:198:25: note: each undeclared
identifier is reported only once for each function it appears in
=====================================================
# Builds where the incident occurred:
## i386_defconfig+kselftest on (i386):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67c89ed518018371956c816d
#kernelci issue maestro:b19e4ff21824fec766ed87bd97bf937236992087
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
variable 'equiv_id' is used uninitialized whenever 'if' condition is
false [-Werror,-Wsometimes-uninitialized] in
arch/x86/kernel/cpu/microcode/amd.o
(arch/x86/kernel/cpu/microcode/amd.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:b7c225f752a4128f41d922c0181dcbac4f66eb…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 9f243f9dd2684b4b27f8be0cbed639052bc9b22e
Log excerpt:
=====================================================
arch/x86/kernel/cpu/microcode/amd.c:820:6: error: variable 'equiv_id'
is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
820 | if (x86_family(bsp_cpuid_1_eax) < 0x17) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kernel/cpu/microcode/amd.c:826:31: note: uninitialized use occurs here
826 | return cache_find_patch(uci, equiv_id);
| ^~~~~~~~
arch/x86/kernel/cpu/microcode/amd.c:820:2: note: remove the 'if' if
its condition is always true
820 | if (x86_family(bsp_cpuid_1_eax) < 0x17) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kernel/cpu/microcode/amd.c:816:14: note: initialize the
variable 'equiv_id' to silence this warning
816 | u16 equiv_id;
| ^
| = 0
1 error generated.
=====================================================
# Builds where the incident occurred:
## x86_64_defconfig on (x86_64):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:67c89e7f18018371956c7fda
## x86_64_defconfig+allmodconfig on (x86_64):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:67c89e8318018371956c7fe7
#kernelci issue maestro:b7c225f752a4128f41d922c0181dcbac4f66eb98
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-6.12.y:
---
‘sz’ undeclared (first use in this function); did you mean ‘s8’? in
arch/arm64/mm/hugetlbpage.o (arch/arm64/mm/hugetlbpage.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:6e4ee7f7604c92f861d1cb98d9b4e90c99eb55…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 43639cc57b2273fce42874dd7f6e0b872f4984c5
Log excerpt:
=====================================================
arch/arm64/mm/hugetlbpage.c:386:35: error: ‘sz’ undeclared (first use
in this function); did you mean ‘s8’?
386 | ncontig = num_contig_ptes(sz, &pgsize);
| ^~
| s8
arch/arm64/mm/hugetlbpage.c:386:35: note: each undeclared identifier
is reported only once for each function it appears in
=====================================================
# Builds where the incident occurred:
## defconfig+arm64-chromebook+kcidebug+lab-setup on (arm64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67c89f6118018371956c856c
#kernelci issue maestro:6e4ee7f7604c92f861d1cb98d9b4e90c99eb5508
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
EARLY_WAKEUP_SW_TRIG_*_SET and EARLY_WAKEUP_SW_TRIG_*_CLEAR
registers are only writeable. Attempting to read these registers
during samsung_clk_save() causes a synchronous external abort.
Remove these 8 registers from cmu_top_clk_regs[] array so that
system suspend gets further.
Note: the code path can be exercised using the following command:
echo mem > /sys/power/state
Fixes: 2c597bb7d66a ("clk: samsung: clk-gs101: Add cmu_top, cmu_misc and cmu_apm support")
Signed-off-by: Peter Griffin <peter.griffin(a)linaro.org>
Cc: stable(a)vger.kernel.org
---
Note: to hit this clock driver issue you also need the CPU hotplug
series otherwise system fails earlier offlining CPUs
Link: https://lore.kernel.org/linux-arm-kernel/20241213-contrib-pg-cpu-hotplug-su…
---
drivers/clk/samsung/clk-gs101.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
index 86b39edba122..08b867ae3ed9 100644
--- a/drivers/clk/samsung/clk-gs101.c
+++ b/drivers/clk/samsung/clk-gs101.c
@@ -382,17 +382,9 @@ static const unsigned long cmu_top_clk_regs[] __initconst = {
EARLY_WAKEUP_DPU_DEST,
EARLY_WAKEUP_CSIS_DEST,
EARLY_WAKEUP_SW_TRIG_APM,
- EARLY_WAKEUP_SW_TRIG_APM_SET,
- EARLY_WAKEUP_SW_TRIG_APM_CLEAR,
EARLY_WAKEUP_SW_TRIG_CLUSTER0,
- EARLY_WAKEUP_SW_TRIG_CLUSTER0_SET,
- EARLY_WAKEUP_SW_TRIG_CLUSTER0_CLEAR,
EARLY_WAKEUP_SW_TRIG_DPU,
- EARLY_WAKEUP_SW_TRIG_DPU_SET,
- EARLY_WAKEUP_SW_TRIG_DPU_CLEAR,
EARLY_WAKEUP_SW_TRIG_CSIS,
- EARLY_WAKEUP_SW_TRIG_CSIS_SET,
- EARLY_WAKEUP_SW_TRIG_CSIS_CLEAR,
CLK_CON_MUX_MUX_CLKCMU_BO_BUS,
CLK_CON_MUX_MUX_CLKCMU_BUS0_BUS,
CLK_CON_MUX_MUX_CLKCMU_BUS1_BUS,
---
base-commit: 480112512bd6e770fa1902d01173731d02377705
change-id: 20250303-clk-suspend-fix-81c5d63827e3
Best regards,
--
Peter Griffin <peter.griffin(a)linaro.org>
This patch series attempts to enable the use of xe DRM driver on non-4KiB
kernel page platforms. This involves fixing the ttm/bo interface, as well
as parts of the userspace API to make use of kernel `PAGE_SIZE' for
alignment instead of the assumed `SZ_4K', it also fixes incorrect usage of
`PAGE_SIZE' in the GuC and ring buffer interface code to make sure all
instructions/commands were aligned to 4KiB barriers (per the Programmer's
Manual for the GPUs covered by this DRM driver).
This issue was first discovered and reported by members of the LoongArch
user communities, whose hardware commonly ran on 16KiB-page kernels. The
patch series began on an unassuming branch of a downstream kernel tree
maintained by Shang Yatsen.[^1]
It worked well but remained sparsely documented, a lot of the work done
here relied on Shang Yatsen's original patch.
AOSC OS then picked it up[^2] to provide Intel Xe/Arc support for users of
its LoongArch port, for which I worked extensively on. After months of
positive user feedback and from encouragement from Kexy Biscuit, my
colleague at the community, I decided to examine its potential for
upstreaming, cross-reference kernel and Intel documentation to better
document and revise this patch.
Now that this series has been tested good (for boot up, OpenGL, and
playback of a standardised set of video samples[^3]... with the exception
of the Intel Arc B580, which seems to segfault at intel-media-driver -
iHD_drv_video.so, but strangely, hardware accelerated video playback works
well with Firefox?) on the following platforms (motherboard + GPU model):
- x86-64, 4KiB kernel page:
- MS-7D42 + Intel Arc A580
- LoongArch, 16KiB kernel page:
- XA61200 + GUNNIR DG1 Blue Halberd (Intel DG1)
- XA61200 + ASRock Arc A380 Challenger ITX OC (Intel Arc 380)
- XA61200 + Intel Arc 580
- XA61200 + GUNNIR Intel Arc A750 Photon 8G OC (Intel Arc A750)
- ASUS XC-LS3A6M + GUNNIR Intel Arc B580 INDEX 12G (Intel Arc B580)
On these platforms, basic functionalities tested good but the driver was
unstable with occasional resets (I do suspect however, that this platform
suffers from PCIe coherence issues, as instability only occurs under heavy
VRAM I/O load):
- AArch64, 4KiB/64KiB kernel pages:
- ERUN-FD3000 (Phytium D3000) + GUNNIR Intel Arc A750 Photon 8G OC
(Intel Arc A750)
I think that this patch series is now ready for your comment and review.
Please forgive me if I made any simple mistake or used wrong terminologies,
but I have never worked on a patch for the DRM subsystem and my experience
is still quite thin.
But anyway, just letting you all know that Intel Xe/Arc works on non-4KiB
kernel page platforms (and honestly, it's great to use, especially for
games and media playback)!
[^1]: https://github.com/FanFansfan/loongson-linux/tree/loongarch-xe
[^2]: We maintained Shang Yatsen's patch until our v6.13.3 tree, until
we decided to test and send this series upstream,
https://github.com/AOSC-Tracking/linux/tree/aosc/v6.13.3
[^3]: Delicious hot pot!
https://repo.aosc.io/ahvl/sample-videos-20250223.tar.zst
Suggested-by: Kexy Biscuit <kexybiscuit(a)aosc.io>
Co-developed-by: Shang Yatsen <429839446(a)qq.com>
Signed-off-by: Shang Yatsen <429839446(a)qq.com>
Signed-off-by: Mingcong Bai <jeffbai(a)aosc.io>
---
Mingcong Bai (5):
drm/xe/bo: fix alignment with non-4K kernel page sizes
drm/xe/guc: use SZ_4K for alignment
drm/xe/regs: fix RING_CTL_SIZE(size) calculation
drm/xe: use 4K alignment for cursor jumps
drm/xe/query: use PAGE_SIZE as the minimum page alignment
drivers/gpu/drm/xe/regs/xe_engine_regs.h | 3 +--
drivers/gpu/drm/xe/xe_bo.c | 8 ++++----
drivers/gpu/drm/xe/xe_guc.c | 4 ++--
drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++----------------
drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++----
drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
drivers/gpu/drm/xe/xe_guc_log.c | 4 ++--
drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++--
drivers/gpu/drm/xe/xe_migrate.c | 4 ++--
drivers/gpu/drm/xe/xe_query.c | 2 +-
include/uapi/drm/xe_drm.h | 2 +-
11 files changed, 36 insertions(+), 37 deletions(-)
---
base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
change-id: 20250226-xe-non-4k-fix-6b2eded0a564
Best regards,
--
Mingcong Bai <jeffbai(a)aosc.io>
The return values of `drm_edid_dup()` and `drm_edid_read_ddc()` must
be checked in `tegra_output_connector_get_modes()` to prevent NULL
pointer dereferences. If either function fails, the function should
immediately return 0, indicating that no display modes can be retrieved.
A proper implementation can be found in `vidi_get_modes()`, where the
return values are carefully validated, and the function returns 0 upon
failure.
Fixes: 98365ca74cbf ("drm/tegra: convert to struct drm_edid")
Cc: stable(a)vger.kernel.org # 6.12+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/gpu/drm/tegra/output.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/tegra/output.c b/drivers/gpu/drm/tegra/output.c
index 49e4f63a5550..360c4f83a4f8 100644
--- a/drivers/gpu/drm/tegra/output.c
+++ b/drivers/gpu/drm/tegra/output.c
@@ -39,6 +39,9 @@ int tegra_output_connector_get_modes(struct drm_connector *connector)
else if (output->ddc)
drm_edid = drm_edid_read_ddc(connector, output->ddc);
+ if (!drm_edid)
+ return 0;
+
drm_edid_connector_update(connector, drm_edid);
cec_notifier_set_phys_addr(output->cec,
connector->display_info.source_physical_address);
--
2.42.0.windows.2
Hello,
New build issue found on stable-rc/linux-6.13.y:
---
use of undeclared identifier 'sz' in arch/arm64/mm/hugetlbpage.o
(arch/arm64/mm/hugetlbpage.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:729533e78f26d9eebe477a0e479e082cba03ba…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 1ff63493daf5412bf3df24d33c1687bd29c2a983
Log excerpt:
=====================================================
arch/arm64/mm/hugetlbpage.c:397:28: error: use of undeclared identifier 'sz'
397 | ncontig = num_contig_ptes(sz, &pgsize);
| ^
1 error generated.
=====================================================
# Builds where the incident occurred:
## defconfig+arm64-chromebook+kselftest on (arm64):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:67c8713c18018371956b7bef
#kernelci issue maestro:729533e78f26d9eebe477a0e479e082cba03bafe
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 166ce267ae3f96e439d8ccc838e8ec4d8b4dab73
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025022456-vintage-hunter-6136@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 166ce267ae3f96e439d8ccc838e8ec4d8b4dab73 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak(a)intel.com>
Date: Fri, 14 Feb 2025 16:19:52 +0200
Subject: [PATCH] drm/i915/ddi: Fix HDMI port width programming in DDI_BUF_CTL
Fix the port width programming in the DDI_BUF_CTL register on MTLP+,
where this had an off-by-one error.
Cc: <stable(a)vger.kernel.org> # v6.5+
Fixes: b66a8abaa48a ("drm/i915/display/mtl: Fill port width in DDI_BUF_/TRANS_DDI_FUNC_/PORT_BUF_CTL for HDMI")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250214142001.552916-3-imre.…
(cherry picked from commit b2ecdabe46d23db275f94cd7c46ca414a144818b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c
index acb986bc1f33..2b9240ab547d 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -3487,7 +3487,7 @@ static void intel_ddi_enable_hdmi(struct intel_atomic_state *state,
intel_de_rmw(dev_priv, XELPDP_PORT_BUF_CTL1(dev_priv, port),
XELPDP_PORT_WIDTH_MASK | XELPDP_PORT_REVERSAL, port_buf);
- buf_ctl |= DDI_PORT_WIDTH(lane_count);
+ buf_ctl |= DDI_PORT_WIDTH(crtc_state->lane_count);
if (DISPLAY_VER(dev_priv) >= 20)
buf_ctl |= XE2LPD_DDI_BUF_D2D_LINK_ENABLE;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 765e6c0528fb..786c727aea45 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3633,7 +3633,7 @@ enum skl_power_gate {
#define DDI_BUF_IS_IDLE (1 << 7)
#define DDI_BUF_CTL_TC_PHY_OWNERSHIP REG_BIT(6)
#define DDI_A_4_LANES (1 << 4)
-#define DDI_PORT_WIDTH(width) (((width) - 1) << 1)
+#define DDI_PORT_WIDTH(width) (((width) == 3 ? 4 : ((width) - 1)) << 1)
#define DDI_PORT_WIDTH_MASK (7 << 1)
#define DDI_PORT_WIDTH_SHIFT 1
#define DDI_INIT_DISPLAY_DETECTED (1 << 0)
From: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
Ensure the PHY reset and perst is asserted during power-off to
guarantee it is in a reset state upon repeated power-on calls. This
resolves an issue where the PHY may not properly initialize during
subsequent power-on cycles. Power-on will deassert the reset at the
appropriate time after tuning the PHY parameters.
During suspend/resume cycles, we observed that the PHY PLL failed to
lock during resume when the CPU temperature increased from 65C to 75C.
The observed errors were:
phy phy-32f00000.pcie-phy.3: phy poweron failed --> -110
imx6q-pcie 33800000.pcie: waiting for PHY ready timeout!
imx6q-pcie 33800000.pcie: PM: dpm_run_callback(): genpd_resume_noirq+0x0/0x80 returns -110
imx6q-pcie 33800000.pcie: PM: failed to resume noirq: error -110
This resulted in a complete CPU freeze, which is resolved by ensuring
the PHY is in reset during power-on, thus preventing PHY PLL failures.
Cc: stable(a)vger.kernel.org
Fixes: 1aa97b002258 ("phy: freescale: pcie: Initialize the imx8 pcie standalone phy driver")
Signed-off-by: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
---
drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
index 5b505e34ca364..7355d9921b646 100644
--- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
+++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
@@ -156,6 +156,16 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
return ret;
}
+static int imx8_pcie_phy_power_off(struct phy *phy)
+{
+ struct imx8_pcie_phy *imx8_phy = phy_get_drvdata(phy);
+
+ reset_control_assert(imx8_phy->reset);
+ reset_control_assert(imx8_phy->perst);
+
+ return 0;
+}
+
static int imx8_pcie_phy_init(struct phy *phy)
{
struct imx8_pcie_phy *imx8_phy = phy_get_drvdata(phy);
@@ -176,6 +186,7 @@ static const struct phy_ops imx8_pcie_phy_ops = {
.init = imx8_pcie_phy_init,
.exit = imx8_pcie_phy_exit,
.power_on = imx8_pcie_phy_power_on,
+ .power_off = imx8_pcie_phy_power_off,
.owner = THIS_MODULE,
};
--
2.45.2
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 879f70382ff3e92fc854589ada3453e3f5f5b601
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025022418-frostlike-congrats-bf0d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 879f70382ff3e92fc854589ada3453e3f5f5b601 Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak(a)intel.com>
Date: Fri, 14 Feb 2025 16:19:51 +0200
Subject: [PATCH] drm/i915/dsi: Use TRANS_DDI_FUNC_CTL's own port width macro
The format of the port width field in the DDI_BUF_CTL and the
TRANS_DDI_FUNC_CTL registers are different starting with MTL, where the
x3 lane mode for HDMI FRL has a different encoding in the two registers.
To account for this use the TRANS_DDI_FUNC_CTL's own port width macro.
Cc: <stable(a)vger.kernel.org> # v6.5+
Fixes: b66a8abaa48a ("drm/i915/display/mtl: Fill port width in DDI_BUF_/TRANS_DDI_FUNC_/PORT_BUF_CTL for HDMI")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250214142001.552916-2-imre.…
(cherry picked from commit 76120b3a304aec28fef4910204b81a12db8974da)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/icl_dsi.c b/drivers/gpu/drm/i915/display/icl_dsi.c
index c977b74f82f0..82bf6c654de2 100644
--- a/drivers/gpu/drm/i915/display/icl_dsi.c
+++ b/drivers/gpu/drm/i915/display/icl_dsi.c
@@ -809,8 +809,8 @@ gen11_dsi_configure_transcoder(struct intel_encoder *encoder,
/* select data lane width */
tmp = intel_de_read(display,
TRANS_DDI_FUNC_CTL(display, dsi_trans));
- tmp &= ~DDI_PORT_WIDTH_MASK;
- tmp |= DDI_PORT_WIDTH(intel_dsi->lane_count);
+ tmp &= ~TRANS_DDI_PORT_WIDTH_MASK;
+ tmp |= TRANS_DDI_PORT_WIDTH(intel_dsi->lane_count);
/* select input pipe */
tmp &= ~TRANS_DDI_EDP_INPUT_MASK;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 2b90e7ace79774a3540ce569e000388f8d22c9e0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030422-depict-timing-18a8@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b90e7ace79774a3540ce569e000388f8d22c9e0 Mon Sep 17 00:00:00 2001
From: Peter Jones <pjones(a)redhat.com>
Date: Wed, 26 Feb 2025 15:18:39 -0500
Subject: [PATCH] efi: Don't map the entire mokvar table to determine its size
Currently, when validating the mokvar table, we (re)map the entire table
on each iteration of the loop, adding space as we discover new entries.
If the table grows over a certain size, this fails due to limitations of
early_memmap(), and we get a failure and traceback:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:139 __early_ioremap+0xef/0x220
...
Call Trace:
<TASK>
? __early_ioremap+0xef/0x220
? __warn.cold+0x93/0xfa
? __early_ioremap+0xef/0x220
? report_bug+0xff/0x140
? early_fixup_exception+0x5d/0xb0
? early_idt_handler_common+0x2f/0x3a
? __early_ioremap+0xef/0x220
? efi_mokvar_table_init+0xce/0x1d0
? setup_arch+0x864/0xc10
? start_kernel+0x6b/0xa10
? x86_64_start_reservations+0x24/0x30
? x86_64_start_kernel+0xed/0xf0
? common_startup_64+0x13e/0x141
</TASK>
---[ end trace 0000000000000000 ]---
mokvar: Failed to map EFI MOKvar config table pa=0x7c4c3000, size=265187.
Mapping the entire structure isn't actually necessary, as we don't ever
need more than one entry header mapped at once.
Changes efi_mokvar_table_init() to only map each entry header, not the
entire table, when determining the table size. Since we're not mapping
any data past the variable name, it also changes the code to enforce
that each variable name is NUL terminated, rather than attempting to
verify it in place.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Jones <pjones(a)redhat.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
diff --git a/drivers/firmware/efi/mokvar-table.c b/drivers/firmware/efi/mokvar-table.c
index 5ed0602c2f75..d865cb1dbaad 100644
--- a/drivers/firmware/efi/mokvar-table.c
+++ b/drivers/firmware/efi/mokvar-table.c
@@ -103,7 +103,6 @@ void __init efi_mokvar_table_init(void)
void *va = NULL;
unsigned long cur_offset = 0;
unsigned long offset_limit;
- unsigned long map_size = 0;
unsigned long map_size_needed = 0;
unsigned long size;
struct efi_mokvar_table_entry *mokvar_entry;
@@ -134,48 +133,34 @@ void __init efi_mokvar_table_init(void)
*/
err = -EINVAL;
while (cur_offset + sizeof(*mokvar_entry) <= offset_limit) {
- mokvar_entry = va + cur_offset;
- map_size_needed = cur_offset + sizeof(*mokvar_entry);
- if (map_size_needed > map_size) {
- if (va)
- early_memunmap(va, map_size);
- /*
- * Map a little more than the fixed size entry
- * header, anticipating some data. It's safe to
- * do so as long as we stay within current memory
- * descriptor.
- */
- map_size = min(map_size_needed + 2*EFI_PAGE_SIZE,
- offset_limit);
- va = early_memremap(efi.mokvar_table, map_size);
- if (!va) {
- pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%lu.\n",
- efi.mokvar_table, map_size);
- return;
- }
- mokvar_entry = va + cur_offset;
+ if (va)
+ early_memunmap(va, sizeof(*mokvar_entry));
+ va = early_memremap(efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+ if (!va) {
+ pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%zu.\n",
+ efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+ return;
}
+ mokvar_entry = va;
/* Check for last sentinel entry */
if (mokvar_entry->name[0] == '\0') {
if (mokvar_entry->data_size != 0)
break;
err = 0;
+ map_size_needed = cur_offset + sizeof(*mokvar_entry);
break;
}
- /* Sanity check that the name is null terminated */
- size = strnlen(mokvar_entry->name,
- sizeof(mokvar_entry->name));
- if (size >= sizeof(mokvar_entry->name))
- break;
+ /* Enforce that the name is NUL terminated */
+ mokvar_entry->name[sizeof(mokvar_entry->name) - 1] = '\0';
/* Advance to the next entry */
- cur_offset = map_size_needed + mokvar_entry->data_size;
+ cur_offset += sizeof(*mokvar_entry) + mokvar_entry->data_size;
}
if (va)
- early_memunmap(va, map_size);
+ early_memunmap(va, sizeof(*mokvar_entry));
if (err) {
pr_err("EFI MOKvar config table is not valid\n");
return;
Hi Greg, hi Sasha
A while back the following regression after 4bce37a68ff8 ("mips/mm:
Convert to using lock_mm_and_find_vma()") was reported:
https://lore.kernel.org/all/75e9fd7b08562ad9b456a5bdaacb7cc220311cc9.camel@…
affecting mips64el. This was later on fixed by 8fa507083388
("mm/memory: Use exception ip to search exception tables") in 6.8-rc5
and which got backported to 6.7.6 and 6.6.18.
The breaking commit was part of a series covering a security fix
(CVE-2023-3269), and landed in 6.5-rc1 and backported to 6.4.1, 6.3.11
and 6.1.37.
So far 6.1.y remained unfixed and in fact in Debian we got reports
about this issue seen on the build infrastructure when building
various packages, details are in:
https://bugs.debian.org/1086028https://bugs.debian.org/1087809https://bugs.debian.org/1093200
The fix probably did not got backported as there is one dependency
missing which was not CC'ed for stable afaics.
Thus, can you please cherry-pick the following two commits please as
well for 6.1.y?
11ba1728be3e ("ptrace: Introduce exception_ip arch hook")
8fa507083388 ("mm/memory: Use exception ip to search exception tables")
Sergei Golovan confirmed as well by testing that this fixes the seen
issue as well in 6.1.y, cf. https://bugs.debian.org/1086028#95
Thanks in advance already.
Regards,
Salvatore
Two rtla commits that fix a bug in setting OSNOISE_WORKLOAD (see
the patches for details) were improperly backported to 6.6-stable,
referencing non-existent field params->kernel_workload.
Revert the broken backports and backport this properly, using
!params->user_hist and !params->user_top instead of the non-existent
params->user_workload.
The patchset was tested to build and fix the bug.
Tomas Glozar (4):
Revert "rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads"
Revert "rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads"
rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads
rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads
tools/tracing/rtla/src/timerlat_hist.c | 2 +-
tools/tracing/rtla/src/timerlat_top.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--
2.48.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 221cd51efe4565501a3dbf04cc011b537dcce7fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021036-footwork-entryway-f39c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 221cd51efe4565501a3dbf04cc011b537dcce7fb Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Tue, 3 Dec 2024 21:20:10 +0000
Subject: [PATCH] media: uvcvideo: Remove dangling pointers
When an async control is written, we copy a pointer to the file handle
that started the operation. That pointer will be used when the device is
done. Which could be anytime in the future.
If the user closes that file descriptor, its structure will be freed,
and there will be one dangling pointer per pending async control, that
the driver will try to use.
Clean all the dangling pointers during release().
To avoid adding a performance penalty in the most common case (no async
operation), a counter has been introduced with some logic to make sure
that it is properly handled.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-3-26c867231118@chromium…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index b05b84887e51..4837d8df9c03 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1579,6 +1579,40 @@ static void uvc_ctrl_send_slave_event(struct uvc_video_chain *chain,
uvc_ctrl_send_event(chain, handle, ctrl, mapping, val, changes);
}
+static void uvc_ctrl_set_handle(struct uvc_fh *handle, struct uvc_control *ctrl,
+ struct uvc_fh *new_handle)
+{
+ lockdep_assert_held(&handle->chain->ctrl_mutex);
+
+ if (new_handle) {
+ if (ctrl->handle)
+ dev_warn_ratelimited(&handle->stream->dev->udev->dev,
+ "UVC non compliance: Setting an async control with a pending operation.");
+
+ if (new_handle == ctrl->handle)
+ return;
+
+ if (ctrl->handle) {
+ WARN_ON(!ctrl->handle->pending_async_ctrls);
+ if (ctrl->handle->pending_async_ctrls)
+ ctrl->handle->pending_async_ctrls--;
+ }
+
+ ctrl->handle = new_handle;
+ handle->pending_async_ctrls++;
+ return;
+ }
+
+ /* Cannot clear the handle for a control not owned by us.*/
+ if (WARN_ON(ctrl->handle != handle))
+ return;
+
+ ctrl->handle = NULL;
+ if (WARN_ON(!handle->pending_async_ctrls))
+ return;
+ handle->pending_async_ctrls--;
+}
+
void uvc_ctrl_status_event(struct uvc_video_chain *chain,
struct uvc_control *ctrl, const u8 *data)
{
@@ -1589,7 +1623,8 @@ void uvc_ctrl_status_event(struct uvc_video_chain *chain,
mutex_lock(&chain->ctrl_mutex);
handle = ctrl->handle;
- ctrl->handle = NULL;
+ if (handle)
+ uvc_ctrl_set_handle(handle, ctrl, NULL);
list_for_each_entry(mapping, &ctrl->info.mappings, list) {
s32 value = __uvc_ctrl_get_value(mapping, data);
@@ -1863,7 +1898,7 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
if (!rollback && handle &&
ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
- ctrl->handle = handle;
+ uvc_ctrl_set_handle(handle, ctrl, handle);
}
return 0;
@@ -2772,6 +2807,26 @@ int uvc_ctrl_init_device(struct uvc_device *dev)
return 0;
}
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle)
+{
+ struct uvc_entity *entity;
+
+ guard(mutex)(&handle->chain->ctrl_mutex);
+
+ if (!handle->pending_async_ctrls)
+ return;
+
+ list_for_each_entry(entity, &handle->chain->dev->entities, list) {
+ for (unsigned int i = 0; i < entity->ncontrols; ++i) {
+ if (entity->controls[i].handle != handle)
+ continue;
+ uvc_ctrl_set_handle(handle, &entity->controls[i], NULL);
+ }
+ }
+
+ WARN_ON(handle->pending_async_ctrls);
+}
+
/*
* Cleanup device controls.
*/
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index dee6feeba274..93c6cdb23881 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -671,6 +671,8 @@ static int uvc_v4l2_release(struct file *file)
uvc_dbg(stream->dev, CALLS, "%s\n", __func__);
+ uvc_ctrl_cleanup_fh(handle);
+
/* Only free resources if this is a privileged handle. */
if (uvc_has_privileges(handle))
uvc_queue_release(&stream->queue);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 965a789ed03e..5690cfd61e23 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -338,7 +338,11 @@ struct uvc_video_chain {
struct uvc_entity *processing; /* Processing unit */
struct uvc_entity *selector; /* Selector unit */
- struct mutex ctrl_mutex; /* Protects ctrl.info */
+ struct mutex ctrl_mutex; /*
+ * Protects ctrl.info,
+ * ctrl.handle and
+ * uvc_fh.pending_async_ctrls
+ */
struct v4l2_prio_state prio; /* V4L2 priority state */
u32 caps; /* V4L2 chain-wide caps */
@@ -613,6 +617,7 @@ struct uvc_fh {
struct uvc_video_chain *chain;
struct uvc_streaming *stream;
enum uvc_handle_state state;
+ unsigned int pending_async_ctrls;
};
struct uvc_driver {
@@ -798,6 +803,8 @@ int uvc_ctrl_is_accessible(struct uvc_video_chain *chain, u32 v4l2_id,
int uvc_xu_ctrl_query(struct uvc_video_chain *chain,
struct uvc_xu_control_query *xqry);
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle);
+
/* Utility functions */
struct usb_host_endpoint *uvc_find_endpoint(struct usb_host_interface *alts,
u8 epaddr);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 221cd51efe4565501a3dbf04cc011b537dcce7fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021036-shrouded-exposable-96da@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 221cd51efe4565501a3dbf04cc011b537dcce7fb Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Tue, 3 Dec 2024 21:20:10 +0000
Subject: [PATCH] media: uvcvideo: Remove dangling pointers
When an async control is written, we copy a pointer to the file handle
that started the operation. That pointer will be used when the device is
done. Which could be anytime in the future.
If the user closes that file descriptor, its structure will be freed,
and there will be one dangling pointer per pending async control, that
the driver will try to use.
Clean all the dangling pointers during release().
To avoid adding a performance penalty in the most common case (no async
operation), a counter has been introduced with some logic to make sure
that it is properly handled.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-3-26c867231118@chromium…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index b05b84887e51..4837d8df9c03 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1579,6 +1579,40 @@ static void uvc_ctrl_send_slave_event(struct uvc_video_chain *chain,
uvc_ctrl_send_event(chain, handle, ctrl, mapping, val, changes);
}
+static void uvc_ctrl_set_handle(struct uvc_fh *handle, struct uvc_control *ctrl,
+ struct uvc_fh *new_handle)
+{
+ lockdep_assert_held(&handle->chain->ctrl_mutex);
+
+ if (new_handle) {
+ if (ctrl->handle)
+ dev_warn_ratelimited(&handle->stream->dev->udev->dev,
+ "UVC non compliance: Setting an async control with a pending operation.");
+
+ if (new_handle == ctrl->handle)
+ return;
+
+ if (ctrl->handle) {
+ WARN_ON(!ctrl->handle->pending_async_ctrls);
+ if (ctrl->handle->pending_async_ctrls)
+ ctrl->handle->pending_async_ctrls--;
+ }
+
+ ctrl->handle = new_handle;
+ handle->pending_async_ctrls++;
+ return;
+ }
+
+ /* Cannot clear the handle for a control not owned by us.*/
+ if (WARN_ON(ctrl->handle != handle))
+ return;
+
+ ctrl->handle = NULL;
+ if (WARN_ON(!handle->pending_async_ctrls))
+ return;
+ handle->pending_async_ctrls--;
+}
+
void uvc_ctrl_status_event(struct uvc_video_chain *chain,
struct uvc_control *ctrl, const u8 *data)
{
@@ -1589,7 +1623,8 @@ void uvc_ctrl_status_event(struct uvc_video_chain *chain,
mutex_lock(&chain->ctrl_mutex);
handle = ctrl->handle;
- ctrl->handle = NULL;
+ if (handle)
+ uvc_ctrl_set_handle(handle, ctrl, NULL);
list_for_each_entry(mapping, &ctrl->info.mappings, list) {
s32 value = __uvc_ctrl_get_value(mapping, data);
@@ -1863,7 +1898,7 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
if (!rollback && handle &&
ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
- ctrl->handle = handle;
+ uvc_ctrl_set_handle(handle, ctrl, handle);
}
return 0;
@@ -2772,6 +2807,26 @@ int uvc_ctrl_init_device(struct uvc_device *dev)
return 0;
}
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle)
+{
+ struct uvc_entity *entity;
+
+ guard(mutex)(&handle->chain->ctrl_mutex);
+
+ if (!handle->pending_async_ctrls)
+ return;
+
+ list_for_each_entry(entity, &handle->chain->dev->entities, list) {
+ for (unsigned int i = 0; i < entity->ncontrols; ++i) {
+ if (entity->controls[i].handle != handle)
+ continue;
+ uvc_ctrl_set_handle(handle, &entity->controls[i], NULL);
+ }
+ }
+
+ WARN_ON(handle->pending_async_ctrls);
+}
+
/*
* Cleanup device controls.
*/
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index dee6feeba274..93c6cdb23881 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -671,6 +671,8 @@ static int uvc_v4l2_release(struct file *file)
uvc_dbg(stream->dev, CALLS, "%s\n", __func__);
+ uvc_ctrl_cleanup_fh(handle);
+
/* Only free resources if this is a privileged handle. */
if (uvc_has_privileges(handle))
uvc_queue_release(&stream->queue);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 965a789ed03e..5690cfd61e23 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -338,7 +338,11 @@ struct uvc_video_chain {
struct uvc_entity *processing; /* Processing unit */
struct uvc_entity *selector; /* Selector unit */
- struct mutex ctrl_mutex; /* Protects ctrl.info */
+ struct mutex ctrl_mutex; /*
+ * Protects ctrl.info,
+ * ctrl.handle and
+ * uvc_fh.pending_async_ctrls
+ */
struct v4l2_prio_state prio; /* V4L2 priority state */
u32 caps; /* V4L2 chain-wide caps */
@@ -613,6 +617,7 @@ struct uvc_fh {
struct uvc_video_chain *chain;
struct uvc_streaming *stream;
enum uvc_handle_state state;
+ unsigned int pending_async_ctrls;
};
struct uvc_driver {
@@ -798,6 +803,8 @@ int uvc_ctrl_is_accessible(struct uvc_video_chain *chain, u32 v4l2_id,
int uvc_xu_ctrl_query(struct uvc_video_chain *chain,
struct uvc_xu_control_query *xqry);
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle);
+
/* Utility functions */
struct usb_host_endpoint *uvc_find_endpoint(struct usb_host_interface *alts,
u8 epaddr);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 221cd51efe4565501a3dbf04cc011b537dcce7fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021037-enactment-bartender-d80d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 221cd51efe4565501a3dbf04cc011b537dcce7fb Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Tue, 3 Dec 2024 21:20:10 +0000
Subject: [PATCH] media: uvcvideo: Remove dangling pointers
When an async control is written, we copy a pointer to the file handle
that started the operation. That pointer will be used when the device is
done. Which could be anytime in the future.
If the user closes that file descriptor, its structure will be freed,
and there will be one dangling pointer per pending async control, that
the driver will try to use.
Clean all the dangling pointers during release().
To avoid adding a performance penalty in the most common case (no async
operation), a counter has been introduced with some logic to make sure
that it is properly handled.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-3-26c867231118@chromium…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index b05b84887e51..4837d8df9c03 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1579,6 +1579,40 @@ static void uvc_ctrl_send_slave_event(struct uvc_video_chain *chain,
uvc_ctrl_send_event(chain, handle, ctrl, mapping, val, changes);
}
+static void uvc_ctrl_set_handle(struct uvc_fh *handle, struct uvc_control *ctrl,
+ struct uvc_fh *new_handle)
+{
+ lockdep_assert_held(&handle->chain->ctrl_mutex);
+
+ if (new_handle) {
+ if (ctrl->handle)
+ dev_warn_ratelimited(&handle->stream->dev->udev->dev,
+ "UVC non compliance: Setting an async control with a pending operation.");
+
+ if (new_handle == ctrl->handle)
+ return;
+
+ if (ctrl->handle) {
+ WARN_ON(!ctrl->handle->pending_async_ctrls);
+ if (ctrl->handle->pending_async_ctrls)
+ ctrl->handle->pending_async_ctrls--;
+ }
+
+ ctrl->handle = new_handle;
+ handle->pending_async_ctrls++;
+ return;
+ }
+
+ /* Cannot clear the handle for a control not owned by us.*/
+ if (WARN_ON(ctrl->handle != handle))
+ return;
+
+ ctrl->handle = NULL;
+ if (WARN_ON(!handle->pending_async_ctrls))
+ return;
+ handle->pending_async_ctrls--;
+}
+
void uvc_ctrl_status_event(struct uvc_video_chain *chain,
struct uvc_control *ctrl, const u8 *data)
{
@@ -1589,7 +1623,8 @@ void uvc_ctrl_status_event(struct uvc_video_chain *chain,
mutex_lock(&chain->ctrl_mutex);
handle = ctrl->handle;
- ctrl->handle = NULL;
+ if (handle)
+ uvc_ctrl_set_handle(handle, ctrl, NULL);
list_for_each_entry(mapping, &ctrl->info.mappings, list) {
s32 value = __uvc_ctrl_get_value(mapping, data);
@@ -1863,7 +1898,7 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
if (!rollback && handle &&
ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
- ctrl->handle = handle;
+ uvc_ctrl_set_handle(handle, ctrl, handle);
}
return 0;
@@ -2772,6 +2807,26 @@ int uvc_ctrl_init_device(struct uvc_device *dev)
return 0;
}
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle)
+{
+ struct uvc_entity *entity;
+
+ guard(mutex)(&handle->chain->ctrl_mutex);
+
+ if (!handle->pending_async_ctrls)
+ return;
+
+ list_for_each_entry(entity, &handle->chain->dev->entities, list) {
+ for (unsigned int i = 0; i < entity->ncontrols; ++i) {
+ if (entity->controls[i].handle != handle)
+ continue;
+ uvc_ctrl_set_handle(handle, &entity->controls[i], NULL);
+ }
+ }
+
+ WARN_ON(handle->pending_async_ctrls);
+}
+
/*
* Cleanup device controls.
*/
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index dee6feeba274..93c6cdb23881 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -671,6 +671,8 @@ static int uvc_v4l2_release(struct file *file)
uvc_dbg(stream->dev, CALLS, "%s\n", __func__);
+ uvc_ctrl_cleanup_fh(handle);
+
/* Only free resources if this is a privileged handle. */
if (uvc_has_privileges(handle))
uvc_queue_release(&stream->queue);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 965a789ed03e..5690cfd61e23 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -338,7 +338,11 @@ struct uvc_video_chain {
struct uvc_entity *processing; /* Processing unit */
struct uvc_entity *selector; /* Selector unit */
- struct mutex ctrl_mutex; /* Protects ctrl.info */
+ struct mutex ctrl_mutex; /*
+ * Protects ctrl.info,
+ * ctrl.handle and
+ * uvc_fh.pending_async_ctrls
+ */
struct v4l2_prio_state prio; /* V4L2 priority state */
u32 caps; /* V4L2 chain-wide caps */
@@ -613,6 +617,7 @@ struct uvc_fh {
struct uvc_video_chain *chain;
struct uvc_streaming *stream;
enum uvc_handle_state state;
+ unsigned int pending_async_ctrls;
};
struct uvc_driver {
@@ -798,6 +803,8 @@ int uvc_ctrl_is_accessible(struct uvc_video_chain *chain, u32 v4l2_id,
int uvc_xu_ctrl_query(struct uvc_video_chain *chain,
struct uvc_xu_control_query *xqry);
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle);
+
/* Utility functions */
struct usb_host_endpoint *uvc_find_endpoint(struct usb_host_interface *alts,
u8 epaddr);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 221cd51efe4565501a3dbf04cc011b537dcce7fb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021037-broadcast-cradling-b8a6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 221cd51efe4565501a3dbf04cc011b537dcce7fb Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Tue, 3 Dec 2024 21:20:10 +0000
Subject: [PATCH] media: uvcvideo: Remove dangling pointers
When an async control is written, we copy a pointer to the file handle
that started the operation. That pointer will be used when the device is
done. Which could be anytime in the future.
If the user closes that file descriptor, its structure will be freed,
and there will be one dangling pointer per pending async control, that
the driver will try to use.
Clean all the dangling pointers during release().
To avoid adding a performance penalty in the most common case (no async
operation), a counter has been introduced with some logic to make sure
that it is properly handled.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Link: https://lore.kernel.org/r/20241203-uvc-fix-async-v6-3-26c867231118@chromium…
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index b05b84887e51..4837d8df9c03 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1579,6 +1579,40 @@ static void uvc_ctrl_send_slave_event(struct uvc_video_chain *chain,
uvc_ctrl_send_event(chain, handle, ctrl, mapping, val, changes);
}
+static void uvc_ctrl_set_handle(struct uvc_fh *handle, struct uvc_control *ctrl,
+ struct uvc_fh *new_handle)
+{
+ lockdep_assert_held(&handle->chain->ctrl_mutex);
+
+ if (new_handle) {
+ if (ctrl->handle)
+ dev_warn_ratelimited(&handle->stream->dev->udev->dev,
+ "UVC non compliance: Setting an async control with a pending operation.");
+
+ if (new_handle == ctrl->handle)
+ return;
+
+ if (ctrl->handle) {
+ WARN_ON(!ctrl->handle->pending_async_ctrls);
+ if (ctrl->handle->pending_async_ctrls)
+ ctrl->handle->pending_async_ctrls--;
+ }
+
+ ctrl->handle = new_handle;
+ handle->pending_async_ctrls++;
+ return;
+ }
+
+ /* Cannot clear the handle for a control not owned by us.*/
+ if (WARN_ON(ctrl->handle != handle))
+ return;
+
+ ctrl->handle = NULL;
+ if (WARN_ON(!handle->pending_async_ctrls))
+ return;
+ handle->pending_async_ctrls--;
+}
+
void uvc_ctrl_status_event(struct uvc_video_chain *chain,
struct uvc_control *ctrl, const u8 *data)
{
@@ -1589,7 +1623,8 @@ void uvc_ctrl_status_event(struct uvc_video_chain *chain,
mutex_lock(&chain->ctrl_mutex);
handle = ctrl->handle;
- ctrl->handle = NULL;
+ if (handle)
+ uvc_ctrl_set_handle(handle, ctrl, NULL);
list_for_each_entry(mapping, &ctrl->info.mappings, list) {
s32 value = __uvc_ctrl_get_value(mapping, data);
@@ -1863,7 +1898,7 @@ static int uvc_ctrl_commit_entity(struct uvc_device *dev,
if (!rollback && handle &&
ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
- ctrl->handle = handle;
+ uvc_ctrl_set_handle(handle, ctrl, handle);
}
return 0;
@@ -2772,6 +2807,26 @@ int uvc_ctrl_init_device(struct uvc_device *dev)
return 0;
}
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle)
+{
+ struct uvc_entity *entity;
+
+ guard(mutex)(&handle->chain->ctrl_mutex);
+
+ if (!handle->pending_async_ctrls)
+ return;
+
+ list_for_each_entry(entity, &handle->chain->dev->entities, list) {
+ for (unsigned int i = 0; i < entity->ncontrols; ++i) {
+ if (entity->controls[i].handle != handle)
+ continue;
+ uvc_ctrl_set_handle(handle, &entity->controls[i], NULL);
+ }
+ }
+
+ WARN_ON(handle->pending_async_ctrls);
+}
+
/*
* Cleanup device controls.
*/
diff --git a/drivers/media/usb/uvc/uvc_v4l2.c b/drivers/media/usb/uvc/uvc_v4l2.c
index dee6feeba274..93c6cdb23881 100644
--- a/drivers/media/usb/uvc/uvc_v4l2.c
+++ b/drivers/media/usb/uvc/uvc_v4l2.c
@@ -671,6 +671,8 @@ static int uvc_v4l2_release(struct file *file)
uvc_dbg(stream->dev, CALLS, "%s\n", __func__);
+ uvc_ctrl_cleanup_fh(handle);
+
/* Only free resources if this is a privileged handle. */
if (uvc_has_privileges(handle))
uvc_queue_release(&stream->queue);
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index 965a789ed03e..5690cfd61e23 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -338,7 +338,11 @@ struct uvc_video_chain {
struct uvc_entity *processing; /* Processing unit */
struct uvc_entity *selector; /* Selector unit */
- struct mutex ctrl_mutex; /* Protects ctrl.info */
+ struct mutex ctrl_mutex; /*
+ * Protects ctrl.info,
+ * ctrl.handle and
+ * uvc_fh.pending_async_ctrls
+ */
struct v4l2_prio_state prio; /* V4L2 priority state */
u32 caps; /* V4L2 chain-wide caps */
@@ -613,6 +617,7 @@ struct uvc_fh {
struct uvc_video_chain *chain;
struct uvc_streaming *stream;
enum uvc_handle_state state;
+ unsigned int pending_async_ctrls;
};
struct uvc_driver {
@@ -798,6 +803,8 @@ int uvc_ctrl_is_accessible(struct uvc_video_chain *chain, u32 v4l2_id,
int uvc_xu_ctrl_query(struct uvc_video_chain *chain,
struct uvc_xu_control_query *xqry);
+void uvc_ctrl_cleanup_fh(struct uvc_fh *handle);
+
/* Utility functions */
struct usb_host_endpoint *uvc_find_endpoint(struct usb_host_interface *alts,
u8 epaddr);
Getting / Setting the frame interval using the V4L2 subdev pad ops
get_frame_interval/set_frame_interval causes a deadlock, as the
subdev state is locked in the [1] but also in the driver itself.
In [2] it's described that the caller is responsible to acquire and
release the lock in this case. Therefore, acquiring the lock in the
driver is wrong.
Remove the lock acquisitions/releases from mt9m114_ifp_get_frame_interval()
and mt9m114_ifp_set_frame_interval().
[1] drivers/media/v4l2-core/v4l2-subdev.c - line 1129
[2] Documentation/driver-api/media/v4l2-subdev.rst
Fixes: 24d756e914fc ("media: i2c: Add driver for onsemi MT9M114 camera sensor")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mathis Foerst <mathis.foerst(a)mt.com>
---
drivers/media/i2c/mt9m114.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/drivers/media/i2c/mt9m114.c b/drivers/media/i2c/mt9m114.c
index 65b9124e464f..79c97ab19be9 100644
--- a/drivers/media/i2c/mt9m114.c
+++ b/drivers/media/i2c/mt9m114.c
@@ -1644,13 +1644,9 @@ static int mt9m114_ifp_get_frame_interval(struct v4l2_subdev *sd,
if (interval->which != V4L2_SUBDEV_FORMAT_ACTIVE)
return -EINVAL;
- mutex_lock(sensor->ifp.hdl.lock);
-
ival->numerator = 1;
ival->denominator = sensor->ifp.frame_rate;
- mutex_unlock(sensor->ifp.hdl.lock);
-
return 0;
}
@@ -1669,8 +1665,6 @@ static int mt9m114_ifp_set_frame_interval(struct v4l2_subdev *sd,
if (interval->which != V4L2_SUBDEV_FORMAT_ACTIVE)
return -EINVAL;
- mutex_lock(sensor->ifp.hdl.lock);
-
if (ival->numerator != 0 && ival->denominator != 0)
sensor->ifp.frame_rate = min_t(unsigned int,
ival->denominator / ival->numerator,
@@ -1684,8 +1678,6 @@ static int mt9m114_ifp_set_frame_interval(struct v4l2_subdev *sd,
if (sensor->streaming)
ret = mt9m114_set_frame_rate(sensor);
- mutex_unlock(sensor->ifp.hdl.lock);
-
return ret;
}
--
2.34.1
No upstream commit exists for this commit.
The issue was introduced with backporting upstream commit 091c1dd2d4df
("mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA
in a MM").
The backport incorrectly added unlock logic to a path where
mmap_read_lock() wasn't acquired, creating lock imbalance when no VMAs
are found.
This fixes the report:
WARNING: bad unlock balance detected!
6.6.79 #1 Not tainted
-------------------------------------
repro/9655 is trying to release lock (&mm->mmap_lock) at:
[<ffffffff81daa36f>] mmap_read_unlock include/linux/mmap_lock.h:173 [inline]
[<ffffffff81daa36f>] do_migrate_pages+0x59f/0x700 mm/mempolicy.c:1196
but there are no more locks to release!
other info that might help us debug this:
no locks held by repro/9655.
stack backtrace:
CPU: 1 PID: 9655 Comm: a Not tainted 6.6.79 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd5/0x1b0 lib/dump_stack.c:106
__lock_release kernel/locking/lockdep.c:5431 [inline]
lock_release+0x4b1/0x680 kernel/locking/lockdep.c:5774
up_read+0x12/0x20 kernel/locking/rwsem.c:1615
mmap_read_unlock include/linux/mmap_lock.h:173 [inline]
do_migrate_pages+0x59f/0x700 mm/mempolicy.c:1196
kernel_migrate_pages+0x59b/0x780 mm/mempolicy.c:1665
__do_sys_migrate_pages mm/mempolicy.c:1684 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1680 [inline]
__x64_sys_migrate_pages+0x92/0xf0 mm/mempolicy.c:1680
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: a13b2b9b0b0b ("mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM")
Signed-off-by: Alexey Panov <apanov(a)astralinux.ru>
---
mm/mempolicy.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 94c74c594d10..1eccbed2fd06 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1072,7 +1072,6 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
vma = find_vma(mm, 0);
if (unlikely(!vma)) {
- mmap_read_unlock(mm);
return 0;
}
--
2.30.2
Hi,
We are offering the visitors contact list The Intec International Trade Fair for Machine Tools, Manufacturing and Automation (INTEC 2025)
We currently have 17,207+ verified visitor contacts.
Each contact contains: Contact name, Job Title, Business Name, Physical Address, Phone Numbers, Official Email address and many more…
Let me know if you are Interested so that I can share the Pricing for the same.
Additionally, we can also provide the Exhibitors list upon request.
Kind Regards,
Davin Roos
Sr. Demand Generation
If you do not wish to receive this newsletter reply as “Unfollow”
The l12b and l15b supplies are used by components that are not (fully)
described (and some never will be) and must never be disabled.
Mark the regulators as always-on to prevent them from being disabled,
for example, when consumers probe defer or suspend.
Note that these supplies currently have no consumers described in
mainline.
Fixes: f5b788d0e8cd ("arm64: dts: qcom: Add support for X1-based Dell XPS 13 9345")
Cc: stable(a)vger.kernel.org # 6.13
Cc: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
index 86e87f03b0ec..90f588ed7d63 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-dell-xps13-9345.dts
@@ -359,6 +359,7 @@ vreg_l12b_1p2: ldo12 {
regulator-min-microvolt = <1200000>;
regulator-max-microvolt = <1200000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l13b_3p0: ldo13 {
@@ -380,6 +381,7 @@ vreg_l15b_1p8: ldo15 {
regulator-min-microvolt = <1800000>;
regulator-max-microvolt = <1800000>;
regulator-initial-mode = <RPMH_REGULATOR_MODE_HPM>;
+ regulator-always-on;
};
vreg_l17b_2p5: ldo17 {
--
2.45.3
From: Mario Limonciello <mario.limonciello(a)amd.com>
[Why]
A slab-use-after-free is reported when HDCP is destroyed but the
property_validate_dwork queue is still running.
[How]
Cancel the delayed work when destroying workqueue.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4006
Fixes: da3fd7ac0bcf ("drm/amd/display: Update CP property based on HW query")
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
index 8238cfd276be..6a4b5f4d8a9d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
@@ -455,6 +455,7 @@ void hdcp_destroy(struct kobject *kobj, struct hdcp_workqueue *hdcp_work)
for (i = 0; i < hdcp_work->max_link; i++) {
cancel_delayed_work_sync(&hdcp_work[i].callback_dwork);
cancel_delayed_work_sync(&hdcp_work[i].watchdog_timer_dwork);
+ cancel_delayed_work_sync(&hdcp_work[i].property_validate_dwork);
}
sysfs_remove_bin_file(kobj, &hdcp_work[0].attr);
--
2.34.1
From: Leo Li <sunpeng.li(a)amd.com>
[Why]
It seems HPD interrupts are enabled by default for all connectors, even
if the hpd source isn't valid. An eDP for example, does not have a valid
hpd source (but does have a valid hpdrx source; see construct_phy()).
Thus, eDPs should have their hpd interrupt disabled.
In the past, this wasn't really an issue. Although the driver gets
interrupted, then acks by writing to hw registers, there weren't any
subscribed handlers that did anything meaningful (see
register_hpd_handlers()).
But things changed with the introduction of IPS. s2idle requires that
the driver allows IPS for DMUB fw to put hw to sleep. Since register
access requires hw to be awake, the driver will block IPS entry to do
so. And no IPS means no hw sleep during s2idle.
This was the observation on DCN35 systems with an eDP. During suspend,
the eDP toggled its hpd pin as part of the panel power down sequence.
The driver was then interrupted, and acked by writing to registers,
blocking IPS entry.
[How]
Since DC marks eDP connections as having invalid hpd sources (see
construct_phy()), DM should disable them at the hw level. Do so in
amdgpu_dm_hpd_init() by disabling all hpd ints first, then selectively
enabling ones for connectors that have valid hpd sources.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Leo Li <sunpeng.li(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
---
.../drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c | 64 +++++++++++++------
1 file changed, 45 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
index 2b63cbab0e87..b61e210f6246 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
@@ -890,8 +890,16 @@ void amdgpu_dm_hpd_init(struct amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
struct drm_connector *connector;
struct drm_connector_list_iter iter;
+ int irq_type;
int i;
+ /* First, clear all hpd and hpdrx interrupts */
+ for (i = DC_IRQ_SOURCE_HPD1; i <= DC_IRQ_SOURCE_HPD6RX; i++) {
+ if (!dc_interrupt_set(adev->dm.dc, i, false))
+ drm_err(dev, "Failed to clear hpd(rx) source=%d on init\n",
+ i);
+ }
+
drm_connector_list_iter_begin(dev, &iter);
drm_for_each_connector_iter(connector, &iter) {
struct amdgpu_dm_connector *amdgpu_dm_connector;
@@ -904,10 +912,31 @@ void amdgpu_dm_hpd_init(struct amdgpu_device *adev)
dc_link = amdgpu_dm_connector->dc_link;
+ /*
+ * Get a base driver irq reference for hpd ints for the lifetime
+ * of dm. Note that only hpd interrupt types are registered with
+ * base driver; hpd_rx types aren't. IOW, amdgpu_irq_get/put on
+ * hpd_rx isn't available. DM currently controls hpd_rx
+ * explicitly with dc_interrupt_set()
+ */
if (dc_link->irq_source_hpd != DC_IRQ_SOURCE_INVALID) {
- dc_interrupt_set(adev->dm.dc,
- dc_link->irq_source_hpd,
- true);
+ irq_type = dc_link->irq_source_hpd - DC_IRQ_SOURCE_HPD1;
+ /*
+ * TODO: There's a mismatch between mode_info.num_hpd
+ * and what bios reports as the # of connectors with hpd
+ * sources. Since the # of hpd source types registered
+ * with base driver == mode_info.num_hpd, we have to
+ * fallback to dc_interrupt_set for the remaining types.
+ */
+ if (irq_type < adev->mode_info.num_hpd) {
+ if (amdgpu_irq_get(adev, &adev->hpd_irq, irq_type))
+ drm_err(dev, "DM_IRQ: Failed get HPD for source=%d)!\n",
+ dc_link->irq_source_hpd);
+ } else {
+ dc_interrupt_set(adev->dm.dc,
+ dc_link->irq_source_hpd,
+ true);
+ }
}
if (dc_link->irq_source_hpd_rx != DC_IRQ_SOURCE_INVALID) {
@@ -917,12 +946,6 @@ void amdgpu_dm_hpd_init(struct amdgpu_device *adev)
}
}
drm_connector_list_iter_end(&iter);
-
- /* Update reference counts for HPDs */
- for (i = DC_IRQ_SOURCE_HPD1; i <= adev->mode_info.num_hpd; i++) {
- if (amdgpu_irq_get(adev, &adev->hpd_irq, i - DC_IRQ_SOURCE_HPD1))
- drm_err(dev, "DM_IRQ: Failed get HPD for source=%d)!\n", i);
- }
}
/**
@@ -938,7 +961,7 @@ void amdgpu_dm_hpd_fini(struct amdgpu_device *adev)
struct drm_device *dev = adev_to_drm(adev);
struct drm_connector *connector;
struct drm_connector_list_iter iter;
- int i;
+ int irq_type;
drm_connector_list_iter_begin(dev, &iter);
drm_for_each_connector_iter(connector, &iter) {
@@ -952,9 +975,18 @@ void amdgpu_dm_hpd_fini(struct amdgpu_device *adev)
dc_link = amdgpu_dm_connector->dc_link;
if (dc_link->irq_source_hpd != DC_IRQ_SOURCE_INVALID) {
- dc_interrupt_set(adev->dm.dc,
- dc_link->irq_source_hpd,
- false);
+ irq_type = dc_link->irq_source_hpd - DC_IRQ_SOURCE_HPD1;
+
+ /* TODO: See same TODO in amdgpu_dm_hpd_init() */
+ if (irq_type < adev->mode_info.num_hpd) {
+ if (amdgpu_irq_put(adev, &adev->hpd_irq, irq_type))
+ drm_err(dev, "DM_IRQ: Failed put HPD for source=%d!\n",
+ dc_link->irq_source_hpd);
+ } else {
+ dc_interrupt_set(adev->dm.dc,
+ dc_link->irq_source_hpd,
+ false);
+ }
}
if (dc_link->irq_source_hpd_rx != DC_IRQ_SOURCE_INVALID) {
@@ -964,10 +996,4 @@ void amdgpu_dm_hpd_fini(struct amdgpu_device *adev)
}
}
drm_connector_list_iter_end(&iter);
-
- /* Update reference counts for HPDs */
- for (i = DC_IRQ_SOURCE_HPD1; i <= adev->mode_info.num_hpd; i++) {
- if (amdgpu_irq_put(adev, &adev->hpd_irq, i - DC_IRQ_SOURCE_HPD1))
- drm_err(dev, "DM_IRQ: Failed put HPD for source=%d!\n", i);
- }
}
--
2.34.1
The total size calculated for EPC can overflow u64 given the added up page
for SECS. Further, the total size calculated for shmem can overflow even
when the EPC size stays within limits of u64, given that it adds the extra
space for 128 byte PCMD structures (one for each page).
Address this by pre-evaluating the micro-architectural requirement of
SGX: the address space size must be power of two. This is eventually
checked up by ECREATE but the pre-check has the additional benefit of
making sure that there is some space for additional data.
Cc: stable(a)vger.kernel.org # v5.11+
Fixes: 888d24911787 ("x86/sgx: Add SGX_IOC_ENCLAVE_CREATE")
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/linux-sgx/c87e01a0-e7dd-4749-a348-0980d3444f04@stan…
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v3: Updated the commit message according to Dave's suggestion and added
fixes tag.
v2: Simply check the micro-architetural requirement in order to address
Dave's comment:
https://lore.kernel.org/linux-sgx/45e68dea-af6a-4b2a-8249-420f14de3424@inte…
---
arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..776a20172867 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -64,6 +64,13 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs)
struct file *backing;
long ret;
+ /*
+ * ECREATE would detect this too, but checking here also ensures
+ * that the 'encl_size' calculations below can never overflow.
+ */
+ if (!is_power_of_2(secs->size))
+ return -EINVAL;
+
va_page = sgx_encl_grow(encl, true);
if (IS_ERR(va_page))
return PTR_ERR(va_page);
--
2.48.1
From: Krister Johansen <kjlx(a)templeofstupid.com>
If multiple connection requests attempt to create an implicit mptcp
endpoint in parallel, more than one caller may end up in
mptcp_pm_nl_append_new_local_addr because none found the address in
local_addr_list during their call to mptcp_pm_nl_get_local_id. In this
case, the concurrent new_local_addr calls may delete the address entry
created by the previous caller. These deletes use synchronize_rcu, but
this is not permitted in some of the contexts where this function may be
called. During packet recv, the caller may be in a rcu read critical
section and have preemption disabled.
An example stack:
BUG: scheduling while atomic: swapper/2/0/0x00000302
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
dump_stack (lib/dump_stack.c:124)
__schedule_bug (kernel/sched/core.c:5943)
schedule_debug.constprop.0 (arch/x86/include/asm/preempt.h:33 kernel/sched/core.c:5970)
__schedule (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:29 kernel/sched/core.c:6621)
schedule (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:6804 kernel/sched/core.c:6818)
schedule_timeout (kernel/time/timer.c:2160)
wait_for_completion (kernel/sched/completion.c:96 kernel/sched/completion.c:116 kernel/sched/completion.c:127 kernel/sched/completion.c:148)
__wait_rcu_gp (include/linux/rcupdate.h:311 kernel/rcu/update.c:444)
synchronize_rcu (kernel/rcu/tree.c:3609)
mptcp_pm_nl_append_new_local_addr (net/mptcp/pm_netlink.c:966 net/mptcp/pm_netlink.c:1061)
mptcp_pm_nl_get_local_id (net/mptcp/pm_netlink.c:1164)
mptcp_pm_get_local_id (net/mptcp/pm.c:420)
subflow_check_req (net/mptcp/subflow.c:98 net/mptcp/subflow.c:213)
subflow_v4_route_req (net/mptcp/subflow.c:305)
tcp_conn_request (net/ipv4/tcp_input.c:7216)
subflow_v4_conn_request (net/mptcp/subflow.c:651)
tcp_rcv_state_process (net/ipv4/tcp_input.c:6709)
tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1934)
tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2334)
ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205 (discriminator 1))
ip_local_deliver_finish (include/linux/rcupdate.h:813 net/ipv4/ip_input.c:234)
ip_local_deliver (include/linux/netfilter.h:314 include/linux/netfilter.h:308 net/ipv4/ip_input.c:254)
ip_sublist_rcv_finish (include/net/dst.h:461 net/ipv4/ip_input.c:580)
ip_sublist_rcv (net/ipv4/ip_input.c:640)
ip_list_rcv (net/ipv4/ip_input.c:675)
__netif_receive_skb_list_core (net/core/dev.c:5583 net/core/dev.c:5631)
netif_receive_skb_list_internal (net/core/dev.c:5685 net/core/dev.c:5774)
napi_complete_done (include/linux/list.h:37 include/net/gro.h:449 include/net/gro.h:444 net/core/dev.c:6114)
igb_poll (drivers/net/ethernet/intel/igb/igb_main.c:8244) igb
__napi_poll (net/core/dev.c:6582)
net_rx_action (net/core/dev.c:6653 net/core/dev.c:6787)
handle_softirqs (kernel/softirq.c:553)
__irq_exit_rcu (kernel/softirq.c:588 kernel/softirq.c:427 kernel/softirq.c:636)
irq_exit_rcu (kernel/softirq.c:651)
common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
</IRQ>
This problem seems particularly prevalent if the user advertises an
endpoint that has a different external vs internal address. In the case
where the external address is advertised and multiple connections
already exist, multiple subflow SYNs arrive in parallel which tends to
trigger the race during creation of the first local_addr_list entries
which have the internal address instead.
Fix by skipping the replacement of an existing implicit local address if
called via mptcp_pm_nl_get_local_id.
Fixes: d045b9eb95a9 ("mptcp: introduce implicit endpoints")
Cc: stable(a)vger.kernel.org
Suggested-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
net/mptcp/pm_netlink.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index c0e47f4f7b1aa2fedf615c44ea595c1f9d2528f9..7868207c4e9d9d7d4855ca3fadc02506637708ba 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -977,7 +977,7 @@ static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry)
static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
struct mptcp_pm_addr_entry *entry,
- bool needs_id)
+ bool needs_id, bool replace)
{
struct mptcp_pm_addr_entry *cur, *del_entry = NULL;
unsigned int addr_max;
@@ -1017,6 +1017,17 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
if (entry->addr.id)
goto out;
+ /* allow callers that only need to look up the local
+ * addr's id to skip replacement. This allows them to
+ * avoid calling synchronize_rcu in the packet recv
+ * path.
+ */
+ if (!replace) {
+ kfree(entry);
+ ret = cur->addr.id;
+ goto out;
+ }
+
pernet->addrs--;
entry->addr.id = cur->addr.id;
list_del_rcu(&cur->list);
@@ -1165,7 +1176,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc
entry->ifindex = 0;
entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
entry->lsk = NULL;
- ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true);
+ ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true, false);
if (ret < 0)
kfree(entry);
@@ -1433,7 +1444,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info)
}
}
ret = mptcp_pm_nl_append_new_local_addr(pernet, entry,
- !mptcp_pm_has_addr_attr_id(attr, info));
+ !mptcp_pm_has_addr_attr_id(attr, info),
+ true);
if (ret < 0) {
GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d", ret);
goto out_free;
---
base-commit: 64e6a754d33d31aa844b3ee66fb93ac84ca1565e
change-id: 20250303-net-mptcp-fix-sched-while-atomic-cec8ab2b9b8d
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Since the driver is broken in the case that src->freq_supported is not
NULL but src->freq_supported_num is 0, add an assertion for it.
Signed-off-by: Jiasheng Jiang <jiashengjiangcool(a)gmail.com>
---
Changelog:
v3 -> v4:
1. Add return after WARN_ON().
v2 -> v3:
1. Add "net-next" to the subject.
2. Remove the "Fixes" tag and "Cc: stable".
3. Replace BUG_ON with WARN_ON.
v1 -> v2:
1. Replace the check with an assertion.
---
drivers/dpll/dpll_core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
index 32019dc33cca..940c26b9dd53 100644
--- a/drivers/dpll/dpll_core.c
+++ b/drivers/dpll/dpll_core.c
@@ -443,8 +443,11 @@ static void dpll_pin_prop_free(struct dpll_pin_properties *prop)
static int dpll_pin_prop_dup(const struct dpll_pin_properties *src,
struct dpll_pin_properties *dst)
{
+ if (WARN_ON(src->freq_supported && !src->freq_supported_num))
+ return -EINVAL;
+
memcpy(dst, src, sizeof(*dst));
- if (src->freq_supported && src->freq_supported_num) {
+ if (src->freq_supported) {
size_t freq_size = src->freq_supported_num *
sizeof(*src->freq_supported);
dst->freq_supported = kmemdup(src->freq_supported,
--
2.25.1
Commit a8091f039c1e ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW
states") adds more status during maple tree walk. But it introduce a
typo on the status check during walk.
It expects to mean neither active nor start, we would restart the walk,
while current code means we would always restart the walk.
Fixes: a8091f039c1e ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
CC: <stable(a)vger.kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 0696e8d1c4e9..81970b3a6af7 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4895,7 +4895,7 @@ void *mas_walk(struct ma_state *mas)
{
void *entry;
- if (!mas_is_active(mas) || !mas_is_start(mas))
+ if (!mas_is_active(mas) && !mas_is_start(mas))
mas->status = ma_start;
retry:
entry = mas_state_walk(mas);
--
2.34.1
On destroy, we should set each node dead. But current code miss this
when the maple tree has only the root node.
The reason is mt_destroy_walk() leverage mte_destroy_descend() to set
node dead, but this is skipped since the only root node is a leaf.
Fixes this by setting the node dead if it is a leaf.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
---
v2:
* move the operation into mt_destroy_walk()
* adjust the title accordingly
---
lib/maple_tree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4bd5a5be1440..0696e8d1c4e9 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5284,6 +5284,7 @@ static void mt_destroy_walk(struct maple_enode *enode, struct maple_tree *mt,
struct maple_enode *start;
if (mte_is_leaf(enode)) {
+ mte_set_node_dead(enode);
node->type = mte_node_type(enode);
goto free_leaf;
}
--
2.34.1
The quilt patch titled
Subject: mm/migrate: fix shmem xarray update during migration
has been removed from the -mm tree. Its filename was
mm-migrate-fix-shmem-xarray-update-during-migration.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/migrate: fix shmem xarray update during migration
Date: Fri, 28 Feb 2025 12:49:53 -0500
Pagecache uses multi-index entries for large folio, so does shmem. Only
swap cache still stores multiple entries for a single large folio. Commit
fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
fixed swap cache but got shmem wrong by storing multiple entries for a
large shmem folio.
This results in a soft lockup as reported by Liu Shixin.
Fix it by storing a single entry for a shmem folio.
Link: https://lkml.kernel.org/r/20250228174953.2222831-1-ziy@nvidia.com
Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
Reviewed-by: Shivank Garg <shivankg(a)amd.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickens <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fix-shmem-xarray-update-during-migration
+++ a/mm/migrate.c
@@ -524,7 +524,11 @@ static int __folio_migrate_mapping(struc
folio_set_swapcache(newfolio);
newfolio->private = folio_get_private(folio);
}
- entries = nr;
+ /* shmem uses high-order entry */
+ if (!folio_test_anon(folio))
+ entries = 1;
+ else
+ entries = nr;
} else {
VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
entries = 1;
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
Syzkaller reports a NULL pointer dereference issue in input_event().
BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: null-ptr-deref in _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline]
BUG: KASAN: null-ptr-deref in is_event_supported drivers/input/input.c:67 [inline]
BUG: KASAN: null-ptr-deref in input_event+0x42/0xa0 drivers/input/input.c:395
Read of size 8 at addr 0000000000000028 by task syz-executor199/2949
CPU: 0 UID: 0 PID: 2949 Comm: syz-executor199 Not tainted 6.13.0-rc4-syzkaller-00076-gf097a36ef88d #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
kasan_report+0xd9/0x110 mm/kasan/report.c:602
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0xef/0x1a0 mm/kasan/generic.c:189
instrument_atomic_read include/linux/instrumented.h:68 [inline]
_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline]
is_event_supported drivers/input/input.c:67 [inline]
input_event+0x42/0xa0 drivers/input/input.c:395
input_report_key include/linux/input.h:439 [inline]
key_down drivers/hid/hid-appleir.c:159 [inline]
appleir_raw_event+0x3e5/0x5e0 drivers/hid/hid-appleir.c:232
__hid_input_report.constprop.0+0x312/0x440 drivers/hid/hid-core.c:2111
hid_ctrl+0x49f/0x550 drivers/hid/usbhid/hid-core.c:484
__usb_hcd_giveback_urb+0x389/0x6e0 drivers/usb/core/hcd.c:1650
usb_hcd_giveback_urb+0x396/0x450 drivers/usb/core/hcd.c:1734
dummy_timer+0x17f7/0x3960 drivers/usb/gadget/udc/dummy_hcd.c:1993
__run_hrtimer kernel/time/hrtimer.c:1739 [inline]
__hrtimer_run_queues+0x20a/0xae0 kernel/time/hrtimer.c:1803
hrtimer_run_softirq+0x17d/0x350 kernel/time/hrtimer.c:1820
handle_softirqs+0x206/0x8d0 kernel/softirq.c:561
__do_softirq kernel/softirq.c:595 [inline]
invoke_softirq kernel/softirq.c:435 [inline]
__irq_exit_rcu+0xfa/0x160 kernel/softirq.c:662
irq_exit_rcu+0x9/0x30 kernel/softirq.c:678
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1049
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
__mod_timer+0x8f6/0xdc0 kernel/time/timer.c:1185
add_timer+0x62/0x90 kernel/time/timer.c:1295
schedule_timeout+0x11f/0x280 kernel/time/sleep_timeout.c:98
usbhid_wait_io+0x1c7/0x380 drivers/hid/usbhid/hid-core.c:645
usbhid_init_reports+0x19f/0x390 drivers/hid/usbhid/hid-core.c:784
hiddev_ioctl+0x1133/0x15b0 drivers/hid/usbhid/hiddev.c:794
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x190/0x200 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
This happens due to the malformed report items sent by the emulated device
which results in a report, that has no fields, being added to the report list.
Due to this appleir_input_configured() is never called, hidinput_connect()
fails which results in the HID_CLAIMED_INPUT flag is not being set. However,
it does not make appleir_probe() fail and lets the event callback to be
called without the associated input device.
Thus, add a check for the HID_CLAIMED_INPUT flag and leave the event hook
early if the driver didn't claim any input_dev for some reason. Moreover,
some other hid drivers accessing input_dev in their event callbacks do have
similar checks, too.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9a4a5574ce42 ("HID: appleir: add support for Apple ir devices")
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniil Dulov <d.dulov(a)aladdin.ru>
---
drivers/hid/hid-appleir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-appleir.c b/drivers/hid/hid-appleir.c
index 8deded185725..c45e5aa569d2 100644
--- a/drivers/hid/hid-appleir.c
+++ b/drivers/hid/hid-appleir.c
@@ -188,7 +188,7 @@ static int appleir_raw_event(struct hid_device *hid, struct hid_report *report,
static const u8 flatbattery[] = { 0x25, 0x87, 0xe0 };
unsigned long flags;
- if (len != 5)
+ if (len != 5 || !(hid->claimed & HID_CLAIMED_INPUT))
goto out;
if (!memcmp(data, keydown, sizeof(keydown))) {
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c157d351460bcf202970e97e611cb6b54a3dd4a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030416-straining-detergent-2e6d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c157d351460bcf202970e97e611cb6b54a3dd4a4 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 25 Feb 2025 23:37:08 +0100
Subject: [PATCH] intel_idle: Handle older CPUs, which stop the TSC in deeper C
states, correctly
The Intel idle driver is preferred over the ACPI processor idle driver,
but fails to implement the work around for Core2 generation CPUs, where
the TSC stops in C2 and deeper C-states. This causes stalls and boot
delays, when the clocksource watchdog does not catch the unstable TSC
before the CPU goes deep idle for the first time.
The ACPI driver marks the TSC unstable when it detects that the CPU
supports C2 or deeper and the CPU does not have a non-stop TSC.
Add the equivivalent work around to the Intel idle driver to cure that.
Fixes: 18734958e9bf ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: Fab Stz <fabstz-it(a)yahoo.fr>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Fab Stz <fabstz-it(a)yahoo.fr>
Cc: All applicable <stable(a)vger.kernel.org>
Closes: https://lore.kernel.org/all/10cf96aa-1276-4bd4-8966-c890377030c3@yahoo.fr
Link: https://patch.msgid.link/87bjupfy7f.ffs@tglx
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 118fe1d37c22..0fdb1d1316c4 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -56,6 +56,7 @@
#include <asm/intel-family.h>
#include <asm/mwait.h>
#include <asm/spec-ctrl.h>
+#include <asm/tsc.h>
#include <asm/fpu/api.h>
#define INTEL_IDLE_VERSION "0.5.1"
@@ -1799,6 +1800,9 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
if (intel_idle_state_needs_timer_stop(state))
state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+ if (cx->type > ACPI_STATE_C1 && !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+ mark_tsc_unstable("TSC halts in idle");
+
state->enter = intel_idle;
state->enter_s2idle = intel_idle_s2idle;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c157d351460bcf202970e97e611cb6b54a3dd4a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030415-uncombed-drinking-e339@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c157d351460bcf202970e97e611cb6b54a3dd4a4 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 25 Feb 2025 23:37:08 +0100
Subject: [PATCH] intel_idle: Handle older CPUs, which stop the TSC in deeper C
states, correctly
The Intel idle driver is preferred over the ACPI processor idle driver,
but fails to implement the work around for Core2 generation CPUs, where
the TSC stops in C2 and deeper C-states. This causes stalls and boot
delays, when the clocksource watchdog does not catch the unstable TSC
before the CPU goes deep idle for the first time.
The ACPI driver marks the TSC unstable when it detects that the CPU
supports C2 or deeper and the CPU does not have a non-stop TSC.
Add the equivivalent work around to the Intel idle driver to cure that.
Fixes: 18734958e9bf ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: Fab Stz <fabstz-it(a)yahoo.fr>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Fab Stz <fabstz-it(a)yahoo.fr>
Cc: All applicable <stable(a)vger.kernel.org>
Closes: https://lore.kernel.org/all/10cf96aa-1276-4bd4-8966-c890377030c3@yahoo.fr
Link: https://patch.msgid.link/87bjupfy7f.ffs@tglx
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 118fe1d37c22..0fdb1d1316c4 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -56,6 +56,7 @@
#include <asm/intel-family.h>
#include <asm/mwait.h>
#include <asm/spec-ctrl.h>
+#include <asm/tsc.h>
#include <asm/fpu/api.h>
#define INTEL_IDLE_VERSION "0.5.1"
@@ -1799,6 +1800,9 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
if (intel_idle_state_needs_timer_stop(state))
state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+ if (cx->type > ACPI_STATE_C1 && !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+ mark_tsc_unstable("TSC halts in idle");
+
state->enter = intel_idle;
state->enter_s2idle = intel_idle_s2idle;
}
The userprog infrastructure links objects files through $(CC).
Either explicitly by manually calling $(CC) on multiple object files or
implicitly by directly compiling a source file to an executable.
The documentation at Documentation/kbuild/llvm.rst indicates that ld.lld
would be used for linking if LLVM=1 is specified.
However clang instead will use either a globally installed cross linker
from $PATH called ${target}-ld or fall back to the system linker, which
probably does not support crosslinking.
For the normal kernel build this is not an issue because the linker is
always executed directly, without the compiler being involved.
Explicitly pass --ld-path to clang so $(LD) is respected.
As clang 13.0.1 is required to build the kernel, this option is available.
Fixes: 7f3a59db274c ("kbuild: add infrastructure to build userspace programs")
Cc: stable(a)vger.kernel.org # needs wrapping in $(cc-option) for < 6.9
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Reproducer, using nolibc to avoid libc requirements for cross building:
$ tail -2 init/Makefile
userprogs-always-y += test-llvm
test-llvm-userccflags += -nostdlib -nolibc -static -isystem usr/ -include $(srctree)/tools/include/nolibc/nolibc.h
$ cat init/test-llvm.c
int main(void)
{
return 0;
}
$ make ARCH=arm64 LLVM=1 allnoconfig headers_install init/
Validate that init/test-llvm builds and has the correct binary format.
---
Changes in v2:
- Use --ld-path instead of -fuse-ld
- Drop already applied patch
- Link to v1: https://lore.kernel.org/r/20250213-kbuild-userprog-fixes-v1-0-f255fb477d98@…
---
Makefile | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Makefile b/Makefile
index 96407c1d6be167b04ed5883e455686918c7a75ee..b41c164533d781d010ff8b2522e523164dc375d0 100644
--- a/Makefile
+++ b/Makefile
@@ -1123,6 +1123,11 @@ endif
KBUILD_USERCFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS))
KBUILD_USERLDFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS))
+# userspace programs are linked via the compiler, use the correct linker
+ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_LD_IS_LLD),yy)
+KBUILD_USERLDFLAGS += --ld-path=$(LD)
+endif
+
# make the checker run with the right architecture
CHECKFLAGS += --arch=$(ARCH)
---
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
change-id: 20250213-kbuild-userprog-fixes-4f07b62ae818
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
If a userptr vma subject to prefetching was already invalidated
or invalidated during the prefetch operation, the operation would
repeatedly return -EAGAIN which would typically cause an infinite
loop.
Validate the userptr to ensure this doesn't happen.
v2:
- Don't fallthrough from UNMAP to PREFETCH (Matthew Brost)
Fixes: 5bd24e78829a ("drm/xe/vm: Subclass userptr vmas")
Fixes: 617eebb9c480 ("drm/xe: Fix array of binds")
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.9+
Suggested-by: Matthew Brost <matthew.brost(a)intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 996000f2424e..6fdc17be619e 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2306,8 +2306,17 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
break;
}
case DRM_GPUVA_OP_UNMAP:
+ xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask);
+ break;
case DRM_GPUVA_OP_PREFETCH:
- /* FIXME: Need to skip some prefetch ops */
+ vma = gpuva_to_vma(op->base.prefetch.va);
+
+ if (xe_vma_is_userptr(vma)) {
+ err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
+ if (err)
+ return err;
+ }
+
xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask);
break;
default:
--
2.48.1
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 2b90e7ace79774a3540ce569e000388f8d22c9e0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030421-disgrace-wince-4786@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b90e7ace79774a3540ce569e000388f8d22c9e0 Mon Sep 17 00:00:00 2001
From: Peter Jones <pjones(a)redhat.com>
Date: Wed, 26 Feb 2025 15:18:39 -0500
Subject: [PATCH] efi: Don't map the entire mokvar table to determine its size
Currently, when validating the mokvar table, we (re)map the entire table
on each iteration of the loop, adding space as we discover new entries.
If the table grows over a certain size, this fails due to limitations of
early_memmap(), and we get a failure and traceback:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:139 __early_ioremap+0xef/0x220
...
Call Trace:
<TASK>
? __early_ioremap+0xef/0x220
? __warn.cold+0x93/0xfa
? __early_ioremap+0xef/0x220
? report_bug+0xff/0x140
? early_fixup_exception+0x5d/0xb0
? early_idt_handler_common+0x2f/0x3a
? __early_ioremap+0xef/0x220
? efi_mokvar_table_init+0xce/0x1d0
? setup_arch+0x864/0xc10
? start_kernel+0x6b/0xa10
? x86_64_start_reservations+0x24/0x30
? x86_64_start_kernel+0xed/0xf0
? common_startup_64+0x13e/0x141
</TASK>
---[ end trace 0000000000000000 ]---
mokvar: Failed to map EFI MOKvar config table pa=0x7c4c3000, size=265187.
Mapping the entire structure isn't actually necessary, as we don't ever
need more than one entry header mapped at once.
Changes efi_mokvar_table_init() to only map each entry header, not the
entire table, when determining the table size. Since we're not mapping
any data past the variable name, it also changes the code to enforce
that each variable name is NUL terminated, rather than attempting to
verify it in place.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Jones <pjones(a)redhat.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
diff --git a/drivers/firmware/efi/mokvar-table.c b/drivers/firmware/efi/mokvar-table.c
index 5ed0602c2f75..d865cb1dbaad 100644
--- a/drivers/firmware/efi/mokvar-table.c
+++ b/drivers/firmware/efi/mokvar-table.c
@@ -103,7 +103,6 @@ void __init efi_mokvar_table_init(void)
void *va = NULL;
unsigned long cur_offset = 0;
unsigned long offset_limit;
- unsigned long map_size = 0;
unsigned long map_size_needed = 0;
unsigned long size;
struct efi_mokvar_table_entry *mokvar_entry;
@@ -134,48 +133,34 @@ void __init efi_mokvar_table_init(void)
*/
err = -EINVAL;
while (cur_offset + sizeof(*mokvar_entry) <= offset_limit) {
- mokvar_entry = va + cur_offset;
- map_size_needed = cur_offset + sizeof(*mokvar_entry);
- if (map_size_needed > map_size) {
- if (va)
- early_memunmap(va, map_size);
- /*
- * Map a little more than the fixed size entry
- * header, anticipating some data. It's safe to
- * do so as long as we stay within current memory
- * descriptor.
- */
- map_size = min(map_size_needed + 2*EFI_PAGE_SIZE,
- offset_limit);
- va = early_memremap(efi.mokvar_table, map_size);
- if (!va) {
- pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%lu.\n",
- efi.mokvar_table, map_size);
- return;
- }
- mokvar_entry = va + cur_offset;
+ if (va)
+ early_memunmap(va, sizeof(*mokvar_entry));
+ va = early_memremap(efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+ if (!va) {
+ pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%zu.\n",
+ efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+ return;
}
+ mokvar_entry = va;
/* Check for last sentinel entry */
if (mokvar_entry->name[0] == '\0') {
if (mokvar_entry->data_size != 0)
break;
err = 0;
+ map_size_needed = cur_offset + sizeof(*mokvar_entry);
break;
}
- /* Sanity check that the name is null terminated */
- size = strnlen(mokvar_entry->name,
- sizeof(mokvar_entry->name));
- if (size >= sizeof(mokvar_entry->name))
- break;
+ /* Enforce that the name is NUL terminated */
+ mokvar_entry->name[sizeof(mokvar_entry->name) - 1] = '\0';
/* Advance to the next entry */
- cur_offset = map_size_needed + mokvar_entry->data_size;
+ cur_offset += sizeof(*mokvar_entry) + mokvar_entry->data_size;
}
if (va)
- early_memunmap(va, map_size);
+ early_memunmap(va, sizeof(*mokvar_entry));
if (err) {
pr_err("EFI MOKvar config table is not valid\n");
return;
The pnfs that we obtain from hmm_range_fault() point to pages that
we don't have a reference on, and the guarantee that they are still
in the cpu page-tables is that the notifier lock must be held and the
notifier seqno is still valid.
So while building the sg table and marking the pages accesses / dirty
we need to hold this lock with a validated seqno.
However, the lock is reclaim tainted which makes
sg_alloc_table_from_pages_segment() unusable, since it internally
allocates memory.
Instead build the sg-table manually. For the non-iommu case
this might lead to fewer coalesces, but if that's a problem it can
be fixed up later in the resource cursor code. For the iommu case,
the whole sg-table may still be coalesced to a single contigous
device va region.
This avoids marking pages that we don't own dirty and accessed, and
it also avoid dereferencing struct pages that we don't own.
v2:
- Use assert to check whether hmm pfns are valid (Matthew Auld)
- Take into account that large pages may cross range boundaries
(Matthew Auld)
v3:
- Don't unnecessarily check for a non-freed sg-table. (Matthew Auld)
- Add a missing up_read() in an error path. (Matthew Auld)
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Cc: Oak Zeng <oak.zeng(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
---
drivers/gpu/drm/xe/xe_hmm.c | 112 +++++++++++++++++++++++++++---------
1 file changed, 86 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index c56738fa713b..d33dd7bc3f41 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -42,6 +42,42 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
}
}
+static int xe_alloc_sg(struct xe_device *xe, struct sg_table *st,
+ struct hmm_range *range, struct rw_semaphore *notifier_sem)
+{
+ unsigned long i, npages, hmm_pfn;
+ unsigned long num_chunks = 0;
+ int ret;
+
+ /* HMM docs says this is needed. */
+ ret = down_read_interruptible(notifier_sem);
+ if (ret)
+ return ret;
+
+ if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
+ up_read(notifier_sem);
+ return -EAGAIN;
+ }
+
+ npages = xe_npages_in_range(range->start, range->end);
+ for (i = 0; i < npages;) {
+ unsigned long len;
+
+ hmm_pfn = range->hmm_pfns[i];
+ xe_assert(xe, hmm_pfn & HMM_PFN_VALID);
+
+ len = 1UL << hmm_pfn_to_map_order(hmm_pfn);
+
+ /* If order > 0 the page may extend beyond range->start */
+ len -= (hmm_pfn & ~HMM_PFN_FLAGS) & (len - 1);
+ i += len;
+ num_chunks++;
+ }
+ up_read(notifier_sem);
+
+ return sg_alloc_table(st, num_chunks, GFP_KERNEL);
+}
+
/**
* xe_build_sg() - build a scatter gather table for all the physical pages/pfn
* in a hmm_range. dma-map pages if necessary. dma-address is save in sg table
@@ -50,6 +86,7 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
* @range: the hmm range that we build the sg table from. range->hmm_pfns[]
* has the pfn numbers of pages that back up this hmm address range.
* @st: pointer to the sg table.
+ * @notifier_sem: The xe notifier lock.
* @write: whether we write to this range. This decides dma map direction
* for system pages. If write we map it bi-diretional; otherwise
* DMA_TO_DEVICE
@@ -76,38 +113,41 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
* Returns 0 if successful; -ENOMEM if fails to allocate memory
*/
static int xe_build_sg(struct xe_device *xe, struct hmm_range *range,
- struct sg_table *st, bool write)
+ struct sg_table *st,
+ struct rw_semaphore *notifier_sem,
+ bool write)
{
+ unsigned long npages = xe_npages_in_range(range->start, range->end);
struct device *dev = xe->drm.dev;
- struct page **pages;
- u64 i, npages;
- int ret;
+ struct scatterlist *sgl;
+ struct page *page;
+ unsigned long i, j;
- npages = xe_npages_in_range(range->start, range->end);
- pages = kvmalloc_array(npages, sizeof(*pages), GFP_KERNEL);
- if (!pages)
- return -ENOMEM;
+ lockdep_assert_held(notifier_sem);
- for (i = 0; i < npages; i++) {
- pages[i] = hmm_pfn_to_page(range->hmm_pfns[i]);
- xe_assert(xe, !is_device_private_page(pages[i]));
- }
+ i = 0;
+ for_each_sg(st->sgl, sgl, st->nents, j) {
+ unsigned long hmm_pfn, size;
- ret = sg_alloc_table_from_pages_segment(st, pages, npages, 0, npages << PAGE_SHIFT,
- xe_sg_segment_size(dev), GFP_KERNEL);
- if (ret)
- goto free_pages;
+ hmm_pfn = range->hmm_pfns[i];
+ page = hmm_pfn_to_page(hmm_pfn);
+ xe_assert(xe, !is_device_private_page(page));
+
+ size = 1UL << hmm_pfn_to_map_order(hmm_pfn);
+ size -= page_to_pfn(page) & (size - 1);
+ i += size;
- ret = dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
- if (ret) {
- sg_free_table(st);
- st = NULL;
+ if (unlikely(j == st->nents - 1)) {
+ if (i > npages)
+ size -= (i - npages);
+ sg_mark_end(sgl);
+ }
+ sg_set_page(sgl, page, size << PAGE_SHIFT, 0);
}
+ xe_assert(xe, i == npages);
-free_pages:
- kvfree(pages);
- return ret;
+ return dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING);
}
/**
@@ -235,16 +275,36 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
if (ret)
goto free_pfns;
- ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
+ ret = xe_alloc_sg(vm->xe, &userptr->sgt, &hmm_range, &vm->userptr.notifier_lock);
if (ret)
goto free_pfns;
+ ret = down_read_interruptible(&vm->userptr.notifier_lock);
+ if (ret)
+ goto free_st;
+
+ if (mmu_interval_read_retry(hmm_range.notifier, hmm_range.notifier_seq)) {
+ ret = -EAGAIN;
+ goto out_unlock;
+ }
+
+ ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt,
+ &vm->userptr.notifier_lock, write);
+ if (ret)
+ goto out_unlock;
+
xe_mark_range_accessed(&hmm_range, write);
userptr->sg = &userptr->sgt;
userptr->notifier_seq = hmm_range.notifier_seq;
+ up_read(&vm->userptr.notifier_lock);
+ kvfree(pfns);
+ return 0;
+out_unlock:
+ up_read(&vm->userptr.notifier_lock);
+free_st:
+ sg_free_table(&userptr->sgt);
free_pfns:
kvfree(pfns);
return ret;
}
-
--
2.48.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 15b3b3254d1453a8db038b7d44b311a2d6c71f98
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030445-collected-spoken-1e75@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15b3b3254d1453a8db038b7d44b311a2d6c71f98 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Sat, 15 Feb 2025 11:11:29 +0000
Subject: [PATCH] btrfs: do regular iput instead of delayed iput during extent
map shrinking
The extent map shrinker now runs in the system unbound workqueue and no
longer in kswapd context so it can directly do an iput() on inodes even
if that blocks or needs to acquire any lock (we aren't holding any locks
when requesting the delayed iput from the shrinker). So we don't need to
add a delayed iput, wake up the cleaner and delegate the iput() to the
cleaner, which also adds extra contention on the spinlock that protects
the delayed iputs list.
Reported-by: Ivan Shapovalov <intelfx(a)intelfx.name>
Tested-by: Ivan Shapovalov <intelfx(a)intelfx.name>
Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12…
CC: stable(a)vger.kernel.org # 6.12+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 8c6b85ffd18f..7f46abbd6311 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -1256,7 +1256,7 @@ static long btrfs_scan_root(struct btrfs_root *root, struct btrfs_em_shrink_ctx
min_ino = btrfs_ino(inode) + 1;
fs_info->em_shrinker_last_ino = btrfs_ino(inode);
- btrfs_add_delayed_iput(inode);
+ iput(&inode->vfs_inode);
if (ctx->scanned >= ctx->nr_to_scan || btrfs_fs_closing(fs_info))
break;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fb8179ce2996bffaa36a04e2b6262843b01b7565
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030439-announcer-lard-995b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb8179ce2996bffaa36a04e2b6262843b01b7565 Mon Sep 17 00:00:00 2001
From: Rob Herring <robh(a)kernel.org>
Date: Mon, 4 Nov 2024 13:03:13 -0600
Subject: [PATCH] riscv: cacheinfo: Use of_property_present() for non-boolean
properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
Reviewed-by: Clément Léger <cleger(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 2d40736fc37c..26b085dbdd07 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -108,11 +108,11 @@ int populate_cache_leaves(unsigned int cpu)
if (!np)
return -ENOENT;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
prev = np;
@@ -125,11 +125,11 @@ int populate_cache_leaves(unsigned int cpu)
break;
if (level <= levels)
break;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
levels = level;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 564fc8eb6f78e01292ff10801f318feae6153fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030452-rural-foe-f342@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 564fc8eb6f78e01292ff10801f318feae6153fdd Mon Sep 17 00:00:00 2001
From: Yong-Xuan Wang <yongxuan.wang(a)sifive.com>
Date: Fri, 20 Dec 2024 16:39:24 +0800
Subject: [PATCH] riscv: signal: fix signal_minsigstksz
The init_rt_signal_env() funciton is called before the alternative patch
is applied, so using the alternative-related API to check the availability
of an extension within this function doesn't have the intended effect.
This patch reorders the init_rt_signal_env() and apply_boot_alternatives()
to get the correct signal_minsigstksz.
Fixes: e92f469b0771 ("riscv: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang(a)sifive.com>
Reviewed-by: Zong Li <zong.li(a)sifive.com>
Reviewed-by: Andy Chiu <andybnac(a)gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241220083926.19453-3-yongxuan.wang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f1793630fc51..4fe45daa6281 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -322,8 +322,8 @@ void __init setup_arch(char **cmdline_p)
riscv_init_cbo_blocksizes();
riscv_fill_hwcap();
- init_rt_signal_env();
apply_boot_alternatives();
+ init_rt_signal_env();
if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) &&
riscv_isa_extension_available(NULL, ZICBOM))
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x fb8179ce2996bffaa36a04e2b6262843b01b7565
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030440-viscous-cucumber-2e7c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb8179ce2996bffaa36a04e2b6262843b01b7565 Mon Sep 17 00:00:00 2001
From: Rob Herring <robh(a)kernel.org>
Date: Mon, 4 Nov 2024 13:03:13 -0600
Subject: [PATCH] riscv: cacheinfo: Use of_property_present() for non-boolean
properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
Reviewed-by: Clément Léger <cleger(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 2d40736fc37c..26b085dbdd07 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -108,11 +108,11 @@ int populate_cache_leaves(unsigned int cpu)
if (!np)
return -ENOENT;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
prev = np;
@@ -125,11 +125,11 @@ int populate_cache_leaves(unsigned int cpu)
break;
if (level <= levels)
break;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
levels = level;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x fb8179ce2996bffaa36a04e2b6262843b01b7565
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030439-deskwork-ladder-b559@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb8179ce2996bffaa36a04e2b6262843b01b7565 Mon Sep 17 00:00:00 2001
From: Rob Herring <robh(a)kernel.org>
Date: Mon, 4 Nov 2024 13:03:13 -0600
Subject: [PATCH] riscv: cacheinfo: Use of_property_present() for non-boolean
properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
Reviewed-by: Clément Léger <cleger(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 2d40736fc37c..26b085dbdd07 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -108,11 +108,11 @@ int populate_cache_leaves(unsigned int cpu)
if (!np)
return -ENOENT;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
prev = np;
@@ -125,11 +125,11 @@ int populate_cache_leaves(unsigned int cpu)
break;
if (level <= levels)
break;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
levels = level;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x fb8179ce2996bffaa36a04e2b6262843b01b7565
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030440-graffiti-good-0bb6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb8179ce2996bffaa36a04e2b6262843b01b7565 Mon Sep 17 00:00:00 2001
From: Rob Herring <robh(a)kernel.org>
Date: Mon, 4 Nov 2024 13:03:13 -0600
Subject: [PATCH] riscv: cacheinfo: Use of_property_present() for non-boolean
properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
Reviewed-by: Clément Léger <cleger(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 2d40736fc37c..26b085dbdd07 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -108,11 +108,11 @@ int populate_cache_leaves(unsigned int cpu)
if (!np)
return -ENOENT;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
prev = np;
@@ -125,11 +125,11 @@ int populate_cache_leaves(unsigned int cpu)
break;
if (level <= levels)
break;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
levels = level;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fb8179ce2996bffaa36a04e2b6262843b01b7565
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030439-anointer-conceal-ab4c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb8179ce2996bffaa36a04e2b6262843b01b7565 Mon Sep 17 00:00:00 2001
From: Rob Herring <robh(a)kernel.org>
Date: Mon, 4 Nov 2024 13:03:13 -0600
Subject: [PATCH] riscv: cacheinfo: Use of_property_present() for non-boolean
properties
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org>
Reviewed-by: Clément Léger <cleger(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code")
Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
index 2d40736fc37c..26b085dbdd07 100644
--- a/arch/riscv/kernel/cacheinfo.c
+++ b/arch/riscv/kernel/cacheinfo.c
@@ -108,11 +108,11 @@ int populate_cache_leaves(unsigned int cpu)
if (!np)
return -ENOENT;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
prev = np;
@@ -125,11 +125,11 @@ int populate_cache_leaves(unsigned int cpu)
break;
if (level <= levels)
break;
- if (of_property_read_bool(np, "cache-size"))
+ if (of_property_present(np, "cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_UNIFIED, level);
- if (of_property_read_bool(np, "i-cache-size"))
+ if (of_property_present(np, "i-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_INST, level);
- if (of_property_read_bool(np, "d-cache-size"))
+ if (of_property_present(np, "d-cache-size"))
ci_leaf_init(this_leaf++, CACHE_TYPE_DATA, level);
levels = level;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 599c44cd21f4967774e0acf58f734009be4aea9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030425-shifty-dipper-3fc6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 599c44cd21f4967774e0acf58f734009be4aea9a Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Mon, 3 Feb 2025 11:06:00 +0100
Subject: [PATCH] riscv/futex: sign extend compare value in atomic cmpxchg
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Make sure the compare value in the lr/sc loop is sign extended to match
what lr.w does. Fortunately, due to the compiler keeping the register
contents sign extended anyway the lack of the explicit extension didn't
result in wrong code so far, but this cannot be relied upon.
Fixes: b90edb33010b ("RISC-V: Add futex support.")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Björn Töpel <bjorn(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/futex.h b/arch/riscv/include/asm/futex.h
index 72be100afa23..90c86b115e00 100644
--- a/arch/riscv/include/asm/futex.h
+++ b/arch/riscv/include/asm/futex.h
@@ -93,7 +93,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %[r]) \
_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %[r]) \
: [r] "+r" (ret), [v] "=&r" (val), [u] "+m" (*uaddr), [t] "=&r" (tmp)
- : [ov] "Jr" (oldval), [nv] "Jr" (newval)
+ : [ov] "Jr" ((long)(int)oldval), [nv] "Jr" (newval)
: "memory");
__disable_user_access();
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 599c44cd21f4967774e0acf58f734009be4aea9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030424-jawed-limes-5ba5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 599c44cd21f4967774e0acf58f734009be4aea9a Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Mon, 3 Feb 2025 11:06:00 +0100
Subject: [PATCH] riscv/futex: sign extend compare value in atomic cmpxchg
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Make sure the compare value in the lr/sc loop is sign extended to match
what lr.w does. Fortunately, due to the compiler keeping the register
contents sign extended anyway the lack of the explicit extension didn't
result in wrong code so far, but this cannot be relied upon.
Fixes: b90edb33010b ("RISC-V: Add futex support.")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Björn Töpel <bjorn(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/futex.h b/arch/riscv/include/asm/futex.h
index 72be100afa23..90c86b115e00 100644
--- a/arch/riscv/include/asm/futex.h
+++ b/arch/riscv/include/asm/futex.h
@@ -93,7 +93,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %[r]) \
_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %[r]) \
: [r] "+r" (ret), [v] "=&r" (val), [u] "+m" (*uaddr), [t] "=&r" (tmp)
- : [ov] "Jr" (oldval), [nv] "Jr" (newval)
+ : [ov] "Jr" ((long)(int)oldval), [nv] "Jr" (newval)
: "memory");
__disable_user_access();
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 599c44cd21f4967774e0acf58f734009be4aea9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030419-satin-cogwheel-ae89@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 599c44cd21f4967774e0acf58f734009be4aea9a Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Mon, 3 Feb 2025 11:06:00 +0100
Subject: [PATCH] riscv/futex: sign extend compare value in atomic cmpxchg
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Make sure the compare value in the lr/sc loop is sign extended to match
what lr.w does. Fortunately, due to the compiler keeping the register
contents sign extended anyway the lack of the explicit extension didn't
result in wrong code so far, but this cannot be relied upon.
Fixes: b90edb33010b ("RISC-V: Add futex support.")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Björn Töpel <bjorn(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/futex.h b/arch/riscv/include/asm/futex.h
index 72be100afa23..90c86b115e00 100644
--- a/arch/riscv/include/asm/futex.h
+++ b/arch/riscv/include/asm/futex.h
@@ -93,7 +93,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %[r]) \
_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %[r]) \
: [r] "+r" (ret), [v] "=&r" (val), [u] "+m" (*uaddr), [t] "=&r" (tmp)
- : [ov] "Jr" (oldval), [nv] "Jr" (newval)
+ : [ov] "Jr" ((long)(int)oldval), [nv] "Jr" (newval)
: "memory");
__disable_user_access();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1898300abf3508bca152e65b36cce5bf93d7e63e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030410-humbling-occupy-f10c@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1898300abf3508bca152e65b36cce5bf93d7e63e Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Thu, 30 Jan 2025 10:25:38 +0100
Subject: [PATCH] riscv/atomic: Do proper sign extension also for unsigned in
arch_cmpxchg
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4cadc56220fe..427c41dde643 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -231,7 +231,7 @@
__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \
sc_prepend, sc_append, \
cas_prepend, cas_append, \
- __ret, __ptr, (long), __old, __new); \
+ __ret, __ptr, (long)(int)(long), __old, __new); \
break; \
case 8: \
__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1898300abf3508bca152e65b36cce5bf93d7e63e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030409-senior-devotion-4685@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1898300abf3508bca152e65b36cce5bf93d7e63e Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Thu, 30 Jan 2025 10:25:38 +0100
Subject: [PATCH] riscv/atomic: Do proper sign extension also for unsigned in
arch_cmpxchg
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4cadc56220fe..427c41dde643 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -231,7 +231,7 @@
__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \
sc_prepend, sc_append, \
cas_prepend, cas_append, \
- __ret, __ptr, (long), __old, __new); \
+ __ret, __ptr, (long)(int)(long), __old, __new); \
break; \
case 8: \
__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1898300abf3508bca152e65b36cce5bf93d7e63e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030409-squirt-handprint-3e7c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1898300abf3508bca152e65b36cce5bf93d7e63e Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Thu, 30 Jan 2025 10:25:38 +0100
Subject: [PATCH] riscv/atomic: Do proper sign extension also for unsigned in
arch_cmpxchg
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4cadc56220fe..427c41dde643 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -231,7 +231,7 @@
__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \
sc_prepend, sc_append, \
cas_prepend, cas_append, \
- __ret, __ptr, (long), __old, __new); \
+ __ret, __ptr, (long)(int)(long), __old, __new); \
break; \
case 8: \
__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1898300abf3508bca152e65b36cce5bf93d7e63e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030408-degraded-overpay-ac4a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1898300abf3508bca152e65b36cce5bf93d7e63e Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Thu, 30 Jan 2025 10:25:38 +0100
Subject: [PATCH] riscv/atomic: Do proper sign extension also for unsigned in
arch_cmpxchg
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4cadc56220fe..427c41dde643 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -231,7 +231,7 @@
__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \
sc_prepend, sc_append, \
cas_prepend, cas_append, \
- __ret, __ptr, (long), __old, __new); \
+ __ret, __ptr, (long)(int)(long), __old, __new); \
break; \
case 8: \
__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1898300abf3508bca152e65b36cce5bf93d7e63e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030408-python-steerable-a903@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1898300abf3508bca152e65b36cce5bf93d7e63e Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Thu, 30 Jan 2025 10:25:38 +0100
Subject: [PATCH] riscv/atomic: Do proper sign extension also for unsigned in
arch_cmpxchg
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4cadc56220fe..427c41dde643 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -231,7 +231,7 @@
__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \
sc_prepend, sc_append, \
cas_prepend, cas_append, \
- __ret, __ptr, (long), __old, __new); \
+ __ret, __ptr, (long)(int)(long), __old, __new); \
break; \
case 8: \
__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 1898300abf3508bca152e65b36cce5bf93d7e63e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030407-tiny-slain-3c45@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1898300abf3508bca152e65b36cce5bf93d7e63e Mon Sep 17 00:00:00 2001
From: Andreas Schwab <schwab(a)suse.de>
Date: Thu, 30 Jan 2025 10:25:38 +0100
Subject: [PATCH] riscv/atomic: Do proper sign extension also for unsigned in
arch_cmpxchg
Sign extend also an unsigned compare value to match what lr.w is doing.
Otherwise try_cmpxchg may spuriously return true when used on a u32 value
that has the sign bit set, as it happens often in inode_set_ctime_current.
Do this in three conversion steps. The first conversion to long is needed
to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a
pointer type. Then convert to int and back to long to always sign extend
the 32-bit value to 64-bit.
Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I")
Signed-off-by: Andreas Schwab <schwab(a)suse.de>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 4cadc56220fe..427c41dde643 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -231,7 +231,7 @@
__arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \
sc_prepend, sc_append, \
cas_prepend, cas_append, \
- __ret, __ptr, (long), __old, __new); \
+ __ret, __ptr, (long)(int)(long), __old, __new); \
break; \
case 8: \
__arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 57a0ef02fefafc4b9603e33a18b669ba5ce59ba3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030437-anaerobic-valiant-b3b2@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57a0ef02fefafc4b9603e33a18b669ba5ce59ba3 Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Tue, 4 Feb 2025 13:57:20 +0100
Subject: [PATCH] ima: Reset IMA_NONACTION_RULE_FLAGS after post_setattr
Commit 0d73a55208e9 ("ima: re-introduce own integrity cache lock")
mistakenly reverted the performance improvement introduced in commit
42a4c603198f0 ("ima: fix ima_inode_post_setattr"). The unused bit mask was
subsequently removed by commit 11c60f23ed13 ("integrity: Remove unused
macro IMA_ACTION_RULE_FLAGS").
Restore the performance improvement by introducing the new mask
IMA_NONACTION_RULE_FLAGS, equal to IMA_NONACTION_FLAGS without
IMA_NEW_FILE, which is not a rule-specific flag.
Finally, reset IMA_NONACTION_RULE_FLAGS instead of IMA_NONACTION_FLAGS in
process_measurement(), if the IMA_CHANGE_ATTR atomic flag is set (after
file metadata modification).
With this patch, new files for which metadata were modified while they are
still open, can be reopened before the last file close (when security.ima
is written), since the IMA_NEW_FILE flag is not cleared anymore. Otherwise,
appraisal fails because security.ima is missing (files with IMA_NEW_FILE
set are an exception).
Cc: stable(a)vger.kernel.org # v4.16.x
Fixes: 0d73a55208e9 ("ima: re-introduce own integrity cache lock")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 24d09ea91b87..a4f284bd846c 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -149,6 +149,9 @@ struct ima_kexec_hdr {
#define IMA_CHECK_BLACKLIST 0x40000000
#define IMA_VERITY_REQUIRED 0x80000000
+/* Exclude non-action flags which are not rule-specific. */
+#define IMA_NONACTION_RULE_FLAGS (IMA_NONACTION_FLAGS & ~IMA_NEW_FILE)
+
#define IMA_DO_MASK (IMA_MEASURE | IMA_APPRAISE | IMA_AUDIT | \
IMA_HASH | IMA_APPRAISE_SUBMASK)
#define IMA_DONE_MASK (IMA_MEASURED | IMA_APPRAISED | IMA_AUDITED | \
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index f2c9affa0c2a..28b8b0db6f9b 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -269,10 +269,13 @@ static int process_measurement(struct file *file, const struct cred *cred,
mutex_lock(&iint->mutex);
if (test_and_clear_bit(IMA_CHANGE_ATTR, &iint->atomic_flags))
- /* reset appraisal flags if ima_inode_post_setattr was called */
+ /*
+ * Reset appraisal flags (action and non-action rule-specific)
+ * if ima_inode_post_setattr was called.
+ */
iint->flags &= ~(IMA_APPRAISE | IMA_APPRAISED |
IMA_APPRAISE_SUBMASK | IMA_APPRAISED_SUBMASK |
- IMA_NONACTION_FLAGS);
+ IMA_NONACTION_RULE_FLAGS);
/*
* Re-evaulate the file if either the xattr has changed or the