The patch titled
Subject: mm: zswap: disable migration while using per-CPU acomp_ctx
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-zswap-disable-migration-while-using-per-cpu-acomp_ctx.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yosry Ahmed <yosryahmed(a)google.com>
Subject: mm: zswap: disable migration while using per-CPU acomp_ctx
Date: Tue, 7 Jan 2025 22:22:35 +0000
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
current CPU at the beginning of the operation is retrieved and used
throughout. However, since neither preemption nor migration are disabled,
it is possible that the operation continues on a different CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as the resources attached to the acomp_ctx are freed
during hotunplug in zswap_cpu_comp_dead().
The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use
crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure the
CPU cannot go away from under us. Preemption cannot be disabled with the
crypto_acomp API as a sleepable context is needed.
Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to
per-acomp_ctx") increased the UAF surface area by making the per-CPU
buffers dynamic, adding yet another resource that can be freed from under
zswap compression/decompression by CPU hotunplug.
This cannot be fixed by holding cpus_read_lock(), as it is possible for
code already holding the lock to fall into reclaim and enter zswap
(causing a deadlock). It also cannot be fixed by wrapping the usage of
acomp_ctx in an SRCU critical section and using synchronize_srcu() in
zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in
CPU-hotplug notifiers (see
Documentation/RCU/Design/Requirements/Requirements.rst).
This can be fixed by refcounting the acomp_ctx, but it involves complexity
in handling the race between the refcount dropping to zero in
zswap_[de]compress() and the refcount being re-initialized when the CPU is
onlined.
Keep things simple for now and just disable migration while using the
per-CPU acomp_ctx to block CPU hotunplug until the usage is over.
Link: https://lkml.kernel.org/r/20250107222236.2715883-2-yosryahmed@google.com
Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
Signed-off-by: Yosry Ahmed <yosryahmed(a)google.com>
Reported-by: Johannes Weiner <hannes(a)cmpxchg.org>
Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun <samsun1006219(a)gmail.com>
Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tP…
Cc: Barry Song <baohua(a)kernel.org>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: syzbot <syzkaller(a)googlegroups.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
--- a/mm/zswap.c~mm-zswap-disable-migration-while-using-per-cpu-acomp_ctx
+++ a/mm/zswap.c
@@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned
return 0;
}
+/* Remain on the CPU while using its acomp_ctx to stop it from going offline */
+static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acomp_ctx __percpu *acomp_ctx)
+{
+ migrate_disable();
+ return raw_cpu_ptr(acomp_ctx);
+}
+
+static void acomp_ctx_put_cpu(void)
+{
+ migrate_enable();
+}
+
static bool zswap_compress(struct page *page, struct zswap_entry *entry,
struct zswap_pool *pool)
{
@@ -893,8 +905,7 @@ static bool zswap_compress(struct page *
gfp_t gfp;
u8 *dst;
- acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
-
+ acomp_ctx = acomp_ctx_get_cpu(pool->acomp_ctx);
mutex_lock(&acomp_ctx->mutex);
dst = acomp_ctx->buffer;
@@ -950,6 +961,7 @@ unlock:
zswap_reject_alloc_fail++;
mutex_unlock(&acomp_ctx->mutex);
+ acomp_ctx_put_cpu();
return comp_ret == 0 && alloc_ret == 0;
}
@@ -960,7 +972,7 @@ static void zswap_decompress(struct zswa
struct crypto_acomp_ctx *acomp_ctx;
u8 *src;
- acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+ acomp_ctx = acomp_ctx_get_cpu(entry->pool->acomp_ctx);
mutex_lock(&acomp_ctx->mutex);
src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
@@ -990,6 +1002,7 @@ static void zswap_decompress(struct zswa
if (src != acomp_ctx->buffer)
zpool_unmap_handle(zpool, entry->handle);
+ acomp_ctx_put_cpu();
}
/*********************************
_
Patches currently in -mm which might be from yosryahmed(a)google.com are
revert-mm-zswap-fix-race-between-compression-and-cpu-hotunplug.patch
mm-zswap-disable-migration-while-using-per-cpu-acomp_ctx.patch
The patch titled
Subject: mm: clear uffd-wp PTE/PMD state on mremap()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-clear-uffd-wp-pte-pmd-state-on-mremap.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: clear uffd-wp PTE/PMD state on mremap()
Date: Tue, 7 Jan 2025 14:47:52 +0000
When mremap()ing a memory region previously registered with userfaultfd as
write-protected but without UFFD_FEATURE_EVENT_REMAP, an inconsistency in
flag clearing leads to a mismatch between the vma flags (which have
uffd-wp cleared) and the pte/pmd flags (which do not have uffd-wp
cleared). This mismatch causes a subsequent mprotect(PROT_WRITE) to
trigger a warning in page_table_check_pte_flags() due to setting the pte
to writable while uffd-wp is still set.
Fix this by always explicitly clearing the uffd-wp pte/pmd flags on any
such mremap() so that the values are consistent with the existing clearing
of VM_UFFD_WP. Be careful to clear the logical flag regardless of its
physical form; a PTE bit, a swap PTE bit, or a PTE marker. Cover PTE,
huge PMD and hugetlb paths.
Link: https://lkml.kernel.org/r/20250107144755.1871363-2-ryan.roberts@arm.com
Co-developed-by: Miko��aj Lenczewski <miko.lenczewski(a)arm.com>
Signed-off-by: Miko��aj Lenczewski <miko.lenczewski(a)arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Closes: https://lore.kernel.org/linux-mm/810b44a8-d2ae-4107-b665-5a42eae2d948@arm.c…
Fixes: 63b2d4174c4a ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/userfaultfd_k.h | 12 ++++++++++++
mm/huge_memory.c | 12 ++++++++++++
mm/hugetlb.c | 14 +++++++++++++-
mm/mremap.c | 32 +++++++++++++++++++++++++++++++-
4 files changed, 68 insertions(+), 2 deletions(-)
--- a/include/linux/userfaultfd_k.h~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/include/linux/userfaultfd_k.h
@@ -247,6 +247,13 @@ static inline bool vma_can_userfault(str
vma_is_shmem(vma);
}
+static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma)
+{
+ struct userfaultfd_ctx *uffd_ctx = vma->vm_userfaultfd_ctx.ctx;
+
+ return uffd_ctx && (uffd_ctx->features & UFFD_FEATURE_EVENT_REMAP) == 0;
+}
+
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
extern void dup_userfaultfd_complete(struct list_head *);
void dup_userfaultfd_fail(struct list_head *);
@@ -401,6 +408,11 @@ static inline bool userfaultfd_wp_async(
{
return false;
}
+
+static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma)
+{
+ return false;
+}
#endif /* CONFIG_USERFAULTFD */
--- a/mm/huge_memory.c~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/mm/huge_memory.c
@@ -2206,6 +2206,16 @@ static pmd_t move_soft_dirty_pmd(pmd_t p
return pmd;
}
+static pmd_t clear_uffd_wp_pmd(pmd_t pmd)
+{
+ if (pmd_present(pmd))
+ pmd = pmd_clear_uffd_wp(pmd);
+ else if (is_swap_pmd(pmd))
+ pmd = pmd_swp_clear_uffd_wp(pmd);
+
+ return pmd;
+}
+
bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd)
{
@@ -2244,6 +2254,8 @@ bool move_huge_pmd(struct vm_area_struct
pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
}
pmd = move_soft_dirty_pmd(pmd);
+ if (vma_has_uffd_without_event_remap(vma))
+ pmd = clear_uffd_wp_pmd(pmd);
set_pmd_at(mm, new_addr, new_pmd, pmd);
if (force_flush)
flush_pmd_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
--- a/mm/hugetlb.c~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/mm/hugetlb.c
@@ -5402,6 +5402,7 @@ static void move_huge_pte(struct vm_area
unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte,
unsigned long sz)
{
+ bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
struct hstate *h = hstate_vma(vma);
struct mm_struct *mm = vma->vm_mm;
spinlock_t *src_ptl, *dst_ptl;
@@ -5418,7 +5419,18 @@ static void move_huge_pte(struct vm_area
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
pte = huge_ptep_get_and_clear(mm, old_addr, src_pte);
- set_huge_pte_at(mm, new_addr, dst_pte, pte, sz);
+
+ if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
+ huge_pte_clear(mm, new_addr, dst_pte, sz);
+ else {
+ if (need_clear_uffd_wp) {
+ if (pte_present(pte))
+ pte = huge_pte_clear_uffd_wp(pte);
+ else if (is_swap_pte(pte))
+ pte = pte_swp_clear_uffd_wp(pte);
+ }
+ set_huge_pte_at(mm, new_addr, dst_pte, pte, sz);
+ }
if (src_ptl != dst_ptl)
spin_unlock(src_ptl);
--- a/mm/mremap.c~mm-clear-uffd-wp-pte-pmd-state-on-mremap
+++ a/mm/mremap.c
@@ -138,6 +138,7 @@ static int move_ptes(struct vm_area_stru
struct vm_area_struct *new_vma, pmd_t *new_pmd,
unsigned long new_addr, bool need_rmap_locks)
{
+ bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
struct mm_struct *mm = vma->vm_mm;
pte_t *old_pte, *new_pte, pte;
pmd_t dummy_pmdval;
@@ -216,7 +217,18 @@ static int move_ptes(struct vm_area_stru
force_flush = true;
pte = move_pte(pte, old_addr, new_addr);
pte = move_soft_dirty_pte(pte);
- set_pte_at(mm, new_addr, new_pte, pte);
+
+ if (need_clear_uffd_wp && pte_marker_uffd_wp(pte))
+ pte_clear(mm, new_addr, new_pte);
+ else {
+ if (need_clear_uffd_wp) {
+ if (pte_present(pte))
+ pte = pte_clear_uffd_wp(pte);
+ else if (is_swap_pte(pte))
+ pte = pte_swp_clear_uffd_wp(pte);
+ }
+ set_pte_at(mm, new_addr, new_pte, pte);
+ }
}
arch_leave_lazy_mmu_mode();
@@ -278,6 +290,15 @@ static bool move_normal_pmd(struct vm_ar
if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
return false;
+ /* If this pmd belongs to a uffd vma with remap events disabled, we need
+ * to ensure that the uffd-wp state is cleared from all pgtables. This
+ * means recursing into lower page tables in move_page_tables(), and we
+ * can reuse the existing code if we simply treat the entry as "not
+ * moved".
+ */
+ if (vma_has_uffd_without_event_remap(vma))
+ return false;
+
/*
* We don't have to worry about the ordering of src and dst
* ptlocks because exclusive mmap_lock prevents deadlock.
@@ -333,6 +354,15 @@ static bool move_normal_pud(struct vm_ar
if (WARN_ON_ONCE(!pud_none(*new_pud)))
return false;
+ /* If this pud belongs to a uffd vma with remap events disabled, we need
+ * to ensure that the uffd-wp state is cleared from all pgtables. This
+ * means recursing into lower page tables in move_page_tables(), and we
+ * can reuse the existing code if we simply treat the entry as "not
+ * moved".
+ */
+ if (vma_has_uffd_without_event_remap(vma))
+ return false;
+
/*
* We don't have to worry about the ordering of src and dst
* ptlocks because exclusive mmap_lock prevents deadlock.
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-clear-uffd-wp-pte-pmd-state-on-mremap.patch
The patch titled
Subject: selftests/mm: virtual_address_range: avoid reading VVAR mappings
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-virtual_address_range-avoid-reading-vvar-mappings.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Thomas Wei��schuh <thomas.weissschuh(a)linutronix.de>
Subject: selftests/mm: virtual_address_range: avoid reading VVAR mappings
Date: Tue, 07 Jan 2025 16:14:46 +0100
The virtual_address_range selftest reads from the start of each mapping
listed in /proc/self/maps.
However not all mappings are valid to be arbitrarily accessed. For
example the vvar data used for virtual clocks on x86 can only be accessed
if 1) the kernel configuration enables virtual clocks and 2) the
hypervisor provided the data for it, which can only determined by the VDSO
code itself.
Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into
dedicated mapping") the virtual clock data was split out into its own
mapping, triggering faulting accesses by virtual_address_range.
Skip the various vvar mappings in virtual_address_range to avoid errors.
Link: https://lkml.kernel.org/r/20250107-virtual_address_range-tests-v1-2-3834a2f…
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping")
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Wei��schuh <thomas.weissschuh(a)linutronix.de>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/virtual_address_range.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/mm/virtual_address_range.c~selftests-mm-virtual_address_range-avoid-reading-vvar-mappings
+++ a/tools/testing/selftests/mm/virtual_address_range.c
@@ -116,10 +116,11 @@ static int validate_complete_va_space(vo
prev_end_addr = 0;
while (fgets(line, sizeof(line), file)) {
+ int path_offset = 0;
unsigned long hop;
- if (sscanf(line, "%lx-%lx %s[rwxp-]",
- &start_addr, &end_addr, prot) != 3)
+ if (sscanf(line, "%lx-%lx %4s %*s %*s %*s %n",
+ &start_addr, &end_addr, prot, &path_offset) != 3)
ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
/* end of userspace mappings; ignore vsyscall mapping */
@@ -135,6 +136,10 @@ static int validate_complete_va_space(vo
if (prot[0] != 'r')
continue;
+ /* Only the VDSO can know if a VVAR mapping is really readable */
+ if (path_offset && !strncmp(line + path_offset, "[vvar", 5))
+ continue;
+
/*
* Confirm whether MAP_CHUNK_SIZE chunk can be found or not.
* If write succeeds, no need to check MAP_CHUNK_SIZE - 1
_
Patches currently in -mm which might be from thomas.weissschuh(a)linutronix.de are
selftests-mm-virtual_address_range-fix-error-when-commitlimit-1gib.patch
selftests-mm-virtual_address_range-avoid-reading-vvar-mappings.patch
The patch titled
Subject: selftests/mm: virtual_address_range: fix error when CommitLimit < 1GiB
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-virtual_address_range-fix-error-when-commitlimit-1gib.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Thomas Wei��schuh <thomas.weissschuh(a)linutronix.de>
Subject: selftests/mm: virtual_address_range: fix error when CommitLimit < 1GiB
Date: Tue, 07 Jan 2025 16:14:45 +0100
If not enough physical memory is available the kernel may fail mmap(); see
__vm_enough_memory() and vm_commit_limit(). In that case the logic in
validate_complete_va_space() does not make sense and will even incorrectly
fail. Instead skip the test if no mmap() succeeded.
Link: https://lkml.kernel.org/r/20250107-virtual_address_range-tests-v1-1-3834a2f…
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()")
Signed-off-by: Thomas Wei��schuh <thomas.weissschuh(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: kernel test robot <oliver.sang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/tools/testing/selftests/mm/virtual_address_range.c~selftests-mm-virtual_address_range-fix-error-when-commitlimit-1gib
+++ a/tools/testing/selftests/mm/virtual_address_range.c
@@ -178,6 +178,12 @@ int main(int argc, char *argv[])
validate_addr(ptr[i], 0);
}
lchunks = i;
+
+ if (!lchunks) {
+ ksft_test_result_skip("Not enough memory for a single chunk\n");
+ ksft_finished();
+ }
+
hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *));
if (hptr == NULL) {
ksft_test_result_skip("Memory constraint not fulfilled\n");
_
Patches currently in -mm which might be from thomas.weissschuh(a)linutronix.de are
selftests-mm-virtual_address_range-fix-error-when-commitlimit-1gib.patch
selftests-mm-virtual_address_range-avoid-reading-vvar-mappings.patch
Changes in v9:
- Added patch to unwind pm subdomains in reverse order.
It would also be possible to squash this patch into patch#2 but,
my own preference is for more granular patches like this instead of
"slipping in" functional changes in larger patches like #2. - bod
- Unwinding pm subdomain on error in patch #2.
To facilitate this change patch #1 was created - Vlad
- Drops Bjorn's RB on patch #2. There is a small churn in this patch
but enough that a reviewer might reasonably expect RB to be given again.
- Amends commit log for patch #3 further.
v8 added a lot to the commit log to provide further information but, it
is clear from the comments I received on the commit log that the added
verbiage was occlusive not elucidative.
Reduce down the commit log of patch #3 - especially Q&A item #1.
Sometimes less is more.
- Link to v8: https://lore.kernel.org/r/20241211-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v8:
- Picks up change I agreed with Vlad but failed to cherry-pick into my b4
tree - Vlad/Bod
- Rewords the commit log for patch #3. As I read it I decided I might
translate bits of it from thought-stream into English - Bod
- Link to v7: https://lore.kernel.org/r/20241211-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v7:
- Expand commit log in patch #3
I've discussed with Bjorn on IRC and video what to put into the log here
and captured most of what we discussed.
Mostly the point here is voting for voltages in the power-domain list
is up to the drivers to do with performance states/opp-tables not for the
GDSC code. - Bjorn/Bryan
- Link to v6: https://lore.kernel.org/r/20241129-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v6:
- Passes NULL to second parameter of devm_pm_domain_attach_list - Vlad
- Link to v5: https://lore.kernel.org/r/20241128-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v5:
- In-lines devm_pm_domain_attach_list() in probe() directly - Vlad
- Link to v4: https://lore.kernel.org/r/20241127-b4-linux-next-24-11-18-clock-multiple-po…
v4:
- Adds Bjorn's RB to first patch - Bjorn
- Drops the 'd' in "and int" - Bjorn
- Amends commit log of patch 3 to capture a number of open questions -
Bjorn
- Link to v3: https://lore.kernel.org/r/20241126-b4-linux-next-24-11-18-clock-multiple-po…
v3:
- Fixes commit log "per which" - Bryan
- Link to v2: https://lore.kernel.org/r/20241125-b4-linux-next-24-11-18-clock-multiple-po…
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
clk: qcom: gdsc: Release pm subdomains in reverse add order
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
clk: qcom: Support attaching GDSCs to multiple parents
drivers/clk/qcom/common.c | 6 ++++
drivers/clk/qcom/gdsc.c | 75 +++++++++++++++++++++++++++++++++++++++--------
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 69 insertions(+), 13 deletions(-)
---
base-commit: 8155b4ef3466f0e289e8fcc9e6e62f3f4dceeac2
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
This is the start of the stable review cycle for the 6.6.68 release.
There are 116 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 27 Dec 2024 15:53:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.68-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.68-rc1
Michel Dänzer <mdaenzer(a)redhat.com>
drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
Francesco Dolcini <francesco.dolcini(a)toradex.com>
net: fec: make PPS channel configurable
Francesco Dolcini <francesco.dolcini(a)toradex.com>
net: fec: refactor PPS channel configuration
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring/rw: avoid punting to io-wq directly
Jens Axboe <axboe(a)kernel.dk>
io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
Jens Axboe <axboe(a)kernel.dk>
io_uring/rw: split io_read() into a helper
Xuewen Yan <xuewen.yan(a)unisoc.com>
epoll: Add synchronous wakeup support for ep_poll_callback
Max Kellermann <max.kellermann(a)ionos.com>
ceph: fix memory leaks in __ceph_sync_read()
Alex Markuze <amarkuze(a)redhat.com>
ceph: improve error handling and short/overflow-read logic in __ceph_sync_read()
Ilya Dryomov <idryomov(a)gmail.com>
ceph: validate snapdirname option length when mounting
Zijun Hu <quic_zijuhu(a)quicinc.com>
of: Fix refcount leakage for OF node returned by __of_get_dma_parent()
Herve Codina <herve.codina(a)bootlin.com>
of: Fix error path in of_parse_phandle_with_args_map()
Jann Horn <jannh(a)google.com>
udmabuf: also check for F_SEAL_FUTURE_WRITE
Edward Adam Davis <eadavis(a)qq.com>
nilfs2: prevent use of deleted inode
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
Zijun Hu <quic_zijuhu(a)quicinc.com>
of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one()
Zijun Hu <quic_zijuhu(a)quicinc.com>
of/irq: Fix interrupt-map cell length check in of_irq_parse_imap_parent()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS/pnfs: Fix a live lock between recalled layouts and layoutget
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring: check if iowq is killed before queuing
Jann Horn <jannh(a)google.com>
io_uring: Fix registered ring file refcount leak
Tiezhu Yang <yangtiezhu(a)loongson.cn>
selftests/bpf: Use asm constraint "m" for LoongArch
Isaac J. Manjarres <isaacmanjarres(a)google.com>
selftests/memfd: run sysctl tests when PID namespace support is enabled
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Add "%s" check in test_event_printk()
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Add missing helper functions in event pointer dereference check
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Fix test_event_printk() to process entire print argument
Enzo Matsumiya <ematsumiya(a)suse.de>
smb: client: fix TCP timers deadlock after rmmod
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Play nice with protected guests in complete_hypercall_exit()
Michael Kelley <mhklinux(a)outlook.com>
Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet
Qu Wenruo <wqu(a)suse.com>
btrfs: tree-checker: reject inline extent items with 0 ref count
Matthew Wilcox (Oracle) <willy(a)infradead.org>
vmalloc: fix accounting with i915
Kairui Song <kasong(a)tencent.com>
zram: fix uninitialized ZRAM not releasing backing device
Kairui Song <kasong(a)tencent.com>
zram: refuse to use zero sized block device as backing device
Murad Masimov <m.masimov(a)maxima.ru>
hwmon: (tmp513) Fix interpretation of values of Temperature Result and Limit Registers
Murad Masimov <m.masimov(a)maxima.ru>
hwmon: (tmp513) Fix Current Register value interpretation
Murad Masimov <m.masimov(a)maxima.ru>
hwmon: (tmp513) Fix interpretation of values of Shunt Voltage and Limit Registers
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
hwmon: (tmp513) Use SI constants from units.h
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
hwmon: (tmp513) Simplify with dev_err_probe()
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
hwmon: (tmp513) Don't use "proxy" headers
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
drm/amdgpu: don't access invalid sched
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
i915/guc: Accumulate active runtime on gt reset
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
i915/guc: Ensure busyness counter increases motonically
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
i915/guc: Reset engine utilization buffer before registration
Yang Yingliang <yangyingliang(a)huawei.com>
drm/panel: novatek-nt35950: fix return value check in nt35950_probe()
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/modes: Avoid divide by zero harder in drm_mode_vrefresh()
Mika Westerberg <mika.westerberg(a)linux.intel.com>
thunderbolt: Improve redrive mode handling
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit FE910C04 rmnet compositions
Jack Wu <wojackbb(a)gmail.com>
USB: serial: option: add MediaTek T7XX compositions
Mank Wang <mank.wang(a)netprisma.com>
USB: serial: option: add Netprisma LCUK54 modules for WWAN Ready
Michal Hrusecky <michal.hrusecky(a)turris.com>
USB: serial: option: add MeiG Smart SLM770A
Daniel Swanemar <d.swanemar(a)gmail.com>
USB: serial: option: add TCL IK512 MBIM & ECM
Nathan Chancellor <nathan(a)kernel.org>
hexagon: Disable constant extender optimization for LLVM prior to 19.1.0
James Bottomley <James.Bottomley(a)HansenPartnership.com>
efivarfs: Fix error on non-existent file
Geert Uytterhoeven <geert+renesas(a)glider.be>
i2c: riic: Always round-up when calculating bus period
Dan Carpenter <dan.carpenter(a)linaro.org>
chelsio/chtls: prevent potential integer overflow on 32bit
Eric Dumazet <edumazet(a)google.com>
net: tun: fix tun_napi_alloc_frags()
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
Borislav Petkov (AMD) <bp(a)alien8.de>
EDAC/amd64: Simplify ECC check on unified memory controllers
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
mmc: mtk-sd: disable wakeup in .remove() and in the error path of .probe()
Prathamesh Shete <pshete(a)nvidia.com>
mmc: sdhci-tegra: Remove SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC quirk
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
net: mdiobus: fix an OF node reference leak
Adrian Moreno <amorenoz(a)redhat.com>
selftests: openvswitch: fix tcpdump execution
Phil Sutter <phil(a)nwl.cc>
netfilter: ipset: Fix for recursive locking warning
David Laight <David.Laight(a)ACULAB.COM>
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
net: ethernet: bgmac-platform: fix an OF node reference leak
Dan Carpenter <dan.carpenter(a)linaro.org>
net: hinic: Fix cleanup in create_rxqs/txqs()
Marios Makassikis <mmakassikis(a)freebox.fr>
ksmbd: fix broken transfers when exceeding max simultaneous operations
Marios Makassikis <mmakassikis(a)freebox.fr>
ksmbd: count all requests in req_running counter
Nikita Yushchenko <nikita.yoush(a)cogentembedded.com>
net: renesas: rswitch: rework ts tags management
Shannon Nelson <shannon.nelson(a)amd.com>
ionic: use ee->offset when returning sprom data
Brett Creeley <brett.creeley(a)amd.com>
ionic: Fix netdev notifier unregister on failure
Eric Dumazet <edumazet(a)google.com>
netdevsim: prevent bad user input in nsim_dev_health_break_write()
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: mscc: ocelot: fix incorrect IFH SRC_PORT field in ocelot_ifh_set_basic()
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check return value of sock_recvmsg when draining clc data
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check smcd_v2_ext_offset when receiving proposal msg
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check v2_ext_offset/eid_cnt/ism_gid_cnt when receiving proposal msg
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check iparea_offset and ipv6_prefixes_cnt when receiving proposal msg
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check sndbuf_space again after NOSPACE flag is set in smc_poll
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: protect link down work from execute after lgr freed
Huaisheng Ye <huaisheng.ye(a)intel.com>
cxl/region: Fix region creation for greater than x2 switches
Davidlohr Bueso <dave(a)stgolabs.net>
cxl/pci: Fix potential bogus return value upon successful probing
Olaf Hering <olaf(a)aepfle.de>
tools: hv: change permissions of NetworkManager configuration file
Darrick J. Wong <djwong(a)kernel.org>
xfs: reset rootdir extent size hint after growfsrt
Darrick J. Wong <djwong(a)kernel.org>
xfs: take m_growlock when running growfsrt
Darrick J. Wong <djwong(a)kernel.org>
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
Zizhi Wo <wozizhi(a)huawei.com>
xfs: Fix the owner setting issue for rmap query in xfs fsmap
Darrick J. Wong <djwong(a)kernel.org>
xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
Darrick J. Wong <djwong(a)kernel.org>
xfs: attr forks require attr, not attr2
Julian Sun <sunjunchao2870(a)gmail.com>
xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
Darrick J. Wong <djwong(a)kernel.org>
xfs: fix file_path handling in tracepoints
Chen Ni <nichen(a)iscas.ac.cn>
xfs: convert comma to semicolon
lei lu <llfamsec(a)gmail.com>
xfs: don't walk off the end of a directory data block
John Garry <john.g.garry(a)oracle.com>
xfs: Fix xfs_prepare_shift() range for RT
John Garry <john.g.garry(a)oracle.com>
xfs: Fix xfs_flush_unmap_range() range for RT
Darrick J. Wong <djwong(a)kernel.org>
xfs: create a new helper to return a file's allocation unit
Darrick J. Wong <djwong(a)kernel.org>
xfs: declare xfs_file.c symbols in xfs_file.h
Darrick J. Wong <djwong(a)kernel.org>
xfs: use consistent uid/gid when grabbing dquots for inodes
Darrick J. Wong <djwong(a)kernel.org>
xfs: verify buffer, inode, and dquot items every tx commit
Christoph Hellwig <hch(a)lst.de>
xfs: fix the contact address for the sysfs ABI documentation
Vladimir Riabchun <ferr.lambarginio(a)gmail.com>
i2c: pnx: Fix timeout in wait functions
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Do not scan and remove the P2SB device when it is unhidden
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Move P2SB hide and unhide code to p2sb_scan_and_cache()
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Introduce the global flag p2sb_hidden_by_bios
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Factor out p2sb_read_from_cache()
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: p2sb: Make p2sb_get_devfn() return void
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
net: stmmac: fix TSO DMA API usage causing oops
Roger Quadros <rogerq(a)kernel.org>
usb: cdns3: Add quirk flag to enable suspend residency
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
PCI/AER: Disable AER service on suspend
Vidya Sagar <vidyas(a)nvidia.com>
PCI: Use preserve_config in place of pci_flags
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: Intel: sof_sdw: add quirk for Dell SKU 0B8C
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: Intel: sof_sdw: fix jack detection on ADL-N variant RVP
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: Loongson64: DTS: Fix msi node for ls7a
Roger Quadros <rogerq(a)kernel.org>
usb: cdns3-ti: Add workaround for Errata i2409
Ajit Khaparde <ajit.khaparde(a)broadcom.com>
PCI: Add ACS quirk for Broadcom BCM5760X NIC
Jiwei Sun <sunjw10(a)lenovo.com>
PCI: vmd: Create domain symlink before pci_bus_add_devices()
Peng Hongchi <hongchi.peng(a)siengine.com>
usb: dwc2: gadget: Don't write invalid mapped sg entries into dma_desc with iommu enabled
Lion Ackermann <nnamrec(a)gmail.com>
net: sched: fix ordering of qlen adjustment
-------------
Diffstat:
Documentation/ABI/testing/sysfs-fs-xfs | 8 +-
Makefile | 4 +-
arch/hexagon/Makefile | 6 +
.../boot/dts/loongson/loongson64g_4core_ls7a.dts | 1 +
arch/x86/kvm/cpuid.c | 31 +++-
arch/x86/kvm/cpuid.h | 1 +
arch/x86/kvm/x86.c | 4 +-
drivers/block/zram/zram_drv.c | 15 +-
drivers/cxl/core/region.c | 25 ++-
drivers/cxl/pci.c | 3 +-
drivers/dma-buf/udmabuf.c | 2 +-
drivers/edac/amd64_edac.c | 32 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +-
drivers/gpu/drm/drm_modes.c | 11 +-
drivers/gpu/drm/i915/gt/intel_engine_types.h | 5 +
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 41 ++++-
drivers/gpu/drm/panel/panel-novatek-nt35950.c | 4 +-
drivers/hv/hv_kvp.c | 6 +
drivers/hv/hv_snapshot.c | 6 +
drivers/hv/hv_util.c | 9 +
drivers/hv/hyperv_vmbus.h | 2 +
drivers/hwmon/tmp513.c | 74 ++++----
drivers/i2c/busses/i2c-pnx.c | 4 +-
drivers/i2c/busses/i2c-riic.c | 2 +-
drivers/mmc/host/mtk-sd.c | 2 +
drivers/mmc/host/sdhci-tegra.c | 1 -
drivers/net/ethernet/broadcom/bgmac-platform.c | 5 +-
.../chelsio/inline_crypto/chtls/chtls_main.c | 5 +-
drivers/net/ethernet/freescale/fec_ptp.c | 11 +-
drivers/net/ethernet/huawei/hinic/hinic_main.c | 2 +
drivers/net/ethernet/mscc/ocelot.c | 2 +-
.../net/ethernet/pensando/ionic/ionic_ethtool.c | 4 +-
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 4 +-
drivers/net/ethernet/renesas/rswitch.c | 68 +++----
drivers/net/ethernet/renesas/rswitch.h | 13 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +-
drivers/net/mdio/fwnode_mdio.c | 13 +-
drivers/net/netdevsim/health.c | 2 +
drivers/net/tun.c | 2 +-
drivers/of/address.c | 2 +-
drivers/of/base.c | 15 +-
drivers/of/irq.c | 2 +
drivers/pci/controller/pci-host-common.c | 4 -
drivers/pci/controller/vmd.c | 8 +-
drivers/pci/pcie/aer.c | 18 ++
drivers/pci/probe.c | 22 ++-
drivers/pci/quirks.c | 4 +
drivers/platform/x86/p2sb.c | 94 ++++++----
drivers/thunderbolt/tb.c | 41 +++++
drivers/usb/cdns3/cdns3-ti.c | 15 +-
drivers/usb/cdns3/core.h | 1 +
drivers/usb/cdns3/drd.c | 10 +-
drivers/usb/cdns3/drd.h | 3 +
drivers/usb/dwc2/gadget.c | 4 +-
drivers/usb/serial/option.c | 27 +++
fs/btrfs/tree-checker.c | 27 ++-
fs/ceph/file.c | 34 ++--
fs/ceph/super.c | 2 +
fs/efivarfs/inode.c | 2 +-
fs/efivarfs/internal.h | 1 -
fs/efivarfs/super.c | 3 -
fs/eventpoll.c | 5 +-
fs/nfs/pnfs.c | 2 +-
fs/nilfs2/btnode.c | 1 +
fs/nilfs2/gcinode.c | 2 +-
fs/nilfs2/inode.c | 13 +-
fs/nilfs2/namei.c | 5 +
fs/nilfs2/nilfs.h | 1 +
fs/smb/client/connect.c | 36 ++--
fs/smb/server/connection.c | 18 +-
fs/smb/server/connection.h | 1 -
fs/smb/server/server.c | 7 +-
fs/smb/server/server.h | 1 +
fs/smb/server/transport_ipc.c | 5 +-
fs/xfs/Kconfig | 12 ++
fs/xfs/libxfs/xfs_dir2_data.c | 31 +++-
fs/xfs/libxfs/xfs_dir2_priv.h | 7 +
fs/xfs/libxfs/xfs_quota_defs.h | 2 +-
fs/xfs/libxfs/xfs_trans_resv.c | 28 +--
fs/xfs/scrub/agheader_repair.c | 2 +-
fs/xfs/scrub/bmap.c | 8 +-
fs/xfs/scrub/trace.h | 10 +-
fs/xfs/xfs.h | 4 +
fs/xfs/xfs_bmap_util.c | 22 ++-
fs/xfs/xfs_buf_item.c | 32 ++++
fs/xfs/xfs_dquot_item.c | 31 ++++
fs/xfs/xfs_file.c | 29 ++-
fs/xfs/xfs_file.h | 15 ++
fs/xfs/xfs_fsmap.c | 6 +-
fs/xfs/xfs_inode.c | 29 ++-
fs/xfs/xfs_inode.h | 2 +
fs/xfs/xfs_inode_item.c | 32 ++++
fs/xfs/xfs_ioctl.c | 12 ++
fs/xfs/xfs_iops.c | 1 +
fs/xfs/xfs_iops.h | 3 -
fs/xfs/xfs_rtalloc.c | 78 ++++++--
fs/xfs/xfs_symlink.c | 8 +-
include/linux/hyperv.h | 1 +
include/linux/io_uring.h | 4 +-
include/linux/wait.h | 1 +
io_uring/io_uring.c | 15 +-
io_uring/io_uring.h | 1 -
io_uring/rw.c | 31 +++-
kernel/trace/trace_events.c | 199 ++++++++++++++++-----
mm/vmalloc.c | 6 +-
net/netfilter/ipset/ip_set_list_set.c | 3 +
net/netfilter/ipvs/ip_vs_conn.c | 4 +-
net/sched/sch_cake.c | 2 +-
net/sched/sch_choke.c | 2 +-
net/smc/af_smc.c | 18 +-
net/smc/smc_clc.c | 17 +-
net/smc/smc_clc.h | 22 ++-
net/smc/smc_core.c | 9 +-
sound/soc/intel/boards/sof_sdw.c | 18 ++
tools/hv/hv_set_ifconfig.sh | 2 +-
tools/testing/selftests/bpf/sdt.h | 2 +
tools/testing/selftests/memfd/memfd_test.c | 14 +-
.../selftests/net/openvswitch/openvswitch.sh | 6 +-
119 files changed, 1223 insertions(+), 441 deletions(-)
This is the start of the stable review cycle for the 6.1.122 release.
There are 83 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 27 Dec 2024 15:53:30 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.122-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.122-rc1
Michel Dänzer <mdaenzer(a)redhat.com>
drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
Francesco Dolcini <francesco.dolcini(a)toradex.com>
dt-bindings: net: fec: add pps channel property
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring/rw: avoid punting to io-wq directly
Jens Axboe <axboe(a)kernel.dk>
io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN
Jens Axboe <axboe(a)kernel.dk>
io_uring/rw: split io_read() into a helper
Xuewen Yan <xuewen.yan(a)unisoc.com>
epoll: Add synchronous wakeup support for ep_poll_callback
Jan Kara <jack(a)suse.cz>
udf: Fix directory iteration for longer tail extents
Ilya Dryomov <idryomov(a)gmail.com>
ceph: validate snapdirname option length when mounting
Zijun Hu <quic_zijuhu(a)quicinc.com>
of: Fix refcount leakage for OF node returned by __of_get_dma_parent()
Herve Codina <herve.codina(a)bootlin.com>
of: Fix error path in of_parse_phandle_with_args_map()
Jann Horn <jannh(a)google.com>
udmabuf: also check for F_SEAL_FUTURE_WRITE
Edward Adam Davis <eadavis(a)qq.com>
nilfs2: prevent use of deleted inode
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
Zijun Hu <quic_zijuhu(a)quicinc.com>
of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one()
Zijun Hu <quic_zijuhu(a)quicinc.com>
of/irq: Fix interrupt-map cell length check in of_irq_parse_imap_parent()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS/pnfs: Fix a live lock between recalled layouts and layoutget
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring: check if iowq is killed before queuing
Jann Horn <jannh(a)google.com>
io_uring: Fix registered ring file refcount leak
Tiezhu Yang <yangtiezhu(a)loongson.cn>
selftests/bpf: Use asm constraint "m" for LoongArch
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Add "%s" check in test_event_printk()
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Add missing helper functions in event pointer dereference check
Steven Rostedt <rostedt(a)goodmis.org>
tracing: Fix test_event_printk() to process entire print argument
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Play nice with protected guests in complete_hypercall_exit()
Michael Kelley <mhklinux(a)outlook.com>
Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet
Qu Wenruo <wqu(a)suse.com>
btrfs: tree-checker: reject inline extent items with 0 ref count
Kairui Song <kasong(a)tencent.com>
zram: fix uninitialized ZRAM not releasing backing device
Kairui Song <kasong(a)tencent.com>
zram: refuse to use zero sized block device as backing device
Geert Uytterhoeven <geert+renesas(a)glider.be>
sh: clk: Fix clk_enable() to return 0 on NULL clk
Murad Masimov <m.masimov(a)maxima.ru>
hwmon: (tmp513) Fix interpretation of values of Temperature Result and Limit Registers
Murad Masimov <m.masimov(a)maxima.ru>
hwmon: (tmp513) Fix Current Register value interpretation
Murad Masimov <m.masimov(a)maxima.ru>
hwmon: (tmp513) Fix interpretation of values of Shunt Voltage and Limit Registers
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
hwmon: (tmp513) Use SI constants from units.h
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
hwmon: (tmp513) Simplify with dev_err_probe()
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
hwmon: (tmp513) Don't use "proxy" headers
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com>
drm/amdgpu: don't access invalid sched
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
i915/guc: Accumulate active runtime on gt reset
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
i915/guc: Ensure busyness counter increases motonically
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
i915/guc: Reset engine utilization buffer before registration
Yang Yingliang <yangyingliang(a)huawei.com>
drm/panel: novatek-nt35950: fix return value check in nt35950_probe()
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/modes: Avoid divide by zero harder in drm_mode_vrefresh()
Mika Westerberg <mika.westerberg(a)linux.intel.com>
thunderbolt: Improve redrive mode handling
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit FE910C04 rmnet compositions
Jack Wu <wojackbb(a)gmail.com>
USB: serial: option: add MediaTek T7XX compositions
Mank Wang <mank.wang(a)netprisma.com>
USB: serial: option: add Netprisma LCUK54 modules for WWAN Ready
Michal Hrusecky <michal.hrusecky(a)turris.com>
USB: serial: option: add MeiG Smart SLM770A
Daniel Swanemar <d.swanemar(a)gmail.com>
USB: serial: option: add TCL IK512 MBIM & ECM
Nathan Chancellor <nathan(a)kernel.org>
hexagon: Disable constant extender optimization for LLVM prior to 19.1.0
James Bottomley <James.Bottomley(a)HansenPartnership.com>
efivarfs: Fix error on non-existent file
Geert Uytterhoeven <geert+renesas(a)glider.be>
i2c: riic: Always round-up when calculating bus period
Dan Carpenter <dan.carpenter(a)linaro.org>
chelsio/chtls: prevent potential integer overflow on 32bit
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
Prathamesh Shete <pshete(a)nvidia.com>
mmc: sdhci-tegra: Remove SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC quirk
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
net: mdiobus: fix an OF node reference leak
Phil Sutter <phil(a)nwl.cc>
netfilter: ipset: Fix for recursive locking warning
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
net: ethernet: bgmac-platform: fix an OF node reference leak
Dan Carpenter <dan.carpenter(a)linaro.org>
net: hinic: Fix cleanup in create_rxqs/txqs()
Shannon Nelson <shannon.nelson(a)amd.com>
ionic: use ee->offset when returning sprom data
Brett Creeley <brett.creeley(a)amd.com>
ionic: Fix netdev notifier unregister on failure
Eric Dumazet <edumazet(a)google.com>
netdevsim: prevent bad user input in nsim_dev_health_break_write()
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: mscc: ocelot: fix incorrect IFH SRC_PORT field in ocelot_ifh_set_basic()
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check return value of sock_recvmsg when draining clc data
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check smcd_v2_ext_offset when receiving proposal msg
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check iparea_offset and ipv6_prefixes_cnt when receiving proposal msg
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: check sndbuf_space again after NOSPACE flag is set in smc_poll
Guangguan Wang <guangguan.wang(a)linux.alibaba.com>
net/smc: protect link down work from execute after lgr freed
Huaisheng Ye <huaisheng.ye(a)intel.com>
cxl/region: Fix region creation for greater than x2 switches
Vladimir Riabchun <ferr.lambarginio(a)gmail.com>
i2c: pnx: Fix timeout in wait functions
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Do not scan and remove the P2SB device when it is unhidden
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Move P2SB hide and unhide code to p2sb_scan_and_cache()
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Introduce the global flag p2sb_hidden_by_bios
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
p2sb: Factor out p2sb_read_from_cache()
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: p2sb: Make p2sb_get_devfn() return void
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
PCI: Introduce pci_resource_n()
Peng Hongchi <hongchi.peng(a)siengine.com>
usb: dwc2: gadget: Don't write invalid mapped sg entries into dma_desc with iommu enabled
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: Loongson64: DTS: Fix msi node for ls7a
Ajit Khaparde <ajit.khaparde(a)broadcom.com>
PCI: Add ACS quirk for Broadcom BCM5760X NIC
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: Intel: sof_sdw: add quirk for Dell SKU 0B8C
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: Intel: sof_sdw: fix jack detection on ADL-N variant RVP
Roger Quadros <rogerq(a)kernel.org>
usb: cdns3: Add quirk flag to enable suspend residency
Jiwei Sun <sunjw10(a)lenovo.com>
PCI: vmd: Create domain symlink before pci_bus_add_devices()
Vidya Sagar <vidyas(a)nvidia.com>
PCI: Use preserve_config in place of pci_flags
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
PCI/AER: Disable AER service on suspend
Lion Ackermann <nnamrec(a)gmail.com>
net: sched: fix ordering of qlen adjustment
-------------
Diffstat:
Documentation/devicetree/bindings/net/fsl,fec.yaml | 7 +
Makefile | 4 +-
arch/hexagon/Makefile | 6 +
.../boot/dts/loongson/loongson64g_4core_ls7a.dts | 1 +
arch/x86/kvm/cpuid.c | 31 +++-
arch/x86/kvm/cpuid.h | 1 +
arch/x86/kvm/x86.c | 4 +-
drivers/block/zram/zram_drv.c | 15 +-
drivers/cxl/core/region.c | 25 ++-
drivers/dma-buf/udmabuf.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +-
drivers/gpu/drm/drm_modes.c | 11 +-
drivers/gpu/drm/i915/gt/intel_engine_types.h | 5 +
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 41 ++++-
drivers/gpu/drm/panel/panel-novatek-nt35950.c | 4 +-
drivers/hv/hv_kvp.c | 6 +
drivers/hv/hv_snapshot.c | 6 +
drivers/hv/hv_util.c | 9 +
drivers/hv/hyperv_vmbus.h | 2 +
drivers/hwmon/tmp513.c | 74 ++++----
drivers/i2c/busses/i2c-pnx.c | 4 +-
drivers/i2c/busses/i2c-riic.c | 2 +-
drivers/mmc/host/sdhci-tegra.c | 1 -
drivers/net/ethernet/broadcom/bgmac-platform.c | 5 +-
.../chelsio/inline_crypto/chtls/chtls_main.c | 5 +-
drivers/net/ethernet/huawei/hinic/hinic_main.c | 2 +
drivers/net/ethernet/mscc/ocelot.c | 2 +-
.../net/ethernet/pensando/ionic/ionic_ethtool.c | 4 +-
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 4 +-
drivers/net/mdio/fwnode_mdio.c | 13 +-
drivers/net/netdevsim/health.c | 2 +
drivers/of/address.c | 2 +-
drivers/of/base.c | 15 +-
drivers/of/irq.c | 2 +
drivers/pci/controller/pci-host-common.c | 4 -
drivers/pci/controller/vmd.c | 8 +-
drivers/pci/pcie/aer.c | 18 ++
drivers/pci/probe.c | 22 ++-
drivers/pci/quirks.c | 4 +
drivers/platform/x86/p2sb.c | 94 ++++++----
drivers/sh/clk/core.c | 2 +-
drivers/thunderbolt/tb.c | 41 +++++
drivers/usb/cdns3/core.h | 1 +
drivers/usb/cdns3/drd.c | 10 +-
drivers/usb/cdns3/drd.h | 3 +
drivers/usb/dwc2/gadget.c | 4 +-
drivers/usb/serial/option.c | 27 +++
fs/btrfs/tree-checker.c | 27 ++-
fs/ceph/super.c | 2 +
fs/efivarfs/inode.c | 2 +-
fs/efivarfs/internal.h | 1 -
fs/efivarfs/super.c | 3 -
fs/eventpoll.c | 5 +-
fs/nfs/pnfs.c | 2 +-
fs/nilfs2/btnode.c | 1 +
fs/nilfs2/gcinode.c | 2 +-
fs/nilfs2/inode.c | 13 +-
fs/nilfs2/namei.c | 5 +
fs/nilfs2/nilfs.h | 1 +
fs/udf/directory.c | 2 +-
include/linux/hyperv.h | 1 +
include/linux/io_uring.h | 4 +-
include/linux/pci.h | 15 +-
include/linux/wait.h | 1 +
io_uring/io_uring.c | 13 +-
io_uring/io_uring.h | 1 -
io_uring/rw.c | 31 +++-
kernel/trace/trace_events.c | 199 ++++++++++++++++-----
net/netfilter/ipset/ip_set_list_set.c | 3 +
net/sched/sch_cake.c | 2 +-
net/sched/sch_choke.c | 2 +-
net/smc/af_smc.c | 15 +-
net/smc/smc_clc.c | 9 +
net/smc/smc_clc.h | 14 +-
net/smc/smc_core.c | 9 +-
sound/soc/intel/boards/sof_sdw.c | 18 ++
tools/testing/selftests/bpf/sdt.h | 2 +
78 files changed, 734 insertions(+), 236 deletions(-)
The patch titled
Subject: mm/hugetlb: fix avoid_reserve to allow taking folio from subpool
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix avoid_reserve to allow taking folio from subpool
Date: Tue, 7 Jan 2025 15:39:56 -0500
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 22 +++-------------------
1 file changed, 3 insertions(+), 19 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool
+++ a/mm/hugetlb.c
@@ -1394,8 +1394,7 @@ static unsigned long available_huge_page
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1411,10 +1410,6 @@ static struct folio *dequeue_hugetlb_fol
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1430,7 +1425,7 @@ static struct folio *dequeue_hugetlb_fol
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3047,17 +3042,6 @@ struct folio *alloc_hugetlb_folio(struct
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3080,7 +3064,7 @@ struct folio *alloc_hugetlb_folio(struct
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool.patch
mm-hugetlb-stop-using-avoid_reserve-flag-in-fork.patch
mm-hugetlb-rename-avoid_reserve-to-cow_from_owner.patch
mm-hugetlb-clean-up-map-global-resv-accounting-when-allocate.patch
mm-hugetlb-simplify-vma_has_reserves.patch
mm-hugetlb-drop-vma_has_reserves.patch
mm-hugetlb-unify-restore-reserve-accounting-for-new-allocations.patch