- Linux-stable-mirror - lists.linaro.org

by Valeria Pérez

¿Cuánto cuesta una mala contratación? body { margin: 0; padding: 0; font-family: Arial, Helvetica, sans-serif; font-size: 14px; color: #333; background-color: #ffffff; } table { border-spacing: 0; width: 100%; max-width: 600px; margin: auto; } td { padding: 12px 20px; } a { color: #1a73e8; text-decoration: none; } .footer { font-size: 12px; color: #888888; text-align: center; } Una mala contratación cuesta 3X el salario. Evítalo con datos, no percepciones. Hola, , ¿Sabías que una mala contratación cuesta hasta 3 veces el salario anual? El 74% de empresas admite haber contratado a la persona equivocada. El motivo: decisiones basadas en percepciones, no en datos objetivos. PsicoSmart te ayuda a evaluar talento con precisión: 31 pruebas psicométricas validadas para medir liderazgo, honestidad e inteligencia 2,500+ exámenes técnicos especializados por industria Verificación de identidad con captura fotográfica automática Resultados en minutos, accesible desde cualquier dispositivo Reduce hasta 60% el riesgo de error en selección. ¿Quieres una demostración gratuita? Responde este correo y te contacto en menos de 24 horas. Saludos, -------------- Atte.: Valeria Pérez Ciudad de México: (55) 5018 0565 WhatsApp: +52 33 1607 2089 Si no deseas recibir más correos, haz clic aquí para darte de baja. Para remover su dirección de esta lista haga <a href="https://s1.arrobamail.com/unsuscribe.php?id=yiwtsrewiswqtrseup">click aquí</a>

7 hours, 24 minutes

1
0
0 0

[PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE

by Yosry Ahmed

KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs. On first glance, it seems like the check should actually be for guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the host to support NPTs but the guest CPUID to not advertise it. However, the consistency check is not architectural to begin with. The APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored if X86_FEATURE_NPT is not available for L1. Apart from the consistency check, this is currently the case because NP_ENABLE is actually copied from VMCB01 to VMCB02, not from VMCB12. On the other hand, the APM does mention two other consistency checks for NP_ENABLE, both of which are missing (paraphrased): In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024): If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to 1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID) In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024): When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the following conditions are considered illegal state combinations, in addition to those mentioned in “Canonicalization and Consistency Checks”: • Any MBZ bit of nCR3 is set. • Any G_PAT.PA field has an unsupported type encoding or any reserved field in G_PAT has a nonzero value. Replace the existing consistency check with consistency checks on hCR0.PG and nCR3. Only perform the consistency checks if L1 has X86_FEATURE_NPT and NP_ENABLE is set in VMCB12. The G_PAT consistency check will be addressed separately. As it is now possible for an L1 to run L2 with NP_ENABLE set but ignored, also check that L1 has X86_FEATURE_NPT in nested_npt_enabled(). Pass L1's CR0 to __nested_vmcb_check_controls(). In nested_vmcb_check_controls(), L1's CR0 is available through kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later through nested_vmcb02_prepare_save() -> svm_set_cr0(). In svm_set_nested_state(), L1's CR0 is available in the captured save area, as svm_get_nested_state() captures L1's save area when running L2, and L1's CR0 is stashed in VMCB01 on nested VMRUN (in nested_svm_vmrun()). Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN") Cc: stable(a)vger.kernel.org Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev> --- arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++----- arch/x86/kvm/svm/svm.h | 3 ++- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 74211c5c68026..87bcc5eff96e8 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size) } static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu, - struct vmcb_ctrl_area_cached *control) + struct vmcb_ctrl_area_cached *control, + unsigned long l1_cr0) { if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN))) return false; @@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu, if (CC(control->asid == 0)) return false; - if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled)) - return false; + if (nested_npt_enabled(to_svm(vcpu))) { + if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3))) + return false; + if (CC(!(l1_cr0 & X86_CR0_PG))) + return false; + } if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa, MSRPM_SIZE))) @@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl; - return __nested_vmcb_check_controls(vcpu, ctl); + /* + * Make sure we did not enter guest mode yet, in which case + * kvm_read_cr0() could return L2's CR0. + */ + WARN_ON_ONCE(is_guest_mode(vcpu)); + return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu)); } static @@ -1831,7 +1841,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, ret = -EINVAL; __nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl); - if (!__nested_vmcb_check_controls(vcpu, &ctl_cached)) + /* 'save' contains L1 state saved from before VMRUN */ + if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0)) goto out_free; /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index f6fb70ddf7272..3e805a43ffcdb 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm) static inline bool nested_npt_enabled(struct vcpu_svm *svm) { - return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE; + return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) && + svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE; } static inline bool nested_vnmi_enabled(struct vcpu_svm *svm) -- 2.51.2.1041.gc1ab5b90ca-goog

7 hours, 32 minutes

2
6
0 0

[merged mm-nonmm-stable] ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: ocfs2: fix kernel BUG in ocfs2_find_victim_chain has been removed from the -mm tree. Its filename was ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Prithvi Tambewagh <activprithvi(a)gmail.com> Subject: ocfs2: fix kernel BUG in ocfs2_find_victim_chain Date: Mon, 1 Dec 2025 18:37:11 +0530 syzbot reported a kernel BUG in ocfs2_find_victim_chain() because the `cl_next_free_rec` field of the allocation chain list (next free slot in the chain list) is 0, triggring the BUG_ON(!cl->cl_next_free_rec) condition in ocfs2_find_victim_chain() and panicking the kernel. To fix this, an if condition is introduced in ocfs2_claim_suballoc_bits(), just before calling ocfs2_find_victim_chain(), the code block in it being executed when either of the following conditions is true: 1. `cl_next_free_rec` is equal to 0, indicating that there are no free chains in the allocation chain list 2. `cl_next_free_rec` is greater than `cl_count` (the total number of chains in the allocation chain list) Either of them being true is indicative of the fact that there are no chains left for usage. This is addressed using ocfs2_error(), which prints the error log for debugging purposes, rather than panicking the kernel. Link: https://lkml.kernel.org/r/20251201130711.143900-1-activprithvi@gmail.com Signed-off-by: Prithvi Tambewagh <activprithvi(a)gmail.com> Reported-by: syzbot+96d38c6e1655c1420a72(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=96d38c6e1655c1420a72 Tested-by: syzbot+96d38c6e1655c1420a72(a)syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Jun Piao <piaojun(a)huawei.com> Cc: Heming Zhao <heming.zhao(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/suballoc.c | 10 ++++++++++ 1 file changed, 10 insertions(+) --- a/fs/ocfs2/suballoc.c~ocfs2-fix-kernel-bug-in-ocfs2_find_victim_chain +++ a/fs/ocfs2/suballoc.c @@ -1993,6 +1993,16 @@ static int ocfs2_claim_suballoc_bits(str } cl = (struct ocfs2_chain_list *) &fe->id2.i_chain; + if (!le16_to_cpu(cl->cl_next_free_rec) || + le16_to_cpu(cl->cl_next_free_rec) > le16_to_cpu(cl->cl_count)) { + status = ocfs2_error(ac->ac_inode->i_sb, + "Chain allocator dinode %llu has invalid next " + "free chain record %u, but only %u total\n", + (unsigned long long)le64_to_cpu(fe->i_blkno), + le16_to_cpu(cl->cl_next_free_rec), + le16_to_cpu(cl->cl_count)); + goto bail; + } victim = ocfs2_find_victim_chain(cl); ac->ac_chain = victim; _ Patches currently in -mm which might be from activprithvi(a)gmail.com are

8 hours, 1 minute

1
0
0 0

[merged mm-stable] mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather has been removed from the -mm tree. Its filename was mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: "David Hildenbrand (Red Hat)" <david(a)kernel.org> Subject: mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather Date: Fri, 5 Dec 2025 22:35:58 +0100 As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race") we can end up in some situations where we perform so many IPI broadcasts when unsharing hugetlb PMD page tables that it severely regresses some workloads. In particular, when we fork()+exit(), or when we munmap() a large area backed by many shared PMD tables, we perform one IPI broadcast per unshared PMD table. There are two optimizations to be had: (1) When we process (unshare) multiple such PMD tables, such as during exit(), it is sufficient to send a single IPI broadcast (as long as we respect locking rules) instead of one per PMD table. Locking prevents that any of these PMD tables could get reuse before we drop the lock. (2) When we are not the last sharer (> 2 users including us), there is no need to send the IPI broadcast. The shared PMD tables cannot become exclusive (fully unshared) before an IPI will be broadcasted by the last sharer. Concurrent GUP-fast could walk into a PMD table just before we unshared it. It could then succeed in grabbing a page from the shared page table even after munmap() etc succeeded (and supressed an IPI). But there is not difference compared to GUP-fast just sleeping for a while after grabbing the page and re-enabling IRQs. Most importantly, GUP-fast will never walk into page tables that are no-longer shared, because the last sharer will issue an IPI broadcast. (if ever required, checking whether the PUD changed in GUP-fast after grabbing the page like we do in the PTE case could handle this) So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather infrastructure so we can implement these optimizations and demystify the code at least a bit. Extend the mmu_gather infrastructure to be able to deal with our special hugetlb PMD table sharing implementation. We'll consolidate the handling for (full) unsharing of PMD tables in tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track in "struct mmu_gather" whether we had (full) unsharing of PMD tables. Because locking is very special (concurrent unsharing+reuse must be prevented), we disallow deferring flushing to tlb_finish_mmu() and instead require an explicit earlier call to tlb_flush_unshared_tables(). From hugetlb code, we call huge_pmd_unshare_flush() where we make sure that the expected lock protecting us from concurrent unsharing+reuse is still held. Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that tlb_flush_unshared_tables() was properly called earlier. Document it all properly. Notes about tlb_remove_table_sync_one() interaction with unsharing: There are two fairly tricky things: (1) tlb_remove_table_sync_one() is a NOP on architectures without CONFIG_MMU_GATHER_RCU_TABLE_FREE. Here, the assumption is that the previous TLB flush would send an IPI to all relevant CPUs. Careful: some architectures like x86 only send IPIs to all relevant CPUs when tlb->freed_tables is set. The relevant architectures should be selecting MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable kernels and it might have been problematic before this patch. Also, the arch flushing behavior (independent of IPIs) is different when tlb->freed_tables is set. Do we have to enlighten them to also take care of tlb->unshared_tables? So far we didn't care, so hopefully we are fine. Of course, we could be setting tlb->freed_tables as well, but that might then unnecessarily flush too much, because the semantics of tlb->freed_tables are a bit fuzzy. This patch changes nothing in this regard. (2) tlb_remove_table_sync_one() is not a NOP on architectures with CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync. Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB) we still issue IPIs during TLB flushes and don't actually need the second tlb_remove_table_sync_one(). This optimized can be implemented on top of this, by checking e.g., in tlb_remove_table_sync_one() whether we really need IPIs. But as described in (1), it really must honor tlb->freed_tables then to send IPIs to all relevant CPUs. Further note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a concern, as we are holding the i_mmap_lock the whole time, preventing concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed separately as a cleanup later. There are plenty more cleanups to be had, but they have to wait until this is fixed. Link: https://lkml.kernel.org/r/20251205213558.2980480-5-david@kernel.org Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race") Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org> Reported-by: Uschakow, Stanislav" <suschako(a)amazon.de> Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/ Tested-by: Laurence Oberman <loberman(a)redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Jann Horn <jannh(a)google.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Liu Shixin <liushixin2(a)huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Will Deacon <will(a)kernel.org> Cc: Lance Yang <lance.yang(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/asm-generic/tlb.h | 69 ++++++++++++++++++++ include/linux/hugetlb.h | 19 +++-- mm/hugetlb.c | 121 ++++++++++++++++++++---------------- mm/mmu_gather.c | 6 + mm/mprotect.c | 2 mm/rmap.c | 25 +++++-- 6 files changed, 173 insertions(+), 69 deletions(-) --- a/include/asm-generic/tlb.h~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather +++ a/include/asm-generic/tlb.h @@ -364,6 +364,17 @@ struct mmu_gather { unsigned int vma_huge : 1; unsigned int vma_pfn : 1; + /* + * Did we unshare (unmap) any shared page tables? + */ + unsigned int unshared_tables : 1; + + /* + * Did we unshare any page tables such that they are now exclusive + * and could get reused+modified by the new owner? + */ + unsigned int fully_unshared_tables : 1; + unsigned int batch_count; #ifndef CONFIG_MMU_GATHER_NO_GATHER @@ -400,6 +411,7 @@ static inline void __tlb_reset_range(str tlb->cleared_pmds = 0; tlb->cleared_puds = 0; tlb->cleared_p4ds = 0; + tlb->unshared_tables = 0; /* * Do not reset mmu_gather::vma_* fields here, we do not * call into tlb_start_vma() again to set them if there is an @@ -484,7 +496,7 @@ static inline void tlb_flush_mmu_tlbonly * these bits. */ if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds || - tlb->cleared_puds || tlb->cleared_p4ds)) + tlb->cleared_puds || tlb->cleared_p4ds || tlb->unshared_tables)) return; tlb_flush(tlb); @@ -773,6 +785,61 @@ static inline bool huge_pmd_needs_flush( } #endif +#ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING +static inline void tlb_unshare_pmd_ptdesc(struct mmu_gather *tlb, struct ptdesc *pt, + unsigned long addr) +{ + /* + * The caller must make sure that concurrent unsharing + exclusive + * reuse is impossible until tlb_flush_unshared_tables() was called. + */ + VM_WARN_ON_ONCE(!ptdesc_pmd_is_shared(pt)); + ptdesc_pmd_pts_dec(pt); + + /* Clearing a PUD pointing at a PMD table with PMD leaves. */ + tlb_flush_pmd_range(tlb, addr & PUD_MASK, PUD_SIZE); + + /* + * If the page table is now exclusively owned, we fully unshared + * a page table. + */ + if (!ptdesc_pmd_is_shared(pt)) + tlb->fully_unshared_tables = true; + tlb->unshared_tables = true; +} + +static inline void tlb_flush_unshared_tables(struct mmu_gather *tlb) +{ + /* + * As soon as the caller drops locks to allow for reuse of + * previously-shared tables, these tables could get modified and + * even reused outside of hugetlb context. So flush the TLB now. + * + * Note that we cannot defer the flush to a later point even if we are + * not the last sharer of the page table. + */ + if (tlb->unshared_tables) + tlb_flush_mmu_tlbonly(tlb); + + /* + * Similarly, we must make sure that concurrent GUP-fast will not + * walk previously-shared page tables that are getting modified+reused + * elsewhere. So broadcast an IPI to wait for any concurrent GUP-fast. + * + * We only perform this when we are the last sharer of a page table, + * as the IPI will reach all CPUs: any GUP-fast. + * + * Note that on configs where tlb_remove_table_sync_one() is a NOP, + * the expectation is that the tlb_flush_mmu_tlbonly() would have issued + * required IPIs already for us. + */ + if (tlb->fully_unshared_tables) { + tlb_remove_table_sync_one(); + tlb->fully_unshared_tables = false; + } +} +#endif /* CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */ + #endif /* CONFIG_MMU */ #endif /* _ASM_GENERIC__TLB_H */ --- a/include/linux/hugetlb.h~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather +++ a/include/linux/hugetlb.h @@ -240,8 +240,9 @@ pte_t *huge_pte_alloc(struct mm_struct * pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); -int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep); +int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); +void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end); @@ -271,7 +272,7 @@ void hugetlb_vma_unlock_write(struct vm_ int hugetlb_vma_trylock_write(struct vm_area_struct *vma); void hugetlb_vma_assert_locked(struct vm_area_struct *vma); void hugetlb_vma_lock_release(struct kref *kref); -long hugetlb_change_protection(struct vm_area_struct *vma, +long hugetlb_change_protection(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); @@ -300,13 +301,17 @@ static inline struct address_space *huge return NULL; } -static inline int huge_pmd_unshare(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +static inline int huge_pmd_unshare(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { return 0; } +static inline void huge_pmd_unshare_flush(struct mmu_gather *tlb, + struct vm_area_struct *vma) +{ +} + static inline void adjust_range_if_pmd_sharing_possible( struct vm_area_struct *vma, unsigned long *start, unsigned long *end) @@ -432,7 +437,7 @@ static inline void move_hugetlb_state(st { } -static inline long hugetlb_change_protection( +static inline long hugetlb_change_protection(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags) --- a/mm/hugetlb.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather +++ a/mm/hugetlb.c @@ -5096,8 +5096,9 @@ int move_hugetlb_page_tables(struct vm_a unsigned long last_addr_mask; pte_t *src_pte, *dst_pte; struct mmu_notifier_range range; - bool shared_pmd = false; + struct mmu_gather tlb; + tlb_gather_mmu(&tlb, vma->vm_mm); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, old_addr, old_end); adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); @@ -5122,12 +5123,12 @@ int move_hugetlb_page_tables(struct vm_a if (huge_pte_none(huge_ptep_get(mm, old_addr, src_pte))) continue; - if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) { - shared_pmd = true; + if (huge_pmd_unshare(&tlb, vma, old_addr, src_pte)) { old_addr |= last_addr_mask; new_addr |= last_addr_mask; continue; } + tlb_remove_huge_tlb_entry(h, &tlb, src_pte, old_addr); dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz); if (!dst_pte) @@ -5136,13 +5137,13 @@ int move_hugetlb_page_tables(struct vm_a move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte, sz); } - if (shared_pmd) - flush_hugetlb_tlb_range(vma, range.start, range.end); - else - flush_hugetlb_tlb_range(vma, old_end - len, old_end); + tlb_flush_mmu_tlbonly(&tlb); + huge_pmd_unshare_flush(&tlb, vma); + mmu_notifier_invalidate_range_end(&range); i_mmap_unlock_write(mapping); hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); return len + old_addr - old_end; } @@ -5161,7 +5162,6 @@ void __unmap_hugepage_range(struct mmu_g unsigned long sz = huge_page_size(h); bool adjust_reservation; unsigned long last_addr_mask; - bool force_flush = false; WARN_ON(!is_vm_hugetlb_page(vma)); BUG_ON(start & ~huge_page_mask(h)); @@ -5184,10 +5184,8 @@ void __unmap_hugepage_range(struct mmu_g } ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + if (huge_pmd_unshare(tlb, vma, address, ptep)) { spin_unlock(ptl); - tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); - force_flush = true; address |= last_addr_mask; continue; } @@ -5303,14 +5301,7 @@ void __unmap_hugepage_range(struct mmu_g } tlb_end_vma(tlb, vma); - /* - * There is nothing protecting a previously-shared page table that we - * unshared through huge_pmd_unshare() from getting freed after we - * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare() - * succeeded, flush the range corresponding to the pud. - */ - if (force_flush) - tlb_flush_mmu_tlbonly(tlb); + huge_pmd_unshare_flush(tlb, vma); } void __hugetlb_zap_begin(struct vm_area_struct *vma, @@ -6399,7 +6390,7 @@ out_release_nounlock: } #endif /* CONFIG_USERFAULTFD */ -long hugetlb_change_protection(struct vm_area_struct *vma, +long hugetlb_change_protection(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { @@ -6409,7 +6400,6 @@ long hugetlb_change_protection(struct vm pte_t pte; struct hstate *h = hstate_vma(vma); long pages = 0, psize = huge_page_size(h); - bool shared_pmd = false; struct mmu_notifier_range range; unsigned long last_addr_mask; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; @@ -6452,7 +6442,7 @@ long hugetlb_change_protection(struct vm } } ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + if (huge_pmd_unshare(tlb, vma, address, ptep)) { /* * When uffd-wp is enabled on the vma, unshare * shouldn't happen at all. Warn about it if it @@ -6461,7 +6451,6 @@ long hugetlb_change_protection(struct vm WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); pages++; spin_unlock(ptl); - shared_pmd = true; address |= last_addr_mask; continue; } @@ -6522,22 +6511,16 @@ long hugetlb_change_protection(struct vm pte = huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; + tlb_remove_huge_tlb_entry(h, tlb, ptep, address); } next: spin_unlock(ptl); cond_resched(); } - /* - * There is nothing protecting a previously-shared page table that we - * unshared through huge_pmd_unshare() from getting freed after we - * release i_mmap_rwsem, so flush the TLB now. If huge_pmd_unshare() - * succeeded, flush the range corresponding to the pud. - */ - if (shared_pmd) - flush_hugetlb_tlb_range(vma, range.start, range.end); - else - flush_hugetlb_tlb_range(vma, start, end); + + tlb_flush_mmu_tlbonly(tlb); + huge_pmd_unshare_flush(tlb, vma); /* * No need to call mmu_notifier_arch_invalidate_secondary_tlbs() we are * downgrading page table protection not changing it to point to a new @@ -6904,18 +6887,27 @@ out: return pte; } -/* - * unmap huge page backed by shared pte. +/** + * huge_pmd_unshare - Unmap a pmd table if it is shared by multiple users + * @tlb: the current mmu_gather. + * @vma: the vma covering the pmd table. + * @addr: the address we are trying to unshare. + * @ptep: pointer into the (pmd) page table. + * + * Called with the page table lock held, the i_mmap_rwsem held in write mode + * and the hugetlb vma lock held in write mode. * - * Called with page table lock held. + * Note: The caller must call huge_pmd_unshare_flush() before dropping the + * i_mmap_rwsem. * - * returns: 1 successfully unmapped a shared pte page - * 0 the underlying pte page is not shared, or it is the last user + * Returns: 1 if it was a shared PMD table and it got unmapped, or 0 if it + * was not a shared PMD table. */ -int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { unsigned long sz = huge_page_size(hstate_vma(vma)); + struct mm_struct *mm = vma->vm_mm; pgd_t *pgd = pgd_offset(mm, addr); p4d_t *p4d = p4d_offset(pgd, addr); pud_t *pud = pud_offset(p4d, addr); @@ -6927,18 +6919,36 @@ int huge_pmd_unshare(struct mm_struct *m i_mmap_assert_write_locked(vma->vm_file->f_mapping); hugetlb_vma_assert_locked(vma); pud_clear(pud); - /* - * Once our caller drops the rmap lock, some other process might be - * using this page table as a normal, non-hugetlb page table. - * Wait for pending gup_fast() in other threads to finish before letting - * that happen. - */ - tlb_remove_table_sync_one(); - ptdesc_pmd_pts_dec(virt_to_ptdesc(ptep)); + + tlb_unshare_pmd_ptdesc(tlb, virt_to_ptdesc(ptep), addr); + mm_dec_nr_pmds(mm); return 1; } +/* + * huge_pmd_unshare_flush - Complete a sequence of huge_pmd_unshare() calls + * @tlb: the current mmu_gather. + * @vma: the vma covering the pmd table. + * + * Perform necessary TLB flushes or IPI broadcasts to synchronize PMD table + * unsharing with concurrent page table walkers (TLB, GUP-fast, etc.). + * + * This function must be called after a sequence of huge_pmd_unshare() + * calls while still holding the i_mmap_rwsem. + */ +void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma) +{ + /* + * We must synchronize page table unsharing such that nobody will + * try reusing a previously-shared page table while it might still + * be in use by previous sharers (TLB, GUP_fast). + */ + i_mmap_assert_write_locked(vma->vm_file->f_mapping); + + tlb_flush_unshared_tables(tlb); +} + #else /* !CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, @@ -6947,12 +6957,16 @@ pte_t *huge_pmd_share(struct mm_struct * return NULL; } -int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) +int huge_pmd_unshare(struct mmu_gather *tlb, struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) { return 0; } +void huge_pmd_unshare_flush(struct mmu_gather *tlb, struct vm_area_struct *vma) +{ +} + void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { @@ -7219,6 +7233,7 @@ static void hugetlb_unshare_pmds(struct unsigned long sz = huge_page_size(h); struct mm_struct *mm = vma->vm_mm; struct mmu_notifier_range range; + struct mmu_gather tlb; unsigned long address; spinlock_t *ptl; pte_t *ptep; @@ -7229,6 +7244,7 @@ static void hugetlb_unshare_pmds(struct if (start >= end) return; + tlb_gather_mmu(&tlb, mm); flush_cache_range(vma, start, end); /* * No need to call adjust_range_if_pmd_sharing_possible(), because @@ -7248,10 +7264,10 @@ static void hugetlb_unshare_pmds(struct if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); - huge_pmd_unshare(mm, vma, address, ptep); + huge_pmd_unshare(&tlb, vma, address, ptep); spin_unlock(ptl); } - flush_hugetlb_tlb_range(vma, start, end); + huge_pmd_unshare_flush(&tlb, vma); if (take_locks) { i_mmap_unlock_write(vma->vm_file->f_mapping); hugetlb_vma_unlock_write(vma); @@ -7261,6 +7277,7 @@ static void hugetlb_unshare_pmds(struct * Documentation/mm/mmu_notifier.rst. */ mmu_notifier_invalidate_range_end(&range); + tlb_finish_mmu(&tlb); } /* --- a/mm/mmu_gather.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather +++ a/mm/mmu_gather.c @@ -469,6 +469,12 @@ void tlb_gather_mmu_fullmm(struct mmu_ga void tlb_finish_mmu(struct mmu_gather *tlb) { /* + * We expect an earlier huge_pmd_unshare_flush() call to sort this out, + * due to complicated locking requirements with page table unsharing. + */ + VM_WARN_ON_ONCE(tlb->fully_unshared_tables); + + /* * If there are parallel threads are doing PTE changes on same range * under non-exclusive lock (e.g., mmap_lock read-side) but defer TLB * flush by batching, one thread may end up seeing inconsistent PTEs --- a/mm/mprotect.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather +++ a/mm/mprotect.c @@ -652,7 +652,7 @@ long change_protection(struct mmu_gather #endif if (is_vm_hugetlb_page(vma)) - pages = hugetlb_change_protection(vma, start, end, newprot, + pages = hugetlb_change_protection(tlb, vma, start, end, newprot, cp_flags); else pages = change_protection_range(tlb, vma, start, end, newprot, --- a/mm/rmap.c~mm-hugetlb-fix-excessive-ipi-broadcasts-when-unsharing-pmd-tables-using-mmu_gather +++ a/mm/rmap.c @@ -76,7 +76,7 @@ #include <linux/mm_inline.h> #include <linux/oom.h> -#include <asm/tlbflush.h> +#include <asm/tlb.h> #define CREATE_TRACE_POINTS #include <trace/events/migrate.h> @@ -2008,13 +2008,17 @@ static bool try_to_unmap_one(struct foli * if unsuccessful. */ if (!anon) { + struct mmu_gather tlb; + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); if (!hugetlb_vma_trylock_write(vma)) goto walk_abort; - if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { + + tlb_gather_mmu(&tlb, mm); + if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) { hugetlb_vma_unlock_write(vma); - flush_tlb_range(vma, - range.start, range.end); + huge_pmd_unshare_flush(&tlb, vma); + tlb_finish_mmu(&tlb); /* * The PMD table was unmapped, * consequently unmapping the folio. @@ -2022,6 +2026,7 @@ static bool try_to_unmap_one(struct foli goto walk_done; } hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); if (pte_dirty(pteval)) @@ -2398,17 +2403,20 @@ static bool try_to_migrate_one(struct fo * fail if unsuccessful. */ if (!anon) { + struct mmu_gather tlb; + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); if (!hugetlb_vma_trylock_write(vma)) { page_vma_mapped_walk_done(&pvmw); ret = false; break; } - if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { - hugetlb_vma_unlock_write(vma); - flush_tlb_range(vma, - range.start, range.end); + tlb_gather_mmu(&tlb, mm); + if (huge_pmd_unshare(&tlb, vma, address, pvmw.pte)) { + hugetlb_vma_unlock_write(vma); + huge_pmd_unshare_flush(&tlb, vma); + tlb_finish_mmu(&tlb); /* * The PMD table was unmapped, * consequently unmapping the folio. @@ -2417,6 +2425,7 @@ static bool try_to_migrate_one(struct fo break; } hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); } /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); _ Patches currently in -mm which might be from david(a)kernel.org are

8 hours, 8 minutes

1
0
0 0

[merged mm-stable] mm-hugetlb-fix-hugetlb_pmd_shared.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/hugetlb: fix hugetlb_pmd_shared() has been removed from the -mm tree. Its filename was mm-hugetlb-fix-hugetlb_pmd_shared.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: "David Hildenbrand (Red Hat)" <david(a)kernel.org> Subject: mm/hugetlb: fix hugetlb_pmd_shared() Date: Fri, 5 Dec 2025 22:35:55 +0100 Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather)". One functional fix, one performance regression fix, and two related comment fixes. I cleaned up my prototype I recently shared [1] for the performance fix, deferring most of the cleanups I had in the prototype to a later point. While doing that I identified the other things. The goal of this patch set is to be backported to stable trees "fairly" easily. At least patch #1 and #4. Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing Patch #2 + #3 are simple comment fixes that patch #4 interacts with. Patch #4 is a fix for the reported performance regression due to excessive IPI broadcasts during fork()+exit(). The last patch is all about TLB flushes, IPIs and mmu_gather. Read: complicated I added as much comments + description that I possibly could, and I am hoping for review from Jann. There are plenty of cleanups in the future to be had + one reasonable optimization on x86. But that's all out of scope for this series. This patch (of 4): We switched from (wrongly) using the page count to an independent shared count. Now, shared page tables have a refcount of 1 (excluding speculative references) and instead use ptdesc->pt_share_count to identify sharing. We didn't convert hugetlb_pmd_shared(), so right now, we would never detect a shared PMD table as such, because sharing/unsharing no longer touches the refcount of a PMD table. Page migration, like mbind() or migrate_pages() would allow for migrating folios mapped into such shared PMD tables, even though the folios are not exclusive. In smaps we would account them as "private" although they are "shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the pagemap interface. Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared(). Link: https://lkml.kernel.org/r/20251205213558.2980480-1-david@kernel.org Link: https://lkml.kernel.org/r/20251205213558.2980480-2-david@kernel.org Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1] Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david(a)kernel.org> Tested-by: Laurence Oberman <loberman(a)redhat.com> Reviewed-by: Rik van Riel <riel(a)surriel.com> Reviewed-by: Lance Yang <lance.yang(a)linux.dev> Cc: Liu Shixin <liushixin2(a)huawei.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Jann Horn <jannh(a)google.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Will Deacon <will(a)kernel.org> Cc: Uschakow, Stanislav" <suschako(a)amazon.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/include/linux/hugetlb.h~mm-hugetlb-fix-hugetlb_pmd_shared +++ a/include/linux/hugetlb.h @@ -1326,7 +1326,7 @@ static inline __init void hugetlb_cma_re #ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING static inline bool hugetlb_pmd_shared(pte_t *pte) { - return page_count(virt_to_page(pte)) > 1; + return ptdesc_pmd_is_shared(virt_to_ptdesc(pte)); } #else static inline bool hugetlb_pmd_shared(pte_t *pte) _ Patches currently in -mm which might be from david(a)kernel.org are

8 hours, 8 minutes

1
0
0 0

[merged mm-stable] powerpc-pseries-cmm-adjust-balloon_migrate-when-migrating-pages.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages has been removed from the -mm tree. Its filename was powerpc-pseries-cmm-adjust-balloon_migrate-when-migrating-pages.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages Date: Tue, 21 Oct 2025 12:06:06 +0200 Let's properly adjust BALLOON_MIGRATE like the other drivers. Note that the INFLATE/DEFLATE events are triggered from the core when enqueueing/dequeueing pages. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-3-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com> Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu> Cc: Madhavan Srinivasan <maddy(a)linux.ibm.com> Cc: Michael Ellerman <mpe(a)ellerman.id.au> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/powerpc/platforms/pseries/cmm.c | 1 + 1 file changed, 1 insertion(+) --- a/arch/powerpc/platforms/pseries/cmm.c~powerpc-pseries-cmm-adjust-balloon_migrate-when-migrating-pages +++ a/arch/powerpc/platforms/pseries/cmm.c @@ -532,6 +532,7 @@ static int cmm_migratepage(struct balloo spin_lock_irqsave(&b_dev_info->pages_lock, flags); balloon_page_insert(b_dev_info, newpage); + __count_vm_event(BALLOON_MIGRATE); b_dev_info->isolated_pages--; spin_unlock_irqrestore(&b_dev_info->pages_lock, flags); _ Patches currently in -mm which might be from david(a)redhat.com are

8 hours, 9 minutes

1
0
0 0

[merged mm-stable] powerpc-pseries-cmm-call-balloon_devinfo_init-also-without-config_balloon_compaction.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION has been removed from the -mm tree. Its filename was powerpc-pseries-cmm-call-balloon_devinfo_init-also-without-config_balloon_compaction.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: powerpc/pseries/cmm: call balloon_devinfo_init() also without CONFIG_BALLOON_COMPACTION Date: Tue, 21 Oct 2025 12:06:05 +0200 Patch series "powerpc/pseries/cmm: two smaller fixes". Two smaller fixes identified while doing a bigger rework. This patch (of 2): We always have to initialize the balloon_dev_info, even when compaction is not configured in: otherwise the containing list and the lock are left uninitialized. Likely not many such configs exist in practice, but let's CC stable to be sure. This was found by code inspection. Link: https://lkml.kernel.org/r/20251021100606.148294-1-david@redhat.com Link: https://lkml.kernel.org/r/20251021100606.148294-2-david@redhat.com Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com> Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu> Cc: Madhavan Srinivasan <maddy(a)linux.ibm.com> Cc: Michael Ellerman <mpe(a)ellerman.id.au> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/powerpc/platforms/pseries/cmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/powerpc/platforms/pseries/cmm.c~powerpc-pseries-cmm-call-balloon_devinfo_init-also-without-config_balloon_compaction +++ a/arch/powerpc/platforms/pseries/cmm.c @@ -550,7 +550,6 @@ static int cmm_migratepage(struct balloo static void cmm_balloon_compaction_init(void) { - balloon_devinfo_init(&b_dev_info); b_dev_info.migratepage = cmm_migratepage; } #else /* CONFIG_BALLOON_COMPACTION */ @@ -572,6 +571,7 @@ static int cmm_init(void) if (!firmware_has_feature(FW_FEATURE_CMO) && !simulate) return -EOPNOTSUPP; + balloon_devinfo_init(&b_dev_info); cmm_balloon_compaction_init(); rc = register_oom_notifier(&cmm_oom_nb); _ Patches currently in -mm which might be from david(a)redhat.com are

8 hours, 9 minutes

1
0
0 0

[PATCH] Revert "drm/amd/display: Fix pbn to kbps Conversion"

by Mario Limonciello

Deeply daisy chained DP/MST displays are no longer able to light up. This reverts commit 1788ef30725da53face7e311cdf62ad65fababcd. Cc: Jerry Zuo <jerry.zuo(a)amd.com> Cc: stable(a)vger.kernel.org # 6.17+ Reported-by: nat(a)nullable.se Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4756 Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> --- .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 59 +++++++++++-------- 1 file changed, 36 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index dbd1da4d85d3..5e92eaa67aa3 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -884,28 +884,26 @@ struct dsc_mst_fairness_params { }; #if defined(CONFIG_DRM_AMD_DC_FP) -static uint64_t kbps_to_pbn(int kbps, bool is_peak_pbn) +static uint16_t get_fec_overhead_multiplier(struct dc_link *dc_link) { - uint64_t effective_kbps = (uint64_t)kbps; + u8 link_coding_cap; + uint16_t fec_overhead_multiplier_x1000 = PBN_FEC_OVERHEAD_MULTIPLIER_8B_10B; - if (is_peak_pbn) { // add 0.6% (1006/1000) overhead into effective kbps - effective_kbps *= 1006; - effective_kbps = div_u64(effective_kbps, 1000); - } + link_coding_cap = dc_link_dp_mst_decide_link_encoding_format(dc_link); + if (link_coding_cap == DP_128b_132b_ENCODING) + fec_overhead_multiplier_x1000 = PBN_FEC_OVERHEAD_MULTIPLIER_128B_132B; - return (uint64_t) DIV64_U64_ROUND_UP(effective_kbps * 64, (54 * 8 * 1000)); + return fec_overhead_multiplier_x1000; } -static uint32_t pbn_to_kbps(unsigned int pbn, bool with_margin) +static int kbps_to_peak_pbn(int kbps, uint16_t fec_overhead_multiplier_x1000) { - uint64_t pbn_effective = (uint64_t)pbn; - - if (with_margin) // deduct 0.6% (994/1000) overhead from effective pbn - pbn_effective *= (1000000 / PEAK_FACTOR_X1000); - else - pbn_effective *= 1000; + u64 peak_kbps = kbps; - return DIV_U64_ROUND_UP(pbn_effective * 8 * 54, 64); + peak_kbps *= 1006; + peak_kbps *= fec_overhead_multiplier_x1000; + peak_kbps = div_u64(peak_kbps, 1000 * 1000); + return (int) DIV64_U64_ROUND_UP(peak_kbps * 64, (54 * 8 * 1000)); } static void set_dsc_configs_from_fairness_vars(struct dsc_mst_fairness_params *params, @@ -976,7 +974,7 @@ static int bpp_x16_from_pbn(struct dsc_mst_fairness_params param, int pbn) dc_dsc_get_default_config_option(param.sink->ctx->dc, &dsc_options); dsc_options.max_target_bpp_limit_override_x16 = drm_connector->display_info.max_dsc_bpp * 16; - kbps = pbn_to_kbps(pbn, false); + kbps = div_u64((u64)pbn * 994 * 8 * 54, 64); dc_dsc_compute_config( param.sink->ctx->dc->res_pool->dscs[0], &param.sink->dsc_caps.dsc_dec_caps, @@ -1005,11 +1003,12 @@ static int increase_dsc_bpp(struct drm_atomic_state *state, int link_timeslots_used; int fair_pbn_alloc; int ret = 0; + uint16_t fec_overhead_multiplier_x1000 = get_fec_overhead_multiplier(dc_link); for (i = 0; i < count; i++) { if (vars[i + k].dsc_enabled) { initial_slack[i] = - kbps_to_pbn(params[i].bw_range.max_kbps, false) - vars[i + k].pbn; + kbps_to_peak_pbn(params[i].bw_range.max_kbps, fec_overhead_multiplier_x1000) - vars[i + k].pbn; bpp_increased[i] = false; remaining_to_increase += 1; } else { @@ -1105,6 +1104,7 @@ static int try_disable_dsc(struct drm_atomic_state *state, int next_index; int remaining_to_try = 0; int ret; + uint16_t fec_overhead_multiplier_x1000 = get_fec_overhead_multiplier(dc_link); int var_pbn; for (i = 0; i < count; i++) { @@ -1137,7 +1137,7 @@ static int try_disable_dsc(struct drm_atomic_state *state, DRM_DEBUG_DRIVER("MST_DSC index #%d, try no compression\n", next_index); var_pbn = vars[next_index].pbn; - vars[next_index].pbn = kbps_to_pbn(params[next_index].bw_range.stream_kbps, true); + vars[next_index].pbn = kbps_to_peak_pbn(params[next_index].bw_range.stream_kbps, fec_overhead_multiplier_x1000); ret = drm_dp_atomic_find_time_slots(state, params[next_index].port->mgr, params[next_index].port, @@ -1197,6 +1197,7 @@ static int compute_mst_dsc_configs_for_link(struct drm_atomic_state *state, int count = 0; int i, k, ret; bool debugfs_overwrite = false; + uint16_t fec_overhead_multiplier_x1000 = get_fec_overhead_multiplier(dc_link); struct drm_connector_state *new_conn_state; memset(params, 0, sizeof(params)); @@ -1277,7 +1278,7 @@ static int compute_mst_dsc_configs_for_link(struct drm_atomic_state *state, DRM_DEBUG_DRIVER("MST_DSC Try no compression\n"); for (i = 0; i < count; i++) { vars[i + k].aconnector = params[i].aconnector; - vars[i + k].pbn = kbps_to_pbn(params[i].bw_range.stream_kbps, false); + vars[i + k].pbn = kbps_to_peak_pbn(params[i].bw_range.stream_kbps, fec_overhead_multiplier_x1000); vars[i + k].dsc_enabled = false; vars[i + k].bpp_x16 = 0; ret = drm_dp_atomic_find_time_slots(state, params[i].port->mgr, params[i].port, @@ -1299,7 +1300,7 @@ static int compute_mst_dsc_configs_for_link(struct drm_atomic_state *state, DRM_DEBUG_DRIVER("MST_DSC Try max compression\n"); for (i = 0; i < count; i++) { if (params[i].compression_possible && params[i].clock_force_enable != DSC_CLK_FORCE_DISABLE) { - vars[i + k].pbn = kbps_to_pbn(params[i].bw_range.min_kbps, false); + vars[i + k].pbn = kbps_to_peak_pbn(params[i].bw_range.min_kbps, fec_overhead_multiplier_x1000); vars[i + k].dsc_enabled = true; vars[i + k].bpp_x16 = params[i].bw_range.min_target_bpp_x16; ret = drm_dp_atomic_find_time_slots(state, params[i].port->mgr, @@ -1307,7 +1308,7 @@ static int compute_mst_dsc_configs_for_link(struct drm_atomic_state *state, if (ret < 0) return ret; } else { - vars[i + k].pbn = kbps_to_pbn(params[i].bw_range.stream_kbps, false); + vars[i + k].pbn = kbps_to_peak_pbn(params[i].bw_range.stream_kbps, fec_overhead_multiplier_x1000); vars[i + k].dsc_enabled = false; vars[i + k].bpp_x16 = 0; ret = drm_dp_atomic_find_time_slots(state, params[i].port->mgr, @@ -1762,6 +1763,18 @@ int pre_validate_dsc(struct drm_atomic_state *state, return ret; } +static uint32_t kbps_from_pbn(unsigned int pbn) +{ + uint64_t kbps = (uint64_t)pbn; + + kbps *= (1000000 / PEAK_FACTOR_X1000); + kbps *= 8; + kbps *= 54; + kbps /= 64; + + return (uint32_t)kbps; +} + static bool is_dsc_common_config_possible(struct dc_stream_state *stream, struct dc_dsc_bw_range *bw_range) { @@ -1860,7 +1873,7 @@ enum dc_status dm_dp_mst_is_port_support_mode( dc_link_get_highest_encoding_format(stream->link)); cur_link_settings = stream->link->verified_link_cap; root_link_bw_in_kbps = dc_link_bandwidth_kbps(aconnector->dc_link, &cur_link_settings); - virtual_channel_bw_in_kbps = pbn_to_kbps(aconnector->mst_output_port->full_pbn, true); + virtual_channel_bw_in_kbps = kbps_from_pbn(aconnector->mst_output_port->full_pbn); /* pick the end to end bw bottleneck */ end_to_end_bw_in_kbps = min(root_link_bw_in_kbps, virtual_channel_bw_in_kbps); @@ -1913,7 +1926,7 @@ enum dc_status dm_dp_mst_is_port_support_mode( immediate_upstream_port = aconnector->mst_output_port->parent->port_parent; if (immediate_upstream_port) { - virtual_channel_bw_in_kbps = pbn_to_kbps(immediate_upstream_port->full_pbn, true); + virtual_channel_bw_in_kbps = kbps_from_pbn(immediate_upstream_port->full_pbn); virtual_channel_bw_in_kbps = min(root_link_bw_in_kbps, virtual_channel_bw_in_kbps); } else { /* For topology LCT 1 case - only one mstb*/ -- 2.51.2

9 hours, 56 minutes

2
1
0 0

[PATCH] net: nfc: nci: Fix parameter validation for packet data

by Michael Thalmeier

Since commit 8fcc7315a10a ("net: nfc: nci: Add parameter validation for packet data") communication with nci nfc chips is not working any more. The mentioned commit tries to fix access of uninitialized data, but failed to understand that in some cases the data packet is of variable length and can therefore not be compared to the maximum packet length given by the sizeof(struct). For these cases it is only possible to check for minimum packet length. Fixes: 8fcc7315a10a ("net: nfc: nci: Add parameter validation for packet data") Cc: stable(a)vger.kernel.org Signed-off-by: Michael Thalmeier <michael.thalmeier(a)hale.at> --- net/nfc/nci/ntf.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c index 418b84e2b260..5161e94f067f 100644 --- a/net/nfc/nci/ntf.c +++ b/net/nfc/nci/ntf.c @@ -58,7 +58,8 @@ static int nci_core_conn_credits_ntf_packet(struct nci_dev *ndev, struct nci_conn_info *conn_info; int i; - if (skb->len < sizeof(struct nci_core_conn_credit_ntf)) + /* Minimal packet size for num_entries=1 is 1 x __u8 + 1 x conn_credit_entry */ + if (skb->len < (sizeof(__u8) + sizeof(struct conn_credit_entry))) return -EINVAL; ntf = (struct nci_core_conn_credit_ntf *)skb->data; @@ -364,7 +365,8 @@ static int nci_rf_discover_ntf_packet(struct nci_dev *ndev, const __u8 *data; bool add_target = true; - if (skb->len < sizeof(struct nci_rf_discover_ntf)) + /* Minimal packet size is 5 if rf_tech_specific_params_len=0 */ + if (skb->len < (5 * sizeof(__u8))) return -EINVAL; data = skb->data; @@ -596,7 +598,10 @@ static int nci_rf_intf_activated_ntf_packet(struct nci_dev *ndev, const __u8 *data; int err = NCI_STATUS_OK; - if (skb->len < sizeof(struct nci_rf_intf_activated_ntf)) + /* Minimal packet size is 11 if + * f_tech_specific_params_len=0 and activation_params_len=0 + */ + if (skb->len < (11 * sizeof(__u8))) return -EINVAL; data = skb->data; -- 2.52.0

11 hours, 3 minutes

2
1
0 0

[6.12.60 lts] [amdgpu]: regression: broken multi-monitor USB4 dock on Ryzen 7840U

by Péter Bohner

upgrading from 6.12.59 to 6.12.60 broke my USB4 (Dynabook Thunderbolt 4 Dock)'s video output with my Framework 13 (AMD Ryzen 7840U / Radeom 780M igpu) . With two monitors plugged in, only one of them works, the other (always the one on the 'video 2' output) remains blank (but receives signal). relevant dmesg [note: tainted by ZFS] (full output at: https://gist.github.com/x-zvf/128d45d028230438b8777c40759fa997): [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout! ------------[ cut here ]------------ WARNING: CPU: 15 PID: 3064 at drivers/gpu/drm/amd/amdgpu/../display/dc/link/hwss/link_hwss_dpia.c:49 update_dpia_stream_allocation_table+0xf2/0x100 [amdgpu] Modules linked in: hid_logitech_hidpp hid_logitech_dj snd_seq_midi snd_seq_midi_event uvcvideo videobuf2_vmalloc uvc videobuf2_memops snd_usb_audio videobuf2_v4l2 videobuf2_common snd_usbmidi_lib snd_ump videodev snd_rawmidi mc cdc_ether usbnet mii uas usb_storage ccm snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device tun ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_multiport xt_cgroup xt_mark xt_owner xt_tcpudp ip6table_raw iptable_raw ip6table_mangle iptable_mangle ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc32c_generic ip6table_filter ip6_tables iptable_filter uhid cmac algif_hash algif_skcipher af_alg bnep vfat fat amd_atl intel_rapl_msr intel_rapl_common snd_sof_amd_acp70 snd_sof_amd_acp63 snd_soc_acpi_amd_match snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof mt7921e snd_sof_utils mt7921_common snd_pci_ps mt792x_lib snd_hda_codec_realtek snd_amd_sdw_acpi soundwire_amd kvm_amd mt76_connac_lib snd_hda_codec_generic soundwire_generic_allocation snd_hda_scodec_component snd_hda_codec_hdmi mousedev mt76 soundwire_bus snd_hda_intel kvm snd_soc_core snd_intel_dspcfg irqbypass snd_intel_sdw_acpi mac80211 snd_compress ac97_bus crct10dif_pclmul hid_sensor_als snd_pcm_dmaengine snd_hda_codec crc32_pclmul hid_sensor_trigger crc32c_intel snd_rpl_pci_acp6x industrialio_triggered_buffer snd_acp_pci polyval_clmulni kfifo_buf snd_hda_core snd_acp_legacy_common polyval_generic libarc4 hid_sensor_iio_common industrialio ghash_clmulni_intel leds_cros_ec cros_ec_sysfs cros_ec_hwmon cros_kbd_led_backlight cros_charge_control led_class_multicolor gpio_cros_ec cros_ec_chardev cros_ec_debugfs sha512_ssse3 snd_hwdep snd_pci_acp6x hid_multitouch joydev spd5118 hid_sensor_hub cros_ec_dev sha256_ssse3 snd_pcm btusb cfg80211 sha1_ssse3 btrtl aesni_intel snd_pci_acp5x btintel snd_timer snd_rn_pci_acp3x sp5100_tco gf128mul ucsi_acpi crypto_simd btbcm snd_acp_config snd amd_pmf typec_ucsi cryptd snd_soc_acpi i2c_piix4 btmtk bluetooth rapl wmi_bmof pcspkr typec k10temp thunderbolt amdtee soundcore ccp snd_pci_acp3x i2c_smbus rfkill roles cros_ec_lpcs i2c_hid_acpi amd_sfh cros_ec platform_profile i2c_hid tee amd_pmc mac_hid i2c_dev crypto_user dm_mod loop nfnetlink bpf_preload ip_tables x_tables hid_generic usbhid amdgpu zfs(POE) crc16 amdxcp spl(OE) i2c_algo_bit drm_ttm_helper ttm serio_raw drm_exec atkbd gpu_sched libps2 vivaldi_fmap drm_suballoc_helper nvme drm_buddy i8042 drm_display_helper nvme_core video serio cec nvme_auth wmi CPU: 15 UID: 1000 PID: 3064 Comm: kwin_wayland Tainted: P OE 6.12.60-1-lts #1 9b11292f14ae477e878a6bb6a5b5efc27ccf021d Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.16 07/25/2025 RIP: 0010:update_dpia_stream_allocation_table+0xf2/0x100 [amdgpu] Code: d0 0f 1f 00 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75 1a 48 83 c4 10 5b 5d 41 5c 41 5d e9 10 ec e3 d9 31 db e9 6f ff ff ff <0f> 0b eb 8a e8 05 09 c3 d9 0f 1f 44 00 00 90 90 90 90 90 90 90 90 RSP: 0018:ffffd26fe3473248 EFLAGS: 00010282 RAX: 00000000ffffffff RBX: 0000000000000025 RCX: 0000000000001140 RDX: 00000000ffffffff RSI: ffffd26fe34731f0 RDI: ffff8bb78c7bb608 RBP: ffff8bb7982c3b88 R08: 00000000ffffffff R09: 0000000000001100 R10: ffffd27000ef9900 R11: ffff8bb78c7bb400 R12: ffff8bb7982ed600 R13: ffff8bb7982c3800 R14: ffff8bb984e402a8 R15: ffff8bb7982c38c8 FS: 000073883c086b80(0000) GS:ffff8bc51e180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002020005ba004 CR3: 000000014396e000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> ? link_set_dpms_on+0x7a5/0xc70 [amdgpu d75f7e51e39957084964278ab74da83065554c01] link_set_dpms_on+0x806/0xc70 [amdgpu d75f7e51e39957084964278ab74da83065554c01] dce110_apply_single_controller_ctx_to_hw+0x300/0x480 [amdgpu d75f7e51e39957084964278ab74da83065554c01] dce110_apply_ctx_to_hw+0x24c/0x2e0 [amdgpu d75f7e51e39957084964278ab74da83065554c01] ? dcn10_setup_stereo+0x160/0x170 [amdgpu d75f7e51e39957084964278ab74da83065554c01] dc_commit_state_no_check+0x63d/0xeb0 [amdgpu d75f7e51e39957084964278ab74da83065554c01] dc_commit_streams+0x296/0x490 [amdgpu d75f7e51e39957084964278ab74da83065554c01] ? srso_alias_return_thunk+0x5/0xfbef5 ? schedule_timeout+0x133/0x170 amdgpu_dm_atomic_commit_tail+0x6a1/0x3a10 [amdgpu d75f7e51e39957084964278ab74da83065554c01] ? srso_alias_return_thunk+0x5/0xfbef5 ? psi_task_switch+0x113/0x2a0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? schedule+0x27/0xf0 ? srso_alias_return_thunk+0x5/0xfbef5 ? schedule_timeout+0x133/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? dma_fence_default_wait+0x8b/0x230 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? wait_for_completion_timeout+0x12e/0x180 commit_tail+0xae/0x140 drm_atomic_helper_commit+0x13c/0x180 drm_atomic_commit+0xa6/0xe0 ? __pfx___drm_printfn_info+0x10/0x10 drm_mode_atomic_ioctl+0xa60/0xcd0 ? sock_poll+0x51/0x110 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 drm_ioctl_kernel+0xad/0x100 drm_ioctl+0x286/0x500 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 amdgpu_drm_ioctl+0x4a/0x80 [amdgpu d75f7e51e39957084964278ab74da83065554c01] __x64_sys_ioctl+0x91/0xd0 do_syscall_64+0x7b/0x190 ? srso_alias_return_thunk+0x5/0xfbef5 ? __x64_sys_ppoll+0xf8/0x180 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode+0x37/0x1c0 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x87/0x190 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x87/0x190 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x87/0x190 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x87/0x190 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x87/0x190 ? srso_alias_return_thunk+0x5/0xfbef5 ? irqentry_exit_to_user_mode+0x2c/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x738842d9b70d Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 RSP: 002b:00007ffe3c7ed230 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000634abd49c210 RCX: 0000738842d9b70d RDX: 00007ffe3c7ed320 RSI: 00000000c03864bc RDI: 0000000000000013 RBP: 00007ffe3c7ed280 R08: 0000634abc4049bc R09: 0000634abce43e80 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3c7ed320 R13: 00000000c03864bc R14: 0000000000000013 R15: 0000634abc404840 </TASK> ---[ end trace 0000000000000000 ]--- regards, ~ Peter

11 hours, 44 minutes

2
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror