Linux-stable-mirror April 2025

linux-stable-mirror@lists.linaro.org

562 participants
1507 discussions

Re: [PATCH v3 1/1] iommu: Allow attaching static domains in iommu_attach_device_pasid()

by Dave Jiang

On 4/24/25 3:59 PM, Jack Vogel wrote: > > >> On Apr 24, 2025, at 15:40, Dave Jiang <dave.jiang(a)intel.com> wrote: >> >> >> >> On 4/24/25 3:34 PM, Jack Vogel wrote: >>> I am having test issues with this patch, test system is running OL9, basically RHEL 9.5, the kernel boots ok, and the dmesg is clean… but the tests in accel-config dont pass. Specifically the crypto tests, this is due to vfio_pci_core not loading. Right now I’m not sure if any of this is my mistake, but at least it’s something I need to keep looking at. >>> >>> Also, since I saw that issue on the latest I did a backport to our UEK8 kernel which is 6.12.23, and on that kernel it still exhibited these messages on boot: >>> >>> *idxd*0000:6a:01.0: enabling device (0144 -> 0146) >>> >>> [ 21.112733] *idxd*0000:6a:01.0: failed to attach device pasid 1, domain type 4 >>> >>> [ 21.120770] *idxd*0000:6a:01.0: No in-kernel DMA with PASID. -95 >>> >>> >>> Again, maybe an issue in my backporting… however I’d like to be sure. >> >> Can you verify against latest upstream kernel plus the patch and see if you still see the error? >> >> DJ > > Yes, the kernel was build from the tip this morning. Like I said, it got no messages booting up, all looked fine. But when running the actual test suite in the accel-config tarball specifically the iaa crypt tests, they failed and the dmesg was from vfio_pci_core failed to load with an unknown symbol. I'm not sure what the test consists of (haven't worked on this device for almost 2 years). But usually the device is either bound to the idxd driver or the vfio_pci driver. Not both. And if the idxd driver didn't emit any errors while loading, then the test failure may be something else... Another way to verify is to set CONFIG_IOMMU_DEFAULT_DMA_LAZY vs PASSTHROUGH. If the tests still fail then it's something else. DJ > > This sounds like the module was wrong, but i would think it was installed with the v6.15 kernel….. > > Jack > >> >>> >>> Cheers, >>> >>> Jack >>> >>> >>>> On Apr 23, 2025, at 20:41, Lu Baolu <baolu.lu(a)linux.intel.com> wrote: >>>> >>>> The idxd driver attaches the default domain to a PASID of the device to >>>> perform kernel DMA using that PASID. The domain is attached to the >>>> device's PASID through iommu_attach_device_pasid(), which checks if the >>>> domain->owner matches the iommu_ops retrieved from the device. If they >>>> do not match, it returns a failure. >>>> >>>> if (ops != domain->owner || pasid == IOMMU_NO_PASID) >>>> return -EINVAL; >>>> >>>> The static identity domain implemented by the intel iommu driver doesn't >>>> specify the domain owner. Therefore, kernel DMA with PASID doesn't work >>>> for the idxd driver if the device translation mode is set to passthrough. >>>> >>>> Generally the owner field of static domains are not set because they are >>>> already part of iommu ops. Add a helper domain_iommu_ops_compatible() >>>> that checks if a domain is compatible with the device's iommu ops. This >>>> helper explicitly allows the static blocked and identity domains associated >>>> with the device's iommu_ops to be considered compatible. >>>> >>>> Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") >>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 >>>> Cc: stable(a)vger.kernel.org >>>> Suggested-by: Jason Gunthorpe <jgg(a)nvidia.com> >>>> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ >>>> Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com> >>>> Reviewed-by: Dave Jiang <dave.jiang(a)intel.com> >>>> Reviewed-by: Robin Murphy <robin.murphy(a)arm.com> >>>> --- >>>> drivers/iommu/iommu.c | 21 ++++++++++++++++++--- >>>> 1 file changed, 18 insertions(+), 3 deletions(-) >>>> >>>> Change log: >>>> v3: >>>> - Convert all places checking domain->owner to the new helper. >>>> v2: https://lore.kernel.org/linux-iommu/20250423021839.2189204-1-baolu.lu@linux… >>>> - Make the solution generic for all static domains as suggested by >>>> Jason. >>>> v1: https://lore.kernel.org/linux-iommu/20250422075422.2084548-1-baolu.lu@linux… >>>> >>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>> index 4f91a740c15f..b26fc3ed9f01 100644 >>>> --- a/drivers/iommu/iommu.c >>>> +++ b/drivers/iommu/iommu.c >>>> @@ -2204,6 +2204,19 @@ static void *iommu_make_pasid_array_entry(struct iommu_domain *domain, >>>> return xa_tag_pointer(domain, IOMMU_PASID_ARRAY_DOMAIN); >>>> } >>>> >>>> +static bool domain_iommu_ops_compatible(const struct iommu_ops *ops, >>>> +struct iommu_domain *domain) >>>> +{ >>>> +if (domain->owner == ops) >>>> +return true; >>>> + >>>> +/* For static domains, owner isn't set. */ >>>> +if (domain == ops->blocked_domain || domain == ops->identity_domain) >>>> +return true; >>>> + >>>> +return false; >>>> +} >>>> + >>>> static int __iommu_attach_group(struct iommu_domain *domain, >>>> struct iommu_group *group) >>>> { >>>> @@ -2214,7 +2227,8 @@ static int __iommu_attach_group(struct iommu_domain *domain, >>>> return -EBUSY; >>>> >>>> dev = iommu_group_first_dev(group); >>>> -if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) >>>> +if (!dev_has_iommu(dev) || >>>> + !domain_iommu_ops_compatible(dev_iommu_ops(dev), domain)) >>>> return -EINVAL; >>>> >>>> return __iommu_group_set_domain(group, domain); >>>> @@ -3435,7 +3449,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, >>>> !ops->blocked_domain->ops->set_dev_pasid) >>>> return -EOPNOTSUPP; >>>> >>>> -if (ops != domain->owner || pasid == IOMMU_NO_PASID) >>>> +if (!domain_iommu_ops_compatible(ops, domain) || >>>> + pasid == IOMMU_NO_PASID) >>>> return -EINVAL; >>>> >>>> mutex_lock(&group->mutex); >>>> @@ -3511,7 +3526,7 @@ int iommu_replace_device_pasid(struct iommu_domain *domain, >>>> if (!domain->ops->set_dev_pasid) >>>> return -EOPNOTSUPP; >>>> >>>> -if (dev_iommu_ops(dev) != domain->owner || >>>> +if (!domain_iommu_ops_compatible(dev_iommu_ops(dev), domain) || >>>> pasid == IOMMU_NO_PASID || !handle) >>>> return -EINVAL; >>>> >>>> -- >>>> 2.43.0 >

8 months, 3 weeks

Re: [PATCH v3 1/1] iommu: Allow attaching static domains in iommu_attach_device_pasid()

by Dave Jiang

On 4/24/25 3:34 PM, Jack Vogel wrote: > I am having test issues with this patch, test system is running OL9, basically RHEL 9.5, the kernel boots ok, and the dmesg is clean… but the tests in accel-config dont pass. Specifically the crypto tests, this is due to vfio_pci_core not loading. Right now I’m not sure if any of this is my mistake, but at least it’s something I need to keep looking at. > > Also, since I saw that issue on the latest I did a backport to our UEK8 kernel which is 6.12.23, and on that kernel it still exhibited these messages on boot: > > *idxd*0000:6a:01.0: enabling device (0144 -> 0146) > > [ 21.112733] *idxd*0000:6a:01.0: failed to attach device pasid 1, domain type 4 > > [ 21.120770] *idxd*0000:6a:01.0: No in-kernel DMA with PASID. -95 > > > Again, maybe an issue in my backporting… however I’d like to be sure. Can you verify against latest upstream kernel plus the patch and see if you still see the error? DJ > > Cheers, > > Jack > > >> On Apr 23, 2025, at 20:41, Lu Baolu <baolu.lu(a)linux.intel.com> wrote: >> >> The idxd driver attaches the default domain to a PASID of the device to >> perform kernel DMA using that PASID. The domain is attached to the >> device's PASID through iommu_attach_device_pasid(), which checks if the >> domain->owner matches the iommu_ops retrieved from the device. If they >> do not match, it returns a failure. >> >> if (ops != domain->owner || pasid == IOMMU_NO_PASID) >> return -EINVAL; >> >> The static identity domain implemented by the intel iommu driver doesn't >> specify the domain owner. Therefore, kernel DMA with PASID doesn't work >> for the idxd driver if the device translation mode is set to passthrough. >> >> Generally the owner field of static domains are not set because they are >> already part of iommu ops. Add a helper domain_iommu_ops_compatible() >> that checks if a domain is compatible with the device's iommu ops. This >> helper explicitly allows the static blocked and identity domains associated >> with the device's iommu_ops to be considered compatible. >> >> Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220031 >> Cc: stable(a)vger.kernel.org >> Suggested-by: Jason Gunthorpe <jgg(a)nvidia.com> >> Link: https://lore.kernel.org/linux-iommu/20250422191554.GC1213339@ziepe.ca/ >> Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com> >> Reviewed-by: Dave Jiang <dave.jiang(a)intel.com> >> Reviewed-by: Robin Murphy <robin.murphy(a)arm.com> >> --- >> drivers/iommu/iommu.c | 21 ++++++++++++++++++--- >> 1 file changed, 18 insertions(+), 3 deletions(-) >> >> Change log: >> v3: >> - Convert all places checking domain->owner to the new helper. >> v2: https://lore.kernel.org/linux-iommu/20250423021839.2189204-1-baolu.lu@linux… >> - Make the solution generic for all static domains as suggested by >> Jason. >> v1: https://lore.kernel.org/linux-iommu/20250422075422.2084548-1-baolu.lu@linux… >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index 4f91a740c15f..b26fc3ed9f01 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -2204,6 +2204,19 @@ static void *iommu_make_pasid_array_entry(struct iommu_domain *domain, >> return xa_tag_pointer(domain, IOMMU_PASID_ARRAY_DOMAIN); >> } >> >> +static bool domain_iommu_ops_compatible(const struct iommu_ops *ops, >> +struct iommu_domain *domain) >> +{ >> +if (domain->owner == ops) >> +return true; >> + >> +/* For static domains, owner isn't set. */ >> +if (domain == ops->blocked_domain || domain == ops->identity_domain) >> +return true; >> + >> +return false; >> +} >> + >> static int __iommu_attach_group(struct iommu_domain *domain, >> struct iommu_group *group) >> { >> @@ -2214,7 +2227,8 @@ static int __iommu_attach_group(struct iommu_domain *domain, >> return -EBUSY; >> >> dev = iommu_group_first_dev(group); >> -if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner) >> +if (!dev_has_iommu(dev) || >> + !domain_iommu_ops_compatible(dev_iommu_ops(dev), domain)) >> return -EINVAL; >> >> return __iommu_group_set_domain(group, domain); >> @@ -3435,7 +3449,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, >> !ops->blocked_domain->ops->set_dev_pasid) >> return -EOPNOTSUPP; >> >> -if (ops != domain->owner || pasid == IOMMU_NO_PASID) >> +if (!domain_iommu_ops_compatible(ops, domain) || >> + pasid == IOMMU_NO_PASID) >> return -EINVAL; >> >> mutex_lock(&group->mutex); >> @@ -3511,7 +3526,7 @@ int iommu_replace_device_pasid(struct iommu_domain *domain, >> if (!domain->ops->set_dev_pasid) >> return -EOPNOTSUPP; >> >> -if (dev_iommu_ops(dev) != domain->owner || >> +if (!domain_iommu_ops_compatible(dev_iommu_ops(dev), domain) || >> pasid == IOMMU_NO_PASID || !handle) >> return -EINVAL; >> >> -- >> 2.43.0 >> >

8 months, 3 weeks

[alternative-merged] smaps-fix-crash-in-smaps_hugetlb_range-for-non-present-hugetlb-entries.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: smaps: fix crash in smaps_hugetlb_range for non-present hugetlb entries has been removed from the -mm tree. Its filename was smaps-fix-crash-in-smaps_hugetlb_range-for-non-present-hugetlb-entries.patch This patch was dropped because an alternative patch was or shall be merged ------------------------------------------------------ From: Ming Wang <wangming01(a)loongson.cn> Subject: smaps: fix crash in smaps_hugetlb_range for non-present hugetlb entries Date: Wed, 23 Apr 2025 09:03:59 +0800 When reading /proc/pid/smaps for a process that has mapped a hugetlbfs file with MAP_PRIVATE, the kernel might crash inside pfn_swap_entry_to_page. This occurs on LoongArch under specific conditions. The root cause involves several steps: 1. When the hugetlbfs file is mapped (MAP_PRIVATE), the initial PMD (or relevant level) entry is often populated by the kernel during mmap() with a non-present entry pointing to the architecture's invalid_pte_table On the affected LoongArch system, this address was observed to be 0x90000000031e4000. 2. The smaps walker (walk_hugetlb_range -> smaps_hugetlb_range) reads this entry. 3. The generic is_swap_pte() macro checks `!pte_present() && !pte_none()`. The entry (invalid_pte_table address) is not present. Crucially, the generic pte_none() check (`!(pte_val(pte) & ~_PAGE_GLOBAL)`) returns false because the invalid_pte_table address is non-zero. Therefore, is_swap_pte() incorrectly returns true. 4. The code enters the `else if (is_swap_pte(...))` block. 5. Inside this block, it checks `is_pfn_swap_entry()`. Due to a bit pattern coincidence in the invalid_pte_table address on LoongArch, the embedded generic `is_migration_entry()` check happens to return true (misinterpreting parts of the address as a migration type). 6. This leads to a call to pfn_swap_entry_to_page() with the bogus swap entry derived from the invalid table address. 7. pfn_swap_entry_to_page() extracts a meaningless PFN, finds an unrelated struct page, checks its lock status (unlocked), and hits the `BUG_ON(is_migration_entry(entry) && !PageLocked(p))` assertion. The original code's intent in the `else if` block seems aimed at handling potential migration entries, as indicated by the inner `is_pfn_swap_entry()` check. The issue arises because the outer `is_swap_pte()` check incorrectly includes the invalid table pointer case on LoongArch. This patch fixes the issue by changing the condition in smaps_hugetlb_range() from the broad `is_swap_pte()` to the specific `is_hugetlb_entry_migration()`. The `is_hugetlb_entry_migration()` helper function correctly handles this by first checking `huge_pte_none()`. Architectures like LoongArch can provide an override for `huge_pte_none()` that specifically recognizes the `invalid_pte_table` address as a "none" state for HugeTLB entries. This ensures `is_hugetlb_entry_migration()` returns false for the invalid entry, preventing the code from entering the faulty block. This change makes the code reflect the likely original intent (handling migration) more accurately and leverages architecture-specific helpers (`huge_pte_none`) to correctly interpret special PTE/PMD values in the HugeTLB context, fixing the crash on LoongArch without altering the generic is_swap_pte() behavior. Link: https://lkml.kernel.org/r/20250423010359.2030576-1-wangming01@loongson.cn Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Co-developed-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn> Signed-off-by: Ming Wang <wangming01(a)loongson.cn> Cc: Andrii Nakryiko <andrii(a)kernel.org> Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu> Cc: David Hildenbrand <david(a)redhat.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Huacai Chen <chenhuacai(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Joern Engel <joern(a)logfs.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.cz> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/proc/task_mmu.c~smaps-fix-crash-in-smaps_hugetlb_range-for-non-present-hugetlb-entries +++ a/fs/proc/task_mmu.c @@ -1027,7 +1027,7 @@ static int smaps_hugetlb_range(pte_t *pt if (pte_present(ptent)) { folio = page_folio(pte_page(ptent)); present = true; - } else if (is_swap_pte(ptent)) { + } else if (is_hugetlb_entry_migration(ptent)) { swp_entry_t swpent = pte_to_swp_entry(ptent); if (is_pfn_swap_entry(swpent)) _ Patches currently in -mm which might be from wangming01(a)loongson.cn are

8 months, 3 weeks

[PATCH] btrfs: adjust subpage bit start based on sectorsize

by Josef Bacik

When running machines with 64k page size and a 16k nodesize we started seeing tree log corruption in production. This turned out to be because we were not writing out dirty blocks sometimes, so this in fact affects all metadata writes. When writing out a subpage EB we scan the subpage bitmap for a dirty range. If the range isn't dirty we do bit_start++; to move onto the next bit. The problem is the bitmap is based on the number of sectors that an EB has. So in this case, we have a 64k pagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4 bits for every node. With a 64k page size we end up with 4 nodes per page. To make this easier this is how everything looks [0 16k 32k 48k ] logical address [0 4 8 12 ] radix tree offset [ 64k page ] folio [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers [ | | | | | | | | | | | | | | | | ] bitmap Now we use all of our addressing based on fs_info->sectorsize_bits, so as you can see the above our 16k eb->start turns into radix entry 4. When we find a dirty range for our eb, we correctly do bit_start += sectors_per_node, because if we start at bit 0, the next bit for the next eb is 4, to correspond to eb->start 16k. However if our range is clean, we will do bit_start++, which will now put us offset from our radix tree entries. In our case, assume that the first time we check the bitmap the block is not dirty, we increment bit_start so now it == 1, and then we loop around and check again. This time it is dirty, and we go to find that start using the following equation start = folio_start + bit_start * fs_info->sectorsize; so in the case above, eb->start 0 is now dirty, and we calculate start as 0 + 1 * fs_info->sectorsize = 4096 4096 >> 12 = 1 Now we're looking up the radix tree for 1, and we won't find an eb. What's worse is now we're using bit_start == 1, so we do bit_start += sectors_per_node, which is now 5. If that eb is dirty we will run into the same thing, we will look at an offset that is not populated in the radix tree, and now we're skipping the writeout of dirty extent buffers. The best fix for this is to not use sectorsize_bits to address nodes, but that's a larger change. Since this is a fs corruption problem fix it simply by always using sectors_per_node to increment the start bit. cc: stable(a)vger.kernel.org Fixes: c4aec299fa8f ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page") Reviewed-by: Boris Burkov <boris(a)bur.io> Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> --- - Further testing indicated that the page tagging theoretical race isn't getting hit in practice, so we're going to limit the "hotfix" to this specific patch, and then send subsequent patches to address the other issues we're hitting. My simplify metadata writebback patches are the more wholistic fix. fs/btrfs/extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5f08615b334f..6cfd286b8bbc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2034,7 +2034,7 @@ static int submit_eb_subpage(struct folio *folio, struct writeback_control *wbc) subpage->bitmaps)) { spin_unlock_irqrestore(&subpage->lock, flags); spin_unlock(&folio->mapping->i_private_lock); - bit_start++; + bit_start += sectors_per_node; continue; } -- 2.48.1

8 months, 3 weeks

[PATCH v2] x86/sev: Fix SNP guest kdump hang/softlockup/panic

by Ashish Kalra

From: Ashish Kalra <ashish.kalra(a)amd.com> When kdump is running makedumpfile to generate vmcore and dumping SNP guest memory it touches the VMSA page of the vCPU executing kdump which then results in unrecoverable #NPF/RMP faults as the VMSA page is marked busy/in-use when the vCPU is running. This leads to guest softlockup/hang: [ 117.111097] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [cp:318] [ 117.111165] CPU: 0 UID: 0 PID: 318 Comm: cp Not tainted 6.14.0-next-20250328-snp-host-f2a41ff576cc-dirty #414 VOLUNTARY [ 117.111171] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 [ 117.111176] RIP: 0010:rep_movs_alternative+0x5b/0x70 [ 117.111200] Call Trace: [ 117.111204] <TASK> [ 117.111206] ? _copy_to_iter+0xc1/0x720 [ 117.111216] ? srso_return_thunk+0x5/0x5f [ 117.111220] ? _raw_spin_unlock+0x27/0x40 [ 117.111234] ? srso_return_thunk+0x5/0x5f [ 117.111236] ? find_vmap_area+0xd6/0xf0 [ 117.111251] ? srso_return_thunk+0x5/0x5f [ 117.111253] ? __check_object_size+0x18d/0x2e0 [ 117.111268] __copy_oldmem_page.part.0+0x64/0xa0 [ 117.111281] copy_oldmem_page_encrypted+0x1d/0x30 [ 117.111285] read_from_oldmem.part.0+0xf4/0x200 [ 117.111306] read_vmcore+0x206/0x3c0 [ 117.111309] ? srso_return_thunk+0x5/0x5f [ 117.111325] proc_reg_read_iter+0x59/0x90 [ 117.111334] vfs_read+0x26e/0x350 Additionally other APs may be halted in guest mode and their VMSA pages are marked busy and touching these VMSA pages during guest memory dump will also cause #NPF. Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out of guest mode and then clear the VMSA bit on their VMSA pages. If the vCPU running kdump is an AP, mark it's VMSA page as offline to ensure that makedumpfile excludes that page while dumping guest memory. Cc: stable(a)vger.kernel.org Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com> --- arch/x86/coco/sev/core.c | 129 ++++++++++++++++++++++++++++++--------- 1 file changed, 101 insertions(+), 28 deletions(-) diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index dcfaa698d6cf..870f4994a13d 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -113,6 +113,8 @@ DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa); DEFINE_PER_CPU(struct svsm_ca *, svsm_caa); DEFINE_PER_CPU(u64, svsm_caa_pa); +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id); + static __always_inline bool on_vc_stack(struct pt_regs *regs) { unsigned long sp = regs->sp; @@ -877,6 +879,42 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end) set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE); } +static int issue_vmgexit_ap_create_destroy(u64 event, struct sev_es_save_area *vmsa, u32 apic_id) +{ + struct ghcb_state state; + unsigned long flags; + struct ghcb *ghcb; + int ret = 0; + + local_irq_save(flags); + + ghcb = __sev_get_ghcb(&state); + + vc_ghcb_invalidate(ghcb); + ghcb_set_rax(ghcb, vmsa->sev_features); + ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION); + ghcb_set_sw_exit_info_1(ghcb, + ((u64)apic_id << 32) | + ((u64)snp_vmpl << 16) | + event); + ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa)); + + sev_es_wr_ghcb_msr(__pa(ghcb)); + VMGEXIT(); + + if (!ghcb_sw_exit_info_1_is_valid(ghcb) || + lower_32_bits(ghcb->save.sw_exit_info_1)) { + pr_err("SNP AP %s error\n", (event == SVM_VMGEXIT_AP_CREATE ? "CREATE" : "DESTROY")); + ret = -EINVAL; + } + + __sev_put_ghcb(&state); + + local_irq_restore(flags); + + return ret; +} + static void set_pte_enc(pte_t *kpte, int level, void *va) { struct pte_enc_desc d = { @@ -973,6 +1011,66 @@ void snp_kexec_begin(void) pr_warn("Failed to stop shared<->private conversions\n"); } +/* + * Shutdown all APs except the one handling kexec/kdump and clearing + * the VMSA tag on AP's VMSA pages as they are not being used as + * VMSA page anymore. + */ +static void snp_shutdown_all_aps(void) +{ + struct sev_es_save_area *vmsa; + int apic_id, cpu; + + /* + * APs are already in HLT loop when kexec_finish() is invoked. + */ + for_each_present_cpu(cpu) { + vmsa = per_cpu(sev_vmsa, cpu); + + /* + * BSP does not have guest allocated VMSA, so it's in-use/busy + * VMSA cannot touch a guest page and there is no need to clear + * the VMSA tag for this page. + */ + if (!vmsa) + continue; + + /* + * Cannot clear the VMSA tag for the currently running vCPU. + */ + if (get_cpu() == cpu) { + unsigned long pa; + struct page *p; + + pa = __pa(vmsa); + p = pfn_to_online_page(pa >> PAGE_SHIFT); + /* + * Mark the VMSA page of the running vCPU as Offline + * so that is excluded and not touched by makedumpfile + * while generating vmcore during kdump boot. + */ + if (p) + __SetPageOffline(p); + put_cpu(); + continue; + } + put_cpu(); + + apic_id = cpuid_to_apicid[cpu]; + + /* + * Issue AP destroy on all APs (to ensure they are kicked out + * of guest mode) to allow using RMPADJUST to remove the VMSA + * tag on VMSA pages especially for guests that allow HLT to + * not be intercepted. + */ + + issue_vmgexit_ap_create_destroy(SVM_VMGEXIT_AP_DESTROY, vmsa, apic_id); + + snp_cleanup_vmsa(vmsa, apic_id); + } +} + void snp_kexec_finish(void) { struct sev_es_runtime_data *data; @@ -987,6 +1085,8 @@ void snp_kexec_finish(void) if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; + snp_shutdown_all_aps(); + unshare_all_memory(); /* @@ -1098,10 +1198,7 @@ static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id) static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip) { struct sev_es_save_area *cur_vmsa, *vmsa; - struct ghcb_state state; struct svsm_ca *caa; - unsigned long flags; - struct ghcb *ghcb; u8 sipi_vector; int cpu, ret; u64 cr4; @@ -1215,31 +1312,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip) } /* Issue VMGEXIT AP Creation NAE event */ - local_irq_save(flags); - - ghcb = __sev_get_ghcb(&state); - - vc_ghcb_invalidate(ghcb); - ghcb_set_rax(ghcb, vmsa->sev_features); - ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION); - ghcb_set_sw_exit_info_1(ghcb, - ((u64)apic_id << 32) | - ((u64)snp_vmpl << 16) | - SVM_VMGEXIT_AP_CREATE); - ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa)); - - sev_es_wr_ghcb_msr(__pa(ghcb)); - VMGEXIT(); - - if (!ghcb_sw_exit_info_1_is_valid(ghcb) || - lower_32_bits(ghcb->save.sw_exit_info_1)) { - pr_err("SNP AP Creation error\n"); - ret = -EINVAL; - } - - __sev_put_ghcb(&state); - - local_irq_restore(flags); + ret = issue_vmgexit_ap_create_destroy(SVM_VMGEXIT_AP_CREATE, vmsa, apic_id); /* Perform cleanup if there was an error */ if (ret) { -- 2.34.1

8 months, 3 weeks

[PATCH] x86/sev: Fix making shared pages private during kdump

by Ashish Kalra

From: Ashish Kalra <ashish.kalra(a)amd.com> When the shared pages are being made private during kdump preparation there are additional checks to handle shared GHCB pages. These additional checks include handling the case of GHCB page being contained within a 2MB page. There is a bug in this additional check for GHCB page contained within a 2MB page which causes any shared page just below the per-cpu GHCB getting skipped from being transitioned back to private before kdump preparation which subsequently causes a 0x404 #VC exception when this shared page is accessed later while dumping guest memory during vmcore generation via kdump. Correct the detection and handling of GHCB pages contained within a 2MB page. Cc: stable(a)vger.kernel.org Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com> --- arch/x86/coco/sev/core.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index 2c27d4b3985c..16d874f4dcd3 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -926,7 +926,13 @@ static void unshare_all_memory(void) data = per_cpu(runtime_data, cpu); ghcb = (unsigned long)&data->ghcb_page; - if (addr <= ghcb && ghcb <= addr + size) { + /* Handle the case of 2MB page containing the GHCB page */ + if (level == PG_LEVEL_4K && addr == ghcb) { + skipped_addr = true; + break; + } + if (level > PG_LEVEL_4K && addr <= ghcb && + ghcb < addr + size) { skipped_addr = true; break; } @@ -1106,6 +1112,9 @@ void snp_kexec_finish(void) ghcb = &data->ghcb_page; pte = lookup_address((unsigned long)ghcb, &level); size = page_level_size(level); + /* Handle the case of 2MB page containing the GHCB page */ + if (level > PG_LEVEL_4K) + ghcb = (struct ghcb *)((unsigned long)ghcb & PMD_MASK); set_pte_enc(pte, level, (void *)ghcb); snp_set_memory_private((unsigned long)ghcb, (size / PAGE_SIZE)); } -- 2.34.1

8 months, 3 weeks

[PATCH net] gve: Add adminq lock for creating and destroying multiple queues

by Harshitha Ramamurthy

From: Ziwei Xiao <ziweixiao(a)google.com> The original adminq lock is only protecting the gve_adminq_execute_cmd which is aimed for sending out single adminq command. However, there are other callers of gve_adminq_kick_and_wait and gve_adminq_issue_cmd that need to take the mutex lock for mutual exclusion between them, which are creating and destroying rx/tx queues. Add the adminq lock for those unprotected callers. Also this patch cleans up the error handling code of gve_adminq_destroy_tx_queue. Cc: stable(a)vger.kernel.org Fixes: 1108566ca509 ("gve: Add adminq mutex lock") Reviewed-by: Willem de Bruijn <willemb(a)google.com> Signed-off-by: Ziwei Xiao <ziweixiao(a)google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy(a)google.com> --- drivers/net/ethernet/google/gve/gve_adminq.c | 54 ++++++++++++++------ 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c index 3e8fc33cc11f..659460812276 100644 --- a/drivers/net/ethernet/google/gve/gve_adminq.c +++ b/drivers/net/ethernet/google/gve/gve_adminq.c @@ -442,6 +442,8 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv) int tail, head; int i; + lockdep_assert_held(&priv->adminq_lock); + tail = ioread32be(&priv->reg_bar0->adminq_event_counter); head = priv->adminq_prod_cnt; @@ -467,9 +469,6 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv) return 0; } -/* This function is not threadsafe - the caller is responsible for any - * necessary locks. - */ static int gve_adminq_issue_cmd(struct gve_priv *priv, union gve_adminq_command *cmd_orig) { @@ -477,6 +476,8 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv, u32 opcode; u32 tail; + lockdep_assert_held(&priv->adminq_lock); + tail = ioread32be(&priv->reg_bar0->adminq_event_counter); // Check if next command will overflow the buffer. @@ -709,13 +710,19 @@ int gve_adminq_create_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_que int err; int i; + mutex_lock(&priv->adminq_lock); + for (i = start_id; i < start_id + num_queues; i++) { err = gve_adminq_create_tx_queue(priv, i); if (err) - return err; + goto out; } - return gve_adminq_kick_and_wait(priv); + err = gve_adminq_kick_and_wait(priv); + +out: + mutex_unlock(&priv->adminq_lock); + return err; } static void gve_adminq_get_create_rx_queue_cmd(struct gve_priv *priv, @@ -788,19 +795,24 @@ int gve_adminq_create_rx_queues(struct gve_priv *priv, u32 num_queues) int err; int i; + mutex_lock(&priv->adminq_lock); + for (i = 0; i < num_queues; i++) { err = gve_adminq_create_rx_queue(priv, i); if (err) - return err; + goto out; } - return gve_adminq_kick_and_wait(priv); + err = gve_adminq_kick_and_wait(priv); + +out: + mutex_unlock(&priv->adminq_lock); + return err; } static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index) { union gve_adminq_command cmd; - int err; memset(&cmd, 0, sizeof(cmd)); cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE); @@ -808,11 +820,7 @@ static int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index) .queue_id = cpu_to_be32(queue_index), }; - err = gve_adminq_issue_cmd(priv, &cmd); - if (err) - return err; - - return 0; + return gve_adminq_issue_cmd(priv, &cmd); } int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_queues) @@ -820,13 +828,19 @@ int gve_adminq_destroy_tx_queues(struct gve_priv *priv, u32 start_id, u32 num_qu int err; int i; + mutex_lock(&priv->adminq_lock); + for (i = start_id; i < start_id + num_queues; i++) { err = gve_adminq_destroy_tx_queue(priv, i); if (err) - return err; + goto out; } - return gve_adminq_kick_and_wait(priv); + err = gve_adminq_kick_and_wait(priv); + +out: + mutex_unlock(&priv->adminq_lock); + return err; } static void gve_adminq_make_destroy_rx_queue_cmd(union gve_adminq_command *cmd, @@ -861,13 +875,19 @@ int gve_adminq_destroy_rx_queues(struct gve_priv *priv, u32 num_queues) int err; int i; + mutex_lock(&priv->adminq_lock); + for (i = 0; i < num_queues; i++) { err = gve_adminq_destroy_rx_queue(priv, i); if (err) - return err; + goto out; } - return gve_adminq_kick_and_wait(priv); + err = gve_adminq_kick_and_wait(priv); + +out: + mutex_unlock(&priv->adminq_lock); + return err; } static void gve_set_default_desc_cnt(struct gve_priv *priv, -- 2.49.0.777.g153de2bbd5-goog

8 months, 3 weeks

[PATCH] x86/insn: Fix CTEST instruction decoding

by Kirill A. Shutemov

insn_decoder_test found a problem with decoding APX CTEST instruction: Found an x86 instruction decoder bug, please report this. ffffffff810021df 62 54 94 05 85 ff ctestneq objdump says 6 bytes, but insn_get_length() says 5 It happens because x86-opcode-map.txt doesn't specify arguments for the instruction and the decoder doesn't expect to see ModRM byte. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Fixes: 690ca3a3067f ("x86/insn: Add support for APX EVEX instructions to the opcode map") Cc: stable(a)vger.kernel.org # v6.10+ Cc: Adrian Hunter <adrian.hunter(a)intel.com> --- arch/x86/lib/x86-opcode-map.txt | 4 ++-- tools/arch/x86/lib/x86-opcode-map.txt | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index caedb3ef6688..f5dd84eb55dc 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -996,8 +996,8 @@ AVXcode: 4 83: Grp1 Ev,Ib (1A),(es) # CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL, # CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ -84: CTESTSCC (ev) -85: CTESTSCC (es) | CTESTSCC (66),(es) +84: CTESTSCC Eb,Gb (ev) +85: CTESTSCC Ev,Gv (es) | CTESTSCC Ev,Gv (66),(es) 88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es) 8f: POP2 Bq,Rq (000),(11B),(ev) a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es) diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt index caedb3ef6688..f5dd84eb55dc 100644 --- a/tools/arch/x86/lib/x86-opcode-map.txt +++ b/tools/arch/x86/lib/x86-opcode-map.txt @@ -996,8 +996,8 @@ AVXcode: 4 83: Grp1 Ev,Ib (1A),(es) # CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL, # CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ -84: CTESTSCC (ev) -85: CTESTSCC (es) | CTESTSCC (66),(es) +84: CTESTSCC Eb,Gb (ev) +85: CTESTSCC Ev,Gv (es) | CTESTSCC Ev,Gv (66),(es) 88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es) 8f: POP2 Bq,Rq (000),(11B),(ev) a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es) -- 2.47.2

8 months, 3 weeks

[PATCH v2] mm, slab: clean up slab->obj_exts always

by Zhenhua Huang

When memory allocation profiling is disabled at runtime or due to an error, shutdown_mem_profiling() is called: slab->obj_exts which previously allocated remains. It won't be cleared by unaccount_slab() because of mem_alloc_profiling_enabled() not true. It's incorrect, slab->obj_exts should always be cleaned up in unaccount_slab() to avoid following error: [...]BUG: Bad page state in process... .. [...]page dumped because: page still charged to cgroup Cc: stable(a)vger.kernel.org Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions") Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com> Acked-by: David Rientjes <rientjes(a)google.com> Acked-by: Harry Yoo <harry.yoo(a)oracle.com> Tested-by: Harry Yoo <harry.yoo(a)oracle.com> --- mm/slub.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 566eb8b8282d..a98ce1426076 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2028,8 +2028,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, return 0; } -/* Should be called only if mem_alloc_profiling_enabled() */ -static noinline void free_slab_obj_exts(struct slab *slab) +/* Free only if slab_obj_exts(slab) */ +static inline void free_slab_obj_exts(struct slab *slab) { struct slabobj_ext *obj_exts; @@ -2601,8 +2601,12 @@ static __always_inline void account_slab(struct slab *slab, int order, static __always_inline void unaccount_slab(struct slab *slab, int order, struct kmem_cache *s) { - if (memcg_kmem_online() || need_slab_obj_ext()) - free_slab_obj_exts(slab); + /* + * The slab object extensions should now be freed regardless of + * whether mem_alloc_profiling_enabled() or not because profiling + * might have been disabled after slab->obj_exts got allocated. + */ + free_slab_obj_exts(slab); mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s), -(PAGE_SIZE << order)); -- 2.34.1

8 months, 3 weeks

[PATCH v2 2/2] usb: typec: ucsi: displayport: Fix NULL pointer access

by Andrei Kuchynski

This patch ensures that the UCSI driver waits for all pending tasks in the ucsi_displayport_work workqueue to finish executing before proceeding with the partner removal. Cc: stable(a)vger.kernel.org Fixes: af8622f6a585 ("usb: typec: ucsi: Support for DisplayPort alt mode") Signed-off-by: Andrei Kuchynski <akuchynski(a)chromium.org> Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com> --- drivers/usb/typec/ucsi/displayport.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/typec/ucsi/displayport.c b/drivers/usb/typec/ucsi/displayport.c index acd053d4e38c..8aae80b457d7 100644 --- a/drivers/usb/typec/ucsi/displayport.c +++ b/drivers/usb/typec/ucsi/displayport.c @@ -299,6 +299,8 @@ void ucsi_displayport_remove_partner(struct typec_altmode *alt) if (!dp) return; + cancel_work_sync(&dp->work); + dp->data.conf = 0; dp->data.status = 0; dp->initialized = false; -- 2.49.0.805.g082f7c87e0-goog

8 months, 3 weeks

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2025