Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

6 Aug 2025


      On 8/6/25 09:09, Jason Gunthorpe wrote:
...
...
...
You can't do this approach without also pushing the pages to freed on
a list and defering the free till the work. This is broadly what the
normal mm user flow is doing..
FWIW, I think the simplest way to do this is to plop an unconditional
schedule_work() in pte_free_kernel(). The work function will invalidate
the IOTLBs and then free the page.
Keep the schedule_work() unconditional to keep it simple. The
schedule_work() is way cheaper than all the system-wide TLB invalidation
IPIs that have to get sent as well. No need to add complexity to
optimize out something that's in the noise already.
That works also, but now you have to allocate memory or you are
dead.. Is it OK these days, and safe in this code which seems a little
bit linked to memory management?
The MM side avoided this by putting the list and the rcu_head in the
struct page.
I don't think you need to allocate memory. A little static structure
that uses the page->list and has a lock should do. Logically something
like this:
struct kernel_pgtable_work
{
    struct list_head list;
    spinlock_t lock;
    struct work_struct work;
} kernel_pte_work;
pte_free_kernel()
{
    struct page *page = ptdesc_magic();
guard(spinlock)(&kernel_pte_work.lock);
    
    list_add(&page->list, &kernel_pte_work.list);
    schedule_work(&kernel_pte_work.work);
}
work_func()
{
    iommu_sva_invalidate_kva();
guard(spinlock)(&kernel_pte_work.lock);
list_for_each_safe() {
    	page = container_of(...);
    	free_whatever(page);
    }
}
The only wrinkle is that pte_free_kernel() itself still has a pte and
'ptdesc', not a 'struct page'. But there is ptdesc->pt_list, which
should be unused at this point, especially for non-pgd pages on x86.
So, either go over to the 'struct page' earlier (maybe by open-coding
pagetable_dtor_free()?), or just use the ptdesc.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush