On Fri, 2025-01-03 at 23:11 +0100, Jann Horn wrote:
On Fri, Jan 3, 2025 at 10:55 PM Rik van Riel riel@surriel.com wrote:
On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:
02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac7016 6618d 8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \ - : PAGE_SHIFT, false) + : PAGE_SHIFT, true)
The code looks good, but should this macro get a comment indicating that code that only frees pages, but not page tables, should be calling flush_tlb() instead?
Documentation/core-api/cachetlb.rst seems to be the common place that's supposed to document the rules - the macro I'm touching is just the x86 implementation. (The arm64 implementation also has some fairly extensive comments that say flush_tlb_range() "also invalidates any walk-cache entries associated with translations for the specified address range" while flush_tlb_page() "only invalidates a single, last-level page-table entry and therefore does not affect any walk-caches".) I wouldn't want to add yet more documentation for this API inside the X86 code. I guess it would make sense to add pointers from the x86 code to the documentation (and copy the details about last-level TLBs from the arm64 code into the docs).
I don't see a function flush_tlb() outside of some (non-x86) arch code.
I see zap_pte_range() calling tlb_flush_mmu(), which calls tlb_flush_mmu_tlbonly() in include/asm-generic/tlb.h, which in turn calls tlb_flush().
The asm-generic version of tlb_flush() goes through flush_tlb_mm(), which on x86 would call flush_tlb_mm_range with flush_tables = true.
Luckily x86 seems to have its own implementation of tlb_flush(), which avoids that issue.
I don't know if it makes sense to tell developers to not use flush_tlb_range() for freeing pages. If the performance of flush_tlb_range() actually is an issue, I guess one fix would be to refactor this and add a parameter or something?
I don't know whether this is a real issue on architectures other than x86.
For now it looks like the code does the right thing when only pages are being freed, so we may not need that parameter.