[PATCH] x86/mm: Fix flush_tlb_range() when used for zapping normal PMDs

List overview All Threads
Download

newer

older

[PATCH 6.1] drm/amd/display: fix...

Re: Patch "drm/ttm: Add...

Jann Horn

3 Jan 2025 3 Jan '25

6:39 p.m.

On the following path, flush_tlb_range() can be used for zapping normal PMD entries (PMD entries that point to page tables) together with the PTE entries in the pointed-to page table:

collapse_pte_mapped_thp pmdp_collapse_flush flush_tlb_range

The arm64 version of flush_tlb_range() has a comment describing that it can be used for page table removal, and does not use any last-level invalidation optimizations. Fix the X86 version by making it behave the same way.

Currently, X86 only uses this information for the following two purposes, which I think means the issue doesn't have much impact:

- In native_flush_tlb_multi() for checking if lazy TLB CPUs need to be IPI'd to avoid issues with speculative page table walks. - In Hyper-V TLB paravirtualization, again for lazy TLB stuff.

The patch "x86/mm: only invalidate final translations with INVLPGB" which is currently under review (see https://lore.kernel.org/all/20241230175550.4046587-13-riel@surriel.com/) would probably be making the impact of this a lot worse.

Cc: stable@vger.kernel.org Fixes: 016c4d92cd16 ("x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range") Signed-off-by: Jann Horn jannh@google.com --- arch/x86/include/asm/tlbflush.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac70166618d8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \ - : PAGE_SHIFT, false) + : PAGE_SHIFT, true)

extern void flush_tlb_all(void); extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,

--- base-commit: aa135d1d0902c49ed45bec98c61c1b4022652b7e change-id: 20250103-x86-collapse-flush-fix-fa87ac4d5834

-- Jann Horn jannh@google.com

Show replies by date

Rik van Riel

3 Jan 3 Jan

9:55 p.m.

On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:

...

02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac70166618d 8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \
		: PAGE_SHIFT, false)
		: PAGE_SHIFT, true)

The code looks good, but should this macro get a comment indicating that code that only frees pages, but not page tables, should be calling flush_tlb() instead?

-- All Rights Reversed.

Jann Horn

10:11 p.m.

On Fri, Jan 3, 2025 at 10:55 PM Rik van Riel riel@surriel.com wrote:

...

On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:

...
02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac70166618d 8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \
                        : PAGE_SHIFT, false)
                        : PAGE_SHIFT, true)
The code looks good, but should this macro get a comment indicating that code that only frees pages, but not page tables, should be calling flush_tlb() instead?

Documentation/core-api/cachetlb.rst seems to be the common place that's supposed to document the rules - the macro I'm touching is just the x86 implementation. (The arm64 implementation also has some fairly extensive comments that say flush_tlb_range() "also invalidates any walk-cache entries associated with translations for the specified address range" while flush_tlb_page() "only invalidates a single, last-level page-table entry and therefore does not affect any walk-caches".) I wouldn't want to add yet more documentation for this API inside the X86 code. I guess it would make sense to add pointers from the x86 code to the documentation (and copy the details about last-level TLBs from the arm64 code into the docs).

I don't see a function flush_tlb() outside of some (non-x86) arch code.

I don't know if it makes sense to tell developers to not use flush_tlb_range() for freeing pages. If the performance of flush_tlb_range() actually is an issue, I guess one fix would be to refactor this and add a parameter or something?

Rik van Riel

4 Jan 4 Jan

3:09 a.m.

On Fri, 2025-01-03 at 23:11 +0100, Jann Horn wrote:

...

On Fri, Jan 3, 2025 at 10:55 PM Rik van Riel riel@surriel.com wrote:

...
On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:

...
02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac7016 6618d 8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \ - : PAGE_SHIFT, false) + : PAGE_SHIFT, true)

The code looks good, but should this macro get a comment indicating that code that only frees pages, but not page tables, should be calling flush_tlb() instead?

Documentation/core-api/cachetlb.rst seems to be the common place that's supposed to document the rules - the macro I'm touching is just the x86 implementation. (The arm64 implementation also has some fairly extensive comments that say flush_tlb_range() "also invalidates any walk-cache entries associated with translations for the specified address range" while flush_tlb_page() "only invalidates a single, last-level page-table entry and therefore does not affect any walk-caches".) I wouldn't want to add yet more documentation for this API inside the X86 code. I guess it would make sense to add pointers from the x86 code to the documentation (and copy the details about last-level TLBs from the arm64 code into the docs).

I don't see a function flush_tlb() outside of some (non-x86) arch code.

I see zap_pte_range() calling tlb_flush_mmu(), which calls tlb_flush_mmu_tlbonly() in include/asm-generic/tlb.h, which in turn calls tlb_flush().

The asm-generic version of tlb_flush() goes through flush_tlb_mm(), which on x86 would call flush_tlb_mm_range with flush_tables = true.

Luckily x86 seems to have its own implementation of tlb_flush(), which avoids that issue.

...

I don't know if it makes sense to tell developers to not use flush_tlb_range() for freeing pages. If the performance of flush_tlb_range() actually is an issue, I guess one fix would be to refactor this and add a parameter or something?

I don't know whether this is a real issue on architectures other than x86.

For now it looks like the code does the right thing when only pages are being freed, so we may not need that parameter.

-- All Rights Reversed.

Jann Horn

6 Jan 6 Jan

1:25 p.m.

On Sat, Jan 4, 2025 at 4:09 AM Rik van Riel riel@surriel.com wrote:

...

On Fri, 2025-01-03 at 23:11 +0100, Jann Horn wrote:

...
On Fri, Jan 3, 2025 at 10:55 PM Rik van Riel riel@surriel.com wrote:

...
On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:

...
02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac7016 6618d 8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \
                        : PAGE_SHIFT, false)
                        : PAGE_SHIFT, true)
The code looks good, but should this macro get a comment indicating that code that only frees pages, but not page tables, should be calling flush_tlb() instead?
Documentation/core-api/cachetlb.rst seems to be the common place that's supposed to document the rules - the macro I'm touching is just the x86 implementation. (The arm64 implementation also has some fairly extensive comments that say flush_tlb_range() "also invalidates any walk-cache entries associated with translations for the specified address range" while flush_tlb_page() "only invalidates a single, last-level page-table entry and therefore does not affect any walk-caches".) I wouldn't want to add yet more documentation for this API inside the X86 code. I guess it would make sense to add pointers from the x86 code to the documentation (and copy the details about last-level TLBs from the arm64 code into the docs).

I don't see a function flush_tlb() outside of some (non-x86) arch code.
I see zap_pte_range() calling tlb_flush_mmu(), which calls tlb_flush_mmu_tlbonly() in include/asm-generic/tlb.h, which in turn calls tlb_flush().

The asm-generic version of tlb_flush() goes through flush_tlb_mm(), which on x86 would call flush_tlb_mm_range with flush_tables = true.

Luckily x86 seems to have its own implementation of tlb_flush(), which avoids that issue.

Aah, right. Yeah, I think the tlb_flush() infrastructure with "struct mmu_gather" is probably one of the two really optimized TLB flushing hotpaths (the other one being the reclaim path).

I think tlb_flush() is for somewhat different use cases though - my understanding is that it is mainly for operations that need batching and/or want to delay TLB flushes while dropping page table locks.

...

...
I don't know if it makes sense to tell developers to not use flush_tlb_range() for freeing pages. If the performance of flush_tlb_range() actually is an issue, I guess one fix would be to refactor this and add a parameter or something?

I don't know whether this is a real issue on architectures other than x86.

arm64 seems to have code specifically for doing flushes without affecting cached higher-level entries - __flush_tlb_range_nosync() receives a "last_level" parameter (which is plumbed through from the arm64 version of tlb_flush()) and picks "vale1is" or "vae1is" depending on it.

...

For now it looks like the code does the right thing when only pages are being freed, so we may not need that parameter.

-- All Rights Reversed.

Peter Zijlstra

22 Jan 22 Jan

10:33 a.m.

On Fri, Jan 03, 2025 at 11:11:51PM +0100, Jann Horn wrote:

...

On Fri, Jan 3, 2025 at 10:55 PM Rik van Riel riel@surriel.com wrote:

...
On Fri, 2025-01-03 at 19:39 +0100, Jann Horn wrote:

...
02fc2aa06e9e0ecdba3fe948cafe5892b72e86c0..3da645139748538daac70166618d 8ad95116eb74 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \
                        : PAGE_SHIFT, false)
                        : PAGE_SHIFT, true)
The code looks good, but should this macro get a comment indicating that code that only frees pages, but not page tables, should be calling flush_tlb() instead?
Documentation/core-api/cachetlb.rst seems to be the common place that's supposed to document the rules - the macro I'm touching is just the x86 implementation. (The arm64 implementation also has some fairly extensive comments that say flush_tlb_range() "also invalidates any walk-cache entries associated with translations for the specified address range" while flush_tlb_page() "only invalidates a single, last-level page-table entry and therefore does not affect any walk-caches".) I wouldn't want to add yet more documentation for this API inside the X86 code. I guess it would make sense to add pointers from the x86 code to the documentation (and copy the details about last-level TLBs from the arm64 code into the docs).

Right, that documentation update might be nice.

Anyway, I'm picking up this patch for tip/x86/mm once -rc1 happens.

Thanks!

tip-bot2 for Jann Horn

3 Feb 3 Feb

12:50 p.m.

New subject: [tip: x86/mm] x86/mm: Fix flush_tlb_range() when used for zapping normal PMDs

The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 3ef938c3503563bfc2ac15083557f880d29c2e64 Gitweb: https://git.kernel.org/tip/3ef938c3503563bfc2ac15083557f880d29c2e64 Author: Jann Horn jannh@google.com AuthorDate: Fri, 03 Jan 2025 19:39:38 +01:00 Committer: Peter Zijlstra peterz@infradead.org CommitterDate: Mon, 03 Feb 2025 11:46:03 +01:00

x86/mm: Fix flush_tlb_range() when used for zapping normal PMDs

On the following path, flush_tlb_range() can be used for zapping normal PMD entries (PMD entries that point to page tables) together with the PTE entries in the pointed-to page table:

collapse_pte_mapped_thp pmdp_collapse_flush flush_tlb_range

Currently, X86 only uses this information for the following two purposes, which I think means the issue doesn't have much impact:

- In native_flush_tlb_multi() for checking if lazy TLB CPUs need to be IPI'd to avoid issues with speculative page table walks. - In Hyper-V TLB paravirtualization, again for lazy TLB stuff.

Fixes: 016c4d92cd16 ("x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250103-x86-collapse-flush-fix-v1-1-3c521856cfa6@... --- arch/x86/include/asm/tlbflush.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 02fc2aa..3da6451 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -242,7 +242,7 @@ void flush_tlb_multi(const struct cpumask *cpumask, flush_tlb_mm_range((vma)->vm_mm, start, end, \ ((vma)->vm_flags & VM_HUGETLB) \ ? huge_page_shift(hstate_vma(vma)) \ - : PAGE_SHIFT, false) + : PAGE_SHIFT, true)

extern void flush_tlb_all(void); extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,

148

days inactive

179

days old

linux-stable-mirror@lists.linaro.org

6 comments

participants

tags (0)

participants (4)

Jann Horn
Peter Zijlstra
Rik van Riel
tip-bot2 for Jann Horn