Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI

9 Dec 2024


      On Mon, 09 Dec 2024 13:04:43 +0100
Valentin Schneider vschneid@redhat.com wrote:
...
On 05/12/24 18:31, Petr Tesarik wrote:
...
On Thu, 21 Nov 2024 16:30:16 +0100
Peter Zijlstra peterz@infradead.org wrote:
...
On Thu, Nov 21, 2024 at 07:07:44AM -0800, Dave Hansen wrote:
...
On 11/21/24 03:12, Peter Zijlstra wrote:
...
...
I see e.g. ds_clear_cea() clears PTEs that can have the _PAGE_GLOBAL flag,
and it correctly uses the non-deferrable flush_tlb_kernel_range().
I always forget what we use global pages for, dhansen might know, but
let me try and have a look.
I *think* we only have GLOBAL on kernel text, and that only sometimes.
I think you're remembering how _PAGE_GLOBAL gets used when KPTI is in play.
Yah, I suppose I am. That was the last time I had a good look at this
stuff :-)
...
Ignoring KPTI for a sec... We use _PAGE_GLOBAL for all kernel mappings.
Before PCIDs, global mappings let the kernel TLB entries live across CR3
writes. When PCIDs are in play, global mappings let two different ASIDs
share TLB entries.
Hurmph.. bah. That means we do need that horrible CR4 dance :/
In general, yes.
But I wonder what exactly was the original scenario encountered by
Valentin. I mean, if TLB entry invalidations were necessary to sync
changes to kernel text after flipping a static branch, then it might be
less overhead to make a list of affected pages and call INVLPG on them.
AFAIK there is currently no such IPI function for doing that, but if we
could add one. If the list of invalidated global pages is reasonably
short, of course.
Valentin, do you happen to know?
So from my experimentation (hackbench + kernel compilation on housekeeping
CPUs, dummy while(1) userspace loop on isolated CPUs), the TLB flushes only
occurred from vunmap() - mainly from all the hackbench threads coming and
going.
Static branch updates only seem to trigger the sync_core() IPI, at least on
x86.
Thank you, this is helpful.
So, these allocations span more than tlb_single_page_flush_ceiling
pages (default 33). Is THP enabled? If yes, we could possibly get below
that threshold by improving flushing of huge pages (cf. footnote [1] in
Documentation/arch/x86/tlb.rst).
OTOH even though a series of INVLPG may reduce subsequent TLB misses,
it will not exactly improve latency, so it would go against the main
goal of this whole patch series.
Hmmm... I see, the CR4 dance is the best solution after all. :-|
Petr T

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [RFC PATCH v3 13/15] context_tracking,x86: Add infrastructure to defer kernel TLBI