Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

20 Feb 2025


      On 19/02/25 12:25, Dave Hansen wrote:
...
On 2/19/25 07:13, Valentin Schneider wrote:
...
...
Maybe I missed part of the discussion though. Is VMEMMAP your only
concern? I would have guessed that the more generic vmalloc()
functionality would be harder to pin down.
Urgh, that'll teach me to send emails that late - I did indeed mean the
vmalloc() range, not at all VMEMMAP. IIUC *neither* are present in the user
kPTI page table and AFAICT the page table swap is done before the actual vmap'd
stack (CONFIG_VMAP_STACK=y) gets used.
OK, so rewriting your question... ;)
...
So what if the vmalloc() range *isn't* in the CR3 tree when a CPU is
executing in userspace?
The LDT and maybe the PEBS buffers are the only implicit supervisor
accesses to vmalloc()'d memory that I can think of. But those are both
handled specially and shouldn't ever get zapped while in use. The LDT
replacement has its own IPIs separate from TLB flushing.
But I'm actually not all that worried about accesses while actually
running userspace. It's that "danger zone" in the kernel between entry
and when the TLB might have dangerous garbage in it.
So say we have kPTI, thus no vmalloc() mapped in CR3 when running
userspace, and do a full TLB flush right before switching to userspace -
could the TLB still end up with vmalloc()-range-related entries when we're
back in the kernel and going through the danger zone?
...
BTW, I hope this whole thing is turned off on 32-bit. There, we can
actually take and handle faults on the vmalloc() area. If you get one of
those faults in your "danger zone", it'll start running page fault code
which will branch out to god-knows-where and certainly isn't noinstr.
Sounds... Fun. Thanks for pointing out the landmines.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs