On 19/02/25 12:25, Dave Hansen wrote:
On 2/19/25 07:13, Valentin Schneider wrote:
Maybe I missed part of the discussion though. Is VMEMMAP your only concern? I would have guessed that the more generic vmalloc() functionality would be harder to pin down.
Urgh, that'll teach me to send emails that late - I did indeed mean the vmalloc() range, not at all VMEMMAP. IIUC *neither* are present in the user kPTI page table and AFAICT the page table swap is done before the actual vmap'd stack (CONFIG_VMAP_STACK=y) gets used.
OK, so rewriting your question... ;)
So what if the vmalloc() range *isn't* in the CR3 tree when a CPU is executing in userspace?
The LDT and maybe the PEBS buffers are the only implicit supervisor accesses to vmalloc()'d memory that I can think of. But those are both handled specially and shouldn't ever get zapped while in use. The LDT replacement has its own IPIs separate from TLB flushing.
But I'm actually not all that worried about accesses while actually running userspace. It's that "danger zone" in the kernel between entry and when the TLB might have dangerous garbage in it.
So say we have kPTI, thus no vmalloc() mapped in CR3 when running userspace, and do a full TLB flush right before switching to userspace - could the TLB still end up with vmalloc()-range-related entries when we're back in the kernel and going through the danger zone?
BTW, I hope this whole thing is turned off on 32-bit. There, we can actually take and handle faults on the vmalloc() area. If you get one of those faults in your "danger zone", it'll start running page fault code which will branch out to god-knows-where and certainly isn't noinstr.
Sounds... Fun. Thanks for pointing out the landmines.