On 20/02/25 09:38, Dave Hansen wrote:
On 2/20/25 09:10, Valentin Schneider wrote:
The LDT and maybe the PEBS buffers are the only implicit supervisor accesses to vmalloc()'d memory that I can think of. But those are both handled specially and shouldn't ever get zapped while in use. The LDT replacement has its own IPIs separate from TLB flushing.
But I'm actually not all that worried about accesses while actually running userspace. It's that "danger zone" in the kernel between entry and when the TLB might have dangerous garbage in it.
So say we have kPTI, thus no vmalloc() mapped in CR3 when running userspace, and do a full TLB flush right before switching to userspace - could the TLB still end up with vmalloc()-range-related entries when we're back in the kernel and going through the danger zone?
Yes, because the danger zone includes the switch back to the kernel CR3 with vmalloc() fully mapped. All bets are off about what's in the TLB the moment that CR3 write occurs.
Actually, you could probably use that.
If a mapping is in the PTI user page table, you can't defer the flushes for it. Basically the same rule for text poking in the danger zone.
If there's a deferred flush pending, make sure that all of the SWITCH_TO_KERNEL_CR3's fully flush the TLB. You'd need something similar to user_pcid_flush_mask.
Right, that's what I (roughly) had in mind...
But, honestly, I'm still not sure this is worth all the trouble. If folks want to avoid IPIs for TLB flushes, there are hardware features that *DO* that. Just get new hardware instead of adding this complicated pile of software that we have to maintain forever. In 10 years, we'll still have this software *and* 95% of our hardware has the hardware feature too.
... But yeah, it pretty much circumvents arch_context_tracking_work, or at the very least adds an early(er) flushing of the context tracking work... Urgh.
Thank you for grounding my wild ideas into reality. I'll try to think some more see if I see any other way out (other than "buy hardware that does what you want and ditch the one that doesn't").