On 14/01/25 19:16, Jann Horn wrote:
On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider vschneid@redhat.com wrote:
vunmap()'s issued from housekeeping CPUs are a relatively common source of interference for isolated NOHZ_FULL CPUs, as they are hit by the flush_tlb_kernel_range() IPIs.
Given that CPUs executing in userspace do not access data in the vmalloc range, these IPIs could be deferred until their next kernel entry.
Deferral vs early entry danger zone
This requires a guarantee that nothing in the vmalloc range can be vunmap'd and then accessed in early entry code.
In other words, it needs a guarantee that no vmalloc allocations that have been created in the vmalloc region while the CPU was idle can then be accessed during early entry, right?
I'm not sure if that would be a problem (not an mm expert, please do correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't deferred anyway.
So after vmapping something, I wouldn't expect isolated CPUs to have invalid TLB entries for the newly vmapped page.
However, upon vunmap'ing something, the TLB flush is deferred, and thus stale TLB entries can and will remain on isolated CPUs, up until they execute the deferred flush themselves (IOW for the entire duration of the "danger zone").
Does that make sense?