On Wed, Aug 17, 2022 at 03:41:16PM +1000, Alistair Popple wrote:
My primary concern with batching is ensuring a CPU write after clearing a clean PTE but before flushing the TLB does the "right thing" (ie. faults if the PTE is not present).
Fair enough. Exactly I have that same concern. But I think Nadav replied very recently on this in the previous thread, quotting from him [1]:
I keep not remembering this erratum correctly. IIRC, the erratum says that the access/dirty might be set, but it does not mean that a write is possible after the PTE is cleared (i.e., the dirty/access might be set on the non-present PTE, but the access itself would fail). So it is not an issue in this case - losing A/D would not impact correctness since the access should fail.
I don't really know whether he means this, but I really think the hardware should behave like that or otherwise I can't see how it can go right.
Let's assume if after pte cleared the page can still be written, then afaict ptep_clear_flush() is not safe either, because fundamentally it is two operations happening in sequence, of: (1) ptep_get_and_clear(), and (2) conditionally do flush_tlb_page() when needed.
If page can be written with TLB cached but without pte present, what if some process writes to memory during step (1) and (2)? AFAIU that's the same question as using raw ptep_get_and_clear() and a batched tlb flush.
IOW, I don't see how a tlb batched solution can be worse than using per-pte ptep_clear_flush(). It may enlarge the race window but fundamentally (iiuc) they're the same thing here as long as there's no atomic way to both "clear pte and flush tlb".
[1] https://lore.kernel.org/lkml/E37036E0-566E-40C7-AD15-720CDB003227@gmail.com/