On Feb 20, 2022, at 10:23 PM, Peter Xu peterx@redhat.com wrote:
On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote:
On Feb 17, 2022, at 6:23 PM, Nadav Amit nadav.amit@gmail.com wrote:
PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || (vma->vm_flags & VM_SOFTDIRTY))) { ptent = pte_mkwrite(ptent); }
I know it is off-topic (not directly related to my patch), but I tried to understand the logic - both of the existing code and of your suggested change - and I failed.
IIUC dirty_accountable (whose value is taken from vma_wants_writenotify()) means that the writes *should* be tracked, and therefore the page should remain read-only.
Right.
So basically the condition should have been based on !dirty_accountable, i.e. the inverted value of dirty_accountable.
The problem is that dirty_accountable also reflects VM_SOFTDIRTY considerations, so looking on the PTE does not tell you whether the PTE should remain write-protected even if it is dirty.
My understanding is that the dirty bits (especially if both set) means we've tracked dirty on this pte already so we don't need to, hence we can set the dirty bit here. E.g., continuous mprotect(RO), mprotect(RW) upon a full dirty pte.
When something wants to enable tracking again, it needs to clear the dirty bit, either the real one or soft-dirty one. So it's a pure performance enhancement to conditionally set write bit here, when we're sure we won't need any further tracking on this pte.
One thing to mention is that this path only applies to VM_SHARED|VM_WRITE, because that's what checked the first in vma_wants_writenotify():
/* If it was private or non-writable, the write bit is already clear */ if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) return 0;
IOW private mappings are not optimized in current tree yet.
Peter Collingbourne proposed a patch some time ago to optimize it but it didn't get merged somehow. Meanwhile even with his latest version it should still miss the thp case, so if to reference the private optimization Andrea's tree would be the best:
https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6
Hope it clarifies things a bit. Thanks,
Thanks for the clarification. That’s what I suspected - I did not encounter it since I only used private anonymous mappings. I will try to create a test-case and send an additional fix for this issue.
Regards, Nadav