On Thu, Sep 28, 2023 at 10:15 AM David Hildenbrand david@redhat.com wrote:
On 27.09.23 20:25, Suren Baghdasaryan wrote:
I have some cleanups pending for page_move_anon_rmap(), that moves the SetPageAnonExclusive hunk out. Here we should be using page_move_anon_rmap() [or rather, folio_move_anon_rmap() after my cleanups]
I'll send them out soonish.
Should I keep this as is in my next version until you post the cleanups? I can add a TODO comment to convert it to folio_move_anon_rmap() once it's ready.
You should just be able to use page_move_anon_rmap() and whatever gets in first cleans it up :)
Ack.
WRITE_ONCE(src_folio->index, linear_page_index(dst_vma,
dst_addr)); >> +
orig_src_pte = ptep_clear_flush(src_vma, src_addr, src_pte);
orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
orig_dst_pte = maybe_mkwrite(pte_mkdirty(orig_dst_pte),
dst_vma);
I think there's still a theoretical issue here that you could fix by checking for the AnonExclusive flag, similar to the huge page case.
Consider the following scenario:
- process P1 does a write fault in a private anonymous VMA, creating
and mapping a new anonymous page A1 2. process P1 forks and creates two children P2 and P3. afterwards, A1 is mapped in P1, P2 and P3 as a COW page, with mapcount 3. 3. process P1 removes its mapping of A1, dropping its mapcount to 2. 4. process P2 uses vmsplice() to grab a reference to A1 with get_user_pages() 5. process P2 removes its mapping of A1, dropping its mapcount to 1.
If at this point P3 does a write fault on its mapping of A1, it will still trigger copy-on-write thanks to the AnonExclusive mechanism; and this is necessary to avoid P3 mapping A1 as writable and writing data into it that will become visible to P2, if P2 and P3 are in different security contexts.
But if P3 instead moves its mapping of A1 to another address with remap_anon_pte() which only does a page mapcount check, the maybe_mkwrite() will directly make the mapping writable, circumventing the AnonExclusive mechanism.
Yes, can_change_pte_writable() contains the exact logic when we can turn something easily writable even if it wasn't writable before. which includes that PageAnonExclusive is set. (but with uffd-wp or softdirty tracking, there is more to consider)
For uffd_remap can_change_pte_writable() would fail it VM_WRITE is not set, but we want remapping to work for RO memory as well. Are you
In a VMA without VM_WRITE you certainly wouldn't want to make PTEs writable :) That's why that function just does a sanity check that it is not called in strange context. So one would only call it if VM_WRITE is set.
saying that a PageAnonExclusive() check alone would not be enough here?
There are some interesting questions to ask here:
- What happens if the old VMA has VM_SOFTDIRTY set but the new one not?
You most probably have to mark the PTE softdirty and not make it writable.
- VM_UFFD_WP requires similar care I assume? Peter might know.
Let me look closer into these cases. I'll also double-check if we need to support uffd_remap for R/O vmas. I assumed we do but I actually never checked. Thanks!
-- Cheers,
David / dhildenb