On Mon, Aug 8, 2022 at 12:32 AM David Hildenbrand david@redhat.com wrote:
So I've read through the patch several times, and it seems fine, but this function (and the pmd version of it) just read oddly to me.
+static inline bool can_follow_write_pte(pte_t pte, struct page *page,
struct vm_area_struct *vma,
unsigned int flags)
+{
if (pte_write(pte))
return true;
if (!(flags & FOLL_FORCE))
return false;
/*
* See check_vma_flags(): only COW mappings need that special
* "force" handling when they lack VM_WRITE.
*/
if (vma->vm_flags & VM_WRITE)
return false;
VM_BUG_ON(!is_cow_mapping(vma->vm_flags));
So apart from the VM_BUG_ON(), this code just looks really strange - even despite the comment. Just conceptually, the whole "if it's writable, return that you cannot follow it for a write" just looks so very very strange.
That doesn't make the code _wrong_, but considering how many times this has had subtle bugs, let's not write code that looks strange.
So I would suggest that to protect against future bugs, we try to make it be fairly clear and straightforward, and maybe even a bit overly protective.
For example, let's kill the "shared mapping that you don't have write permissions to" very explicitly and without any subtle code at all. The vm_flags tests are cheap and easy, and we could very easily just add some core ones to make any mistakes much less critical.
Now, making that 'is_cow_mapping()' check explicit at the very top of this would already go a long way:
/* FOLL_FORCE for writability only affects COW mappings */ if (!is_cow_mapping(vma->vm_flags)) return false;
but I'd actually go even further: in this case that "is_cow_mapping()" helper to some degree actually hides what is going on.
So I'd actually prefer for that function to be written something like
/* If the pte is writable, we can write to the page */ if (pte_write(pte)) return true;
/* Maybe FOLL_FORCE is set to override it? */ if (flags & FOLL_FORCE) return false;
/* But FOLL_FORCE has no effect on shared mappings */ if (vma->vm_flags & MAP_SHARED) return false;
/* .. or read-only private ones */ if (!(vma->vm_flags & MAP_MAYWRITE)) return false;
/* .. or already writable ones that just need to take a write fault */ if (vma->vm_flags & MAP_WRITE) return false;
and the two first vm_flags tests above are basically doing tat "is_cow_mapping()", and maybe we could even have a comment to that effect, but wouldn't it be nice to just write it out that way?
And after you've written it out like the above, now that
if (!page || !PageAnon(page) || !PageAnonExclusive(page)) return false;
makes you pretty safe from a data sharing perspective: it's most definitely not a shared page at that point.
So if you write it that way, the only remaining issues are the magic special soft-dirty and uffd ones, but at that point it's purely about the semantics of those features, no longer about any possible "oh, we fooled some shared page to be writable".
And I think the above is fairly legible without any subtle cases, and the one-liner comments make it all fairly clear that it's testing.
Is any of this in any _technical_ way different from what your patch did? No. It's literally just rewriting it to be a bit more explicit in what it is doing, I think, and it makes that odd "it's not writable if VM_WRITE is set" case a bit more explicit.
Hmm?
Linus