On 17.12.21 22:36, Linus Torvalds wrote:
On Fri, Dec 17, 2021 at 12:55 PM David Hildenbrand david@redhat.com wrote:
If we have a shared anonymous page we cannot have GUP references, not even R/O ones. Because GUP would have unshared and copied the page, resulting in a R/O mapped anonymous page.
Doing a GUP on an actual shared page is wrong to begin with.
You even know that, you try to use "page_mapcount() > 1" to disallow it.
GUP is incomaptible with shared anonymous pages, therefore it has to trigger unsharing, correct.
My point is that it's wrong regardless, and that "mapcount" is dubious, and that COW cannot - and must not - use mapcount, and that I think your shared case should strive to avoid it for the exact same reason.
For now I have not heard a compelling argument why the mapcount is dubious, I repeat:
* mapcount can only increase due to fork() * mapcount can decrease due to unmap / zap
We can protect from the transtition == 1 -> >1 using the mmap_lock.
For COW the mapcount is the only thing that matters *if we take GUP* out of the equation. And that's exactly what we
OTOH, take a look which issues resulted from the page_count changes. That's what I call dubious, sorry to say.
So, what I think should happen is:
(a) GUP makes sure that it only ever looks up pages that can be shared with this VM. This may in involve breaking COW early with any past fork().
Is that unsharing as we propose it?
(b) it marks such pages so that any future work will not cause them to COW either
Right, exactly. GUP before fork does not result in a page getting shared again.
Note that (a) is not necessarily "always COW and have to allocate and copy new page". In particular, if the page is already writable, you know you already have exclusive access to it and don't need to COW.
And if it isn't writable, then the other common case is "the cow has only one user, and it's us" - that's the "refcount == 1" case.
And (b) is what we do with that page_maybe_dma_pinned() logic for fork(), but also for things like swap cache creation (eg see commit feb889fb40fa: "mm: don't put pinned pages into the swap cache").
I fully agree with b). GUP before fork is a totally different set of problems than GUP after fork.
Note that this code all already exists, and already works - even without getting the (very expensive) mmap_sem. So it works with fast-GUP and it can race with concurrent forking by another thread, which is why we also have that seqcount thing.
I know, I studied it intensively :)
As far as I can tell, your "mapcount" logic fundamentally requires mmap_sem for the fork() race avoidance, for example.
Yes. Or any other more lightweight synchronization in the future. For now this is just perfect.
So this is why I don't like the mapcount games - I think they are very fragile, and not at all as logical as the two simple rules a/b above.
I don't really see anything fragile, really. I'm happy to learn as always.
I believe you can make mapcount games _work_ - we used to have something like that. It was incredibly fragile, and it had its own set of bugs, but with enough care it's doable.
We made it work, and it was comparatively simple.