On 17.12.21 22:50, Linus Torvalds wrote:
On Fri, Dec 17, 2021 at 1:47 PM David Hildenbrand david@redhat.com wrote:
For now I have not heard a compelling argument why the mapcount is dubious, I repeat:
- mapcount can only increase due to fork()
- mapcount can decrease due to unmap / zap
We can protect from the transtition == 1 -> >1 using the mmap_lock.
For COW the mapcount is the only thing that matters *if we take GUP* out of the equation. And that's exactly what we
What do you have against just doing what we already do in other parts, that a/b thing?
Let me put it that way: I just want to get all of this fixed for good. I don't particularly care how. *But* I will fight for something that is superior, logically makes sense (at least to me :) ) and not super complicated. And I call also avoiding unnecessary COW "superior".
I do know that what this series proposes fixes the CVE: GUP after fork. I do know that the part 2 we'll be sending out next year will fix everything else we discovered so far, and it will rely on this as a basis, to not reintroduce any other COW issues we've seen so far.
If someone can propose something comparable that makes all discovered problems go away I'll be *extremely* happy. We have reproducers for all issues, so it's easy to verify, and I'm planning on extending the selftests to cover even more corner cases.
So far, I am not convinced that using the mapcount is dubious or problematic, I just don't see how. COW is an about sharing pages between processes, each expressed in the mapcount. It's a pure optimization for exactly that purpose.
GUP is the problem, not COW, not the mapcount. To me the mapcount is the only thing that makes sense in COW+unsharing logic, and GUP has to be taught to identify it and resolve it -> unshare when it detects a shared anaonymous page.
Which avoids the whole mmap_sem issue. That was a big issue for the rdma people, afaik.
While I do care about future use cases, I cannot possibly see fork() not requiring the mmap_lock in the foreseeable future. Just so much depends on it as of now.
And after all, fixing everything what we discovered so far is more important to me than something like that for the future. We have other problems to solve in that regard.
----------------------------------------------------------------------
I didn't want to talk about hugetlb here but I will just because it's a good example why using the refocunt is just wrong -- because unnecessary COW are just absolutely problematic.
Assume you have a R/O mapped huge page that is only mapped into your process and you get a write fault. What should your COW logic do?
a) Rely on the mapcount? Yes, iff GUP has been taught to unshare properly, because then it expresses exactly what we want to know. mapcount == 1 -> reuse. mapcount > 1 -> COW.
b) Rely on the refocunt? If we have a speculative refrence on the page we would COW. As huge pages are a scarce resource we can easily just not have a free huge page anymore and crash the application. The app didn't do anything wrong.
So teaching the hugetlb COW code to rely on the refount would just be highly fragile.