On 17.12.21 23:18, Linus Torvalds wrote:
On Fri, Dec 17, 2021 at 1:47 PM David Hildenbrand david@redhat.com wrote:
For now I have not heard a compelling argument why the mapcount is dubious, I repeat:
- mapcount can only increase due to fork()
- mapcount can decrease due to unmap / zap
And to answer the "why is this dubious", let' sjust look at your actual code that I reacted to:
vmf->page = vm_normal_page(vmf->vma, vmf->address, vmf->orig_pte);
if (vmf->page && PageAnon(vmf->page) && !PageKsm(vmf->page) &&
page_mapcount(vmf->page) > 1) {
Note how you don't just check page_mapcount(). Why not? Because mapcount is completely immaterial if it's not a PageAnon page, so you test for that.
So even when you do the mapcount read as one atomic thing, it's one atomic thing that depends on _other_ things, and all these checks are not atomic.
But a PageAnon() page can actually become a swap-backed page, and as far as I can tell, your code doesn't have any locking to protect against that.
The pages stay PageAnon(). swap-backed pages simply set a bit IIRC. mapcount still applies.
So now you need not only the mmap_sem (to protect against fork), you also need the page lock (to protect against rmap changing the type of page).
No, I don't think so. But I'm happy to be proven wrong because I might just be missing something important.
I don't see you taking the page lock anywhere. Maybe the page table lock ends up serializing sufficiently with the rmap code that it ends up working
In the do_wp_page() path, we currently do those kinds of racy checks too, but then we do a trylock_page, and re-do them. And at any time there is any question about things, we fall back to copying - because a copy is always safe.
Yes, I studied that code in detail as well.
Well, it's always safe if we have the rule that "once we've pinned things, we don't cause them to be COW again".
We should also be handling FOLL_GET, but that's a completely different discussion.
But that "it's safe if" was exactly my (b) case.
That's why I much prefer the model I'm trying to push - it's conceptually quite simple. I can literally explain mine at a conceptual level with that "break pre-existing COW, make sure no future COW" model.
:)
We really might be talking about the same thing just that my point is that the mapcount is the right thing to use for making the discussion whether to break COW -> triger unsharing.
In contrast, I look at your page_mapcount() code, and I go "there is no conceptual rules here, and the actual implementation details look dodgy".
I personally like having clear conceptual rules - as opposed to random implementation details.
Oh, don't get me wrong, me to. But for me it just all makes perfect.
What we document is:
"The fault is an unsharing request to unshare a shared anonymous page (-> mapped R/O). Does not apply to KSM."
And the code checks for exactly that. And in that context the mapcount just expresses exactly what we want. Again, unless I am missing something important that you raise above.
Anyhow, it's late in Germany. thanks for the discussion Linus!