On Thu, Jun 20, 2024 at 04:45:08PM +0200, David Hildenbrand wrote:
If we could disallow pinning any shared pages, that would make life a lot easier, but I think there were reasons for why we might require it. To convert shared->private, simply unmap that folio (only the shared parts could possibly be mapped) from all user page tables.
IMHO it should be reasonable to make it work like ZONE_MOVABLE and FOLL_LONGTERM. Making a shared page private is really no different from moving it.
And if you have built a VMM that uses VMA mapped shared pages and short-term pinning then you should really also ensure that the VM is aware when the pins go away. For instance if you are doing some virtio thing with O_DIRECT pinning then the guest will know the pins are gone when it observes virtio completions.
In this way making private is just like moving, we unmap the page and then drive the refcount to zero, then move it.
I'm kind of surprised the CC folks don't want the same thing for exactly the same reason. It is much easier to recover the huge mappings for the S2 in the presence of shared holes if you track it this way. Even CC will have this problem, to some degree, too.
Precisely! RH (and therefore, me) is primarily interested in existing guest_memfd users at this point ("CC"), and I don't see an easy way to get that running with huge pages in the existing model reasonably well ...
IMHO it is an important topic so I'm glad you are thinking about it.
There is definately some overlap here where if you do teach guest_memfd about huge pages then you must also provide a away to map the fragments of them that have become shared. I think there is little option here unless you double allocate and/or destroy the performance properties of the huge pages.
It is just the nature of our system that shared pages must be in VMAs and must be copy_to/from_user/GUP'able/etc.
Jason