Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

20 Jun 2024


      On Thu, Jun 20, 2024 at 08:53:07PM +0200, David Hildenbrand wrote:
...
On 20.06.24 18:36, Jason Gunthorpe wrote:
...
On Thu, Jun 20, 2024 at 04:45:08PM +0200, David Hildenbrand wrote:
...
If we could disallow pinning any shared pages, that would make life a lot
easier, but I think there were reasons for why we might require it. To
convert shared->private, simply unmap that folio (only the shared parts
could possibly be mapped) from all user page tables.
IMHO it should be reasonable to make it work like ZONE_MOVABLE and
FOLL_LONGTERM. Making a shared page private is really no different
from moving it.
And if you have built a VMM that uses VMA mapped shared pages and
short-term pinning then you should really also ensure that the VM is
aware when the pins go away. For instance if you are doing some virtio
thing with O_DIRECT pinning then the guest will know the pins are gone
when it observes virtio completions.
In this way making private is just like moving, we unmap the page and
then drive the refcount to zero, then move it.
Yes, but here is the catch: what if a single shared subpage of a large folio
is (validly) longterm pinned and you want to convert another shared subpage
to private?
When I wrote the above I was assuming option b was the choice.
...
a) Disallow long-term pinning. That means, we can, with a bit of wait,
   always convert subpages shared->private after unmapping them and
   waiting for the short-term pin to go away. Not too bad, and we
   already have other mechanisms disallow long-term pinnings (especially
   writable fs ones!).
This seems reasonable, but you are trading off a big hit to IO
performance while doing shared/private operations
...
b) Expose the large folio as multiple 4k folios to the core-mm.
And this trades off more VMM memory usage and micro-slower
copy_to/from_user. I think this is probably the better choice
IMHO the VMA does not need to map at a high granularity for these
cases. The IO path on these VM types is already disastrously slow,
optimizing with 1GB huge pages in the VMM to make copy_to/from_user
very slightly faster doesn't seem worthwhile.
...
b) would look as follows: we allocate a gigantic page from the (hugetlb)
reserve into guest_memfd. Then, we break it down into individual 4k folios
by splitting/demoting the folio. We make sure that all 4k folios are
unmovable (raised refcount). We keep tracking internally that these 4k
folios comprise a single large gigantic page.
Yes, something like this. Or maybe they get converted to ZONE_DEVICE
pages so that freeing them goes back to pgmap callback in the the
guest_memfd or something simple like that.
...
The downside is that we won't benefit from vmemmap optimizations for large
folios from hugetlb, and have more tracking overhead when mapping individual
pages into user page tables.
Yes, that too, but you are going to have some kind of per 4k tracking
overhead anyhow in guest_memfd no matter what you do. It would
probably be less than the struct pages though.
There is also the interesting option to use a PFNMAP VMA so there is
no refcounting and we don't need to mess with the struct pages. The
downside is that you totally lose GUP. So no O_DIRECT..
Jason

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning