Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

20 Jun 2024


      On Wed, Jun 19, 2024, Fuad Tabba wrote:
...
Hi Jason,
On Wed, Jun 19, 2024 at 12:51 PM Jason Gunthorpe jgg@nvidia.com wrote:
...
On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote:
...
To be honest, personally (speaking only for myself, not necessarily
for Elliot and not for anyone else in the pKVM team), I still would
prefer to use guest_memfd(). I think that having one solution for
confidential computing that rules them all would be best. But we do
need to be able to share memory in place, have a plan for supporting
huge pages in the near future, and migration in the not-too-distant
future.
I think using a FD to control this special lifetime stuff is
dramatically better than trying to force the MM to do it with struct
page hacks.
If you can't agree with the guest_memfd people on how to get there
then maybe you need a guest_memfd2 for this slightly different special
stuff instead of intruding on the core mm so much. (though that would
be sad)
We really need to be thinking more about containing these special
things and not just sprinkling them everywhere.
I agree that we need to agree :) This discussion has been going on
since before LPC last year, and the consensus from the guest_memfd()
folks (if I understood it correctly) is that guest_memfd() is what it
is: designed for a specific type of confidential computing, in the
style of TDX and CCA perhaps, and that it cannot (or will not) perform
the role of being a general solution for all confidential computing.
That isn't remotely accurate.  I have stated multiple times that I want guest_memfd
to be a vehicle for all VM types, i.e. not just CoCo VMs, and most definitely not
just TDX/SNP/CCA VMs.
What I am staunchly against is piling features onto guest_memfd that will cause
it to eventually become virtually indistinguishable from any other file-based
backing store.  I.e. while I want to make guest_memfd usable for all VM *types*,
making guest_memfd the preferred backing store for all *VMs* and use cases is
very much a non-goal.
From an earlier conversation[1]:
: In other words, ditch the complexity for features that are well served by existing
 : general purpose solutions, so that guest_memfd can take on a bit of complexity to
 : serve use cases that are unique to KVM guests, without becoming an unmaintainble
 : mess due to cross-products.
...
...
...
Also, since pin is already overloading the refcount, having the
exclusive pin there helps in ensuring atomic accesses and avoiding
races.
Yeah, but every time someone does this and then links it to a uAPI it
becomes utterly baked in concrete for the MM forever.
I agree. But if we can't modify guest_memfd() to fit our needs (pKVM,
Gunyah), then we don't really have that many other options.
What _are_ your needs?  There are multiple unanswered questions from our last
conversation[2].  And by "needs" I don't mean "what changes do you want to make
to guest_memfd?", I mean "what are the use cases, patterns, and scenarios that
you want to support?".
: What's "hypervisor-assisted page migration"?  More specifically, what's the
 : mechanism that drives it?
: Do you happen to have a list of exactly what you mean by "normal mm stuff"?  I
 : am not at all opposed to supporting .mmap(), because long term I also want to
 : use guest_memfd for non-CoCo VMs.  But I want to be very conservative with respect
 : to what is allowed for guest_memfd.   E.g. host userspace can map guest_memfd,
 : and do operations that are directly related to its mapping, but that's about it.
That distinction matters, because as I have stated in that thread, I am not
opposed to page migration itself:
: I am not opposed to page migration itself, what I am opposed to is adding deep
 : integration with core MM to do some of the fancy/complex things that lead to page
 : migration.
I am generally aware of the core pKVM use cases, but I AFAIK I haven't seen a
complete picture of everything you want to do, and _why_.
E.g. if one of your requirements is that guest memory is managed by core-mm the
same as all other memory in the system, then yeah, guest_memfd isn't for you.
Integrating guest_memfd deeply into core-mm simply isn't realistic, at least not
without *massive* changes to core-mm, as the whole point of guest_memfd is that
it is guest-first memory, i.e. it is NOT memory that is managed by core-mm (primary
MMU) and optionally mapped into KVM (secondary MMU).
Again from that thread, one of most important aspects guest_memfd is that VMAs
are not required.  Stating the obvious, lack of VMAs makes it really hard to drive
swap, reclaim, migration, etc. from code that fundamentally operates on VMAs.
: More broadly, no VMAs are required.  The lack of stage-1 page tables are nice to
 : have; the lack of VMAs means that guest_memfd isn't playing second fiddle, e.g.
 : it's not subject to VMA protections, isn't restricted to host mapping size, etc.
[1] https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com
[2] https://lore.kernel.org/all/Zg3xF7dTtx6hbmZj@google.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning