On Thu, Jun 20, 2024 at 01:30:29PM -0700, Sean Christopherson wrote:
I.e. except for blatant bugs, e.g. use-after-free, we need to be able to guarantee with 100% accuracy that there are no outstanding mappings when converting a page from shared=>private. Crossing our fingers and hoping that short-term GUP will have gone away isn't enough.
To be clear it is not crossing fingers. If the page refcount is 0 then there are no references to that memory anywhere at all. It is 100% certain.
It may take time to reach zero, but when it does it is safe.
Many things rely on this property, including FSDAX.
For non-CoCo VMs, I expect we'll want to be much more permissive, but I think they'll be a complete non-issue because there is no shared vs. private to worry about. We can simply allow any and all userspace mappings for guest_memfd that is attached to a "regular" VM, because a misbehaving userspace only loses whatever hardening (or other benefits) was being provided by using guest_memfd. I.e. the kernel and system at-large isn't at risk.
It does seem to me like guest_memfd should really focus on the private aspect.
If we need normal memfd enhancements of some kind to work better with KVM then that may be a better option than turning guest_memfd into memfd.
Jason