On Fri, Jun 21, 2024 at 07:32:40AM +0000, Quentin Perret wrote:
No, I'm interested in what pKVM is doing that needs this to be so much different than the CC case..
The underlying technology for implementing CC is obviously very different (MMU-based for pKVM, encryption-based for the others + some extra bits but let's keep it simple). In-place conversion is inherently painful with encryption-based schemes, so it's not a surprise the approach taken in these cases is built around destructive conversions as a core construct.
I'm not sure I fully agree with this. CC can do non-destructive too (though the proprietary secure worlds may choose not to implement it). Even implementations like ARM's CC are much closer to how pKVM works without encryption and just page table updates.
The only question that matters at all is how fast is the private->shared conversion. Is it fast enough that it can be used on the IO path instead of swiotlb?
TBH I'm willing to believe number's showing that pKVM is fast enough, but would like to see them before we consider major changes to the kernel :)
I'm not at all against starting with something simple and bouncing via swiotlb, that is totally fine. What is _not_ fine however would be to bake into the userspace API that conversions are not in-place and destructive (which in my mind equates to 'you can't mmap guest_memfd pages'). But I think that isn't really a point of disagreement these days, so hopefully we're aligned.
IMHO CC and pKVM should align here and provide a way for optional non-destructive private->shared conversion.
It's really only accesses via e.g. the linear map that are problematic, hence the exclusive GUP approach proposed in the series that tries to avoid that by construction.
I think as others have said, this is just too weird. Memory that is inaccessible and always faults the kernel doesn't make any sense. It shouldn't be mapped into VMAs.
If you really, really, want to do this then use your own FD and a PFN map. Copy to user will still work fine and you don't need to disrupt the mm.
Jason