On Dec 17, 2021, at 7:05 PM, Jason Gunthorpe jgg@nvidia.com wrote:
On Fri, Dec 17, 2021 at 05:53:45PM -0800, Linus Torvalds wrote:
But honestly, at least for the second case, if somebody does a GUP, and then starts playing mprotect games on the same virtual memory area that they did a GUP on, and are surprised when they get another COW fault that breaks their own connection with a page they did a GUP on earlier, that's their own fault.
I've been told there are real workloads that do this.
Something like qemu will use GUP with VFIO to insert PCI devices into the guest and GUP with RDMA to do fast network copy of VM memory during VM migration.
qemu also uses the WP games to implement dirty tracking of VM memory during migration (and more? I'm not sure). It expects that during all of this nothing will COW the pages, as the two kinds of DMA must always go to the pages mapped to KVM.
The big trouble here is this all worked before, so it is a userspace visible regression.
In such a case, I do think it makes sense to fail uffd-wp (when page_count() > 1), and in a prototype I am working on I do something like that. Otherwise, if the page is written and you use uffd for dirty tracking, what do you actually achieve?
You can return EAGAIN (which is documented and actually returned while “mmap_changing”) in such case. This would not break userspace, but indeed still likely to cause a performance regression.