On Thu, May 11, 2023 at 11:24 AM Axel Rasmussen axelrasmussen@google.com wrote:
So the basic way to use this new feature is:
- On the new host, the guest's memory is registered with userfaultfd, in either MISSING or MINOR mode (doesn't really matter for this purpose).
- On any first access, we get a userfaultfd event. At this point we can communicate with the old host to find out if the page was poisoned.
- If so, we can respond with a UFFDIO_SIGBUS - this places a swap marker so any future accesses will SIGBUS. Because the pte is now "present", future accesses won't generate more userfaultfd events, they'll just SIGBUS directly.
I want to clarify the SIGBUS mechanism here when KVM is involved, keeping in mind that we need to be able to inject an MCE into the guest for this to be useful.
1. vCPU gets an EPT violation --> KVM attempts GUP. 2. GUP finds a PTE_MARKER_UFFD_SIGBUS and returns VM_FAULT_SIGBUS. 3. KVM finds that GUP failed and returns -EFAULT.
This is different than if GUP found poison, in which case KVM will actually queue up a SIGBUS *containing the address of the fault*, and userspace can use it to inject an appropriate MCE into the guest. With UFFDIO_SIGBUS, we are missing the address!
I see three options: 1. Make KVM_RUN queue up a signal for any VM_FAULT_SIGBUS. I think this is pointless. 2. Don't have UFFDIO_SIGBUS install a PTE entry, but instead have a UFFDIO_WAKE_MODE_SIGBUS, where upon waking, we return VM_FAULT_SIGBUS instead of VM_FAULT_RETRY. We will keep getting userfaults on repeated accesses, just like how we get repeated signals for real poison. 3. Use this in conjunction with the additional KVM EFAULT info that Anish proposed (the first part of [1]).
I think option 3 is fine. :)
[1]: https://lore.kernel.org/kvm/20230412213510.1220557-1-amoorthy@google.com/
- James