On 09.08.22 20:48, Linus Torvalds wrote:
On Mon, Aug 8, 2022 at 12:32 AM David Hildenbrand david@redhat.com wrote:
For example, a write() via /proc/self/mem to a uffd-wp-protected range has to fail instead of silently granting write access and bypassing the userspace fault handler.
This, btw, just makes me go "uffd-wp is broken garbage" once more.
It also makes me go "if uffd-wp can disallow ptrace writes, then why doesn't regular write protect do it"?
I remember that it's not just uffd-wp, it's also ordinary userfaultfd if we have no page mapped, because we'd have to drop the mmap lock in order to notify user space about the fault and wait for a resolution.
IIUC, we cannot tell what exactly user-space will do as a response to a user write fault here (for example, QEMU VM snapshots have to copy page content away such that the VM snapshot remains consistent and we won't corrupt the snapshot), so we have to back off and fail the GUP. I'd say, for ptrace that's even the right thing to do because one might deadlock waiting on the user space thread that handles faults ... but that's a little off-topic to this fix here. I'm just trying to keep the semantics unchanged, as weird as they might be.
IOW, I don't think the patch is wrong (apart from the VM_BUG_ON's that absolutely must go away), but I get the strong feelign that we instead should try to get rid of FOLL_FORCE entirely.
I can resend v2 soonish, taking care of the VM_BUG_ON as you requested if there are no other comments.
If some other user action can stop FOLL_FORCE anyway, then why do we support it at all?
My humble opinion is that debugging userfaultfd-managed memory is a corner case and that we can hopefully stop using FOLL_FORCE outside of debugging context soon.
Having that said, I do enjoy having the uffd and uffd-wp feature available in user space (especially in QEMU). I don't always enjoy having to handle such corner cases in the kernel.