On 21.12.21 19:00, Linus Torvalds wrote:
On Tue, Dec 21, 2021 at 9:40 AM David Hildenbrand david@redhat.com wrote:
I do think the existing "maybe_pinned()" logic is fine for that. The "exclusive to this VM" bit can be used to *help* that decision - because only an exclusive page can be pinned - bit I don't think it should _replace_ that logic.
The issue is that O_DIRECT uses FOLL_GET and cannot easily be changed to FOLL_PIN unfortunately. So I'm *trying* to make it more generic such that such corner cases can be handled as well correctly. But yeah, I'll see where this goes ... O_DIRECT has to be fixed one way or the other.
John H. mentioned that he wants to look into converting that to FOLL_PIN. So maybe that will work eventually.
I'd really prefer that as the plan.
What exactly is the issue with O_DIRECT? Is it purely that it uses "put_page()" instead of "unpin", or what?
I really think that if people look up pages and expect those pages to stay coherent with the VM they looked it up for, they _have_ to actively tell the VM layer - which means using FOLL_PIN.
Note that this is in absolutely no way a "new" issue. It has *always* been true. If some O_DIORECT path depends on pinning behavior, it has never worked correctly, and it is entirely on O_DIRECT, and not at all a VM issue. We've had people doing GUP games forever, and being burnt by those games not working reliably.
GUP (before we even had the notion of pinning) would always just take a reference to the page, but it would not guarantee that that exact page then kept an association with the VM.
Now, in *practice* this all works if:
(a) the GUP user had always written to the page since the fork (either explicitly, or with FOLL_WRITE obviously acting as such)
(b) the GUP user never forks afterwards until the IO is done
(c) the GUP user plays no other VM games on that address
and it's also very possible that it has worked by pure luck (ie we've had a lot of random code that actively mis-used things and it would work in practice just because COW would happen to cut the right direction etc).
Is there some particular GUP user you happen to care about more than others? I think it's a valid option to try to fix things up one by one, even if you don't perhaps fix _all_ cases.
Yes, of course. The important part for me is to have a rough idea in how to tackle all pieces and have a reliable design/approach. Besides the security issue, highest priority is getting R/W pins (FOLL_WRITE) right, including O_DIRECT, because that can silently break existing use cases.
Lower priority is getting R/O pins on anonymous memory right, because that never worked reliably. Lowest priority is getting R/O pins on MAP_PRIVATE file memory right.
I'd appreciate if someone could work on the O_DIRECT FOLL_PIN conversion while I struggle with PageAnonExclusive() and R/W pins :)
[noting that I'll not get too much done within the next 2 weeks]