On 21.12.21 19:30, Linus Torvalds wrote:
On Tue, Dec 21, 2021 at 10:07 AM Jan Kara jack@suse.cz wrote:
For record we always intended (and still intend) to make O_DIRECT use FOLL_PIN. Just it is tricky because some users mix pages pinned with GUP and pages acquired through get_page() in a single bio (such as zero page) and thus it is non-trivial to do the right thing on IO completion (unpin or just put_page).
Side note: the new "exclusive VM" bit wouldn't _solve_ this issue, but it might make it much easier to debug and catch.
If we only set the exclusive VM bit on pages that get mapped into user space, and we guarantee that GUP only looks up such pages, then we can also add a debug test to the "unpin" case that the bit is still set.
And that would catch anybody who ends up using other pages for unpin(), and you could have a WARN_ON() for it (obviously also trigger on the page count being too small to unpin).
It would also catch if someone would be wrongly dropping the exclusive flag although there are users (pin) relying on the page staying exclusive.
That way, at least from a kernel debugging and development standpoint it would make it easy to see "ok, this unpinning got a page that wasn't pinned"
For that purpose the pincount would already kind-off work. Not precise, but at least something ("this page cannot possibly have been pinned").