On 22.12.21 15:42, Jan Kara wrote:
On Wed 22-12-21 14:09:41, David Hildenbrand wrote:
IIUC, our COW logic makes sure that a shared anonymous page that might still be used by a R/O FOLL_GET cannot be modified, because any attempt to modify it would result in a copy.
Well, we defined FOLL_PIN to mean the intent that the caller wants to access not only page state (for which is enough FOLL_GET and there are some users
- mostly inside mm - who need this) but also page data. Eventually, we even
wanted to make FOLL_GET unavailable to broad areas of kernel (and keep it internal to only MM for its dirty deeds ;)) to reduce the misuse of GUP.
For file pages we need this data vs no-data access distinction so that filesystems can detect when someone can be accessing page data although the page is unmapped. Practically, filesystems care most about when someone can be *modifying* page data (we need to make sure data is stable e.g. when writing back data to disk or doing data checksumming or other operations) so using FOLL_GET when wanting to only read page data should be OK for filesystems but honestly I would be reluctant to break the rule of "use FOLL_PIN when wanting to access page data" to keep things simple and reasonably easy to understand for parties such as filesystem developers or driver developers who all need to interact with pinned pages...
Right, from an API perspective we really want people to use FOLL_PIN.
To optimize this case in particular it would help if we would have the FOLL flags on the unpin path. Then we could just decide internally "well, short-term R/O FOLL_PIN can be really lightweight, we can treat this like a FOLL_GET instead". And we would need that as well if we were to keep different counters for R/O vs. R/W pinned.
Well, I guess the question here is: Which GUP user needs only R/O access to page data and is so performance critical that it would be worth it to sacrifice API clarity for speed? I'm not aware of any but I was not looking really hard...
I'd be interested in examples as well. Maybe databases that use O_DIRECT after fork()?