On Tue, Aug 09, 2022 at 11:40:50AM -0700, Linus Torvalds wrote:
On Mon, Aug 8, 2022 at 12:32 AM David Hildenbrand david@redhat.com wrote:
For example, a write() via /proc/self/mem to a uffd-wp-protected range has to fail instead of silently granting write access and bypassing the userspace fault handler. Note that FOLL_FORCE is not only used for debug access, but also triggered by applications without debug intentions, for example, when pinning pages via RDMA.
So this made me go "Whaa?"
I didn't even realize that the media drivers and rdma used FOLL_FORCE.
That's just completely bogus.
Why do they do that?
It seems to be completely bogus, and seems to have no actual valid reason for it. Looking through the history, it goes back to the original code submission in 2006, and doesn't have a mention of why.
It is because of all this madness with COW.
Lets say an app does:
buf = mmap(MAP_PRIVATE) rdma_pin_for_read(buf) buf[0] = 1
Then the store to buf[0] will COW the page and the pin will become decoherent.
The purpose of the FORCE is to force COW to happen early so this is avoided.
Sadly there are real apps that end up working this way, eg because they are using buffer in .data or something.
I've hoped David's new work on this provided some kind of path to a proper solution, as the need is very similar to all the other places where we need to ensure there is no possiblity of future COW.
So, these usage can be interpreted as a FOLL flag we don't have - some kind of (FOLL_EXCLUSIVE | FOLL_READ) to match the PG_anon_exclusive sort of idea.
Jason