On 17.12.21 21:36, Linus Torvalds wrote:
On Fri, Dec 17, 2021 at 12:18 PM David Hildenbrand david@redhat.com wrote:
On 17.12.21 20:22, Linus Torvalds wrote:
On Fri, Dec 17, 2021 at 11:04 AM Linus Torvalds torvalds@linux-foundation.org wrote:
get a "readonly" copy of a local private page using FAULT_FLAG_UNSHARE.
This just increments the page count, because mapcount == 1.
fork()
unmap in the original
child now has "mapcount == 1" on a page again, but refcount is
elevated, and child HAS TO COW before writing.
Hi Linus,
This is just GUP before fork(), which is in general problematic/incompatible with sharing.
Note that my example was not meant to be an example of a problem per se, but purely as an example of how meaningless 'mapcount' is, and how 'mapcount==1' isn't really a very meaningful test.
So it wasn't mean to show "look, GUP before fork is problematic". We have that problem already solved at least for regular pages.
It was purely meant to show how "mapcount==1" isn't a meaningful thing to test, and my worry about how you're adding that nonsensical test to the new code.
Let's just take a look at what refcount does *wrong*. Let's use an adjusted version of your example above, because it's a perfect fit:
- mem = mmap(pagesize, MAP_PRIVATE)
-> refcount == 1
memset(mem, 0, pagesize); /* Page is mapped R/W */
fork() /* Page gets mapped R/O */
-> refcount > 1
- child quits
-> refcount == 1
- Take a R/O pin (RDMA, VFIO, ...)
-> refcount > 1
- memset(mem, 0xff, pagesize);
-> Write fault -> COW
I do not believe this is actually a bug.
It's debatable if it's a BUG or not (I think it is one). It's for sure inconsistent.
You asked for a R/O pin, and you got one.
Then somebody else modified that page, and you got exactly what you asked for - a COW event. The original R/O pin has the original page that it asked for, and can read it just fine.
Where in the code did I ask for a COW event? I asked for a R/O pin, not any kind of memory protection.