On Tue, Dec 21, 2021 at 04:19:33PM +0100, David Hildenbrand wrote:
Note that I am trying to make also any kind of R/O pins on an anonymous page work as expected as well, to fix any kind of GUP after fork() and GUP before fork(). So taking a R/O pin on an !PageAnonExclusive() page similarly has to make sure that the page is exclusive -- even if it's mapped R/O (!).
Why? AFAIK we don't have bugs here. If the page is RO and has an elevated refcount it cannot be 'PageAnonExclusive' and so any place that wants to drop the WP just cannot. What is the issue?
But what I think you actually mean is if we want to get R/O pins right.
What I ment was a page that is GUP'd RO, is not PageAnonExclusive and has an elevated refcount. Those cannot be transformed to PageAnonExclusive, or re-used during COW, but also they don't have problems today. Either places are like O_DIRECT read and are tolerant of a false COW, or they are broken like VFIO and should be using FOLL_FORCE|FOLL_WRITE, which turns them into a WRITE and then we know they get PageAnonExclusive.
So, the swap issue is fixed directly with PageAnonExclusive and no change to READ GUP is required, at least in your #1 scenario, AFAICT..
There are 2 models, leaving FOLL_FAULT_UNSHARE out of the picture for now:
- Whenever mapping an anonymous page R/W (after COW, during ordinary
fault, on swapin), we mark the page exclusive. We must never lose the PageAnonExclusive bit, not during migration, not during swapout.
I prefer this one as well.
It allows us to keep Linus's simple logic that refcount == 1 means always safe to re-use, no matter what.
And refcount != 1 goes on to consider the additional bit to decide what to do. The simple bit really means 'we know this page has one PTE so ignore the refcount for COW reuse decisions'.
fork() will process the bit for each and every process, even if there was no GUP, and will copy if there are additional references.
Yes, just like it does today already for mapcount.
- Whenever GUP wants to pin/ref a page, we try marking it exclusive. We
can lose the PageAnonExclusive bit during migration and swapout, because that can only happen when there are no additional references.
I haven't thought of a way this is achievable.
At least not without destroying GUP fast..
Idea #2 is really a "this page is GUP'd" flag with some sneaky logic to clear it. That comes along with all the races too because as an idea it is fundamentally about GUP which runs without locks.
Jason