On 28.09.23 19:21, Peter Xu wrote:
On Thu, Sep 28, 2023 at 07:05:40PM +0200, David Hildenbrand wrote:
As described as reply to v1, without fork() and KSM, the PAE bit should stick around. If that's not the case, we should investigate why.
If we ever support the post-fork case (which the comment above remap_pages() excludes) we'll need good motivation why we'd want to make this overly-complicated feature even more complicated.
The problem is DONTFORK is only a suggestion, but not yet restricted. If someone reaches on top of some !PAE page on src it'll never gonna proceed and keep failing, iiuc.
Yes. It won't work if you fork() and not use DONTFORK on the src VMA. We should document that as a limitation.
For example, you could return an error to the user that can just call UFFDIO_COPY. (or to the UFFDIO_COPY from inside uffd code, but that's probably ugly as well).
do_wp_page() doesn't have that issue of accuracy only because one round of CoW will just allocate a new page with PAE set guaranteed, which is pretty much self-heal and unnoticed.
Yes. But it might have to copy, at which point the whole optimization of remap is gone :)
So it'll be great if we can have similar self-heal way for PAE. If not, I think it's still fine we just always fail on !PAE src pages, but then maybe we should let the user know what's wrong, e.g., the user can just forgot to apply DONTFORK then forked. And then the user hits error and don't know what happened. Probably at least we should document it well in man pages.
Yes, exactly.
Another option can be we keep using folio_mapcount() for pte, and another helper (perhaps: _nr_pages_mapped==COMPOUND_MAPPED && _entire_mapcount==1) for thp. But I know that's not ideal either.
As long as we only set the pte writable if PAE is set, we're good from a CVE perspective. The other part is just simplicity of avoiding all these mapcount+swapcount games where possible.
(one day folio_mapcount() might be faster -- I'm still working on that patch in the bigger picture of handling PTE-mapped THP better)