On Thu, Apr 1, 2021 at 4:47 PM Peter Xu peterx@redhat.com wrote:
Hi, Suren,
On Thu, Apr 01, 2021 at 12:43:51PM -0700, Suren Baghdasaryan wrote:
On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds torvalds@linux-foundation.org wrote:
On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan surenb@google.com wrote:
We received a report that the copy-on-write issue repored by Jann Horn in https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer coded in vmsplice.c).
Gaah.
I confirmed this and also that the issue was not reproducible with 5.10 kernel. I tracked the fix to the following patch introduced in 5.9 which changes the do_wp_page() logic:
09854ba94c6a 'mm: do_wp_page() simplification'
The problem here is that there's a _lot_ more patches than the few you found that fixed various other cases (THP etc).
I backported this patch (#2 in the series) along with 2 prerequisite patches (#1 and #4) that keep the backports clean and two followup fixes to the main patch (#3 and #5). I had to skip the following fix:
feb889fb40fa 'mm: don't put pinned pages into the swap cache'
because it uses page_maybe_dma_pinned() which does not exists in earlier kernels. Because pin_user_pages() does not exist there as well, I *think* we can safely skip this fix on older kernels, but I would appreciate if someone could confirm that claim.
Hmm. I think this means that swap activity can now break the connection to a GUP page (the whole pre-pinning model), but it probably isn't a new problem for 4.9/4.19.
I suspect the test there should be something like
/* Single mapper, more references than us and the map? */ if (page_mapcount(page) == 1 && page_count(page) > 2) goto keep_locked;
in the pre-pinning days.
But I really think that there are a number of other commits you're missing too, because we had a whole series for THP fixes for the same exact issue.
Added Peter Xu to the cc, because he probably tracked those issues better than I did.
So NAK on this for now, I think this limited patch-set likely introduces more problems than it fixes.
Thanks for confirming my worries. I'll be happy to add additional backports if Peter can point me to them.
If for a full-alignment with current upstream, I can at least think of below series:
Early cow for general pages: https://lore.kernel.org/lkml/20200925222600.6832-1-peterx@redhat.com/
A race fix for copy_page and gup-fast: https://lore.kernel.org/linux-mm/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia....
Early cow for hugetlbfs (which is very recently): https://lore.kernel.org/lkml/20210217233547.93892-1-peterx@redhat.com/
But I believe they'll bring a number of dependencies too like the page pinned work; so seems not easy.
Thanks Peter. Let me try backporting these and I'll see if it's doable.
Btw, AFAICT you don't need patch 4/5 in this series for 4.14/4.19, since those're only for uffd-wp and it doesn't exist until 5.7.
Got it. Will drop it from the next series. Thanks, Suren.
Thanks,
-- Peter Xu