Hi David,
On 8/4/2023 3:31 PM, David Hildenbrand wrote:
On 04.08.23 02:17, Yin, Fengwei wrote:
On 8/4/2023 7:38 AM, Yu Zhao wrote:
On Thu, Aug 3, 2023 at 5:27 PM Yin, Fengwei fengwei.yin@intel.com wrote:
On 8/4/2023 4:46 AM, Yu Zhao wrote:
On Wed, Aug 2, 2023 at 6:56 AM Yin, Fengwei fengwei.yin@intel.com wrote:
"
On 8/2/2023 8:49 PM, Ryan Roberts wrote: > On 02/08/2023 13:42, Yin, Fengwei wrote: >> >> >> On 8/2/2023 8:40 PM, Ryan Roberts wrote: >>> On 02/08/2023 13:35, Yin, Fengwei wrote: >>>> >>>> >>>> On 8/2/2023 6:27 PM, Ryan Roberts wrote: >>>>> On 28/07/2023 17:13, Yin Fengwei wrote: >>>>>> In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), >>>>>> folio_mapcount() is used to check whether the folio is shared. But it's >>>>>> not correct as folio_mapcount() returns total mapcount of large folio. >>>>>> >>>>>> Use folio_estimated_sharers() here as the estimated number is enough. >>>>>> >>>>>> Yin Fengwei (2): >>>>>> madvise: don't use mapcount() against large folio for sharing check >>>>>> madvise: don't use mapcount() against large folio for sharing check >>>>>> >>>>>> mm/huge_memory.c | 2 +- >>>>>> mm/madvise.c | 6 +++--- >>>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>>> >>>>> >>>>> As a set of fixes, I agree this is definitely an improvement, so: >>>>> >>>>> Reviewed-By: Ryan Roberts >>>> Thanks. >>>> >>>>> >>>>> >>>>> But I have a couple of comments around further improvements; >>>>> >>>>> Once we have the scheme that David is working on to be able to provide precise >>>>> exclusive vs shared info, we will probably want to move to that. Although that >>>>> scheme will need access to the mm_struct of a process known to be mapping the >>>>> folio. We have that info, but its not passed to folio_estimated_sharers() so we >>>>> can't just reimplement folio_estimated_sharers() - we will need to rework these >>>>> call sites again. >>>> Yes. This could be extra work. Maybe should delay till David's work is done. >>> >>> What you have is definitely an improvement over what was there before. And is >>> probably the best we can do without David's scheme. So I wouldn't delay this. >>> Just pointing out that we will be able to make it even better later on (if >>> David's stuff goes in). >> Yes. I agree that we should wait for David's work ready and do fix based on that. > > I was suggesting the opposite - not waiting. Then we can do separate improvement > later. Let's wait for David's work ready.
Waiting is fine as long as we don't miss the next merge window -- we don't want these two bugs to get into another release. Also I think we should cc stable, since as David mentioned, they have been causing selftest failures.
Stable was CCed.
Need to add the "Cc: stable@vger.kernel.org" tag: Documentation/process/stable-kernel-rules.rst
OK. Thanks for clarification. I totally mis-understanded this. :).
I'd like to wait for answer from Andrew whether these patches are suitable for stable (I suppose you think so) branch.
Note that the COW test does not fail -- it skips -- but the behavir changed:
$ ./cow # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled TAP version 13 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page ok 1 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped out base page ok 2 No leak from parent into child # [RUN] Basic COW after fork() ... with THP ok 3 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped-out THP ok 4 No leak from parent into child # [RUN] Basic COW after fork() ... with PTE-mapped THP ok 5 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP ok 6 # SKIP MADV_PAGEOUT did not work, is swap enabled? # [RUN] Basic COW after fork() ... with single PTE of THP ok 7 No leak from parent into child # [RUN] Basic COW after fork() ... with single PTE of swapped-out THP ok 8 No leak from parent into child # [RUN] Basic COW after fork() ... with partially mremap()'ed THP ok 9 No leak from parent into child # [RUN] Basic COW after fork() ... with partially shared THP ok 10 No leak from parent into child ...
Observe how patch #6 skips because the MADV_PAGEOUT was not effective (which might have happened due to other reasons as well, thus no failure).
The code that broke it is
commit 07e8c82b5eff8ef34b74210eacb8d9c4a2886b82 Author: Vishal Moola (Oracle) vishal.moola@gmail.com Date: Wed Dec 21 10:08:46 2022 -0800
madvise: convert madvise_cold_or_pageout_pte_range() to use folios This change removes a number of calls to compound_head(), and saves 1729 bytes of kernel text. Link: https://lkml.kernel.org/r/20221221180848.20774-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: SeongJae Park sj@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
Ever since v6.3.
The simplest way to fix it would be to revert the page_mapcount() -> folio_mapcount(), conversion.
Probably all that is information worth having in the patch description.
Thanks a lot for checking this. I will try this patchset to see whether it can restore the behavior (I suppose so from your broken commit info but want to confirm).
Regards Yin, Fengwei