On 04.08.23 02:17, Yin, Fengwei wrote:
On 8/4/2023 7:38 AM, Yu Zhao wrote:
On Thu, Aug 3, 2023 at 5:27 PM Yin, Fengwei fengwei.yin@intel.com wrote:
On 8/4/2023 4:46 AM, Yu Zhao wrote:
On Wed, Aug 2, 2023 at 6:56 AM Yin, Fengwei fengwei.yin@intel.com wrote:
"
On 8/2/2023 8:49 PM, Ryan Roberts wrote:
On 02/08/2023 13:42, Yin, Fengwei wrote: > > > On 8/2/2023 8:40 PM, Ryan Roberts wrote: >> On 02/08/2023 13:35, Yin, Fengwei wrote: >>> >>> >>> On 8/2/2023 6:27 PM, Ryan Roberts wrote: >>>> On 28/07/2023 17:13, Yin Fengwei wrote: >>>>> In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(), >>>>> folio_mapcount() is used to check whether the folio is shared. But it's >>>>> not correct as folio_mapcount() returns total mapcount of large folio. >>>>> >>>>> Use folio_estimated_sharers() here as the estimated number is enough. >>>>> >>>>> Yin Fengwei (2): >>>>> madvise: don't use mapcount() against large folio for sharing check >>>>> madvise: don't use mapcount() against large folio for sharing check >>>>> >>>>> mm/huge_memory.c | 2 +- >>>>> mm/madvise.c | 6 +++--- >>>>> 2 files changed, 4 insertions(+), 4 deletions(-) >>>>> >>>> >>>> As a set of fixes, I agree this is definitely an improvement, so: >>>> >>>> Reviewed-By: Ryan Roberts >>> Thanks. >>> >>>> >>>> >>>> But I have a couple of comments around further improvements; >>>> >>>> Once we have the scheme that David is working on to be able to provide precise >>>> exclusive vs shared info, we will probably want to move to that. Although that >>>> scheme will need access to the mm_struct of a process known to be mapping the >>>> folio. We have that info, but its not passed to folio_estimated_sharers() so we >>>> can't just reimplement folio_estimated_sharers() - we will need to rework these >>>> call sites again. >>> Yes. This could be extra work. Maybe should delay till David's work is done. >> >> What you have is definitely an improvement over what was there before. And is >> probably the best we can do without David's scheme. So I wouldn't delay this. >> Just pointing out that we will be able to make it even better later on (if >> David's stuff goes in). > Yes. I agree that we should wait for David's work ready and do fix based on that.
I was suggesting the opposite - not waiting. Then we can do separate improvement later.
Let's wait for David's work ready.
Waiting is fine as long as we don't miss the next merge window -- we don't want these two bugs to get into another release. Also I think we should cc stable, since as David mentioned, they have been causing selftest failures.
Stable was CCed.
Need to add the "Cc: stable@vger.kernel.org" tag: Documentation/process/stable-kernel-rules.rst
OK. Thanks for clarification. I totally mis-understanded this. :).
I'd like to wait for answer from Andrew whether these patches are suitable for stable (I suppose you think so) branch.
Note that the COW test does not fail -- it skips -- but the behavir changed:
$ ./cow # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled TAP version 13 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page ok 1 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped out base page ok 2 No leak from parent into child # [RUN] Basic COW after fork() ... with THP ok 3 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped-out THP ok 4 No leak from parent into child # [RUN] Basic COW after fork() ... with PTE-mapped THP ok 5 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped-out, PTE-mapped THP ok 6 # SKIP MADV_PAGEOUT did not work, is swap enabled? # [RUN] Basic COW after fork() ... with single PTE of THP ok 7 No leak from parent into child # [RUN] Basic COW after fork() ... with single PTE of swapped-out THP ok 8 No leak from parent into child # [RUN] Basic COW after fork() ... with partially mremap()'ed THP ok 9 No leak from parent into child # [RUN] Basic COW after fork() ... with partially shared THP ok 10 No leak from parent into child ...
Observe how patch #6 skips because the MADV_PAGEOUT was not effective (which might have happened due to other reasons as well, thus no failure).
The code that broke it is
commit 07e8c82b5eff8ef34b74210eacb8d9c4a2886b82 Author: Vishal Moola (Oracle) vishal.moola@gmail.com Date: Wed Dec 21 10:08:46 2022 -0800
madvise: convert madvise_cold_or_pageout_pte_range() to use folios
This change removes a number of calls to compound_head(), and saves 1729 bytes of kernel text.
Link: https://lkml.kernel.org/r/20221221180848.20774-3-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) vishal.moola@gmail.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: SeongJae Park sj@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
Ever since v6.3.
The simplest way to fix it would be to revert the page_mapcount() -> folio_mapcount(), conversion.
Probably all that is information worth having in the patch description.