Re: [PATCH 6.1.y 6.6.y 0/3] mm/filemap: fix page cache corruption with large folios

23 Mar 2025


      On Sat, Mar 22, 2025 at 11:54 PM Linus Torvalds
torvalds@linux-foundation.org wrote:
...
On Sat, 22 Mar 2025 at 05:17, Yafang Shao laoar.shao@gmail.com wrote:
...
At this point, XFS large folios appear to be unreliable in the 6.1.y
stable kernel.
I suspect it's a bad idea to start using large folios on stable
kernels.
It seems that way. Since the 6.1.y stable branch continues to enable
XFS large folios after the page cache corruption issue was resolved,
we considered it safe to keep the feature enabled. As a result, we did
not revert the problematic commit after applying this patch series.
...
Even with the page cache corruption fix, 6.1 is old enough
that I don't know what other fixes have happened since.
It's not like the large folio code has been _hugely_ problematic, but
there has definitely been various small fixes related to it, and maybe
some of them have missed stable.
So I think stable should revert the "turn on large folios" in general.
I will send a revert of commit 6795801366da ('xfs: Support large
folios') to the 6.1.y stable.
...
That said:
...
We would appreciate any suggestions, such as adding debug messages to
the kernel source code, to help us diagnose the root cause.
I think the first thing to do - if you can - is to make sure that a
much more *current* kernel actually is ok.
Without a consistent reproducer it's going to be hard to really bisect
things, but the first step should be to make sure it's not some new
kind of issue that happens to be unique to what you do.
By "current" I don't necessarily mean "very latest" - 6.14 is going to
be released this weekend - but certainly something much more recent
than 6.1-stable.
Because while the stable trees obviously collect modern fixes, subtler
issues can easily fall through if people don't realize how important a
particular fix was. Sometimes the "obvious cleanup patches" end up
fixing things unintentionally just by making the code more
straightforward and correcting something in the process.
Without any real clues outside of "corruption", it's hard to even
guess whether it's core MM or VFS code, or some XFS-specific thing.
There has been large folio work in all three areas.
This issue is particularly challenging to diagnose because there are
no warnings in the kernel log, and the kernel continues to function
perfectly fine even after the application core dump occurs.
...
So I suspect unless somebody has something in mind, "bisect it" to at
least partially narrowing it down would be the only thing to do.
Bisecting to one particular commit obviously is the best scenario, but
even narrowing it down to "the issue still happens in 6.12, but is
gone in 6.13" kind of narrowing down might help give people more of a
place to start looking.
Thank you for your suggestion. I will give it a try, though it might
take some time since we haven’t yet found a reliable way to reproduce
the issue.
-- 
Regards
Yafang

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 6.1.y 6.6.y 0/3] mm/filemap: fix page cache corruption with large folios