On Tue, Feb 13, 2024 at 02:18:10PM +0530, Charan Teja Kalla wrote:
An anon THP page is first added to swap cache before reclaiming it. Initially, each tail page contains the proper swap entry value(stored in ->private field) which is filled from add_to_swap_cache(). After migrating the THP page sitting on the swap cache, only the swap entry of the head page is filled(see folio_migrate_mapping()).
Now when this page is tried to split(one case is when this page is again migrated, see migrate_pages()->try_split_thp()), the tail pages ->private is not stored with proper swap entry values. When this tail page is now try to be freed, as part of it delete_from_swap_cache() is called which operates on the wrong swap cache index and eventually replaces the wrong swap cache index with shadow/NULL value, frees the page.
This leads to the state with a swap cache containing the freed page. This issue can manifest in many forms and the most common thing observed is the rcu stall during the swapin (see mapping_get_entry()).
On the recent kernels, this issues is indirectly getting fixed with the series[1], to be specific[2].
Then why can we not take that series? Taking one-off patches almost ALWAYS causes future problems, what are you going to do to prevent that here (merge and logic problems).
When tried to back port this series, it is observed many merge conflicts and also seems dependent on many other changes. As backporting to LTS branches is not a trivial one, the similar change from [2] is picked as a fix.
[1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/ [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/
Again, please try to take the original series, ESPECIALLY for stuff in -mm which is tricky and likely to blow up in odd ways in the future.
So I will not take this unless the -mm maintainers agree it really is the only way forward.
thanks,
greg k-h