Re: [PATCH v2] mm/swap: fix race when skipping swapcache

19 Feb 2024


      Kairui Song ryncsn@gmail.com writes:
...
On Thu, Feb 8, 2024 at 2:36 PM Huang, Ying ying.huang@intel.com wrote:
...
Kairui Song ryncsn@gmail.com writes:
...
On Thu, Feb 8, 2024 at 2:31 AM Minchan Kim minchan@kernel.org wrote:
...
On Wed, Feb 07, 2024 at 12:06:15PM +0800, Kairui Song wrote:
[snip]
...
...
...
So I think the thing is, it's getting complex because this patch
wanted to make it simple and just reuse the swap cache flags.
I agree that a simple fix would be the important at this point.
Considering your description, here's my understanding of the other idea:
Other method, such as increasing the swap count, haven't proven effective
in your tests. The approach risk forcing racers to rely on the swap cache
again and the potential performance loss in race scenario.
While I understand that simplicity is important, and performance loss
in this case may be infrequent, I believe swap_count approach could be a
suitable solution. What do you think?
Hi Minchan
Yes, my main concern was about simplicity and performance.
Increasing swap_count here will also race with another process from
releasing swap_count to 0 (swapcache was able to sync callers in other
call paths but we skipped swapcache here).
What is the consequence of the race condition?
Hi Ying,
It will increase the swap count of an already freed entry, this race
with multiple swap free/alloc logic that checks if count ==
SWAP_HAS_CACHE or sets count to zero, or repeated free of an entry,
all result in random corruption of the swap map. This happens a lot
during stress testing.
You are right!  Thanks for explanation.
--
Best Regards,
Huang, Ying
...
...
...
So the right step is: 1. Lock the cluster/swap lock; 2. Check if still
have swap_count == 1, bail out if not; 3. Set it to 2;
__swap_duplicate can be modified to support this, it's similar to
existing logics for SWAP_HAS_CACHE.
And swap freeing path will do more things, swapcache clean up needs to
be handled even in the bypassing path since the racer may add it to
swapcache.
Reusing SWAP_HAS_CACHE seems to make it much simpler and avoided many
overhead, so I used that way in this patch, the only issue is
potentially repeated page faults now.
I'm currently trying to add a SWAP_MAP_LOCK (or SWAP_MAP_SYNC, I'm bad
at naming it) special value, so any racer can just spin on it to avoid
all the problems, how do you think about this?
Let's try some simpler method firstly.
Another simpler idea is, add a schedule() or
schedule_timeout_uninterruptible(1) in the swapcache_prepare failure
path before goto out (just like __read_swap_cache_async). I think this
should ensure in almost all cases, PTE is ready after it returns, also
yields more CPU.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] mm/swap: fix race when skipping swapcache