On Tue, Nov 11, 2025 at 5:56 PM Huang, Ying ying.huang@linux.alibaba.com wrote:
Kairui Song ryncsn@gmail.com writes:
From: Kairui Song kasong@tencent.com
Since commit 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning"), the common helper for allocating and preparing a folio in the swap cache layer no longer tries to get a swap device reference internally, because all callers of __read_swap_cache_async are already holding a swap entry reference. The repeated swap device pinning isn't needed on the same swap device.
Caller of VMA readahead is also holding a reference to the target entry's swap device, but VMA readahead walks the page table, so it might encounter swap entries from other devices, and call __read_swap_cache_async on another device without holding a reference to it.
So it is possible to cause a UAF when swapoff of device A raced with swapin on device B, and VMA readahead tries to read swap entries from device A. It's not easy to trigger, but in theory, it could cause real issues.
Make VMA readahead try to get the device reference first if the swap device is a different one from the target entry.
Cc: stable@vger.kernel.org Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning") Suggested-by: Huang Ying ying.huang@linux.alibaba.com Signed-off-by: Kairui Song kasong@tencent.com
Sending as a new patch instead of V2 because the approach is very different.
Previous patch: https://lore.kernel.org/linux-mm/20251110-revert-78524b05f1a3-v1-1-88313f2b9...
mm/swap_state.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/mm/swap_state.c b/mm/swap_state.c index 0cf9853a9232..da0481e163a4 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -745,6 +745,7 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
blk_start_plug(&plug); for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
struct swap_info_struct *si = NULL; softleaf_t entry; if (!pte++) {@@ -759,8 +760,19 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, continue; pte_unmap(pte); pte = NULL;
/** Readahead entry may come from a device that we are not* holding a reference to, try to grab a reference, or skip.*/if (swp_type(entry) != swp_type(targ_entry)) {si = get_swap_device(entry);if (!si)continue;} folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false);if (si)put_swap_device(si); if (!folio) continue; if (page_allocated) {Personally, I prefer to call put_swap_device() after all swap operations on the swap entry, that is, after possible swap_read_folio() and folio_put() in the loop to make it easier to follow the get/put_swap_device() rule. But I understand that it will make
if (!folio) continue;
to use 'goto' and introduce more change. So, it's up to you to decide whether to do that.
Personally I prefer it to keep the put_swap_device() in the current location, closer to the matching get_swap_device(). To me that is simpler, I don't need to reason about other branch out conditions. Those error handling branch conditions are very error prone, I have made enough mistakes on those goto branch handling in my past experience. The si reference is only needed for the __read_swap_cache_async() anyway.
To it to the end also works, just take more brain power to reason it.
Otherwise, LGTM, Thanks for doing this! Feel free to add my
Reviewed-by: Huang Ying ying.huang@linux.alibaba.com
Thank you for the review.
Chris