From: Miaohe Lin linmiaohe@huawei.com
[ Upstream commit 2799e77529c2a25492a4395db93996e3dacd762d ]
When I was investigating the swap code, I found the below possible race window:
CPU 1 CPU 2 ----- ----- do_swap_page if (data_race(si->flags & SWP_SYNCHRONOUS_IO) swap_readpage if (data_race(sis->flags & SWP_FS_OPS)) { swapoff .. p->swap_file = NULL; .. struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping;[oops!]
Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked.
Fix this race by using get/put_swap_device() to guard against concurrent swapoff.
Link: https://lkml.kernel.org/r/20210426123316.806267-3-linmiaohe@huawei.com Fixes: 0bcac06f27d7 ("mm,swap: skip swapcache for swapin of synchronous device") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Alex Shi alexs@kernel.org Cc: David Hildenbrand david@redhat.com Cc: Dennis Zhou dennis@kernel.org Cc: Hugh Dickins hughd@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Minchan Kim minchan@kernel.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: Yu Zhao yuzhao@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/swap.h | 9 +++++++++ mm/memory.c | 11 +++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index 4cc6ec3bf0ab..7482f8b968ea 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -504,6 +504,15 @@ static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) return NULL; }
+static inline struct swap_info_struct *get_swap_device(swp_entry_t entry) +{ + return NULL; +} + +static inline void put_swap_device(struct swap_info_struct *si) +{ +} + #define swap_address_space(entry) (NULL) #define get_nr_swap_pages() 0L #define total_swap_pages 0L diff --git a/mm/memory.c b/mm/memory.c index 36624986130b..e0073089bc9f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3310,6 +3310,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = NULL, *swapcache; + struct swap_info_struct *si = NULL; swp_entry_t entry; pte_t pte; int locked; @@ -3337,14 +3338,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out; }
+ /* Prevent swapoff from happening to us. */ + si = get_swap_device(entry); + if (unlikely(!si)) + goto out;
delayacct_set_flag(DELAYACCT_PF_SWAPIN); page = lookup_swap_cache(entry, vma, vmf->address); swapcache = page;
if (!page) { - struct swap_info_struct *si = swp_swap_info(entry); - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */ @@ -3515,6 +3518,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: + if (si) + put_swap_device(si); return ret; out_nomap: pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3526,6 +3531,8 @@ out_release: unlock_page(swapcache); put_page(swapcache); } + if (si) + put_swap_device(si); return ret; }