Hello.
On pondělí 19. července 2021 13:12:58 CEST Miaohe Lin wrote:
On 2021/7/19 18:14, Boqun Feng wrote:
On Mon, Jul 19, 2021 at 03:43:00AM +0100, Matthew Wilcox wrote:
On Mon, Jul 19, 2021 at 10:24:18AM +0800, Zhouyi Zhou wrote:
Meanwhile, I examined the 5.12.17 by naked eye, and found a suspicious place that could possibly trigger that problem:
struct swap_info_struct *get_swap_device(swp_entry_t entry) {
struct swap_info_struct *si; unsigned long offset; if (!entry.val) goto out;
si = swp_swap_info(entry); if (!si) goto bad_nofile; rcu_read_lock(); if (data_race(!(si->flags & SWP_VALID))) goto unlock_out; offset = swp_offset(entry); if (offset >= si->max) goto unlock_out; return si;
bad_nofile: pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val);
out: return NULL;
unlock_out: rcu_read_unlock(); return NULL;
} I guess the function "return si" without a rcu_read_unlock.
Yes, but the caller is supposed to call put_swap_device() which calls rcu_read_unlock(). See commit eb085574a752.
Right, but we need to make sure there is no sleepable function called before put_swap_device() called, and the call trace showed the following
happened: do_swap_page(): si = get_swap_device(): rcu_read_lock(); lock_page_or_retry(): might_sleep(); // call a sleepable function inside RCU read-side
c.s.
__lock_page_or_retry(): wait_on_page_bit_common(): schedule(): rcu_note_context_switch(); // Warn here put_swap_device(); rcu_read_unlock();
, which introduced by commit 2799e77529c2a
When in the commit 2799e77529c2a, we're using the percpu_ref to serialize against concurrent swapoff, i.e. there's percpu_ref inside get_swap_device() instead of rcu_read_lock(). Please see commit 63d8620ecf93 ("mm/swapfile: use percpu_ref to serialize against concurrent swapoff") for detail.
The problem here is that 2799e77529c2a got pulled into stable, but 63d8620ecf93 was not pulled. Are you suggesting that 63d8620ecf93 should be pulled into the stable kernel as well?
Thanks.