On Mon, Jul 19, 2021 at 10:24 AM Zhouyi Zhou zhouzhouyi@gmail.com wrote:
On Mon, Jul 19, 2021 at 9:53 AM Paul E. McKenney paulmck@kernel.org wrote:
On Sun, Jul 18, 2021 at 11:51:36PM +0100, Matthew Wilcox wrote:
On Sun, Jul 18, 2021 at 02:59:14PM -0700, Paul E. McKenney wrote:
https://lore.kernel.org/lkml/CAK2bqVK0Q9YcpakE7_Rc6nr-E4e2GnMOgi5jJj=_Eh_1k EHLHA@mail.gmail.com/
But this one does show this warning in v5.12.17:
WARN_ON_ONCE(!preempt && rcu_preempt_depth() > 0);
This is in rcu_note_context_switch(), and could be caused by something like a schedule() within an RCU read-side critical section. This would of course be RCU-usage bugs, given that you are not permitted to block within an RCU read-side critical section.
I suggest checking the functions in the stack trace to see where the rcu_read_lock() is hiding. CONFIG_PROVE_LOCKING might also be helpful.
I'm not sure I see it in this stack trace.
Is it possible that there's something taking the rcu read lock in an interrupt handler, then returning from the interrupt handler without releasing the rcu lock? Do we have debugging that would fire if somebody did this?
Lockdep should complain, but in the absence of lockdep I don't know that anything would gripe in this situation.
I think Lockdep should complain. Meanwhile, I examined the 5.12.17 by naked eye, and found a suspicious place
I examined 5.13.2 the unpaired rcu_read_lock is still there
that could possibly trigger that problem:
struct swap_info_struct *get_swap_device(swp_entry_t entry) { struct swap_info_struct *si; unsigned long offset;
if (!entry.val) goto out; si = swp_swap_info(entry); if (!si) goto bad_nofile;
rcu_read_lock(); if (data_race(!(si->flags & SWP_VALID))) goto unlock_out; offset = swp_offset(entry); if (offset >= si->max) goto unlock_out;
return si; bad_nofile: pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); out: return NULL; unlock_out: rcu_read_unlock(); return NULL; } I guess the function "return si" without a rcu_read_unlock.
However the get_swap_device has changed in the mainline tree, there is no rcu_read_lock anymore.
Also, this is a preemptible kernel, so it is possible to trace __rcu_read_lock(), if that helps.
Thanx, Paul
Thanx Zhouyi