On Sat, Jan 09, 2021 at 03:05:32AM +0100, Frederic Weisbecker wrote:
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced.
Unfortunately the call to rcu_user_enter() is already past the last rescheduling opportunity before we resume to userspace or to guest mode. We may escape there with the woken task ignored.
The ultimate resort to fix every callsites is to trigger a self-IPI (nohz_full depends on IRQ_WORK) that will trigger a reschedule on IRQ tail or guest exit.
Eventually every site that want a saner treatment will need to carefully place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit need_resched() check upon resume.
Reported-by: Paul E. McKenney paulmck@kernel.org Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Cc: stable@vger.kernel.org Cc: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnarmingo@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org
kernel/rcu/tree.c | 22 +++++++++++++++++++++- kernel/rcu/tree.h | 2 +- kernel/rcu/tree_plugin.h | 25 ++++++++++++++++--------- 3 files changed, 38 insertions(+), 11 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b6e1377774e3..2920dfc9f58c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -676,6 +676,18 @@ void rcu_idle_enter(void) EXPORT_SYMBOL_GPL(rcu_idle_enter); #ifdef CONFIG_NO_HZ_FULL
+/*
- An empty function that will trigger a reschedule on
- IRQ tail once IRQs get re-enabled on userspace resume.
- */
+static void late_wakeup_func(struct irq_work *work) +{ +}
+static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
- IRQ_WORK_INIT(late_wakeup_func);
/**
- rcu_user_enter - inform RCU that we are resuming userspace.
@@ -692,9 +704,17 @@ noinstr void rcu_user_enter(void) struct rcu_data *rdp = this_cpu_ptr(&rcu_data); lockdep_assert_irqs_disabled();
- do_nocb_deferred_wakeup(rdp);
- /*
* We may be past the last rescheduling opportunity in the entry code.
* Trigger a self IPI that will fire and reschedule once we resume to
* user/guest mode.
*/
- if (do_nocb_deferred_wakeup(rdp) && need_resched())
irq_work_queue(this_cpu_ptr(&late_wakeup_work));
- rcu_eqs_enter(true);
}
Do we have the guarantee that every architecture that supports NOHZ_FULL has arch_irq_work_raise() on?
Also, can't you do the same thing you did earlier and do that wakeup thing before we complete exit_to_user_mode_prepare() ?