From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 62849a9612924a655c67cf6962920544aa5c20db ]
The kernel test robot triggered a warning with the following race: task-ctx A interrupt-ctx B worker -> process_one_work() -> work_item() -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> ->sleeping = 1 atomic_dec_and_test(nr_running) __schedule(); *interrupt* async_page_fault() -> local_irq_enable(); -> schedule(); -> sched_submit_work() -> wq_worker_sleeping() -> if (WARN_ON(->sleeping)) return -> __schedule() -> sched_update_worker() -> wq_worker_running() -> atomic_inc(nr_running); -> ->sleeping = 0;
-> sched_update_worker() -> wq_worker_running() if (!->sleeping) return
In this context the warning is pointless everything is fine. An interrupt before wq_worker_sleeping() will perform the ->sleeping assignment (0 -> 1 > 0) twice. An interrupt after wq_worker_sleeping() will trigger the warning and nr_running will be decremented (by A) and incremented once (only by B, A will skip it). This is the case until the ->sleeping is zeroed again in wq_worker_running().
Remove the WARN statement because this condition may happen. Document that preemption around wq_worker_sleeping() needs to be disabled to protect ->sleeping and not just as an optimisation.
Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Tejun Heo tj@kernel.org Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/core.c | 3 ++- kernel/workqueue.c | 6 ++++-- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 352239c411a44..79ce22de44095 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4199,7 +4199,8 @@ static inline void sched_submit_work(struct task_struct *tsk) * it wants to wake up a task to maintain concurrency. * As this function is called inside the schedule() context, * we disable preemption to avoid it calling schedule() again - * in the possible wakeup of a kworker. + * in the possible wakeup of a kworker and because wq_worker_sleeping() + * requires it. */ if (tsk->flags & PF_WQ_WORKER) { preempt_disable(); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 1a0c224af6fb3..4aa268582a225 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -864,7 +864,8 @@ void wq_worker_running(struct task_struct *task) * @task: task going to sleep * * This function is called from schedule() when a busy worker is - * going to sleep. + * going to sleep. Preemption needs to be disabled to protect ->sleeping + * assignment. */ void wq_worker_sleeping(struct task_struct *task) { @@ -881,7 +882,8 @@ void wq_worker_sleeping(struct task_struct *task)
pool = worker->pool;
- if (WARN_ON_ONCE(worker->sleeping)) + /* Return if preempted before wq_worker_running() was reached */ + if (worker->sleeping) return;
worker->sleeping = 1;