From: Rik van Riel riel@surriel.com
commit cdea5459ce263fbc963657a7736762ae897a8ae6 upstream
The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker.
Doing the wakeup after releasing the spinlock opens up the following race condition:
Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE
This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable.
Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE.
This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks.
[dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.]
Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason clm@fb.com Signed-off-by: Rik van Riel riel@surriel.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Cc: stable@vger.kernel.org # 4.9.x-4.19.x [modified for contextual change near xlog_state_do_callback()] Signed-off-by: Samuel Mendoza-Jonas samjonas@amazon.com Reviewed-by: Frank van der Linden fllinden@amazon.com Reviewed-by: Suraj Jitindar Singh surajjs@amazon.com Reviewed-by: Benjamin Herrenschmidt benh@amazon.com Reviewed-by: Anchal Agarwal anchalag@amazon.com --- This issue was fixed in v5.4 but didn't appear to make it to stable. The fixed commit goes back to v2.6, so backport this to stable kernels before v5.4. The only difference is a contextual change at xlog_state_do_callback(log, XFS_LI_ABORTED, NULL); Which in v5.4 is xlog_state_do_callback(log, true, NULL);
fs/xfs/xfs_log.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 360e32220f93..0c0f70e6c7d9 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -2684,7 +2684,6 @@ xlog_state_do_callback( int funcdidcallbacks; /* flag: function did callbacks */ int repeats; /* for issuing console warnings if * looping too many times */ - int wake = 0;
spin_lock(&log->l_icloglock); first_iclog = iclog = log->l_iclog; @@ -2886,11 +2885,9 @@ xlog_state_do_callback( #endif
if (log->l_iclog->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_IOERROR)) - wake = 1; - spin_unlock(&log->l_icloglock); - - if (wake) wake_up_all(&log->l_flush_wait); + + spin_unlock(&log->l_icloglock); }
@@ -4052,7 +4049,9 @@ xfs_log_force_umount( * item committed callback functions will do this again under lock to * avoid races. */ + spin_lock(&log->l_cilp->xc_push_lock); wake_up_all(&log->l_cilp->xc_commit_wait); + spin_unlock(&log->l_cilp->xc_push_lock); xlog_state_do_callback(log, XFS_LI_ABORTED, NULL);
#ifdef XFSERRORDEBUG
On Thu, Jul 30, 2020 at 02:35:07PM -0700, Samuel Mendoza-Jonas wrote:
From: Rik van Riel riel@surriel.com
commit cdea5459ce263fbc963657a7736762ae897a8ae6 upstream
The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker.
Doing the wakeup after releasing the spinlock opens up the following race condition:
Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE
This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable.
Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE.
This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks.
[dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.]
Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason clm@fb.com Signed-off-by: Rik van Riel riel@surriel.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Cc: stable@vger.kernel.org # 4.9.x-4.19.x [modified for contextual change near xlog_state_do_callback()] Signed-off-by: Samuel Mendoza-Jonas samjonas@amazon.com Reviewed-by: Frank van der Linden fllinden@amazon.com Reviewed-by: Suraj Jitindar Singh surajjs@amazon.com Reviewed-by: Benjamin Herrenschmidt benh@amazon.com Reviewed-by: Anchal Agarwal anchalag@amazon.com
This issue was fixed in v5.4 but didn't appear to make it to stable. The fixed commit goes back to v2.6, so backport this to stable kernels before v5.4. The only difference is a contextual change at xlog_state_do_callback(log, XFS_LI_ABORTED, NULL); Which in v5.4 is xlog_state_do_callback(log, true, NULL);
Now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org