From: Sarthak Garg sartgarg@codeaurora.org
Consider the following stack trace
-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out
mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.
Fix this issue with the lock only for the required critical section.
Cc: stable@vger.kernel.org # v4.19+ Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org --- drivers/mmc/core/queue.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..72bef39 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed) - __mmc_cqe_recovery_notifier(mq); + mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */ @@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
spin_lock_irqsave(&mq->lock, flags);
- if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) + if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) { ret = BLK_EH_RESET_TIMER; - else + spin_unlock_irqrestore(&mq->lock, flags); + } else { + spin_unlock_irqrestore(&mq->lock, flags); ret = mmc_cqe_timed_out(req); - - spin_unlock_irqrestore(&mq->lock, flags); + }
return ret; }
On 6/05/20 5:34 pm, Veerabhadrarao Badiganti wrote:
From: Sarthak Garg sartgarg@codeaurora.org
Consider the following stack trace
-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out
mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.
Fix this issue with the lock only for the required critical section.
Cc: stable@vger.kernel.org # v4.19+ Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org
drivers/mmc/core/queue.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..72bef39 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed)
__mmc_cqe_recovery_notifier(mq);
} /* No timeout (XXX: huh? comment doesn't make much sense) */mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER;
@@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, spin_lock_irqsave(&mq->lock, flags);
- if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
- if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) { ret = BLK_EH_RESET_TIMER;
- else
spin_unlock_irqrestore(&mq->lock, flags);
- } else {
ret = mmc_cqe_timed_out(req);spin_unlock_irqrestore(&mq->lock, flags);
- spin_unlock_irqrestore(&mq->lock, flags);
- }
This looks good, but I think there needs to be another change also. I will send a patch for that, but in the meantime maybe you could straighten up the code flow through the spinlock e.g.
spin_lock_irqsave(&mq->lock, flags); ignore = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);
return ignore ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
And add a fixes tag.
return ret; }
First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it.
Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion in the timeout handler became unnecessary.
At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0.
Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path"
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org ---
This is the patch I alluded to when replying to "mmc: core: Fix recursive locking issue in CQE recovery path"
drivers/mmc/core/queue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 72bef39d7011..10ea67892b5f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } - /* No timeout (XXX: huh? comment doesn't make much sense) */ - blk_mq_complete_request(req); + /* The request has gone already */ return BLK_EH_DONE; default: /* Timeout is handled by mmc core */
On Thu, 7 May 2020 at 16:06, Adrian Hunter adrian.hunter@intel.com wrote:
First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it.
Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion in the timeout handler became unnecessary.
At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0.
Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path"
Signed-off-by: Adrian Hunter adrian.hunter@intel.com Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org
This is the patch I alluded to when replying to "mmc: core: Fix recursive locking issue in CQE recovery path"
Looks like the patch got corrupted, I was trying to fix it, but just couldn't figure it out.
Can you please re-format and do a repost?
Kind regards Uffe
drivers/mmc/core/queue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 72bef39d7011..10ea67892b5f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; }
/* No timeout (XXX: huh? comment doesn't make much sense) */
blk_mq_complete_request(req);
/* The request has gone already */ return BLK_EH_DONE; default: /* Timeout is handled by mmc core */
-- 2.17.1
From: Sarthak Garg sartgarg@codeaurora.org
Consider the following stack trace
-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out
mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.
Fix this issue with the lock only for the required critical section.
Cc: stable@vger.kernel.org Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org --- drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed) - __mmc_cqe_recovery_notifier(mq); + mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */ @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags; - int ret; + bool ignore_tout;
spin_lock_irqsave(&mq->lock, flags); - - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) - ret = BLK_EH_RESET_TIMER; - else - ret = mmc_cqe_timed_out(req); - + ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);
- return ret; + return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req); }
static void mmc_mq_recovery_handler(struct work_struct *work)
On 7/05/20 7:15 pm, Veerabhadrarao Badiganti wrote:
From: Sarthak Garg sartgarg@codeaurora.org
Consider the following stack trace
-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out
mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.
Fix this issue with the lock only for the required critical section.
Cc: stable@vger.kernel.org Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org
Acked-by: Adrian Hunter adrian.hunter@intel.com
drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed)
__mmc_cqe_recovery_notifier(mq);
} /* No timeout (XXX: huh? comment doesn't make much sense) */mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER;
@@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags;
- int ret;
- bool ignore_tout;
spin_lock_irqsave(&mq->lock, flags);
- if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
ret = BLK_EH_RESET_TIMER;
- else
ret = mmc_cqe_timed_out(req);
- ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);
- return ret;
- return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
} static void mmc_mq_recovery_handler(struct work_struct *work)
On Thu, 7 May 2020 at 18:15, Veerabhadrarao Badiganti vbadigan@codeaurora.org wrote:
From: Sarthak Garg sartgarg@codeaurora.org
Consider the following stack trace
-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out
mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.
Fix this issue with the lock only for the required critical section.
Cc: stable@vger.kernel.org Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org
Applied for fixes, thanks!
Kind regards Uffe
drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed)
__mmc_cqe_recovery_notifier(mq);
mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags;
int ret;
bool ignore_tout; spin_lock_irqsave(&mq->lock, flags);
if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
ret = BLK_EH_RESET_TIMER;
else
ret = mmc_cqe_timed_out(req);
ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);
return ret;
return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
}
static void mmc_mq_recovery_handler(struct work_struct *work)
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Hi
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag fixing commit: 1e8e55b67030 ("mmc: block: Add CQE support").
The bot has tested the following trees: v5.6.11, v5.4.39, v4.19.121.
v5.6.11: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")
v5.4.39: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")
v4.19.121: Failed to apply! Possible dependencies: 310df020cdd7 ("mmc: stop abusing the request queue_lock pointer") 511ce378e16f ("mmc: Add MMC host software queue support") b061b326287d ("mmc: simplify queue initialization") f5d72c5c55bc ("mmc: stop abusing the request queue_lock pointer")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: 4.19+
The bot has tested the following trees: v5.6.11, v5.4.39, v4.19.121.
v5.6.11: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")
v5.4.39: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")
v4.19.121: Failed to apply! Possible dependencies: 310df020cdd7 ("mmc: stop abusing the request queue_lock pointer") 511ce378e16f ("mmc: Add MMC host software queue support") b061b326287d ("mmc: simplify queue initialization") f5d72c5c55bc ("mmc: stop abusing the request queue_lock pointer")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
linux-stable-mirror@lists.linaro.org