[PATCH V1 2/2] mmc: core: Fix recursive locking issue in CQE recovery path

List overview All Threads
Download

newer

older

stable-rc/linux-4.4.y boot: 91...

[patch 01/15] ipc/mqueue.c: change...

Veerabhadrarao Badiganti

6 May 2020 6 May '20

2:34 p.m.

From: Sarthak Garg sartgarg@codeaurora.org

Consider the following stack trace

mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.

Fix this issue with the lock only for the required critical section.

Cc: stable@vger.kernel.org # v4.19+ Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org --- drivers/mmc/core/queue.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..72bef39 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed) - __mmc_cqe_recovery_notifier(mq); + mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */ @@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,

spin_lock_irqsave(&mq->lock, flags);

- if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) + if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) { ret = BLK_EH_RESET_TIMER; - else + spin_unlock_irqrestore(&mq->lock, flags); + } else { + spin_unlock_irqrestore(&mq->lock, flags); ret = mmc_cqe_timed_out(req); - - spin_unlock_irqrestore(&mq->lock, flags); + }

return ret; }

-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

Show replies by date

Adrian Hunter

7 May 7 May

11:48 a.m.

On 6/05/20 5:34 pm, Veerabhadrarao Badiganti wrote:

...

From: Sarthak Garg sartgarg@codeaurora.org

Consider the following stack trace

-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out

mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.

Fix this issue with the lock only for the required critical section.

Cc: stable@vger.kernel.org # v4.19+ Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org

drivers/mmc/core/queue.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..72bef39 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed)
		__mmc_cqe_recovery_notifier(mq);
		mmc_cqe_recovery_notifier(mrq);
return BLK_EH_RESET_TIMER;
} /* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, spin_lock_irqsave(&mq->lock, flags);

if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)

if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) { ret = BLK_EH_RESET_TIMER;

else
spin_unlock_irqrestore(&mq->lock, flags);
} else {
spin_unlock_irqrestore(&mq->lock, flags);
ret = mmc_cqe_timed_out(req);
spin_unlock_irqrestore(&mq->lock, flags);

}

This looks good, but I think there needs to be another change also. I will send a patch for that, but in the meantime maybe you could straighten up the code flow through the spinlock e.g.

spin_lock_irqsave(&mq->lock, flags); ignore = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);

return ignore ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);

And add a fixes tag.

...

return ret; }

Adrian Hunter

2:06 p.m.

New subject: [PATCH] mmc: block: Fix request completion in the CQE timeout path

First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it.

Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion in the timeout handler became unnecessary.

At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0.

Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path"

Signed-off-by: Adrian Hunter adrian.hunter@intel.com Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org ---

This is the patch I alluded to when replying to "mmc: core: Fix recursive locking issue in CQE recovery path"

drivers/mmc/core/queue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 72bef39d7011..10ea67892b5f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } - /* No timeout (XXX: huh? comment doesn't make much sense) */ - blk_mq_complete_request(req); + /* The request has gone already */ return BLK_EH_DONE; default: /* Timeout is handled by mmc core */

-- 2.17.1

Ulf Hansson

8 May 8 May

5:25 a.m.

New subject: [PATCH] mmc: block: Fix request completion in the CQE timeout path

On Thu, 7 May 2020 at 16:06, Adrian Hunter adrian.hunter@intel.com wrote:

...

First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it.

Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion in the timeout handler became unnecessary.

At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0.

Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path"

Signed-off-by: Adrian Hunter adrian.hunter@intel.com Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: stable@vger.kernel.org

This is the patch I alluded to when replying to "mmc: core: Fix recursive locking issue in CQE recovery path"

Looks like the patch got corrupted, I was trying to fix it, but just couldn't figure it out.

Can you please re-format and do a repost?

Kind regards Uffe

...

drivers/mmc/core/queue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 72bef39d7011..10ea67892b5f 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -110,8 +110,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; }
          /* No timeout (XXX: huh? comment doesn't make much sense) */
          blk_mq_complete_request(req);
          /* The request has gone already */
          return BLK_EH_DONE;
  default:
          /* Timeout is handled by mmc core */
-- 2.17.1

Veerabhadrarao Badiganti

7 May 7 May

4:15 p.m.

New subject: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path

From: Sarthak Garg sartgarg@codeaurora.org

Consider the following stack trace

Fix this issue with the lock only for the required critical section.

Cc: stable@vger.kernel.org Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org --- drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed) - __mmc_cqe_recovery_notifier(mq); + mmc_cqe_recovery_notifier(mrq); return BLK_EH_RESET_TIMER; } /* No timeout (XXX: huh? comment doesn't make much sense) */ @@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags; - int ret; + bool ignore_tout;

spin_lock_irqsave(&mq->lock, flags); - - if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) - ret = BLK_EH_RESET_TIMER; - else - ret = mmc_cqe_timed_out(req); - + ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);

- return ret; + return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req); }

static void mmc_mq_recovery_handler(struct work_struct *work)

-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

Adrian Hunter

5:21 p.m.

New subject: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path

On 7/05/20 7:15 pm, Veerabhadrarao Badiganti wrote:

...

From: Sarthak Garg sartgarg@codeaurora.org

Consider the following stack trace

-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out

mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.

Fix this issue with the lock only for the required critical section.

Cc: stable@vger.kernel.org Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org

Acked-by: Adrian Hunter adrian.hunter@intel.com

...

drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed)
		__mmc_cqe_recovery_notifier(mq);
		mmc_cqe_recovery_notifier(mrq);
return BLK_EH_RESET_TIMER;
} /* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags;

int ret;

bool ignore_tout;

spin_lock_irqsave(&mq->lock, flags);
if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
ret = BLK_EH_RESET_TIMER;
else
ret = mmc_cqe_timed_out(req);
ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled; spin_unlock_irqrestore(&mq->lock, flags);

return ret;

return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);

} static void mmc_mq_recovery_handler(struct work_struct *work)

Ulf Hansson

8 May 8 May

8:12 a.m.

New subject: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path

On Thu, 7 May 2020 at 18:15, Veerabhadrarao Badiganti vbadigan@codeaurora.org wrote:

...

From: Sarthak Garg sartgarg@codeaurora.org

Consider the following stack trace

-001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out

mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark.

Fix this issue with the lock only for the required critical section.

Cc: stable@vger.kernel.org Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Sarthak Garg sartgarg@codeaurora.org

Applied for fixes, thanks!

Kind regards Uffe

...

drivers/mmc/core/queue.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 25bee3d..b5fd3bc 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req) case MMC_ISSUE_DCMD: if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) { if (recovery_needed)
                          __mmc_cqe_recovery_notifier(mq);
                          mmc_cqe_recovery_notifier(mrq);
                  return BLK_EH_RESET_TIMER;
          }
          /* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -127,18 +127,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req, struct mmc_card *card = mq->card; struct mmc_host *host = card->host; unsigned long flags;
  int ret;
  bool ignore_tout;

  spin_lock_irqsave(&mq->lock, flags);
  if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
          ret = BLK_EH_RESET_TIMER;
  else
          ret = mmc_cqe_timed_out(req);
  ignore_tout = mq->recovery_needed || !mq->use_cqe || host->hsq_enabled;
  spin_unlock_irqrestore(&mq->lock, flags);
  return ret;
  return ignore_tout ? BLK_EH_RESET_TIMER : mmc_cqe_timed_out(req);
}

static void mmc_mq_recovery_handler(struct work_struct *work)

Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

Sasha Levin

9 May 9 May

12:30 p.m.

New subject: [PATCH V2] mmc: core: Fix recursive locking issue in CQE recovery path

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag fixing commit: 1e8e55b67030 ("mmc: block: Add CQE support").

The bot has tested the following trees: v5.6.11, v5.4.39, v4.19.121.

v5.6.11: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")

v5.4.39: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")

v4.19.121: Failed to apply! Possible dependencies: 310df020cdd7 ("mmc: stop abusing the request queue_lock pointer") 511ce378e16f ("mmc: Add MMC host software queue support") b061b326287d ("mmc: simplify queue initialization") f5d72c5c55bc ("mmc: stop abusing the request queue_lock pointer")

NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- Thanks Sasha

Sasha Levin

12:30 p.m.

[This is an automated email]

This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: 4.19+

The bot has tested the following trees: v5.6.11, v5.4.39, v4.19.121.

v5.6.11: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")

v5.4.39: Failed to apply! Possible dependencies: 511ce378e16f ("mmc: Add MMC host software queue support")

NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- Thanks Sasha

1984

days inactive

1987

days old

linux-stable-mirror@lists.linaro.org

8 comments

participants

tags (0)

participants (4)

Adrian Hunter
Sasha Levin
Ulf Hansson
Veerabhadrarao Badiganti