From: Peter Wang peter.wang@mediatek.com
There is a deadlock when runtime suspend waits for the flush of RTC work, and the RTC work calls ufshcd_rpm_get_sync to wait for runtime resume.
Here is deadlock backtrace kworker/0:1 D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367 ptr f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff <ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4 <ffffffee5e71e604> __schedule+0x684/0xa98 <ffffffee5e71ea60> schedule+0x48/0xc8 <ffffffee5e725f78> schedule_timeout+0x48/0x170 <ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0 <ffffffee5e71efe0> wait_for_completion+0x44/0x60 <ffffffee5d6de968> __flush_work+0x39c/0x424 <ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208 <ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28 <ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480 <ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4 <ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8 <ffffffee5df93580> __rpm_callback+0x94/0x3e0 <ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c <ffffffee5df91448> __pm_runtime_suspend+0x80/0x114 <ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c <ffffffee5df912f4> rpm_idle+0x264/0x338 <ffffffee5df90f14> __pm_runtime_idle+0x80/0x110 <ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4 <ffffffee5d6e3a40> process_one_work+0x26c/0x650 <ffffffee5d6e65c8> worker_thread+0x260/0x3d8 <ffffffee5d6edec8> kthread+0x110/0x134 <ffffffee5d616b18> ret_from_fork+0x10/0x20
Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org 6.9.x
Signed-off-by: Peter Wang peter.wang@mediatek.com --- drivers/ufs/core/ufshcd-priv.h | 5 +++++ drivers/ufs/core/ufshcd.c | 5 ++++- 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h index f42d99ce5bf1..81d6f0cfb148 100644 --- a/drivers/ufs/core/ufshcd-priv.h +++ b/drivers/ufs/core/ufshcd-priv.h @@ -329,6 +329,11 @@ static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba) return pm_runtime_get_sync(&hba->ufs_device_wlun->sdev_gendev); }
+static inline int ufshcd_rpm_get_if_active(struct ufs_hba *hba) +{ + return pm_runtime_get_if_active(&hba->ufs_device_wlun->sdev_gendev); +} + static inline int ufshcd_rpm_put_sync(struct ufs_hba *hba) { return pm_runtime_put_sync(&hba->ufs_device_wlun->sdev_gendev); diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 46433ecf0c4d..2e5adfa0f757 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8171,7 +8171,10 @@ static void ufshcd_update_rtc(struct ufs_hba *hba) */ val = ts64.tv_sec - hba->dev_info.rtc_time_baseline;
- ufshcd_rpm_get_sync(hba); + /* Skip update RTC if RPM state is not RPM_ACTIVE */ + if (ufshcd_rpm_get_if_active(hba) <= 0) + return; + err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR, QUERY_ATTR_IDN_SECONDS_PASSED, 0, 0, &val); ufshcd_rpm_put_sync(hba);
On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
- ufshcd_rpm_get_sync(hba); + /* Skip update RTC if RPM state is not RPM_ACTIVE */ + if (ufshcd_rpm_get_if_active(hba) <= 0) + return;
I understood your intention of this 'retun', but my understanding is you assume that __ufshcd_wl_resume() will schedule rtc update work, however, before the time that __ufshcd_wl_resume() completes, the RPM status is not RPM_ACTIVE until __ufshcd_wl_resume() completes and __update_runtime_status(dev, RPM_ACTIVE) is called.
If rtc update work is performed before __update_runtime_status(dev, RPM_ACTIVE), here you return, then no RTC work will be scheduled.
do you think it is possible?
On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
@@ -8171,7 +8171,10 @@ static void ufshcd_update_rtc(struct ufs_hba *hba) */ val = ts64.tv_sec - hba->dev_info.rtc_time_baseline; - ufshcd_rpm_get_sync(hba); + /* Skip update RTC if RPM state is not RPM_ACTIVE */ + if (ufshcd_rpm_get_if_active(hba) <= 0) + return;
err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR, QUERY_ATTR_IDN_SECONDS_PASSED, 0, 0, &val); ufshcd_rpm_put_sync(hba);
My suggestion would be to not return here and just skip the update, but reschedule it for the next time that doesn't affect the suspend/resume flow you're worried about.
On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org 6.9.x
Signed-off-by: Peter Wang peter.wang@mediatek.com
ignore my previous two emails, I saw you have just skipped update, not skip schedule in this version.
Reviewed-by: Bean Huo beanhuo@micron.com
On Mon, 2024-07-15 at 11:37 +0200, Bean Huo wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content. On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org 6.9.x
Signed-off-by: Peter Wang peter.wang@mediatek.com
ignore my previous two emails, I saw you have just skipped update, not skip schedule in this version.
Reviewed-by: Bean Huo beanhuo@micron.com
Hi Bean,
Yes, just skip update RTC, thanks for review.
Peter
On 7/14/24 11:38 PM, peter.wang@mediatek.com wrote:
There is a deadlock when runtime suspend waits for the flush of RTC work, and the RTC work calls ufshcd_rpm_get_sync to wait for runtime resume.
The above description is too brief - a description of how the fix works is missing. Please include a more detailed description in future patches.
Anyway:
Reviewed-by: Bart Van Assche bvanassche@acm.org
On Mon, 2024-07-15 at 10:17 -0700, Bart Van Assche wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content. On 7/14/24 11:38 PM, peter.wang@mediatek.com wrote:
There is a deadlock when runtime suspend waits for the flush of RTC
work,
and the RTC work calls ufshcd_rpm_get_sync to wait for runtime
resume.
The above description is too brief - a description of how the fix works is missing. Please include a more detailed description in future patches.
Anyway:
Reviewed-by: Bart Van Assche bvanassche@acm.org
Hi Bart,
Will improve in future patches.
Thanks for review. Peter
On Mon, 15 Jul 2024 14:38:31 +0800, peter.wang@mediatek.com wrote:
There is a deadlock when runtime suspend waits for the flush of RTC work, and the RTC work calls ufshcd_rpm_get_sync to wait for runtime resume.
Here is deadlock backtrace kworker/0:1 D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367 ptr f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff <ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4 <ffffffee5e71e604> __schedule+0x684/0xa98 <ffffffee5e71ea60> schedule+0x48/0xc8 <ffffffee5e725f78> schedule_timeout+0x48/0x170 <ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0 <ffffffee5e71efe0> wait_for_completion+0x44/0x60 <ffffffee5d6de968> __flush_work+0x39c/0x424 <ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208 <ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28 <ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480 <ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4 <ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8 <ffffffee5df93580> __rpm_callback+0x94/0x3e0 <ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c <ffffffee5df91448> __pm_runtime_suspend+0x80/0x114 <ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c <ffffffee5df912f4> rpm_idle+0x264/0x338 <ffffffee5df90f14> __pm_runtime_idle+0x80/0x110 <ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4 <ffffffee5d6e3a40> process_one_work+0x26c/0x650 <ffffffee5d6e65c8> worker_thread+0x260/0x3d8 <ffffffee5d6edec8> kthread+0x110/0x134 <ffffffee5d616b18> ret_from_fork+0x10/0x20
[...]
Applied to 6.11/scsi-queue, thanks!
[1/1] ufs: core: fix deadlock when rtc update https://git.kernel.org/mkp/scsi/c/3911af778f20
linux-stable-mirror@lists.linaro.org