On 8/26/2025 8:20 AM, Bjorn Andersson wrote:
On Mon, Aug 25, 2025 at 05:22:02PM +0530, Kathiravan Thirumoorthy wrote:
From: Poovendhan Selvaraj quic_poovendh@quicinc.com
The pm ops are enabled before qmp phy create which causes a NULL pointer dereference when accessing qmp->phy->init_count in the qmp_usb_runtime_suspend.
How does that happen? Do we end up in the error path inbetween the devm_pm_runtime_enable()? Or does it happen by some other means?
qmp_usb_probe() is scheduled out per the below stack (collected from the RAM dump),
Stack trace of the kmodloader process: [<0x408def88>] __schedule+0x348/0x55c [<0x408df1f8>] schedule+0x5c/0x98 [<0x4052c318>] rpm_resume+0x150/0x404 [<0x4052d4e4>] pm_runtime_forbid+0x54/0x60 [<0x629c47f0>] qmp_usb_probe+0x3c4/0x5d0 [phy_qcom_qmp_usb.ko]
From the above snippet, we can see that the phy-create has not happened yet as the probe is still in pm_runtime_forbid() and qmp->phy is NULL. Meanwhile, qmp_usb_runtime_suspend() is called, causing the NULL pointer de-reference issue. Since the issue is not easily reproducible, we are not able to find out who/why the suspend was called.
This would be quite useful information for others to know if they hit the same or just a similar problem.
So if qmp->phy is NULL, bail out early in suspend / resume callbacks to avoid the NULL pointer dereference in qmp_usb_runtime_suspend and qmp_usb_runtime_resume.
Below is the stacktrace for reference:
[<818381a0>] (qmp_usb_runtime_suspend [phy_qcom_qmp_usb]) from [<4051d1d8>] (__rpm_callback+0x3c/0x110) [<4051d1d8>] (__rpm_callback) from [<4051d2fc>] (rpm_callback+0x50/0x54) [<4051d2fc>] (rpm_callback) from [<4051d940>] (rpm_suspend+0x23c/0x428) [<4051d940>] (rpm_suspend) from [<4051e808>] (pm_runtime_work+0x74/0x8c) [<4051e808>] (pm_runtime_work) from [<401311f4>] (process_scheduled_works+0x1d0/0x2c8) [<401311f4>] (process_scheduled_works) from [<40131d48>] (worker_thread+0x260/0x2e4) [<40131d48>] (worker_thread) from [<40138970>] (kthread+0x118/0x12c) [<40138970>] (kthread) from [<4010013c>] (ret_from_fork+0x14/0x38)
Cc: stable@vger.kernel.org # v6.0 Fixes: 65753f38f530 ("phy: qcom-qmp-usb: drop multi-PHY support")
Has this been a reproducible issue for last 3 years? I think the fixes makes sense in that it introduced the indirection, but when did the issue actually show up?
After migrating the QSDK Linux from 5.4 to 6.6, we are started seeing this issue randomly. We didn't had a chance to test in the other kernel versions.
Regards, Bjorn
Signed-off-by: Poovendhan Selvaraj quic_poovendh@quicinc.com Signed-off-by: Kathiravan Thirumoorthy kathiravan.thirumoorthy@oss.qualcomm.com
drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c index ed646a7e705ba3259708775ed5fedbbbada13735..cd04e8f22a0fe81b086b308d02713222aa95cae3 100644 --- a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c @@ -1940,7 +1940,7 @@ static int __maybe_unused qmp_usb_runtime_suspend(struct device *dev) dev_vdbg(dev, "Suspending QMP phy, mode:%d\n", qmp->mode);
- if (!qmp->phy->init_count) {
- if (!qmp->phy || !qmp->phy->init_count) { dev_vdbg(dev, "PHY not initialized, bailing out\n"); return 0; }
@@ -1960,7 +1960,7 @@ static int __maybe_unused qmp_usb_runtime_resume(struct device *dev) dev_vdbg(dev, "Resuming QMP phy, mode:%d\n", qmp->mode);
- if (!qmp->phy->init_count) {
- if (!qmp->phy || !qmp->phy->init_count) { dev_vdbg(dev, "PHY not initialized, bailing out\n"); return 0; }
-- 2.34.1