Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in the works that makes the GLINK callback happen in a sleepable context, which would remove the need for the clients list to be protected by a spinlock, and removing the work scheduling. This is however not -rc material...
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ Johan reports that these fixes do not address that issue.
Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- Changes in v2: - Refer to the correct commit in the ucsi_unregister() patch. - Updated wording in the same commit message about the new error message in the log. - Changed the data type of the introduced state variables, opted to go for a bool as we only represent two states (and I would like to further clean this up going forward) - Initialized the spinlock - Link to v1: https://lore.kernel.org/r/20240818-pmic-glink-v6-11-races-v1-0-f87c577e0bc9@...
--- Bjorn Andersson (3): soc: qcom: pmic_glink: Fix race during initialization usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Actually communicate with remote goes down
drivers/power/supply/qcom_battmgr.c | 16 ++++++++----- drivers/soc/qcom/pmic_glink.c | 40 ++++++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++----- drivers/usb/typec/ucsi/ucsi_glink.c | 43 ++++++++++++++++++++++++++--------- include/linux/soc/qcom/pmic_glink.h | 11 +++++---- 5 files changed, 87 insertions(+), 40 deletions(-) --- base-commit: 2fd613d27928293eaa87788b10e8befb6805cd42 change-id: 20240818-pmic-glink-v6-11-races-363f5964c339
Best regards,
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced.
Timeline provided by Stephen: CPU0 CPU1 ---- ---- ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir amit.pundir@linaro.org Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX1... Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd swboyd@chromium.org Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaE... Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Tested-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- drivers/power/supply/qcom_battmgr.c | 16 ++++++++++------ drivers/soc/qcom/pmic_glink.c | 28 ++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++++------ drivers/usb/typec/ucsi/ucsi_glink.c | 16 ++++++++++------ include/linux/soc/qcom/pmic_glink.h | 11 ++++++----- 5 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c index 49bef4a5ac3f..df90a470c51a 100644 --- a/drivers/power/supply/qcom_battmgr.c +++ b/drivers/power/supply/qcom_battmgr.c @@ -1387,12 +1387,16 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev, "failed to register wireless charing power supply\n"); }
- battmgr->client = devm_pmic_glink_register_client(dev, - PMIC_GLINK_OWNER_BATTMGR, - qcom_battmgr_callback, - qcom_battmgr_pdr_notify, - battmgr); - return PTR_ERR_OR_ZERO(battmgr->client); + battmgr->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_BATTMGR, + qcom_battmgr_callback, + qcom_battmgr_pdr_notify, + battmgr); + if (IS_ERR(battmgr->client)) + return PTR_ERR(battmgr->client); + + pmic_glink_register_client(battmgr->client); + + return 0; }
static const struct auxiliary_device_id qcom_battmgr_id_table[] = { diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9ebc0ba35947..58ec91767d79 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -66,15 +66,14 @@ static void _devm_pmic_glink_release_client(struct device *dev, void *res) spin_unlock_irqrestore(&pg->client_lock, flags); }
-struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, - unsigned int id, - void (*cb)(const void *, size_t, void *), - void (*pdr)(void *, int), - void *priv) +struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev, + unsigned int id, + void (*cb)(const void *, size_t, void *), + void (*pdr)(void *, int), + void *priv) { struct pmic_glink_client *client; struct pmic_glink *pg = dev_get_drvdata(dev->parent); - unsigned long flags;
client = devres_alloc(_devm_pmic_glink_release_client, sizeof(*client), GFP_KERNEL); if (!client) @@ -85,6 +84,18 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, client->cb = cb; client->pdr_notify = pdr; client->priv = priv; + INIT_LIST_HEAD(&client->node); + + devres_add(dev, client); + + return client; +} +EXPORT_SYMBOL_GPL(devm_pmic_glink_new_client); + +void pmic_glink_register_client(struct pmic_glink_client *client) +{ + struct pmic_glink *pg = client->pg; + unsigned long flags;
mutex_lock(&pg->state_lock); spin_lock_irqsave(&pg->client_lock, flags); @@ -95,11 +106,8 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, spin_unlock_irqrestore(&pg->client_lock, flags); mutex_unlock(&pg->state_lock);
- devres_add(dev, client); - - return client; } -EXPORT_SYMBOL_GPL(devm_pmic_glink_register_client); +EXPORT_SYMBOL_GPL(pmic_glink_register_client);
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c index 1e0808b3cb93..e4f5059256e5 100644 --- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -520,12 +520,17 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev, return ret; }
- altmode->client = devm_pmic_glink_register_client(dev, - altmode->owner_id, - pmic_glink_altmode_callback, - pmic_glink_altmode_pdr_notify, - altmode); - return PTR_ERR_OR_ZERO(altmode->client); + altmode->client = devm_pmic_glink_new_client(dev, + altmode->owner_id, + pmic_glink_altmode_callback, + pmic_glink_altmode_pdr_notify, + altmode); + if (IS_ERR(altmode->client)) + return PTR_ERR(altmode->client); + + pmic_glink_register_client(altmode->client); + + return 0; }
static const struct auxiliary_device_id pmic_glink_altmode_id_table[] = { diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 16c328497e0b..ac53a81c2a81 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -367,12 +367,16 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, ucsi->port_orientation[port] = desc; }
- ucsi->client = devm_pmic_glink_register_client(dev, - PMIC_GLINK_OWNER_USBC, - pmic_glink_ucsi_callback, - pmic_glink_ucsi_pdr_notify, - ucsi); - return PTR_ERR_OR_ZERO(ucsi->client); + ucsi->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_USBC, + pmic_glink_ucsi_callback, + pmic_glink_ucsi_pdr_notify, + ucsi); + if (IS_ERR(ucsi->client)) + return PTR_ERR(ucsi->client); + + pmic_glink_register_client(ucsi->client); + + return 0; }
static void pmic_glink_ucsi_remove(struct auxiliary_device *adev) diff --git a/include/linux/soc/qcom/pmic_glink.h b/include/linux/soc/qcom/pmic_glink.h index fd124aa18c81..aedde76d7e13 100644 --- a/include/linux/soc/qcom/pmic_glink.h +++ b/include/linux/soc/qcom/pmic_glink.h @@ -23,10 +23,11 @@ struct pmic_glink_hdr {
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len);
-struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, - unsigned int id, - void (*cb)(const void *, size_t, void *), - void (*pdr)(void *, int), - void *priv); +struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev, + unsigned int id, + void (*cb)(const void *, size_t, void *), + void (*pdr)(void *, int), + void *priv); +void pmic_glink_register_client(struct pmic_glink_client *client);
#endif
Hi,
On Mon, Aug 19, 2024 at 01:07:45PM GMT, Bjorn Andersson wrote:
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced.
Timeline provided by Stephen: CPU0 CPU1
ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir amit.pundir@linaro.org Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX1... Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd swboyd@chromium.org Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaE... Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Tested-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
I expect this to go through SOC tree:
Acked-by: Sebastian Reichel sebastian.reichel@collabora.com
-- Sebastian
drivers/power/supply/qcom_battmgr.c | 16 ++++++++++------ drivers/soc/qcom/pmic_glink.c | 28 ++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++++------ drivers/usb/typec/ucsi/ucsi_glink.c | 16 ++++++++++------ include/linux/soc/qcom/pmic_glink.h | 11 ++++++----- 5 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c index 49bef4a5ac3f..df90a470c51a 100644 --- a/drivers/power/supply/qcom_battmgr.c +++ b/drivers/power/supply/qcom_battmgr.c @@ -1387,12 +1387,16 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev, "failed to register wireless charing power supply\n"); }
- battmgr->client = devm_pmic_glink_register_client(dev,
PMIC_GLINK_OWNER_BATTMGR,
qcom_battmgr_callback,
qcom_battmgr_pdr_notify,
battmgr);
- return PTR_ERR_OR_ZERO(battmgr->client);
- battmgr->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_BATTMGR,
qcom_battmgr_callback,
qcom_battmgr_pdr_notify,
battmgr);
- if (IS_ERR(battmgr->client))
return PTR_ERR(battmgr->client);
- pmic_glink_register_client(battmgr->client);
- return 0;
} static const struct auxiliary_device_id qcom_battmgr_id_table[] = { diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9ebc0ba35947..58ec91767d79 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -66,15 +66,14 @@ static void _devm_pmic_glink_release_client(struct device *dev, void *res) spin_unlock_irqrestore(&pg->client_lock, flags); } -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
{ struct pmic_glink_client *client; struct pmic_glink *pg = dev_get_drvdata(dev->parent);
- unsigned long flags;
client = devres_alloc(_devm_pmic_glink_release_client, sizeof(*client), GFP_KERNEL); if (!client) @@ -85,6 +84,18 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, client->cb = cb; client->pdr_notify = pdr; client->priv = priv;
- INIT_LIST_HEAD(&client->node);
- devres_add(dev, client);
- return client;
+} +EXPORT_SYMBOL_GPL(devm_pmic_glink_new_client);
+void pmic_glink_register_client(struct pmic_glink_client *client) +{
- struct pmic_glink *pg = client->pg;
- unsigned long flags;
mutex_lock(&pg->state_lock); spin_lock_irqsave(&pg->client_lock, flags); @@ -95,11 +106,8 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, spin_unlock_irqrestore(&pg->client_lock, flags); mutex_unlock(&pg->state_lock);
- devres_add(dev, client);
- return client;
} -EXPORT_SYMBOL_GPL(devm_pmic_glink_register_client); +EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c index 1e0808b3cb93..e4f5059256e5 100644 --- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -520,12 +520,17 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev, return ret; }
- altmode->client = devm_pmic_glink_register_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
pmic_glink_altmode_pdr_notify,
altmode);
- return PTR_ERR_OR_ZERO(altmode->client);
- altmode->client = devm_pmic_glink_new_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
pmic_glink_altmode_pdr_notify,
altmode);
- if (IS_ERR(altmode->client))
return PTR_ERR(altmode->client);
- pmic_glink_register_client(altmode->client);
- return 0;
} static const struct auxiliary_device_id pmic_glink_altmode_id_table[] = { diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 16c328497e0b..ac53a81c2a81 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -367,12 +367,16 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, ucsi->port_orientation[port] = desc; }
- ucsi->client = devm_pmic_glink_register_client(dev,
PMIC_GLINK_OWNER_USBC,
pmic_glink_ucsi_callback,
pmic_glink_ucsi_pdr_notify,
ucsi);
- return PTR_ERR_OR_ZERO(ucsi->client);
- ucsi->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_USBC,
pmic_glink_ucsi_callback,
pmic_glink_ucsi_pdr_notify,
ucsi);
- if (IS_ERR(ucsi->client))
return PTR_ERR(ucsi->client);
- pmic_glink_register_client(ucsi->client);
- return 0;
} static void pmic_glink_ucsi_remove(struct auxiliary_device *adev) diff --git a/include/linux/soc/qcom/pmic_glink.h b/include/linux/soc/qcom/pmic_glink.h index fd124aa18c81..aedde76d7e13 100644 --- a/include/linux/soc/qcom/pmic_glink.h +++ b/include/linux/soc/qcom/pmic_glink.h @@ -23,10 +23,11 @@ struct pmic_glink_hdr { int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len); -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv);
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv);
+void pmic_glink_register_client(struct pmic_glink_client *client); #endif
-- 2.34.1
On Mon, Aug 19, 2024 at 01:07:45PM -0700, Bjorn Andersson wrote:
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced.
Timeline provided by Stephen: CPU0 CPU1
ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir amit.pundir@linaro.org Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX1... Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd swboyd@chromium.org Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaE... Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Tested-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9ebc0ba35947..58ec91767d79 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -66,15 +66,14 @@ static void _devm_pmic_glink_release_client(struct device *dev, void *res) spin_unlock_irqrestore(&pg->client_lock, flags); } -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
Please consider renaming this one
devm_pmic_glink_alloc_client()
(or devm_pmic_glink_client_alloc()) which is more conventional for kernel code than using "new".
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
{ struct pmic_glink_client *client; struct pmic_glink *pg = dev_get_drvdata(dev->parent);
- unsigned long flags;
client = devres_alloc(_devm_pmic_glink_release_client, sizeof(*client), GFP_KERNEL); if (!client)
Looks good otherwise:
Reviewed-by: Johan Hovold johan+linaro@kernel.org
Johan
Commit '635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients list without a lock")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference. This does however result in the user being informed about this error by the following entry in the kernel log:
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: 635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients list without a lock") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 27 ++++++++++++++++++++++----- 2 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg; + int ret;
- return rpmsg_send(pg->ept, data, len); + mutex_lock(&pg->state_lock); + if (!pg->ept) + ret = -ECONNRESET; + else + ret = rpmsg_send(pg->ept, data, len); + mutex_unlock(&pg->state_lock); + + return ret; } EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..bb6244f21e0a 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work; struct work_struct register_work; + spinlock_t state_lock; + bool ucsi_registered; + bool pd_running;
u8 read_buf[UCSI_BUF_SIZE]; }; @@ -244,8 +247,20 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work); + unsigned long flags; + bool pd_running;
- ucsi_register(ucsi->ucsi); + spin_lock_irqsave(&ucsi->state_lock, flags); + pd_running = ucsi->pd_running; + spin_unlock_irqrestore(&ucsi->state_lock, flags); + + if (!ucsi->ucsi_registered && pd_running) { + ucsi_register(ucsi->ucsi); + ucsi->ucsi_registered = true; + } else if (ucsi->ucsi_registered && !pd_running) { + ucsi_unregister(ucsi->ucsi); + ucsi->ucsi_registered = false; + } }
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +284,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv; + unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP) - schedule_work(&ucsi->register_work); - else if (state == SERVREG_SERVICE_STATE_DOWN) - ucsi_unregister(ucsi->ucsi); + spin_lock_irqsave(&ucsi->state_lock, flags); + ucsi->pd_running = state == SERVREG_SERVICE_STATE_UP; + spin_unlock_irqrestore(&ucsi->state_lock, flags); + schedule_work(&ucsi->register_work); }
static void pmic_glink_ucsi_destroy(void *data) @@ -320,6 +336,7 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, INIT_WORK(&ucsi->register_work, pmic_glink_ucsi_register); init_completion(&ucsi->read_ack); init_completion(&ucsi->write_ack); + spin_lock_init(&ucsi->state_lock); mutex_init(&ucsi->lock);
ucsi->ucsi = ucsi_create(dev, &pmic_glink_ucsi_ops);
On Mon, Aug 19, 2024 at 01:07:46PM -0700, Bjorn Andersson wrote:
Commit '635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients
Looks like you copied the wrong SHA again. This should be
9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
as we discussed.
list without a lock")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which
And this should be "atomic context" as pdr notifications are done from a worker thread.
isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference. This does however result in the user being informed about this error by the following entry in the kernel log:
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: 635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients list without a lock")
Fixes: 9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
@@ -269,11 +284,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->pd_running = state == SERVREG_SERVICE_STATE_UP;
Add parentheses for readability?
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work);
} static void pmic_glink_ucsi_destroy(void *data) @@ -320,6 +336,7 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, INIT_WORK(&ucsi->register_work, pmic_glink_ucsi_register); init_completion(&ucsi->read_ack); init_completion(&ucsi->write_ack);
- spin_lock_init(&ucsi->state_lock); mutex_init(&ucsi->lock);
ucsi->ucsi = ucsi_create(dev, &pmic_glink_ucsi_ops);
Looks good otherwise:
Reviewed-by: Johan Hovold johan+linaro@kernel.org
Johan
When the pmic_glink state is UP and we either receive a protection- domain (PD) notification indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- drivers/soc/qcom/pmic_glink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index e4747f1d3da5..cb202a37e8ab 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -191,7 +191,7 @@ static void pmic_glink_state_notify_clients(struct pmic_glink *pg) if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) new_state = SERVREG_SERVICE_STATE_UP; } else { - if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) + if (pg->pdr_state == SERVREG_SERVICE_STATE_DOWN || !pg->ept) new_state = SERVREG_SERVICE_STATE_DOWN; }
On Mon, Aug 19, 2024 at 01:07:47PM -0700, Bjorn Andersson wrote:
When the pmic_glink state is UP and we either receive a protection- domain (PD) notification indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
@@ -191,7 +191,7 @@ static void pmic_glink_state_notify_clients(struct pmic_glink *pg) if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) new_state = SERVREG_SERVICE_STATE_UP; } else {
if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept)
}if (pg->pdr_state == SERVREG_SERVICE_STATE_DOWN || !pg->ept) new_state = SERVREG_SERVICE_STATE_DOWN;
I guess you could drop the outer conditional
if (pg->client_state != SERVREG_SERVICE_STATE_UP) {
} else {
}
here to make this a bit more readable, but that's for a separate patch.
Reviewed-by: Johan Hovold johan+linaro@kernel.org
Johan
On Tue, Aug 20, 2024 at 09:07:10AM +0200, Johan Hovold wrote:
On Mon, Aug 19, 2024 at 01:07:47PM -0700, Bjorn Andersson wrote:
When the pmic_glink state is UP and we either receive a protection- domain (PD) notification indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
And I believe you meant
s/with/when/
in the patch Subject.
Johan
On Mon, Aug 19, 2024 at 01:07:44PM -0700, Bjorn Andersson wrote:
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong.
So, let's fix all three.
Changes in v2:
- Refer to the correct commit in the ucsi_unregister() patch.
- Updated wording in the same commit message about the new error message in the log.
- Changed the data type of the introduced state variables, opted to go for a bool as we only represent two states (and I would like to further clean this up going forward)
- Initialized the spinlock
- Link to v1: https://lore.kernel.org/r/20240818-pmic-glink-v6-11-races-v1-0-f87c577e0bc9@...
Bjorn Andersson (3): soc: qcom: pmic_glink: Fix race during initialization usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Actually communicate with remote goes down
Tested-by: Johan Hovold johan+linaro@kernel.org
linux-stable-mirror@lists.linaro.org