Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in the works that makes the GLINK callback happen in a sleepable context, which would remove the need for the clients list to be protected by a spinlock, and removing the work scheduling. This is however not -rc material...
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue.
Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- Bjorn Andersson (3): soc: qcom: pmic_glink: Fix race during initialization usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Actually communicate with remote goes down
drivers/power/supply/qcom_battmgr.c | 16 ++++++++----- drivers/soc/qcom/pmic_glink.c | 40 +++++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++----- drivers/usb/typec/ucsi/ucsi_glink.c | 44 ++++++++++++++++++++++++++--------- include/linux/soc/qcom/pmic_glink.h | 11 +++++---- 5 files changed, 88 insertions(+), 40 deletions(-) --- base-commit: 296c871d2904cff2b4742702ef94512ab467a8e3 change-id: 20240818-pmic-glink-v6-11-races-363f5964c339
Best regards,
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced.
Timeline provided by Stephen: CPU0 CPU1 ---- ---- ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir amit.pundir@linaro.org Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX1... Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd swboyd@chromium.org Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaE... Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- drivers/power/supply/qcom_battmgr.c | 16 ++++++++++------ drivers/soc/qcom/pmic_glink.c | 28 ++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++++------ drivers/usb/typec/ucsi/ucsi_glink.c | 16 ++++++++++------ include/linux/soc/qcom/pmic_glink.h | 11 ++++++----- 5 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c index 49bef4a5ac3f..df90a470c51a 100644 --- a/drivers/power/supply/qcom_battmgr.c +++ b/drivers/power/supply/qcom_battmgr.c @@ -1387,12 +1387,16 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev, "failed to register wireless charing power supply\n"); }
- battmgr->client = devm_pmic_glink_register_client(dev, - PMIC_GLINK_OWNER_BATTMGR, - qcom_battmgr_callback, - qcom_battmgr_pdr_notify, - battmgr); - return PTR_ERR_OR_ZERO(battmgr->client); + battmgr->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_BATTMGR, + qcom_battmgr_callback, + qcom_battmgr_pdr_notify, + battmgr); + if (IS_ERR(battmgr->client)) + return PTR_ERR(battmgr->client); + + pmic_glink_register_client(battmgr->client); + + return 0; }
static const struct auxiliary_device_id qcom_battmgr_id_table[] = { diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9ebc0ba35947..58ec91767d79 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -66,15 +66,14 @@ static void _devm_pmic_glink_release_client(struct device *dev, void *res) spin_unlock_irqrestore(&pg->client_lock, flags); }
-struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, - unsigned int id, - void (*cb)(const void *, size_t, void *), - void (*pdr)(void *, int), - void *priv) +struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev, + unsigned int id, + void (*cb)(const void *, size_t, void *), + void (*pdr)(void *, int), + void *priv) { struct pmic_glink_client *client; struct pmic_glink *pg = dev_get_drvdata(dev->parent); - unsigned long flags;
client = devres_alloc(_devm_pmic_glink_release_client, sizeof(*client), GFP_KERNEL); if (!client) @@ -85,6 +84,18 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, client->cb = cb; client->pdr_notify = pdr; client->priv = priv; + INIT_LIST_HEAD(&client->node); + + devres_add(dev, client); + + return client; +} +EXPORT_SYMBOL_GPL(devm_pmic_glink_new_client); + +void pmic_glink_register_client(struct pmic_glink_client *client) +{ + struct pmic_glink *pg = client->pg; + unsigned long flags;
mutex_lock(&pg->state_lock); spin_lock_irqsave(&pg->client_lock, flags); @@ -95,11 +106,8 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, spin_unlock_irqrestore(&pg->client_lock, flags); mutex_unlock(&pg->state_lock);
- devres_add(dev, client); - - return client; } -EXPORT_SYMBOL_GPL(devm_pmic_glink_register_client); +EXPORT_SYMBOL_GPL(pmic_glink_register_client);
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c index 1e0808b3cb93..e4f5059256e5 100644 --- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -520,12 +520,17 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev, return ret; }
- altmode->client = devm_pmic_glink_register_client(dev, - altmode->owner_id, - pmic_glink_altmode_callback, - pmic_glink_altmode_pdr_notify, - altmode); - return PTR_ERR_OR_ZERO(altmode->client); + altmode->client = devm_pmic_glink_new_client(dev, + altmode->owner_id, + pmic_glink_altmode_callback, + pmic_glink_altmode_pdr_notify, + altmode); + if (IS_ERR(altmode->client)) + return PTR_ERR(altmode->client); + + pmic_glink_register_client(altmode->client); + + return 0; }
static const struct auxiliary_device_id pmic_glink_altmode_id_table[] = { diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 16c328497e0b..ac53a81c2a81 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -367,12 +367,16 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, ucsi->port_orientation[port] = desc; }
- ucsi->client = devm_pmic_glink_register_client(dev, - PMIC_GLINK_OWNER_USBC, - pmic_glink_ucsi_callback, - pmic_glink_ucsi_pdr_notify, - ucsi); - return PTR_ERR_OR_ZERO(ucsi->client); + ucsi->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_USBC, + pmic_glink_ucsi_callback, + pmic_glink_ucsi_pdr_notify, + ucsi); + if (IS_ERR(ucsi->client)) + return PTR_ERR(ucsi->client); + + pmic_glink_register_client(ucsi->client); + + return 0; }
static void pmic_glink_ucsi_remove(struct auxiliary_device *adev) diff --git a/include/linux/soc/qcom/pmic_glink.h b/include/linux/soc/qcom/pmic_glink.h index fd124aa18c81..aedde76d7e13 100644 --- a/include/linux/soc/qcom/pmic_glink.h +++ b/include/linux/soc/qcom/pmic_glink.h @@ -23,10 +23,11 @@ struct pmic_glink_hdr {
int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len);
-struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, - unsigned int id, - void (*cb)(const void *, size_t, void *), - void (*pdr)(void *, int), - void *priv); +struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev, + unsigned int id, + void (*cb)(const void *, size_t, void *), + void (*pdr)(void *, int), + void *priv); +void pmic_glink_register_client(struct pmic_glink_client *client);
#endif
On 19/08/2024 01:17, Bjorn Andersson wrote:
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced.
Timeline provided by Stephen: CPU0 CPU1
ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir amit.pundir@linaro.org Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX1... Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd swboyd@chromium.org Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaE... Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/power/supply/qcom_battmgr.c | 16 ++++++++++------ drivers/soc/qcom/pmic_glink.c | 28 ++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++++------ drivers/usb/typec/ucsi/ucsi_glink.c | 16 ++++++++++------ include/linux/soc/qcom/pmic_glink.h | 11 ++++++----- 5 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c index 49bef4a5ac3f..df90a470c51a 100644 --- a/drivers/power/supply/qcom_battmgr.c +++ b/drivers/power/supply/qcom_battmgr.c @@ -1387,12 +1387,16 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev, "failed to register wireless charing power supply\n"); }
- battmgr->client = devm_pmic_glink_register_client(dev,
PMIC_GLINK_OWNER_BATTMGR,
qcom_battmgr_callback,
qcom_battmgr_pdr_notify,
battmgr);
- return PTR_ERR_OR_ZERO(battmgr->client);
- battmgr->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_BATTMGR,
qcom_battmgr_callback,
qcom_battmgr_pdr_notify,
battmgr);
- if (IS_ERR(battmgr->client))
return PTR_ERR(battmgr->client);
- pmic_glink_register_client(battmgr->client);
- return 0; }
static const struct auxiliary_device_id qcom_battmgr_id_table[] = { diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9ebc0ba35947..58ec91767d79 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -66,15 +66,14 @@ static void _devm_pmic_glink_release_client(struct device *dev, void *res) spin_unlock_irqrestore(&pg->client_lock, flags); } -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
{ struct pmic_glink_client *client; struct pmic_glink *pg = dev_get_drvdata(dev->parent);void *priv)
- unsigned long flags;
client = devres_alloc(_devm_pmic_glink_release_client, sizeof(*client), GFP_KERNEL); if (!client) @@ -85,6 +84,18 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, client->cb = cb; client->pdr_notify = pdr; client->priv = priv;
- INIT_LIST_HEAD(&client->node);
- devres_add(dev, client);
- return client;
+} +EXPORT_SYMBOL_GPL(devm_pmic_glink_new_client);
+void pmic_glink_register_client(struct pmic_glink_client *client) +{
- struct pmic_glink *pg = client->pg;
- unsigned long flags;
mutex_lock(&pg->state_lock); spin_lock_irqsave(&pg->client_lock, flags); @@ -95,11 +106,8 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, spin_unlock_irqrestore(&pg->client_lock, flags); mutex_unlock(&pg->state_lock);
- devres_add(dev, client);
- return client; }
-EXPORT_SYMBOL_GPL(devm_pmic_glink_register_client); +EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c index 1e0808b3cb93..e4f5059256e5 100644 --- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -520,12 +520,17 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev, return ret; }
- altmode->client = devm_pmic_glink_register_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
pmic_glink_altmode_pdr_notify,
altmode);
- return PTR_ERR_OR_ZERO(altmode->client);
- altmode->client = devm_pmic_glink_new_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
pmic_glink_altmode_pdr_notify,
altmode);
- if (IS_ERR(altmode->client))
return PTR_ERR(altmode->client);
- pmic_glink_register_client(altmode->client);
- return 0; }
static const struct auxiliary_device_id pmic_glink_altmode_id_table[] = { diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 16c328497e0b..ac53a81c2a81 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -367,12 +367,16 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, ucsi->port_orientation[port] = desc; }
- ucsi->client = devm_pmic_glink_register_client(dev,
PMIC_GLINK_OWNER_USBC,
pmic_glink_ucsi_callback,
pmic_glink_ucsi_pdr_notify,
ucsi);
- return PTR_ERR_OR_ZERO(ucsi->client);
- ucsi->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_USBC,
pmic_glink_ucsi_callback,
pmic_glink_ucsi_pdr_notify,
ucsi);
- if (IS_ERR(ucsi->client))
return PTR_ERR(ucsi->client);
- pmic_glink_register_client(ucsi->client);
- return 0; }
static void pmic_glink_ucsi_remove(struct auxiliary_device *adev) diff --git a/include/linux/soc/qcom/pmic_glink.h b/include/linux/soc/qcom/pmic_glink.h index fd124aa18c81..aedde76d7e13 100644 --- a/include/linux/soc/qcom/pmic_glink.h +++ b/include/linux/soc/qcom/pmic_glink.h @@ -23,10 +23,11 @@ struct pmic_glink_hdr { int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len); -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv);
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv);
+void pmic_glink_register_client(struct pmic_glink_client *client); #endif
Reviewed-by: Neil Armstrong neil.armstrong@linaro.org
On Sun, Aug 18, 2024 at 04:17:37PM -0700, Bjorn Andersson wrote:
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized.
The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced.
Timeline provided by Stephen: CPU0 CPU1
ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely.
Reported-by: Amit Pundir amit.pundir@linaro.org Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX1... Reported-by: Johan Hovold johan@kernel.org Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd swboyd@chromium.org Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaE... Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com
drivers/power/supply/qcom_battmgr.c | 16 ++++++++++------ drivers/soc/qcom/pmic_glink.c | 28 ++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++++------ drivers/usb/typec/ucsi/ucsi_glink.c | 16 ++++++++++------ include/linux/soc/qcom/pmic_glink.h | 11 ++++++----- 5 files changed, 55 insertions(+), 33 deletions(-)
diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c index 49bef4a5ac3f..df90a470c51a 100644 --- a/drivers/power/supply/qcom_battmgr.c +++ b/drivers/power/supply/qcom_battmgr.c @@ -1387,12 +1387,16 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev, "failed to register wireless charing power supply\n"); }
- battmgr->client = devm_pmic_glink_register_client(dev,
PMIC_GLINK_OWNER_BATTMGR,
qcom_battmgr_callback,
qcom_battmgr_pdr_notify,
battmgr);
- return PTR_ERR_OR_ZERO(battmgr->client);
- battmgr->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_BATTMGR,
qcom_battmgr_callback,
qcom_battmgr_pdr_notify,
battmgr);
- if (IS_ERR(battmgr->client))
return PTR_ERR(battmgr->client);
- pmic_glink_register_client(battmgr->client);
- return 0;
} static const struct auxiliary_device_id qcom_battmgr_id_table[] = { diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9ebc0ba35947..58ec91767d79 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -66,15 +66,14 @@ static void _devm_pmic_glink_release_client(struct device *dev, void *res) spin_unlock_irqrestore(&pg->client_lock, flags); } -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv)
{ struct pmic_glink_client *client; struct pmic_glink *pg = dev_get_drvdata(dev->parent);
- unsigned long flags;
client = devres_alloc(_devm_pmic_glink_release_client, sizeof(*client), GFP_KERNEL); if (!client) @@ -85,6 +84,18 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, client->cb = cb; client->pdr_notify = pdr; client->priv = priv;
- INIT_LIST_HEAD(&client->node);
- devres_add(dev, client);
- return client;
+} +EXPORT_SYMBOL_GPL(devm_pmic_glink_new_client);
+void pmic_glink_register_client(struct pmic_glink_client *client) +{
- struct pmic_glink *pg = client->pg;
- unsigned long flags;
mutex_lock(&pg->state_lock); spin_lock_irqsave(&pg->client_lock, flags); @@ -95,11 +106,8 @@ struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev, spin_unlock_irqrestore(&pg->client_lock, flags); mutex_unlock(&pg->state_lock);
- devres_add(dev, client);
- return client;
} -EXPORT_SYMBOL_GPL(devm_pmic_glink_register_client); +EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c index 1e0808b3cb93..e4f5059256e5 100644 --- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -520,12 +520,17 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev, return ret; }
- altmode->client = devm_pmic_glink_register_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
pmic_glink_altmode_pdr_notify,
altmode);
- return PTR_ERR_OR_ZERO(altmode->client);
- altmode->client = devm_pmic_glink_new_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
pmic_glink_altmode_pdr_notify,
altmode);
- if (IS_ERR(altmode->client))
return PTR_ERR(altmode->client);
- pmic_glink_register_client(altmode->client);
- return 0;
} static const struct auxiliary_device_id pmic_glink_altmode_id_table[] = { diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index 16c328497e0b..ac53a81c2a81 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -367,12 +367,16 @@ static int pmic_glink_ucsi_probe(struct auxiliary_device *adev, ucsi->port_orientation[port] = desc; }
- ucsi->client = devm_pmic_glink_register_client(dev,
PMIC_GLINK_OWNER_USBC,
pmic_glink_ucsi_callback,
pmic_glink_ucsi_pdr_notify,
ucsi);
- return PTR_ERR_OR_ZERO(ucsi->client);
- ucsi->client = devm_pmic_glink_new_client(dev, PMIC_GLINK_OWNER_USBC,
pmic_glink_ucsi_callback,
pmic_glink_ucsi_pdr_notify,
ucsi);
- if (IS_ERR(ucsi->client))
return PTR_ERR(ucsi->client);
- pmic_glink_register_client(ucsi->client);
- return 0;
} static void pmic_glink_ucsi_remove(struct auxiliary_device *adev) diff --git a/include/linux/soc/qcom/pmic_glink.h b/include/linux/soc/qcom/pmic_glink.h index fd124aa18c81..aedde76d7e13 100644 --- a/include/linux/soc/qcom/pmic_glink.h +++ b/include/linux/soc/qcom/pmic_glink.h @@ -23,10 +23,11 @@ struct pmic_glink_hdr { int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len); -struct pmic_glink_client *devm_pmic_glink_register_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv);
+struct pmic_glink_client *devm_pmic_glink_new_client(struct device *dev,
unsigned int id,
void (*cb)(const void *, size_t, void *),
void (*pdr)(void *, int),
void *priv);
+void pmic_glink_register_client(struct pmic_glink_client *client); #endif
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg; + int ret;
- return rpmsg_send(pg->ept, data, len); + mutex_lock(&pg->state_lock); + if (!pg->ept) + ret = -ECONNRESET; + else + ret = rpmsg_send(pg->ept, data, len); + mutex_unlock(&pg->state_lock); + + return ret; } EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work; struct work_struct register_work; + spinlock_t state_lock; + unsigned int pdr_state; + unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE]; }; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work); + unsigned long flags; + unsigned int new_state; + + spin_lock_irqsave(&ucsi->state_lock, flags); + new_state = ucsi->new_pdr_state; + spin_unlock_irqrestore(&ucsi->state_lock, flags); + + if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) { + if (new_state == SERVREG_SERVICE_STATE_UP) + ucsi_register(ucsi->ucsi); + } else { + if (new_state == SERVREG_SERVICE_STATE_DOWN) + ucsi_unregister(ucsi->ucsi); + }
- ucsi_register(ucsi->ucsi); + ucsi->pdr_state = new_state; }
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv; + unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP) - schedule_work(&ucsi->register_work); - else if (state == SERVREG_SERVICE_STATE_DOWN) - ucsi_unregister(ucsi->ucsi); + spin_lock_irqsave(&ucsi->state_lock, flags); + ucsi->new_pdr_state = state; + spin_unlock_irqrestore(&ucsi->state_lock, flags); + schedule_work(&ucsi->register_work); }
static void pmic_glink_ucsi_destroy(void *data)
On 19 August 2024 06:17:38 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg;
- int ret;
- return rpmsg_send(pg->ept, data, len);
- mutex_lock(&pg->state_lock);
- if (!pg->ept)
ret = -ECONNRESET;
- else
ret = rpmsg_send(pg->ept, data, len);
- mutex_unlock(&pg->state_lock);
- return ret;
} EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work; struct work_struct register_work;
spinlock_t state_lock;
unsigned int pdr_state;
unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE];
}; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
}
Is there a chance if a race condition if the firmware is restarted quickly, but the system is under heavy mist: - the driver gets DOWN event, updates the state and schedules the work, - the work starts to execute, reads the state, - the driver gets UP event, updates the state, but the work is not rescheduled as it is still executing - the worker finishes unregistering the UCSI.
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->new_pdr_state = state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work);
}
static void pmic_glink_ucsi_destroy(void *data)
On Mon, Aug 19, 2024 at 08:16:25AM +0700, Dmitry Baryshkov wrote:
On 19 August 2024 06:17:38 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg;
- int ret;
- return rpmsg_send(pg->ept, data, len);
- mutex_lock(&pg->state_lock);
- if (!pg->ept)
ret = -ECONNRESET;
- else
ret = rpmsg_send(pg->ept, data, len);
- mutex_unlock(&pg->state_lock);
- return ret;
} EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work; struct work_struct register_work;
spinlock_t state_lock;
unsigned int pdr_state;
unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE];
}; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
}
Is there a chance if a race condition if the firmware is restarted quickly, but the system is under heavy mist:
- the driver gets DOWN event, updates the state and schedules the work,
- the work starts to execute, reads the state,
- the driver gets UP event, updates the state, but the work is not rescheduled as it is still executing
- the worker finishes unregistering the UCSI.
I was under the impression that if we reach the point where we start executing the worker, then a second schedule_work() would cause the worker to run again. But I might be mistaken here.
What I do expect though is that if we for some reason don't start executing the work before the state becomes UP again, the UCSI core wouldn't know that the firmware has been reset.
My proposal is to accept this risk for v6.11 (and get the benefit of things actually working) and then take a new swing at getting rid of all these workers for v6.12/13. Does that sound reasonable?
Regards, Bjorn
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->new_pdr_state = state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work);
}
static void pmic_glink_ucsi_destroy(void *data)
On 19 August 2024 10:05:49 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
On Mon, Aug 19, 2024 at 08:16:25AM +0700, Dmitry Baryshkov wrote:
On 19 August 2024 06:17:38 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg;
- int ret;
- return rpmsg_send(pg->ept, data, len);
- mutex_lock(&pg->state_lock);
- if (!pg->ept)
ret = -ECONNRESET;
- else
ret = rpmsg_send(pg->ept, data, len);
- mutex_unlock(&pg->state_lock);
- return ret;
} EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work; struct work_struct register_work;
spinlock_t state_lock;
unsigned int pdr_state;
unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE];
}; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
}
Is there a chance if a race condition if the firmware is restarted quickly, but the system is under heavy mist:
- the driver gets DOWN event, updates the state and schedules the work,
- the work starts to execute, reads the state,
- the driver gets UP event, updates the state, but the work is not rescheduled as it is still executing
- the worker finishes unregistering the UCSI.
I was under the impression that if we reach the point where we start executing the worker, then a second schedule_work() would cause the worker to run again. But I might be mistaken here.
I don't have full source code at hand and the docs only speak about being queued, so it is perfectly possible that I am mistaken here.
What I do expect though is that if we for some reason don't start executing the work before the state becomes UP again, the UCSI core wouldn't know that the firmware has been reset.
My proposal is to accept this risk for v6.11 (and get the benefit of things actually working) and then take a new swing at getting rid of all these workers for v6.12/13. Does that sound reasonable?
Yes, makes sense to me.
Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
Regards, Bjorn
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->new_pdr_state = state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work);
}
static void pmic_glink_ucsi_destroy(void *data)
On 19/08/2024 01:17, Bjorn Andersson wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg;
- int ret;
- return rpmsg_send(pg->ept, data, len);
- mutex_lock(&pg->state_lock);
- if (!pg->ept)
ret = -ECONNRESET;
- else
ret = rpmsg_send(pg->ept, data, len);
- mutex_unlock(&pg->state_lock);
- return ret; } EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi { struct work_struct notify_work; struct work_struct register_work;
- spinlock_t state_lock;
- unsigned int pdr_state;
- unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE]; }; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state; }
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->new_pdr_state = state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work); }
static void pmic_glink_ucsi_destroy(void *data)
Looks good, I'll run it on the 8550/8650 platforms
Reviewed-by: Neil Armstrong neil.armstrong@linaro.org
On Sun, Aug 18, 2024 at 04:17:38PM -0700, Bjorn Andersson wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com
drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg;
- int ret;
- return rpmsg_send(pg->ept, data, len);
- mutex_lock(&pg->state_lock);
- if (!pg->ept)
ret = -ECONNRESET;
- else
ret = rpmsg_send(pg->ept, data, len);
- mutex_unlock(&pg->state_lock);
- return ret;
} EXPORT_SYMBOL_GPL(pmic_glink_send); diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi { struct work_struct notify_work; struct work_struct register_work;
- spinlock_t state_lock;
- unsigned int pdr_state;
- unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE]; }; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
} static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->new_pdr_state = state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work);
} static void pmic_glink_ucsi_destroy(void *data)
-- 2.34.1
On Sun, Aug 18, 2024 at 04:17:38PM -0700, Bjorn Andersson wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")'
This commit does not exist, but I think you really meant to refer to
9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
and possibly also
635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients list without a lock")
here.
moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context.
But this is not correct as you say above that the callback has always been made in IRQ context. Then this bug has been there since the introduction of the UCSI driver by commit
62b5412b1f4a ("usb: typec: ucsi: add PMIC Glink UCSI driver")
An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send
Perhaps you can rephrase this bit ("becomes expecting to see").
upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")
So this should be
Fixes: 62b5412b1f4a ("usb: typec: ucsi: add PMIC Glink UCSI driver")
Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi { struct work_struct notify_work; struct work_struct register_work;
- spinlock_t state_lock;
- unsigned int pdr_state;
- unsigned int new_pdr_state;
Should these be int to match the notify callback (and enum servreg_service_state)?
u8 read_buf[UCSI_BUF_SIZE]; }; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
Then int here too.
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
Do you risk a double deregistration (and UAF/double free) here?
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
}
Johan
On Mon, Aug 19, 2024 at 05:06:58PM +0200, Johan Hovold wrote:
On Sun, Aug 18, 2024 at 04:17:38PM -0700, Bjorn Andersson wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")'
This commit does not exist, but I think you really meant to refer to
9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
and possibly also
635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients list without a lock")
here.
Yeah, I copy-pasted the wrong SHA1. Prior to commit 9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping") the PDR notification happened from a worker with only mutexes held.
moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context.
But this is not correct as you say above that the callback has always been made in IRQ context. Then this bug has been there since the introduction of the UCSI driver by commit
No, I'm stating that commit 9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping") was needed because the client list is traversed under the separate glink callback, which has always been made in IRQ context.
62b5412b1f4a ("usb: typec: ucsi: add PMIC Glink UCSI driver")
An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send
Perhaps you can rephrase this bit ("becomes expecting to see").
Sure.
upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")
So this should be
Fixes: 62b5412b1f4a ("usb: typec: ucsi: add PMIC Glink UCSI driver")
I think it should be:
9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi { struct work_struct notify_work; struct work_struct register_work;
- spinlock_t state_lock;
- unsigned int pdr_state;
- unsigned int new_pdr_state;
Should these be int to match the notify callback (and enum servreg_service_state)?
Ohh my. I made it unsigned because I made it unsigned in pmic_glink, when I wrote that. But as you point out, the type passed around is an enum servreg_service_state and it's mostly handled as a signed int.
That said, pmic_glink actually filters the value space down to UP/DOWN, so making this "bool pdr_up" (pd_running?) and "bool ucsi_registered" would make this cleaner...
u8 read_buf[UCSI_BUF_SIZE]; }; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
Then int here too.
Yes.
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
Do you risk a double deregistration (and UAF/double free) here?
I believe we're good.
Thank you, Bjorn
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
}
Johan
On Mon, Aug 19, 2024 at 09:45:29AM -0700, Bjorn Andersson wrote:
On Mon, Aug 19, 2024 at 05:06:58PM +0200, Johan Hovold wrote:
On Sun, Aug 18, 2024 at 04:17:38PM -0700, Bjorn Andersson wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")'
This commit does not exist, but I think you really meant to refer to
9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping")
and possibly also
635ce0db8956 ("soc: qcom: pmic_glink: don't traverse clients list without a lock")
here.
Yeah, I copy-pasted the wrong SHA1. Prior to commit 9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping") the PDR notification happened from a worker with only mutexes held.
moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which
^^^^^^^^^^^
isn't feasible as it's expecting a sleepable context.
But this is not correct as you say above that the callback has always been made in IRQ context. Then this bug has been there since the introduction of the UCSI driver by commit
No, I'm stating that commit 9329933699b3 ("soc: qcom: pmic_glink: Make client-lock non-sleeping") was needed because the client list is traversed under the separate glink callback, which has always been made in IRQ context.
Ok, got it. But then you meant "atomic context", not "IRQ context", in the paragraph above.
Johan
On Sun, Aug 18, 2024 at 04:17:38PM -0700, Bjorn Andersson wrote:
@@ -68,6 +68,9 @@ struct pmic_glink_ucsi { struct work_struct notify_work; struct work_struct register_work;
- spinlock_t state_lock;
You also never initialise this lock...
Lockdep would have let you know with a big splat.
Johan
When the pmic_glink state is UP and we either receive a protection- domain (PD) notifcation indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com --- drivers/soc/qcom/pmic_glink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index e4747f1d3da5..cb202a37e8ab 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -191,7 +191,7 @@ static void pmic_glink_state_notify_clients(struct pmic_glink *pg) if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) new_state = SERVREG_SERVICE_STATE_UP; } else { - if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) + if (pg->pdr_state == SERVREG_SERVICE_STATE_DOWN || !pg->ept) new_state = SERVREG_SERVICE_STATE_DOWN; }
On 19 August 2024 06:17:39 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
When the pmic_glink state is UP and we either receive a protection- domain (PD) notifcation indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
On 19/08/2024 01:17, Bjorn Andersson wrote:
When the pmic_glink state is UP and we either receive a protection- domain (PD) notifcation indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index e4747f1d3da5..cb202a37e8ab 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -191,7 +191,7 @@ static void pmic_glink_state_notify_clients(struct pmic_glink *pg) if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) new_state = SERVREG_SERVICE_STATE_UP; } else {
if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept)
}if (pg->pdr_state == SERVREG_SERVICE_STATE_DOWN || !pg->ept) new_state = SERVREG_SERVICE_STATE_DOWN;
Good catch!
Reviewed-by: Neil Armstrong neil.armstrong@linaro.org
On Sun, Aug 18, 2024 at 04:17:39PM -0700, Bjorn Andersson wrote:
When the pmic_glink state is UP and we either receive a protection- domain (PD) notifcation indicating that the PD is going down, or that the whole remoteproc is going down, it's expected that the pmic_glink client instances are notified that their function has gone DOWN.
This is not what the code does, which results in the client state either not updating, or being wrong in many cases. So let's fix the conditions.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com
drivers/soc/qcom/pmic_glink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index e4747f1d3da5..cb202a37e8ab 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -191,7 +191,7 @@ static void pmic_glink_state_notify_clients(struct pmic_glink *pg) if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept) new_state = SERVREG_SERVICE_STATE_UP; } else {
if (pg->pdr_state == SERVREG_SERVICE_STATE_UP && pg->ept)
}if (pg->pdr_state == SERVREG_SERVICE_STATE_DOWN || !pg->ept) new_state = SERVREG_SERVICE_STATE_DOWN;
On Mon, 19 Aug 2024 at 04:47, Bjorn Andersson quic_bjorande@quicinc.com wrote:
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in the works that makes the GLINK callback happen in a sleepable context, which would remove the need for the clients list to be protected by a spinlock, and removing the work scheduling. This is however not -rc material...
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue.
Thank you for the fixes Bjorn. I'm not able to reproduce that pmic_glink kernel panic on SM8550-HDK anymore.
Tested-by: Amit Pundir amit.pundir@linaro.org
Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
Bjorn Andersson (3): soc: qcom: pmic_glink: Fix race during initialization usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Actually communicate with remote goes down
drivers/power/supply/qcom_battmgr.c | 16 ++++++++----- drivers/soc/qcom/pmic_glink.c | 40 +++++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++----- drivers/usb/typec/ucsi/ucsi_glink.c | 44 ++++++++++++++++++++++++++--------- include/linux/soc/qcom/pmic_glink.h | 11 +++++---- 5 files changed, 88 insertions(+), 40 deletions(-)
base-commit: 296c871d2904cff2b4742702ef94512ab467a8e3 change-id: 20240818-pmic-glink-v6-11-races-363f5964c339
Best regards,
Bjorn Andersson quic_bjorande@quicinc.com
On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote:
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in the works that makes the GLINK callback happen in a sleepable context, which would remove the need for the clients list to be protected by a spinlock, and removing the work scheduling. This is however not -rc material...
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue.
Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
What tree are these to go through? I can take them through mine, but if someone else wants to, feel free to route them some other way.
thanks,
greg k-h
On Mon, Aug 19, 2024 at 04:07:41PM +0200, Greg Kroah-Hartman wrote:
On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote:
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong.
So, let's fix all three.
As mentioned in the commit message for the UCSI fix, I have a series in the works that makes the GLINK callback happen in a sleepable context, which would remove the need for the clients list to be protected by a spinlock, and removing the work scheduling. This is however not -rc material...
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue.
Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
What tree are these to go through? I can take them through mine, but if someone else wants to, feel free to route them some other way.
It's primarily soc/qcom content, so I can pick them through the qcom soc tree.
Regards, Bjorn
thanks,
greg k-h
On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote:
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue.
I can confirm that I still see the -ECANCELED issue with this series applied:
[ 8.979329] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) [ 9.004735] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
Johan
On Mon, Aug 19, 2024 at 05:48:26PM +0200, Johan Hovold wrote:
On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote:
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition).
In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue.
I can confirm that I still see the -ECANCELED issue with this series applied:
[ 8.979329] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) [ 9.004735] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
Could you confirm that you're seeing a call to qcom_glink_handle_intent_req_ack() with granted == 0, leading to the transfer failing.
It would also be nice, just for completeness sake to rule out that you do not get a call to qcom_glink_intent_req_abort() here.
Regards, Bjorn
Johan
On Mon, Aug 19, 2024 at 09:53:09AM -0700, Bjorn Andersson wrote:
On Mon, Aug 19, 2024 at 05:48:26PM +0200, Johan Hovold wrote:
I can confirm that I still see the -ECANCELED issue with this series applied:
[ 8.979329] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) [ 9.004735] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
Could you confirm that you're seeing a call to qcom_glink_handle_intent_req_ack() with granted == 0, leading to the transfer failing.
It appears so:
[ 9.539415] 30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0 [ 9.561750] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
[ 9.448945] 30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0 [ 9.461267] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) [ 9.469241] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1 [ 9.478968] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
It would also be nice, just for completeness sake to rule out that you do not get a call to qcom_glink_intent_req_abort() here.
And I'm not seeing this function being called.
Johan
linux-stable-mirror@lists.linaro.org