On 19 August 2024 10:05:49 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
On Mon, Aug 19, 2024 at 08:16:25AM +0700, Dmitry Baryshkov wrote:
On 19 August 2024 06:17:38 GMT+07:00, Bjorn Andersson quic_bjorande@quicinc.com wrote:
Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")' moved the pmic_glink client list under a spinlock, as it is accessed by the rpmsg/glink callback, which in turn is invoked from IRQ context.
This means that ucsi_unregister() is now called from IRQ context, which isn't feasible as it's expecting a sleepable context. An effort is under way to get GLINK to invoke its callbacks in a sleepable context, but until then lets schedule the unregistration.
A side effect of this is that ucsi_unregister() can now happen after the remote processor, and thereby the communication link with it, is gone. pmic_glink_send() is amended with a check to avoid the resulting NULL pointer dereference, but it becomes expecting to see a failing send upon shutting down the remote processor (e.g. during a restart following a firmware crash):
ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson quic_bjorande@quicinc.com
drivers/soc/qcom/pmic_glink.c | 10 +++++++++- drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++----- 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 58ec91767d79..e4747f1d3da5 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg;
- int ret;
- return rpmsg_send(pg->ept, data, len);
- mutex_lock(&pg->state_lock);
- if (!pg->ept)
ret = -ECONNRESET;
- else
ret = rpmsg_send(pg->ept, data, len);
- mutex_unlock(&pg->state_lock);
- return ret;
} EXPORT_SYMBOL_GPL(pmic_glink_send);
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c index ac53a81c2a81..a33056eec83d 100644 --- a/drivers/usb/typec/ucsi/ucsi_glink.c +++ b/drivers/usb/typec/ucsi/ucsi_glink.c @@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
struct work_struct notify_work; struct work_struct register_work;
spinlock_t state_lock;
unsigned int pdr_state;
unsigned int new_pdr_state;
u8 read_buf[UCSI_BUF_SIZE];
}; @@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work) static void pmic_glink_ucsi_register(struct work_struct *work) { struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
- unsigned long flags;
- unsigned int new_state;
- spin_lock_irqsave(&ucsi->state_lock, flags);
- new_state = ucsi->new_pdr_state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
if (new_state == SERVREG_SERVICE_STATE_UP)
ucsi_register(ucsi->ucsi);
- } else {
if (new_state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- }
- ucsi_register(ucsi->ucsi);
- ucsi->pdr_state = new_state;
}
Is there a chance if a race condition if the firmware is restarted quickly, but the system is under heavy mist:
- the driver gets DOWN event, updates the state and schedules the work,
- the work starts to execute, reads the state,
- the driver gets UP event, updates the state, but the work is not rescheduled as it is still executing
- the worker finishes unregistering the UCSI.
I was under the impression that if we reach the point where we start executing the worker, then a second schedule_work() would cause the worker to run again. But I might be mistaken here.
I don't have full source code at hand and the docs only speak about being queued, so it is perfectly possible that I am mistaken here.
What I do expect though is that if we for some reason don't start executing the work before the state becomes UP again, the UCSI core wouldn't know that the firmware has been reset.
My proposal is to accept this risk for v6.11 (and get the benefit of things actually working) and then take a new swing at getting rid of all these workers for v6.12/13. Does that sound reasonable?
Yes, makes sense to me.
Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
Regards, Bjorn
static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) @@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv) static void pmic_glink_ucsi_pdr_notify(void *priv, int state) { struct pmic_glink_ucsi *ucsi = priv;
- unsigned long flags;
- if (state == SERVREG_SERVICE_STATE_UP)
schedule_work(&ucsi->register_work);
- else if (state == SERVREG_SERVICE_STATE_DOWN)
ucsi_unregister(ucsi->ucsi);
- spin_lock_irqsave(&ucsi->state_lock, flags);
- ucsi->new_pdr_state = state;
- spin_unlock_irqrestore(&ucsi->state_lock, flags);
- schedule_work(&ucsi->register_work);
}
static void pmic_glink_ucsi_destroy(void *data)