On 2021-11-25 07:01 PM, Manivannan Sadhasivam wrote:
On Wed, Nov 24, 2021 at 06:52:21PM +0530, Manivannan Sadhasivam wrote:
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR interrupt to the host and host starts handling SYS_ERR execution.
Since device does not actually trigger a SYS_ERR interrupt, we could say "device triggers BHI interrupt acknowledging the MHI RESET sent by host but does not clear the MHI state to reset or ready and leaves it as SYS_ERR" or any such wording.
So by the time the device has completed reset, host starts SYS_ERR handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check to avoid getting spurious IRQs from the device.
Cc: stable@vger.kernel.org Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up") Reported-by: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
Hemant, Bhaumik, Jeff: Can you please do the review again?
Thanks, Mani
Changes in v4:
- Reverted the change that moved BHI_INTVEC as that was causing issue
as reported by Aleksander.
Changes in v3:
- Moved BHI_INTVEC setup after irq setup
- Used interval_us as the delay for the polling API
Changes in v2:
- Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
drivers/bus/mhi/core/pm.c | 33 +++++++++++---------------------- 1 file changed, 11 insertions(+), 22 deletions(-)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index fb99e3727155..21484a61bbed 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -1038,7 +1038,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) enum mhi_ee_type current_ee; enum dev_st_transition next_state; struct device *dev = &mhi_cntrl->mhi_dev->dev;
- u32 val;
- u32 interval_us = 25000; /* poll register field every 25
milliseconds */ int ret;
dev_info(dev, "Requested to power ON\n"); @@ -1055,10 +1055,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mutex_lock(&mhi_cntrl->pm_mutex); mhi_cntrl->pm_state = MHI_PM_DISABLE;
- ret = mhi_init_irq_setup(mhi_cntrl);
- if (ret)
goto error_setup_irq;
- /* Setup BHI INTVEC */ write_lock_irq(&mhi_cntrl->pm_lock); mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
@@ -1072,7 +1068,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) dev_err(dev, "%s is not a valid EE for power on\n", TO_MHI_EXEC_STR(current_ee)); ret = -EIO;
goto error_async_power_up;
goto error_setup_irq;
We're using the label "error_setup_irq" even though there has not been any prior IRQ setup done. Can we use the label "error_async_power_up" instead?
}
state = mhi_get_mhi_state(mhi_cntrl); @@ -1081,20 +1077,12 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
if (state == MHI_STATE_SYS_ERR) { mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
ret = wait_event_timeout(mhi_cntrl->state_event,
MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) ||
mhi_read_reg_field(mhi_cntrl,
mhi_cntrl->regs,
MHICTRL,
MHICTRL_RESET_MASK,
MHICTRL_RESET_SHIFT,
&val) ||
!val,
msecs_to_jiffies(mhi_cntrl->timeout_ms));
if (!ret) {
ret = -EIO;
ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
interval_us);
if (ret) { dev_info(dev, "Failed to reset MHI due to syserr state\n");
goto error_async_power_up;
goto error_setup_irq;
Same here
} /*
@@ -1104,6 +1092,10 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0); }
- ret = mhi_init_irq_setup(mhi_cntrl);
- if (ret)
goto error_setup_irq;
And here
- /* Transition to next state */ next_state = MHI_IN_PBL(current_ee) ? DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
@@ -1116,9 +1108,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
return 0;
-error_async_power_up:
- mhi_deinit_free_irq(mhi_cntrl);
error_setup_irq:
s/error_setup_irq/error_async_power_up
mhi_cntrl->pm_state = MHI_PM_DISABLE; mutex_unlock(&mhi_cntrl->pm_mutex); -- 2.25.1