Hey Mani,
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR interrupt to the host and host starts handling SYS_ERR execution.
So by the time the device has completed reset, host starts SYS_ERR handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check to avoid getting spurious IRQs from the device.
Cc: stable@vger.kernel.org Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up") Reported-by: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
Changes in v3:
- Moved BHI_INTVEC setup after irq setup
- Used interval_us as the delay for the polling API
Changes in v2:
- Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
I tried this v3 patch and I'm not sure if it's working properly in my setup; not all boots are successfully bringing the modem up.
Ouch!
Once I installed it, I kept having this kind of logs on every boot: [ 7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600000fff 64bit] [ 7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002) [ 7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI [ 7.052191] mhi mhi0: Requested to power ON [ 7.168042] mhi mhi0: Power on setup success [ 7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode [ 15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16
[...]
I didn't try the v1 or v2 patches (sorry!), so not sure if the issues come in this last iteration or in an earlier one. Do you want me to try with v1 and v2 as well?
Yes, please. Nothing changed other than moving the BHI_INTVEC programming.
Or if you want to do it quickly, please test the diff below:
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index ee0515a25e46..21484a61bbed 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -1055,7 +1055,9 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mutex_lock(&mhi_cntrl->pm_mutex); mhi_cntrl->pm_state = MHI_PM_DISABLE;
/* Setup BHI INTVEC */ write_lock_irq(&mhi_cntrl->pm_lock);
mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0); mhi_cntrl->pm_state = MHI_PM_POR; mhi_cntrl->ee = MHI_EE_MAX; current_ee = mhi_get_exec_env(mhi_cntrl);
@@ -1094,9 +1096,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) if (ret) goto error_setup_irq;
/* Setup BHI INTVEC */
mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
/* Transition to next state */ next_state = MHI_IN_PBL(current_ee) ? DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
I tested that additional diff on top of v3, and so far so good; I did 5 soft reboots and 5 hard boots and they were all successful.