On Tue, Apr 16, 2024 at 08:49:53AM +0200, Niklas Cassel wrote:
On Tue, Apr 16, 2024 at 11:12:35AM +0530, Manivannan Sadhasivam wrote:
Rockchip platforms use 'GPIO_ACTIVE_HIGH' flag in the devicetree definition for ep_gpio. This means, whatever the logical value set by the driver for the ep_gpio, physical line will output the same logic level.
For instance,
gpiod_set_value_cansleep(rockchip->ep_gpio, 0); --> Level low gpiod_set_value_cansleep(rockchip->ep_gpio, 1); --> Level high
But while requesting the ep_gpio, GPIOD_OUT_HIGH flag is currently used. Now, this also causes the physical line to output 'high' creating trouble for endpoint devices during host reboot.
When host reboot happens, the ep_gpio will initially output 'low' due to the GPIO getting reset to its POR value. Then during host controller probe, it will output 'high' due to GPIOD_OUT_HIGH flag. Then during rockchip_pcie_host_init_port(), it will first output 'low' and then 'high' indicating the completion of controller initialization.
On the endpoint side, each output 'low' of ep_gpio is accounted for PERST# assert and 'high' for PERST# deassert. With the above mentioned flow during host reboot, endpoint will witness below state changes for PERST#:
(1) PERST# assert - GPIO POR state (2) PERST# deassert - GPIOD_OUT_HIGH while requesting GPIO (3) PERST# assert - rockchip_pcie_host_init_port() (4) PERST# deassert - rockchip_pcie_host_init_port()
Now the time interval between (2) and (3) is very short as both happen during the driver probe(), and this results in a race in the endpoint. Because, before completing the PERST# deassertion in (2), endpoint got another PERST# assert in (3).
A proper way to fix this issue is to change the GPIOD_OUT_HIGH flag in (2) to GPIOD_OUT_LOW. Because the usual convention is to request the GPIO with a state corresponding to its 'initial/default' value and let the driver change the state of the GPIO when required.
As per that, the ep_gpio should be requested with GPIOD_OUT_LOW as it corresponds to the POR value of '0' (PERST# assert in the endpoint). Then the driver can change the state of the ep_gpio later in rockchip_pcie_host_init_port() as per the initialization sequence.
This fixes the firmware crash issue in Qcom based modems connected to Rockpro64 based board.
Cc: stable@vger.kernel.org # 4.9 Reported-by: Slark Xiao slark_xiao@163.com Closes: https://lore.kernel.org/mhi/20240402045647.GG2933@thinkpad/ Fixes: e77f847df54c ("PCI: rockchip: Add Rockchip PCIe controller support") Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
Reviewed-by: Niklas Cassel cassel@kernel.org
I sent a similar fix for the DWC-based rockchip driver a few weeks ago: https://lore.kernel.org/linux-pci/20240327152531.814392-1-cassel@kernel.org/
What a coincidence :)
If your fix is picked up, it would be nice if mine got picked up as well, such that both drivers get fixed.
I can see the same issue in drivers/pci/controller/dwc/pcie-histb.c but the severity is high in that. The driver assumes that the PERST# polarity is ACTIVE_LOW while poplar devicetree defines ACTIVE_HIGH [1]. And there is no external polarity inversion in the PCB.
I don't know if anyone ever validated PCIe on that board. I will check internally.
But this situation is not ideal IMO. The drivers and DTs are not consistent w.r.t PERST# and WAKE# handling.
- Mani
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch...