On Tue, Jan 23, 2024 at 06:25:52PM +0100, Johan Hovold wrote:
On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
I never got a reply to this one so resending with updated Subject in case it got buried in your inbox.
I did see it but decided it was better to fix the problem with resume causing an unintended reboot, even though fixing that meant breaking lockdep again, since I don't think we have user reports of the potential deadlock lockdep finds.
That may be because I fixed the previous regression in 6.7-rc1 before any users had a chance to hit the deadlock on Qualcomm platforms.
I can easily trigger a deadlock on the X13s by instrumenting 6.7-final with a delay to increase the race window.
And any user hitting this occasionally is likely not going to be able to track it down to this lock inversion (unless they have lockdep enabled).
I agree, it's a problem we need to fix.
08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a start at fixing other problems and also improving the ASPM style, so I hope somebody steps up to fix both it and the lockdep issue. I haven't looked at it enough to have a preference for *how* to fix it.
Ok, but since you were the one introducing the locking regression in 6.7-final shouldn't you look into fixing it?
Especially if there were alternatives to restoring the offending commit which would solve the underlying issue for the resume failure without breaking other platforms.
Did somebody propose an alternate patch? If so, I missed it, but we could look at it now.
I don't want to spend more time on this if the offending commit could simply be reverted.
I don't quite follow. By simply reverting, do you mean to revert f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")? IIUC that would break Michael's machine again.
Bjorn