[CCing Mario, who asked for the two suspected commits to be backported]
On 05.05.24 03:12, Micha Albert wrote:
I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. In 6.8.7, this works as expected, and my Plymouth screen (including the LUKS password prompt) shows on my 2 monitors connected to the GPU as well as my main laptop screen. Upon entering the password, I'm put into userspace as expected. However, upon upgrading to 6.8.8, I will be greeted with the regular password prompt, but after entering my password and waiting for it to be accepted, my eGPU will reset and not function. I can tell that it resets since I can hear the click of my ATX power supply turning off and on again, and the status LED of the eGPU board goes from green to blue and back to green, all in less than a second.
I talked to a friend, and we found out that the kernel parameter thunderbolt.host_reset=false fixes the issue. He also thinks that commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look suspicious. I've attached the output of dmesg when the error was occurring, since I'm still able to use my laptop normally when this happens, just not with my eGPU and its connected displays.
Thx for the report. Could you please test if 6.9-rc6 (or a later snapshot; or -rc7, which should be out in about ~18 hours) is affected as well? That would be really important to know.
It would also be great if you could try reverting the two patches you mentioned and see if they are really what's causing this. There iirc are two more; maybe you might need to revert some or all of them in the order they were applied.
Ciao, Thorsten
P.s.: To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.8.7..v6.8.8 #regzbot title thunderbolt: eGPU disconnected during boot
On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote:
[CCing Mario, who asked for the two suspected commits to be backported]
On 05.05.24 03:12, Micha Albert wrote:
I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. In 6.8.7, this works as expected, and my Plymouth screen (including the LUKS password prompt) shows on my 2 monitors connected to the GPU as well as my main laptop screen. Upon entering the password, I'm put into userspace as expected. However, upon upgrading to 6.8.8, I will be greeted with the regular password prompt, but after entering my password and waiting for it to be accepted, my eGPU will reset and not function. I can tell that it resets since I can hear the click of my ATX power supply turning off and on again, and the status LED of the eGPU board goes from green to blue and back to green, all in less than a second.
I talked to a friend, and we found out that the kernel parameter thunderbolt.host_reset=false fixes the issue. He also thinks that commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look suspicious. I've attached the output of dmesg when the error was occurring, since I'm still able to use my laptop normally when this happens, just not with my eGPU and its connected displays.
Thx for the report. Could you please test if 6.9-rc6 (or a later snapshot; or -rc7, which should be out in about ~18 hours) is affected as well? That would be really important to know.
It would also be great if you could try reverting the two patches you mentioned and see if they are really what's causing this. There iirc are two more; maybe you might need to revert some or all of them in the order they were applied.
There are two other things that I think would be good to understand this issue.
1) Is it related to trusted devices handling?
You can try to apply it both to 6.8.y or to 6.9-rc.
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iom...
2) Is it because you have amdgpu in your initramfs but not thunderbolt?
If so; there's very likely an ordering issue.
[ 2.325788] [drm] GPU posting now... [ 30.360701] ACPI: bus type thunderbolt registered
Can you remove amdgpu from your initramfs and wait for it to startup after you pivot rootfs? Does this still happen?
Ciao, Thorsten
P.s.: To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.8.7..v6.8.8 #regzbot title thunderbolt: eGPU disconnected during boot
On 5/5/2024 07:37, Mario Limonciello wrote:
On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote:
[CCing Mario, who asked for the two suspected commits to be backported]
On 05.05.24 03:12, Micha Albert wrote:
I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. In 6.8.7, this works as expected, and my Plymouth screen (including the LUKS password prompt) shows on my 2 monitors connected to the GPU as well as my main laptop screen. Upon entering the password, I'm put into userspace as expected. However, upon upgrading to 6.8.8, I will be greeted with the regular password prompt, but after entering my password and waiting for it to be accepted, my eGPU will reset and not function. I can tell that it resets since I can hear the click of my ATX power supply turning off and on again, and the status LED of the eGPU board goes from green to blue and back to green, all in less than a second.
I talked to a friend, and we found out that the kernel parameter thunderbolt.host_reset=false fixes the issue. He also thinks that commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look suspicious. I've attached the output of dmesg when the error was occurring, since I'm still able to use my laptop normally when this happens, just not with my eGPU and its connected displays.
Thx for the report. Could you please test if 6.9-rc6 (or a later snapshot; or -rc7, which should be out in about ~18 hours) is affected as well? That would be really important to know.
It would also be great if you could try reverting the two patches you mentioned and see if they are really what's causing this. There iirc are two more; maybe you might need to revert some or all of them in the order they were applied.
There are two other things that I think would be good to understand this issue.
- Is it related to trusted devices handling?
You can try to apply it both to 6.8.y or to 6.9-rc.
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iom...
- Is it because you have amdgpu in your initramfs but not thunderbolt?
If so; there's very likely an ordering issue.
[ 2.325788] [drm] GPU posting now... [ 30.360701] ACPI: bus type thunderbolt registered
Can you remove amdgpu from your initramfs and wait for it to startup after you pivot rootfs? Does this still happen?
One more thought. When you say it's "not function", is it authorized in thunderbolt sysfs?
See https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/thun...
Is it showing up in lspci anymore?
Ciao, Thorsten
P.s.: To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.8.7..v6.8.8 #regzbot title thunderbolt: eGPU disconnected during boot
linux-stable-mirror@lists.linaro.org