Hi, this is your Linux kernel regression tracker speaking.
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616 :
Andreas 2022-10-22 14:25:32 UTC
Created attachment 303074 [details] dmesg
6.0.2 works.
On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
[reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
Created attachment 303075 [details] my kernel .config for 6.0.3
Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
[reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
In /var/log/Xorg.0.log the only obvious difference is the last line: ---- snap randr: falling back to unsynchronized pixmap sharing ---- snap The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
(Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
[reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad) Author: Thomas Zimmermann tzimmermann@suse.de Date: Mon Jul 18 09:23:18 2022 +0200
video/aperture: Disable and unregister sysfb devices via aperture helpers
[ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ] Call sysfb_disable() before removing conflicting devices in aperture helpers. Fixes sysfb state if fbdev has been disabled. Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Javier Martinez Canillas javierm@redhat.com Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
[reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
Link to the suspect patch:
https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmerm... (or https://patchwork.freedesktop.org/patch/494608/)
[reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
While still running, after > 15 seconds, the fault looked like this (dmesg): ---- snap ---- rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace:
<TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace: <TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
See the ticket for more details.
BTW, let me use this mail to also add the report to the list of tracked regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: cfecfc98a78d9 https://bugzilla.kernel.org/show_bug.cgi?id=216616 #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Hi
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
Hi, this is your Linux kernel regression tracker speaking.
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616 :
Andreas 2022-10-22 14:25:32 UTC
Created attachment 303074 [details] dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Best regards Thomas
6.0.2 works.
On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
[reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
Created attachment 303075 [details] my kernel .config for 6.0.3
Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
[reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
In /var/log/Xorg.0.log the only obvious difference is the last line: ---- snap randr: falling back to unsynchronized pixmap sharing ---- snap The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
(Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
[reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad) Author: Thomas Zimmermann tzimmermann@suse.de Date: Mon Jul 18 09:23:18 2022 +0200
video/aperture: Disable and unregister sysfb devices via aperture helpers [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ] Call sysfb_disable() before removing conflicting devices in aperture helpers. Fixes sysfb state if fbdev has been disabled. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
[reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
Link to the suspect patch:
https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmerm... (or https://patchwork.freedesktop.org/patch/494608/)
[reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
While still running, after > 15 seconds, the fault looked like this (dmesg): ---- snap ---- rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace:
<TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace: <TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
See the ticket for more details.
BTW, let me use this mail to also add the report to the list of tracked regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: cfecfc98a78d9 https://bugzilla.kernel.org/show_bug.cgi?id=216616 #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
Hi! Thx for the reply.
On 24.10.22 12:26, Thomas Zimmermann wrote:
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616%C2%A0 :
Andreas 2022-10-22 14:25:32 UTC
Created attachment 303074 [details] dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?
Ciao, Thorsten
6.0.2 works.
On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
[reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
Created attachment 303075 [details] my kernel .config for 6.0.3
Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
[reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
In /var/log/Xorg.0.log the only obvious difference is the last line: ---- snap randr: falling back to unsynchronized pixmap sharing ---- snap The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
(Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
[reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad) Author: Thomas Zimmermann tzimmermann@suse.de Date: Mon Jul 18 09:23:18 2022 +0200
video/aperture: Disable and unregister sysfb devices via aperture helpers [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ] Call sysfb_disable() before removing conflicting devices in aperture helpers. Fixes sysfb state if fbdev has been disabled. Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Javier Martinez Canillas javierm@redhat.com Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
[reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
Link to the suspect patch:
https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmerm... (or https://patchwork.freedesktop.org/patch/494608/)
[reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
While still running, after > 15 seconds, the fault looked like this (dmesg): ---- snap ---- rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace: <TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace: <TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
See the ticket for more details.
BTW, let me use this mail to also add the report to the list of tracked regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: cfecfc98a78d9 https://bugzilla.kernel.org/show_bug.cgi?id=216616 #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
Hi! Thx for the reply.
On 24.10.22 12:26, Thomas Zimmermann wrote:
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616%C2%A0 :
Andreas 2022-10-22 14:25:32 UTC
Created attachment 303074 [details] dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?
This should be good enough. If this does NOT fix the issue, please let me know.
thanks,
greg k-h
Hi
Am 24.10.22 um 13:27 schrieb Greg KH:
On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
Hi! Thx for the reply.
On 24.10.22 12:26, Thomas Zimmermann wrote:
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616%C2%A0 :
Andreas 2022-10-22 14:25:32 UTC
Created attachment 303074 [details] dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?
This should be good enough. If this does NOT fix the issue, please let me know.
Thanks a lot. I think I can provided a dedicated fix if the proposed commit doesn't work.
Best regards Thomas
thanks,
greg k-h
Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
Hi
Am 24.10.22 um 13:27 schrieb Greg KH:
On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
Hi! Thx for the reply.
On 24.10.22 12:26, Thomas Zimmermann wrote:
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616%C2%A0 :
Andreas 2022-10-22 14:25:32 UTC
Created attachment 303074 [details] dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?
This should be good enough. If this does NOT fix the issue, please let me know.
Thanks a lot. I think I can provided a dedicated fix if the proposed commit doesn't work.
Best regards Thomas
thanks,
greg k-h
Thanks... In short: the additional patch did NOT fix the problem.
I don't use git and I don't know how to /cherry-pick commit/ 9d69ef183815, but I found the patch here: https://patchwork.freedesktop.org/patch/494609/
I hope that's the right one. I reintegrated v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch and also applied v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch, did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel (same .config).
Now the system doesn't even boot to a console. The first boot got me to a rcu_shed stall on CPUs/tasks, same as above, but this time with: Workqueue: btrfs-cache btrfs_work_helper
I booted a second time with the same kernel, and it got stuck after mounting the root btrfs filesystem (what looked like a total freeze, but when it didn't show a rcu_stall message after ~2 min I got impatient and wanted to see if I had just busted my root filesystem...)
I booted 6.0.2 and everything is fine. (I'm very glad! I definitely should update my backup right away!)
I will try 6.1-rc1 next, bear with...
Hi Andreas
Am 24.10.22 um 18:19 schrieb Andreas Thalhammer:
Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
Hi
Am 24.10.22 um 13:27 schrieb Greg KH:
On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
Hi! Thx for the reply.
On 24.10.22 12:26, Thomas Zimmermann wrote:
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616%C2%A0 :
> Andreas 2022-10-22 14:25:32 UTC > > Created attachment 303074 [details] > dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?
This should be good enough. If this does NOT fix the issue, please let me know.
Thanks a lot. I think I can provided a dedicated fix if the proposed commit doesn't work.
Best regards Thomas
thanks,
greg k-h
Thanks... In short: the additional patch did NOT fix the problem.
Yeah, it's also part of a larger changeset. But I wouldn't want to backport all those changes either.
Attached is a simple patch for linux-stable that adds the necessary fix. If this still doesn't work, we should probably revert the problematic patch.
Please test the patch and let me know if it works.
Best regards Thomas
I don't use git and I don't know how to /cherry-pick commit/ 9d69ef183815, but I found the patch here: https://patchwork.freedesktop.org/patch/494609/
I hope that's the right one. I reintegrated v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch and also applied v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch, did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel (same .config).
Now the system doesn't even boot to a console. The first boot got me to a rcu_shed stall on CPUs/tasks, same as above, but this time with: Workqueue: btrfs-cache btrfs_work_helper
I booted a second time with the same kernel, and it got stuck after mounting the root btrfs filesystem (what looked like a total freeze, but when it didn't show a rcu_stall message after ~2 min I got impatient and wanted to see if I had just busted my root filesystem...)
I booted 6.0.2 and everything is fine. (I'm very glad! I definitely should update my backup right away!)
I will try 6.1-rc1 next, bear with...
Am 25.10.22 um 10:16 schrieb Thomas Zimmermann:
Hi Andreas
Am 24.10.22 um 18:19 schrieb Andreas Thalhammer:
Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:
Hi
Am 24.10.22 um 13:27 schrieb Greg KH:
On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:
Hi! Thx for the reply.
On 24.10.22 12:26, Thomas Zimmermann wrote:
Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis: > > I noticed a regression report in bugzilla.kernel.org. As many > (most?) > kernel developer don't keep an eye on it, I decided to forward it by > mail. Quoting from > https://bugzilla.kernel.org/show_bug.cgi?id=216616%C2%A0 : > >> Andreas 2022-10-22 14:25:32 UTC >> >> Created attachment 303074 [details] >> dmesg
I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.
Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.
Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?
This should be good enough. If this does NOT fix the issue, please let me know.
Thanks a lot. I think I can provided a dedicated fix if the proposed commit doesn't work.
Best regards Thomas
thanks,
greg k-h
Thanks... In short: the additional patch did NOT fix the problem.
Yeah, it's also part of a larger changeset. But I wouldn't want to backport all those changes either.
Attached is a simple patch for linux-stable that adds the necessary fix. If this still doesn't work, we should probably revert the problematic patch.
Please test the patch and let me know if it works.
Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all fine.
Thanks! Andreas
Best regards Thomas
I don't use git and I don't know how to /cherry-pick commit/ 9d69ef183815, but I found the patch here: https://patchwork.freedesktop.org/patch/494609/
I hope that's the right one. I reintegrated v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch and also applied v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch, did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel (same .config).
Now the system doesn't even boot to a console. The first boot got me to a rcu_shed stall on CPUs/tasks, same as above, but this time with: Workqueue: btrfs-cache btrfs_work_helper
I booted a second time with the same kernel, and it got stuck after mounting the root btrfs filesystem (what looked like a total freeze, but when it didn't show a rcu_stall message after ~2 min I got impatient and wanted to see if I had just busted my root filesystem...)
I booted 6.0.2 and everything is fine. (I'm very glad! I definitely should update my backup right away!)
I will try 6.1-rc1 next, bear with...
Hi
Am 25.10.22 um 10:45 schrieb Andreas Thalhammer: [...]
Yeah, it's also part of a larger changeset. But I wouldn't want to backport all those changes either.
Attached is a simple patch for linux-stable that adds the necessary fix. If this still doesn't work, we should probably revert the problematic patch.
Please test the patch and let me know if it works.
Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all fine.
Thanks a lot for testing. If Greg doesn't already pick up the patch from this discussion, I'll send it to stable soonish; adding your Tested-by tag.
Best regards Thomas
Thanks! Andreas
Best regards Thomas
I don't use git and I don't know how to /cherry-pick commit/ 9d69ef183815, but I found the patch here: https://patchwork.freedesktop.org/patch/494609/
I hope that's the right one. I reintegrated v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch and also applied v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch, did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel (same .config).
Now the system doesn't even boot to a console. The first boot got me to a rcu_shed stall on CPUs/tasks, same as above, but this time with: Workqueue: btrfs-cache btrfs_work_helper
I booted a second time with the same kernel, and it got stuck after mounting the root btrfs filesystem (what looked like a total freeze, but when it didn't show a rcu_stall message after ~2 min I got impatient and wanted to see if I had just busted my root filesystem...)
I booted 6.0.2 and everything is fine. (I'm very glad! I definitely should update my backup right away!)
I will try 6.1-rc1 next, bear with...
On Tue, Oct 25, 2022 at 11:21:57AM +0200, Thomas Zimmermann wrote:
Hi
Am 25.10.22 um 10:45 schrieb Andreas Thalhammer: [...]
Yeah, it's also part of a larger changeset. But I wouldn't want to backport all those changes either.
Attached is a simple patch for linux-stable that adds the necessary fix. If this still doesn't work, we should probably revert the problematic patch.
Please test the patch and let me know if it works.
Yes, this fixed the problem. I'm running 6.0.3 with your patch now, all fine.
Thanks a lot for testing. If Greg doesn't already pick up the patch from this discussion, I'll send it to stable soonish; adding your Tested-by tag.
Please send it as a real patch.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org