Hi, this is your Linux kernel regression tracker speaking.
I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616 :
> Andreas 2022-10-22 14:25:32 UTC
>
> Created attachment 303074 [details]
> dmesg
>
> 6.0.2 works.
>
> On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.
>
> Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).
>
> [reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC
>
> Created attachment 303075 [details]
> my kernel .config for 6.0.3
>
> Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.
>
> [reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC
>
> In /var/log/Xorg.0.log the only obvious difference is the last line:
> ---- snap
> randr: falling back to unsynchronized pixmap sharing
> ---- snap
> The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.
>
> (Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)
>
> [reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC
>
> I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:
>
> cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
> commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
> Author: Thomas Zimmermann <tzimmermann(a)suse.de>
> Date: Mon Jul 18 09:23:18 2022 +0200
>
> video/aperture: Disable and unregister sysfb devices via aperture helpers
>
> [ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ]
>
> Call sysfb_disable() before removing conflicting devices in aperture
> helpers. Fixes sysfb state if fbdev has been disabled.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
> Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
> Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
>
> [reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC
>
> Link to the suspect patch:
>
> https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmer…
> (or https://patchwork.freedesktop.org/patch/494608/)
>
> [reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC
>
> Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.
>
> I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).
>
> Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.
>
> While still running, after > 15 seconds, the fault looked like this (dmesg):
> ---- snap ----
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008
> Call Trace:
> <TASK>
> ? commit_tail+0xd7/0x130
> ? drm_atomic_helper_commit+0x126/0x150
> ? drm_atomic_commit+0xa4/0xe0
> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
> ? drm_atomic_helper_dirtyfb+0x19e/0x280
> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? drm_ioctl_kernel+0xc4/0x150
> ? drm_ioctl+0x246/0x3f0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? __x64_sys_ioctl+0x91/0xd0
> ? do_syscall_64+0x60/0xd0
> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008
> Call Trace:
> <TASK>
> ? commit_tail+0xd7/0x130
> ? drm_atomic_helper_commit+0x126/0x150
> ? drm_atomic_commit+0xa4/0xe0
> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
> ? drm_atomic_helper_dirtyfb+0x19e/0x280
> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? drm_ioctl_kernel+0xc4/0x150
> ? drm_ioctl+0x246/0x3f0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? __x64_sys_ioctl+0x91/0xd0
> ? do_syscall_64+0x60/0xd0
> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e
> Call Trace:
> <TASK>
> ? memcpy_toio+0x76/0xc0
> ? drm_fb_memcpy_toio+0x76/0xb0
> ? drm_fb_blit_toio+0x75/0x2b0
> ? simpledrm_simple_display_pipe_update+0x132/0x150
> ? drm_atomic_helper_commit_planes+0xb6/0x230
> ? drm_atomic_helper_commit_tail+0x44/0x80
> ? commit_tail+0xd7/0x130
> ? drm_atomic_helper_commit+0x126/0x150
> ? drm_atomic_commit+0xa4/0xe0
> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
> ? drm_atomic_helper_dirtyfb+0x19e/0x280
> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? drm_ioctl_kernel+0xc4/0x150
> ? drm_ioctl+0x246/0x3f0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? __x64_sys_ioctl+0x91/0xd0
> ? do_syscall_64+0x60/0xd0
> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e
> Call Trace:
> <TASK>
> ? memcpy_toio+0x76/0xc0
> ? memcpy_toio+0x1b/0xc0
> ? drm_fb_memcpy_toio+0x76/0xb0
> ? drm_fb_blit_toio+0x75/0x2b0
> ? simpledrm_simple_display_pipe_update+0x132/0x150
> ? drm_atomic_helper_commit_planes+0xb6/0x230
> ? drm_atomic_helper_commit_tail+0x44/0x80
> ? commit_tail+0xd7/0x130
> ? drm_atomic_helper_commit+0x126/0x150
> ? drm_atomic_commit+0xa4/0xe0
> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
> ? drm_atomic_helper_dirtyfb+0x19e/0x280
> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? drm_ioctl_kernel+0xc4/0x150
> ? drm_ioctl+0x246/0x3f0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? __x64_sys_ioctl+0x91/0xd0
> ? do_syscall_64+0x60/0xd0
> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> </TASK>
> rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/.
> rcu: blocking rcu_node structures (internal RCU debug):
> Task dump for CPU 13:
> task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e
> Call Trace:
> <TASK>
> ? memcpy_toio+0x76/0xc0
> ? memcpy_toio+0x1b/0xc0
> ? drm_fb_memcpy_toio+0x76/0xb0
> ? drm_fb_blit_toio+0x75/0x2b0
> ? simpledrm_simple_display_pipe_update+0x132/0x150
> ? drm_atomic_helper_commit_planes+0xb6/0x230
> ? drm_atomic_helper_commit_tail+0x44/0x80
> ? commit_tail+0xd7/0x130
> ? drm_atomic_helper_commit+0x126/0x150
> ? drm_atomic_commit+0xa4/0xe0
> ? drm_plane_get_damage_clips.cold+0x1c/0x1c
> ? drm_atomic_helper_dirtyfb+0x19e/0x280
> ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? drm_ioctl_kernel+0xc4/0x150
> ? drm_ioctl+0x246/0x3f0
> ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
> ? __x64_sys_ioctl+0x91/0xd0
> ? do_syscall_64+0x60/0xd0
> ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> </TASK>
> traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]
>
See the ticket for more details.
BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: cfecfc98a78d9
https://bugzilla.kernel.org/show_bug.cgi?id=216616
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
From: Thomas Schmitt <scdbackup(a)gmx.net>
Change the return type of function iso_date() from int to time64_t,
to avoid truncating to the 1902..2038 date range.
After this patch, the reported timestamps should fall into the
range reported in the s_time_min/s_time_max fields.
Signed-off-by: Thomas Schmitt <scdbackup(a)gmx.net>
Cc: stable(a)vger.kernel.org
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800627
Fixes: 34be4dbf87fc ("isofs: fix timestamps beyond 2027")
Fixes: 5ad32b3acded ("isofs: Initialize filesystem timestamp ranges")
[arnd: expand changelog text slightly]
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
fs/isofs/isofs.h | 2 +-
fs/isofs/util.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
index dcdc191ed183..c3473ca3f686 100644
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -106,7 +106,7 @@ static inline unsigned int isonum_733(u8 *p)
/* Ignore bigendian datum due to broken mastering programs */
return get_unaligned_le32(p);
}
-extern int iso_date(u8 *, int);
+extern time64_t iso_date(u8 *, int);
struct inode; /* To make gcc happy */
diff --git a/fs/isofs/util.c b/fs/isofs/util.c
index e88dba721661..348af786a8a4 100644
--- a/fs/isofs/util.c
+++ b/fs/isofs/util.c
@@ -16,10 +16,10 @@
* to GMT. Thus we should always be correct.
*/
-int iso_date(u8 *p, int flag)
+time64_t iso_date(u8 *p, int flag)
{
int year, month, day, hour, minute, second, tz;
- int crtime;
+ time64_t crtime;
year = p[0];
month = p[1];
--
2.29.2
The iterator can not be greater than ATC_MAX_DSCR_TRIALS, as the for loop
will stop when i == ATC_MAX_DSCR_TRIALS. While here, use the common "i"
name for the iterator.
Fixes: 93dce3a6434f ("dmaengine: at_hdmac: fix residue computation")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
---
drivers/dma/at_hdmac.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 968a5aba47cd..afcbad3e1718 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -318,7 +318,8 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
struct at_desc *desc_first = atc_first_active(atchan);
struct at_desc *desc;
int ret;
- u32 ctrla, dscr, trials;
+ u32 ctrla, dscr;
+ unsigned int i;
/*
* If the cookie doesn't match to the currently running transfer then
@@ -388,7 +389,7 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
dscr = channel_readl(atchan, DSCR);
rmb(); /* ensure DSCR is read before CTRLA */
ctrla = channel_readl(atchan, CTRLA);
- for (trials = 0; trials < ATC_MAX_DSCR_TRIALS; ++trials) {
+ for (i = 0; i < ATC_MAX_DSCR_TRIALS; ++i) {
u32 new_dscr;
rmb(); /* ensure DSCR is read after CTRLA */
@@ -414,7 +415,7 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
rmb(); /* ensure DSCR is read before CTRLA */
ctrla = channel_readl(atchan, CTRLA);
}
- if (unlikely(trials >= ATC_MAX_DSCR_TRIALS))
+ if (unlikely(i == ATC_MAX_DSCR_TRIALS))
return -ETIMEDOUT;
/* for the first descriptor we can be more accurate */
--
2.25.1
at_hdmac uses __raw_writel for register writes. In the absence of a
barrier, the CPU may reorder the register operations.
Introduce a write memory barrier so that the CPU does not reorder the
channel enable, thus the start of the transfer, without making sure that
all the pre-required register fields are already written.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
---
drivers/dma/at_hdmac.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 80eeb4fb88ef..968a5aba47cd 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -256,6 +256,8 @@ static void atc_dostart(struct at_dma_chan *atchan, struct at_desc *first)
ATC_SPIP_BOUNDARY(first->boundary));
channel_writel(atchan, DPIP, ATC_DPIP_HOLE(first->dst_hole) |
ATC_DPIP_BOUNDARY(first->boundary));
+ /* Don't allow CPU to reorder channel enable. */
+ wmb();
dma_writel(atdma, CHER, atchan->mask);
vdbg_dump_regs(atchan);
--
2.25.1
In case the controller detected an error, the code took the chance to move
all the queued (submitted) descriptors to the active (issued) list. This
was wrong as if there were any descriptors in the submitted list they were
moved to the issued list without actually issuing them to the controller,
thus a completion could be raised without even fireing the descriptor.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
---
drivers/dma/at_hdmac.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 9e5a30396c1c..80eeb4fb88ef 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -539,10 +539,6 @@ static void atc_handle_error(struct at_dma_chan *atchan)
bad_desc = atc_first_active(atchan);
list_del_init(&bad_desc->desc_node);
- /* As we are stopped, take advantage to push queued descriptors
- * in active_list */
- list_splice_init(&atchan->queue, atchan->active_list.prev);
-
/* Try to restart the controller */
if (!list_empty(&atchan->active_list)) {
desc = atc_first_queued(atchan);
--
2.25.1
As it was before, the descriptor was issued to the hardware without adding
it to the active (issued) list. This could result in a completion of other
descriptor, or/and in the descriptor never being completed.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
---
drivers/dma/at_hdmac.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index b53a9fc15dd9..9e5a30396c1c 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -510,8 +510,11 @@ static void atc_advance_work(struct at_dma_chan *atchan)
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
spin_unlock_irqrestore(&atchan->lock, flags);
}
@@ -523,6 +526,7 @@ static void atc_advance_work(struct at_dma_chan *atchan)
static void atc_handle_error(struct at_dma_chan *atchan)
{
struct at_desc *bad_desc;
+ struct at_desc *desc;
struct at_desc *child;
unsigned long flags;
@@ -540,8 +544,11 @@ static void atc_handle_error(struct at_dma_chan *atchan)
list_splice_init(&atchan->queue, atchan->active_list.prev);
/* Try to restart the controller */
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
/*
* KERN_CRITICAL may seem harsh, but since this only happens
--
2.25.1
The tasklet (atc_advance_work()) did not held the channel lock when
retrieving the first active descriptor, causing concurrency problems if
issue_pending() was called in between. If issue_pending() was called
exactly after the lock was released in the tasklet (atc_advance_work()),
atc_chain_complete() could complete a descriptor for which the controller
has not yet raised an interrupt.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
---
drivers/dma/at_hdmac.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 0fb44f622d35..b53a9fc15dd9 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,8 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* Remove transfer node from the active list. */
- list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -495,6 +493,7 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
*/
static void atc_advance_work(struct at_dma_chan *atchan)
{
+ struct at_desc *desc;
unsigned long flags;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
@@ -502,9 +501,12 @@ static void atc_advance_work(struct at_dma_chan *atchan)
spin_lock_irqsave(&atchan->lock, flags);
if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
return spin_unlock_irqrestore(&atchan->lock, flags);
- spin_unlock_irqrestore(&atchan->lock, flags);
- atc_chain_complete(atchan, atc_first_active(atchan));
+ desc = atc_first_active(atchan);
+ /* Remove the transfer node from the active list. */
+ list_del_init(&desc->desc_node);
+ spin_unlock_irqrestore(&atchan->lock, flags);
+ atc_chain_complete(atchan, desc);
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
--
2.25.1