Hi Greg,
would you please queue up this upstream patch:
0a6b58c5cd0d ("lockdep: fix static memory detection even more")
to stable kernels 6.1, 6.4 and 6.5 ?
Thanks,
Helge
From 0a6b58c5cd0dfd7961e725212f0fc8dfc5d96195 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)gmx.de>
Date: Tue, 15 Aug 2023 00:31:09 +0200
Subject: [PATCH] lockdep: fix static memory detection even more
On the parisc architecture, lockdep reports for all static objects which
are in the __initdata section (e.g. "setup_done" in devtmpfs,
"kthreadd_done" in init/main.c) this warning:
INFO: trying to register non-static key.
The warning itself is wrong, because those objects are in the __initdata
section, but the section itself is on parisc outside of range from
_stext to _end, which is why the static_obj() functions returns a wrong
answer.
While fixing this issue, I noticed that the whole existing check can
be simplified a lot.
Instead of checking against the _stext and _end symbols (which include
code areas too) just check for the .data and .bss segments (since we check a
data object). This can be done with the existing is_kernel_core_data()
macro.
In addition objects in the __initdata section can be checked with
init_section_contains(), and is_kernel_rodata() allows keys to be in the
_ro_after_init section.
This partly reverts and simplifies commit bac59d18c701 ("x86/setup: Fix static
memory detection").
Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100
Fixes: bac59d18c701 ("x86/setup: Fix static memory detection")
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index a6e8373a5170..3fa87e5e11ab 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -2,8 +2,6 @@
#ifndef _ASM_X86_SECTIONS_H
#define _ASM_X86_SECTIONS_H
-#define arch_is_kernel_initmem_freed arch_is_kernel_initmem_freed
-
#include <asm-generic/sections.h>
#include <asm/extable.h>
@@ -18,20 +16,4 @@ extern char __end_of_kernel_reserve[];
extern unsigned long _brk_start, _brk_end;
-static inline bool arch_is_kernel_initmem_freed(unsigned long addr)
-{
- /*
- * If _brk_start has not been cleared, brk allocation is incomplete,
- * and we can not make assumptions about its use.
- */
- if (_brk_start)
- return 0;
-
- /*
- * After brk allocation is complete, space between _brk_end and _end
- * is available for allocation.
- */
- return addr >= _brk_end && addr < (unsigned long)&_end;
-}
-
#endif /* _ASM_X86_SECTIONS_H */
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 111607d91489..e85b5ad3e206 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -819,34 +819,26 @@ static int very_verbose(struct lock_class *class)
* Is this the address of a static object:
*/
#ifdef __KERNEL__
-/*
- * Check if an address is part of freed initmem. After initmem is freed,
- * memory can be allocated from it, and such allocations would then have
- * addresses within the range [_stext, _end].
- */
-#ifndef arch_is_kernel_initmem_freed
-static int arch_is_kernel_initmem_freed(unsigned long addr)
-{
- if (system_state < SYSTEM_FREEING_INITMEM)
- return 0;
-
- return init_section_contains((void *)addr, 1);
-}
-#endif
-
static int static_obj(const void *obj)
{
- unsigned long start = (unsigned long) &_stext,
- end = (unsigned long) &_end,
- addr = (unsigned long) obj;
+ unsigned long addr = (unsigned long) obj;
- if (arch_is_kernel_initmem_freed(addr))
- return 0;
+ if (is_kernel_core_data(addr))
+ return 1;
+
+ /*
+ * keys are allowed in the __ro_after_init section.
+ */
+ if (is_kernel_rodata(addr))
+ return 1;
/*
- * static variable?
+ * in initdata section and used during bootup only?
+ * NOTE: On some platforms the initdata section is
+ * outside of the _stext ... _end range.
*/
- if ((addr >= start) && (addr < end))
+ if (system_state < SYSTEM_FREEING_INITMEM &&
+ init_section_contains((void *)addr, 1))
return 1;
/*
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 147f04b14adde831eb4a0a1e378667429732f9e8 ]
If an RCU expedited grace period starts just when a CPU is in the process
of going offline, so that the outgoing CPU has completed its pass through
stop-machine but has not yet completed its final dive into the idle loop,
RCU will attempt to enable that CPU's scheduling-clock tick via a call
to tick_dep_set_cpu(). For this to happen, that CPU has to have been
online when the expedited grace period completed its CPU-selection phase.
This is pointless: The outgoing CPU has interrupts disabled, so it cannot
take a scheduling-clock tick anyway. In addition, the tick_dep_set_cpu()
function's eventual call to irq_work_queue_on() will splat as follows:
smpboot: CPU 1 is now offline
WARNING: CPU: 6 PID: 124 at kernel/irq_work.c:95
+irq_work_queue_on+0x57/0x60
Modules linked in:
CPU: 6 PID: 124 Comm: kworker/6:2 Not tainted 5.15.0-rc1+ #3
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Workqueue: rcu_gp wait_rcu_exp_gp
RIP: 0010:irq_work_queue_on+0x57/0x60
Code: 8b 05 1d c7 ea 62 a9 00 00 f0 00 75 21 4c 89 ce 44 89 c7 e8
+9b 37 fa ff ba 01 00 00 00 89 d0 c3 4c 89 cf e8 3b ff ff ff eb ee <0f> 0b eb b7
+0f 0b eb db 90 48 c7 c0 98 2a 02 00 65 48 03 05 91
6f
RSP: 0000:ffffb12cc038fe48 EFLAGS: 00010282
RAX: 0000000000000001 RBX: 0000000000005208 RCX: 0000000000000020
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9ad01f45a680
RBP: 000000000004c990 R08: 0000000000000001 R09: ffff9ad01f45a680
R10: ffffb12cc0317db0 R11: 0000000000000001 R12: 00000000fffecee8
R13: 0000000000000001 R14: 0000000000026980 R15: ffffffff9e53ae00
FS: 0000000000000000(0000) GS:ffff9ad01f580000(0000)
+knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000000de0c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tick_nohz_dep_set_cpu+0x59/0x70
rcu_exp_wait_wake+0x54e/0x870
? sync_rcu_exp_select_cpus+0x1fc/0x390
process_one_work+0x1ef/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x28/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0x115/0x140
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace c5bf75eb6aa80bc6 ]---
This commit therefore avoids invoking tick_dep_set_cpu() on offlined
CPUs to limit both futility and false-positive splats.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
kernel/rcu/tree_exp.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 401c1f331caf..07a284a18645 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -507,7 +507,10 @@ static void synchronize_rcu_expedited_wait(void)
if (rdp->rcu_forced_tick_exp)
continue;
rdp->rcu_forced_tick_exp = true;
- tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
+ preempt_disable();
+ if (cpu_online(cpu))
+ tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
+ preempt_enable();
}
}
j = READ_ONCE(jiffies_till_first_fqs);
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Hi!
On 02.08.23 23:28, Olaf Skibbe wrote:
> Dear Maintainers,
>
> Hereby I would like to report an apparent bug in the nouveau driver in
> linux/6.1.38-2.
Thx for your report. Maybe your problem is caused by a incomplete
backport. I Cced the maintainers for the drivers (and the regressions
and the stable list), maybe one of them has an idea, as they know the
driver.
If they don't reply in the next few days, please check if the problem is
also present in mainline. If not, check if the latest 6.1.y. release
already fixes this. If not, try to check which of the four patches you
reverted to make things going is actually causing this (e.g. first only
revert the one that was applied last; then the two last ones; ...).
> Running a current debian stable on a Dell Latitude E6510 with a
> "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> just a black screen. Access via ssh is possible.
>
> ~# uname -r
> 6.1.0-10-amd64
>
> demesg shows the following error message:
>
> [ 3.560153] WARNING: CPU: 0 PID: 176 at
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> video wmi button
> [ 3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> 6.1.0-10-amd64 #1 Debian 6.1.38-2
> [ 3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> 05/12/2017
> [ 3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [ 3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [ 3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246
> [ 3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX:
> 0000000000041eb0
> [ 3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI:
> ffff9899c048bcf0
> [ 3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09:
> 0000000000005b76
> [ 3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12:
> 00000000ffffffea
> [ 3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15:
> 0000000000000000
> [ 3.560549] FS: 0000000000000000(0000) GS:ffff88e123c00000(0000)
> knlGS:0000000000000000
> [ 3.560551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4:
> 00000000000006f0
> [ 3.560554] Call Trace:
> [ 3.560558] <TASK>
> [ 3.560560] ? __warn+0x7d/0xc0
> [ 3.560566] ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560671] ? report_bug+0xe6/0x170
> [ 3.560675] ? handle_bug+0x41/0x70
> [ 3.560679] ? exc_invalid_op+0x13/0x60
> [ 3.560681] ? asm_exc_invalid_op+0x16/0x20
> [ 3.560685] ? init_reset_begun+0x20/0x20 [nouveau]
> [ 3.560769] ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [ 3.560888] nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [ 3.560997] nv50_disp_super+0x113/0x210 [nouveau]
> [ 3.561103] process_one_work+0x1c7/0x380
> [ 3.561109] worker_thread+0x4d/0x380
> [ 3.561113] ? rescuer_thread+0x3a0/0x3a0
> [ 3.561116] kthread+0xe9/0x110
> [ 3.561120] ? kthread_complete_and_exit+0x20/0x20
> [ 3.561122] ret_from_fork+0x22/0x30
> [ 3.561130] </TASK>
>
> Further information:
>
> $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> (rev a2) (prog-if 00 [VGA controller])
> Subsystem: Dell Latitude E6510
> Flags: bus master, fast devsel, latency 0, IRQ 27
> Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
> Memory at d0000000 (64-bit, prefetchable) [size=256M]
> Memory at e0000000 (64-bit, prefetchable) [size=32M]
> I/O ports at 7000 [size=128]
> Expansion ROM at 000c0000 [disabled] [size=128K]
> Capabilities: <access denied>
> Kernel driver in use: nouveau
> Kernel modules: nouveau
>
> I reported this bug to debian already, see
> https://bugs.debian.org/1042753 for context.
>
> With support (thanks Diederik!) I managed to figure out that the cause
> was a regression between upstream kernel version 6.1.27 and 6.1.38.
>
> I build a new 6.1.38 kernel with these commits reverted:
>
> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
> 5a144bad3e75 nouveau: fix client work fence deletion race
>
> With that kernel the graphic works again.
>
> Please inform me if further tests are required.
FWIW, to be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.1.27..v6.1.38
#regzbot title drm/nouveau: display stays black
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.