On Fri, Aug 4, 2023 at 2:02 PM Thorsten Leemhuis regressions@leemhuis.info wrote:
Hi!
On 02.08.23 23:28, Olaf Skibbe wrote:
Dear Maintainers,
Hereby I would like to report an apparent bug in the nouveau driver in linux/6.1.38-2.
Thx for your report. Maybe your problem is caused by a incomplete backport. I Cced the maintainers for the drivers (and the regressions and the stable list), maybe one of them has an idea, as they know the driver.
If they don't reply in the next few days, please check if the problem is also present in mainline. If not, check if the latest 6.1.y. release already fixes this. If not, try to check which of the four patches you reverted to make things going is actually causing this (e.g. first only revert the one that was applied last; then the two last ones; ...).
Running a current debian stable on a Dell Latitude E6510 with a "NVIDIA Corporation GT218M" graphic card, the monitor turns black after the grub screen. Also switching to a console (Strg-Alt-F2) shows just a black screen. Access via ssh is possible.
~# uname -r 6.1.0-10-amd64
demesg shows the following error message:
[ 3.560153] WARNING: CPU: 0 PID: 176 at drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery video wmi button [ 3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted 6.1.0-10-amd64 #1 Debian 6.1.38-2 [ 3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 05/12/2017 [ 3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau] [ 3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26 [ 3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246 [ 3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX: 0000000000041eb0 [ 3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI: ffff9899c048bcf0 [ 3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09: 0000000000005b76 [ 3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12: 00000000ffffffea [ 3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15: 0000000000000000 [ 3.560549] FS: 0000000000000000(0000) GS:ffff88e123c00000(0000) knlGS:0000000000000000 [ 3.560551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4: 00000000000006f0 [ 3.560554] Call Trace: [ 3.560558] <TASK> [ 3.560560] ? __warn+0x7d/0xc0 [ 3.560566] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 3.560671] ? report_bug+0xe6/0x170 [ 3.560675] ? handle_bug+0x41/0x70 [ 3.560679] ? exc_invalid_op+0x13/0x60 [ 3.560681] ? asm_exc_invalid_op+0x16/0x20 [ 3.560685] ? init_reset_begun+0x20/0x20 [nouveau] [ 3.560769] ? nvkm_dp_acquire+0x26a/0x490 [nouveau] [ 3.560888] nv50_disp_super_2_2+0x70/0x430 [nouveau] [ 3.560997] nv50_disp_super+0x113/0x210 [nouveau] [ 3.561103] process_one_work+0x1c7/0x380 [ 3.561109] worker_thread+0x4d/0x380 [ 3.561113] ? rescuer_thread+0x3a0/0x3a0 [ 3.561116] kthread+0xe9/0x110 [ 3.561120] ? kthread_complete_and_exit+0x20/0x20 [ 3.561122] ret_from_fork+0x22/0x30 [ 3.561130] </TASK>
Further information:
$ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }') 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] (rev a2) (prog-if 00 [VGA controller]) Subsystem: Dell Latitude E6510 Flags: bus master, fast devsel, latency 0, IRQ 27 Memory at e2000000 (32-bit, non-prefetchable) [size=16M] Memory at d0000000 (64-bit, prefetchable) [size=256M] Memory at e0000000 (64-bit, prefetchable) [size=32M] I/O ports at 7000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: nouveau Kernel modules: nouveau
I reported this bug to debian already, see https://bugs.debian.org/1042753 for context.
With support (thanks Diederik!) I managed to figure out that the cause was a regression between upstream kernel version 6.1.27 and 6.1.38.
I build a new 6.1.38 kernel with these commits reverted:
62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device 5a144bad3e75 nouveau: fix client work fence deletion race
mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would be weird if the other two commits are causing it. If that's the case, it's a bit worrying that reverting either of the those causes issues, but maybe there is a good reason for it. Anyway, mind figuring out which of the two you need reverted to fix your issue? Thanks!
With that kernel the graphic works again.
Please inform me if further tests are required.
FWIW, to be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.1.27..v6.1.38 #regzbot title drm/nouveau: display stays black #regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.