On 03.07.25 15:37, Thomas Zimmermann wrote:
Hi
Am 03.07.25 um 13:59 schrieb Bert Karwatzki:
When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last updated 20250703) I get a several warnings of the following kind:
[ 8.702999] [ T1628] ------------[ cut here ]------------ [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628
Well, that didn't take long to blow up. Thanks for reporting the bug.
I have an idea how to fix this, but it would likely just trigger the next issue.
Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem?
Sure we can, but I would rather vote for fixing this at least for now. Those patches are not just cleanup, but are fixing rare occurring real world problems.
If we can't get it working in the next week or so we can still revert back to a working state.
What exactly is the issue? That cursors don't necessarily have GEM handles? If yes how we grab/drop handle refs when we have a DMA-buf?
Regards, Christian.
Best regards Thomas
[ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 [ 8.703099] [ T1628] PKRU: 55555554 [ 8.703100] [ T1628] Call Trace: [ 8.703101] [ T1628] <TASK> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 [ 8.703302] [ T1628] </TASK> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]---
As the warnings do not occur in next-20250702, I looked at the commits given by $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm to search for a culprit. So I reverted the most likely candidate, commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), in next-20250703 and the warnings disappeared. This is the hardware I used: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
Bert Karwatzki