Hello all,
please forgive me if this issue is already known, but I couldn't find any reference to it with regard to the 6.17.9 kernel. Anyway, when updating from 6.17.8 to 6.17.9, the following error is raised on every boot:
Dec 04 14:44:20 P14s kernel: amdgpu: Topology: Add dGPU node [0x1638:0x1002] Dec 04 14:44:20 P14s kernel: kfd kfd: amdgpu: added device 1002:1638 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 8 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8 Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: Runtime PM not available Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: amdgpu: [drm] Using custom brightness curve Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: [drm] Registered 4 planes with drm panic Dec 04 14:44:20 P14s kernel: [drm] Initialized amdgpu 3.64.0 for 0000:07:00.0 on minor 1 Dec 04 14:44:20 P14s kernel: fbcon: amdgpudrmfb (fb0) is primary device Dec 04 14:44:20 P14s kernel: amdgpu 0000:07:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data Dec 04 14:44:21 P14s kernel: amdgpu 0000:07:00.0: [drm] fb0: amdgpudrmfb frame buffer device
Setup is as follows:
Hardware: ThinkPad P14s Gen 2 AMD Processor: AMD Ryzen™ 7 PRO 5850U OS: Arch Linux AMD Firmware: linux-firmware-amdgpu 20251125
Running a bisection gives the following:
git bisect start # status: waiting for both good and bad commits # good: [8ac42a63c561a8b4cccfe84ed8b97bb057e6ffae] Linux 6.17.8 git bisect good 8ac42a63c561a8b4cccfe84ed8b97bb057e6ffae # status: waiting for bad commit, 1 good commit known # bad: [1bfd0faa78d09eb41b81b002e0292db0f3e75de0] Linux 6.17.9 git bisect bad 1bfd0faa78d09eb41b81b002e0292db0f3e75de0 # bad: [92ef36a75fbb56a02a16b141fe684f64fb2b1cb9] lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIAN git bisect bad 92ef36a75fbb56a02a16b141fe684f64fb2b1cb9 # bad: [aaba523dd7b6106526c24b1fd9b5fc35e5aaa88d] sctp: prevent possible shift-out-of-bounds in sctp_transport_update_rto git bisect bad aaba523dd7b6106526c24b1fd9b5fc35e5aaa88d # bad: [b3b288206a1ea7e21472f8d1c7834ebface9bb33] drm/amdkfd: fix suspend/resume all calls in mes based eviction path git bisect bad b3b288206a1ea7e21472f8d1c7834ebface9bb33 # good: [ac486718d6cc96e07bc67094221e682ba5ea6f76] drm/amd/pm: Use pm_display_cfg in legacy DPM (v2) git bisect good ac486718d6cc96e07bc67094221e682ba5ea6f76 # bad: [1009f007b3afba93082599e263b3807d05177d53] RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence errors git bisect bad 1009f007b3afba93082599e263b3807d05177d53 # bad: [ccd8af579101ca68f1fba8c9e055554202381cab] drm/amd: Disable ASPM on SI git bisect bad ccd8af579101ca68f1fba8c9e055554202381cab # bad: [e95425b6df29cc88fac7d0d77aa38a5a131dbf45] drm/amd/pm: Disable MCLK switching on SI at high pixel clocks git bisect bad e95425b6df29cc88fac7d0d77aa38a5a131dbf45 # bad: [5ee434b55134c24df7ad426d40fe28c6542fab4d] drm/amd/display: Disable fastboot on DCE 6 too git bisect bad 5ee434b55134c24df7ad426d40fe28c6542fab4d # first bad commit: [5ee434b55134c24df7ad426d40fe28c6542fab4d] drm/amd/display: Disable fastboot on DCE 6 too
The error still occurs in 6.18, but reverting the above bad commit removes it.
Although an error is reported, the system still boots to the graphical interface and appears to function normally, although I have neither benchmarked graphics performance or used the system for an extended period after the error has been flagged.
Yours faithfully,
Neil Gammie