On Thu Apr 24, 2025 at 4:44 PM BST, Alex Deucher wrote:
On Tue, Apr 22, 2025 at 11:59 AM Alexey Klimov alexey.klimov@linaro.org wrote:
On Tue Apr 22, 2025 at 2:00 PM BST, Alex Deucher wrote:
On Mon, Apr 21, 2025 at 10:21 PM Alexey Klimov alexey.klimov@linaro.org wrote:
On Thu Apr 17, 2025 at 2:08 PM BST, Alex Deucher wrote:
On Wed, Apr 16, 2025 at 8:43 PM Fugang Duan fugang.duan@cixtech.com wrote:
发件人: Alex Deucher alexdeucher@gmail.com 发送时间: 2025年4月16日 22:49 >收件人: Alexey Klimov alexey.klimov@linaro.org >On Wed, Apr 16, 2025 at 9:48 AM Alexey Klimov alexey.klimov@linaro.org wrote: >> >> On Wed Apr 16, 2025 at 4:12 AM BST, Fugang Duan wrote: >> > 发件人: Alexey Klimov alexey.klimov@linaro.org 发送时间: 2025年4月16 >日 2:28 >> >>#regzbot introduced: v6.12..v6.13 >> >>The only change related to hdp_v5_0_flush_hdp() was >> >>cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP >> >> >> >>Reverting that commit ^^ did help and resolved that problem. Before
[..]
OK. that patch won't change anything then. Can you try this patch instead?
Config I am using is basically defconfig wrt memory parameters, yeah, i use 4k.
So I tested that patch, thank you, and some other different configurations -- nothing helped. Exactly the same behaviour with the same backtrace.
Did you test the first (4k check) or the second (don't remap on ARM) patch?
The second one. I think you mentioned that first one won't help for 4k pages.
So it seems that it is firmware problem after all?
There is no GPU firmware involved in this operation. It's just a posted write. E.g., we write to a register to flush the HDP write queue and then read the register back to make sure the write posted. If the second patch didn't help, then perhaps there is some issue with MMIO access on your platform?
I didn't mean GPU firmware at all. I only had uefi/EL3 firmwares in mind.
Completely out of the blue, based on nothing, do you think that adding delay/some mem barrier between write and read might help? I wonder if host data path code should be executed during common desktop usage as a common user then why it doesn't break later. But yeah, I also think this is this motherboard problem. Thank you.
I think I found the problem. The previous patch wasn't doing what I expected. Please try this patch instead.
This one works!
[ 4.483750] [drm] amdgpu kernel modesetting enabled. [ 4.491985] amdgpu: IO link not available for non x86 platforms [ 4.497189] amdgpu: Virtual CRAT table created for CPU [ 4.497559] amdgpu: Topology: Add CPU node [ 4.509623] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 0 <nv_common> [ 4.512905] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 1 <gmc_v10_0> [ 4.513254] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 2 <navi10_ih> [ 4.513595] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 3 <psp> [ 4.513932] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 4 <smu> [ 4.514278] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 5 <dm> [ 4.514625] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 6 <gfx_v10_0> [ 4.514980] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 7 <sdma_v5_2> [ 4.515334] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 8 <vcn_v3_0> [ 4.515699] amdgpu 0000:c3:00.0: amdgpu: detected ip block number 9 <jpeg_v3_0> [ 4.516087] amdgpu 0000:c3:00.0: amdgpu: Fetched VBIOS from VFCT [ 4.516466] amdgpu: ATOM BIOS: 113-V502MECH-0OC [ 4.749748] amdgpu 0000:c3:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default) [ 4.777435] amdgpu 0000:c3:00.0: BAR 2 [mem 0x1810000000-0x18101fffff 64bit pref]: releasing [ 4.793256] amdgpu 0000:c3:00.0: BAR 0 [mem 0x1800000000-0x180fffffff 64bit pref]: releasing [ 4.844639] amdgpu 0000:c3:00.0: BAR 0 [mem 0x1800000000-0x19ffffffff 64bit pref]: assigned [ 4.849774] amdgpu 0000:c3:00.0: BAR 2 [mem 0x1a00000000-0x1a001fffff 64bit pref]: assigned [ 4.957411] amdgpu 0000:c3:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used) [ 4.967618] amdgpu 0000:c3:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF [ 4.992963] [drm] amdgpu: 8176M of VRAM memory ready [ 5.004032] [drm] amdgpu: 7888M of GTT memory ready. [ 6.224159] amdgpu 0000:c3:00.0: amdgpu: STB initialized to 2048 entries [ 6.284328] amdgpu 0000:c3:00.0: amdgpu: Found VCN firmware Version ENC: 1.33 DEC: 4 VEP: 0 Revision: 3 [ 6.361142] amdgpu 0000:c3:00.0: amdgpu: reserve 0xa00000 from 0x81fd000000 for PSP TMR [ 6.471231] amdgpu 0000:c3:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 6.492967] amdgpu 0000:c3:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 6.492993] amdgpu 0000:c3:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b3100 (59.49.0) [ 6.513659] amdgpu 0000:c3:00.0: amdgpu: SMU driver if version not matched [ 6.513699] amdgpu 0000:c3:00.0: amdgpu: use vbios provided pptable [ 6.588418] amdgpu 0000:c3:00.0: amdgpu: SMU is initialized successfully! [ 6.800975] kfd kfd: amdgpu: Allocated 3969056 bytes on gart [ 6.806709] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1 [ 6.813516] amdgpu: Virtual CRAT table created for GPU [ 6.819229] amdgpu: Topology: Add dGPU node [0x73ff:0x1002] [ 6.824865] kfd kfd: amdgpu: added device 1002:73ff [ 6.829821] amdgpu 0000:c3:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 28 [ 6.838355] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 6.846007] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0 [ 6.853658] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0 [ 6.861398] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0 [ 6.869137] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 [ 6.876877] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 [ 6.884615] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 [ 6.892356] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 [ 6.900094] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 [ 6.907921] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 [ 6.915748] amdgpu 0000:c3:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0 [ 6.923663] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0 [ 6.931050] amdgpu 0000:c3:00.0: amdgpu: ring sdma1 uses VM inv eng 14 on hub 0 [ 6.938439] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8 [ 6.946089] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8 [ 6.953916] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8 [ 6.961742] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8 [ 6.970485] amdgpu 0000:c3:00.0: amdgpu: Using BACO for runtime pm [ 6.977167] [drm] Initialized amdgpu 3.63.0 for 0000:c3:00.0 on minor 0 [ 7.234638] amdgpu 0000:c3:00.0: [drm] fb0: amdgpudrmfb frame buffer device root@orion:~ # uname -a Linux orion 6.15.0-rc3test6+ #1 SMP Sun Apr 27 01:12:10 BST 2025 aarch64 GNU/Linux
Thank you for taking a look into this.
Best regards, Alexey