This patch series attempts to enable the use of xe DRM driver on non-4KiB kernel page platforms. This involves fixing the ttm/bo interface, as well as parts of the userspace API to make use of kernel `PAGE_SIZE' for alignment instead of the assumed `SZ_4K', it also fixes incorrect usage of `PAGE_SIZE' in the GuC and ring buffer interface code to make sure all instructions/commands were aligned to 4KiB barriers (per the Programmer's Manual for the GPUs covered by this DRM driver).
This issue was first discovered and reported by members of the LoongArch user communities, whose hardware commonly ran on 16KiB-page kernels. The patch series began on an unassuming branch of a downstream kernel tree maintained by Shang Yatsen.[^1]
It worked well but remained sparsely documented, a lot of the work done here relied on Shang Yatsen's original patch.
AOSC OS then picked it up[^2] to provide Intel Xe/Arc support for users of its LoongArch port, for which I worked extensively on. After months of positive user feedback and from encouragement from Kexy Biscuit, my colleague at the community, I decided to examine its potential for upstreaming, cross-reference kernel and Intel documentation to better document and revise this patch.
Now that this series has been tested good (for boot up, OpenGL, and playback of a standardised set of video samples[^3]... with the exception of the Intel Arc B580, which seems to segfault at intel-media-driver - iHD_drv_video.so, but strangely, hardware accelerated video playback works well with Firefox?) on the following platforms (motherboard + GPU model):
- x86-64, 4KiB kernel page: - MS-7D42 + Intel Arc A580 - LoongArch, 16KiB kernel page: - XA61200 + GUNNIR DG1 Blue Halberd (Intel DG1) - XA61200 + ASRock Arc A380 Challenger ITX OC (Intel Arc 380) - XA61200 + Intel Arc 580 - XA61200 + GUNNIR Intel Arc A750 Photon 8G OC (Intel Arc A750) - ASUS XC-LS3A6M + GUNNIR Intel Arc B580 INDEX 12G (Intel Arc B580)
On these platforms, basic functionalities tested good but the driver was unstable with occasional resets (I do suspect however, that this platform suffers from PCIe coherence issues, as instability only occurs under heavy VRAM I/O load):
- AArch64, 4KiB/64KiB kernel pages: - ERUN-FD3000 (Phytium D3000) + GUNNIR Intel Arc A750 Photon 8G OC (Intel Arc A750)
I think that this patch series is now ready for your comment and review. Please forgive me if I made any simple mistake or used wrong terminologies, but I have never worked on a patch for the DRM subsystem and my experience is still quite thin.
But anyway, just letting you all know that Intel Xe/Arc works on non-4KiB kernel page platforms (and honestly, it's great to use, especially for games and media playback)!
[^1]: https://github.com/FanFansfan/loongson-linux/tree/loongarch-xe [^2]: We maintained Shang Yatsen's patch until our v6.13.3 tree, until we decided to test and send this series upstream, https://github.com/AOSC-Tracking/linux/tree/aosc/v6.13.3 [^3]: Delicious hot pot! https://repo.aosc.io/ahvl/sample-videos-20250223.tar.zst
Suggested-by: Kexy Biscuit kexybiscuit@aosc.io Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io --- Mingcong Bai (5): drm/xe/bo: fix alignment with non-4K kernel page sizes drm/xe/guc: use SZ_4K for alignment drm/xe/regs: fix RING_CTL_SIZE(size) calculation drm/xe: use 4K alignment for cursor jumps drm/xe/query: use PAGE_SIZE as the minimum page alignment
drivers/gpu/drm/xe/regs/xe_engine_regs.h | 3 +-- drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- drivers/gpu/drm/xe/xe_guc.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++---------------- drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++---- drivers/gpu/drm/xe/xe_guc_ct.c | 2 +- drivers/gpu/drm/xe/xe_guc_log.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++-- drivers/gpu/drm/xe/xe_migrate.c | 4 ++-- drivers/gpu/drm/xe/xe_query.c | 2 +- include/uapi/drm/xe_drm.h | 2 +- 11 files changed, 36 insertions(+), 37 deletions(-) --- base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6 change-id: 20250226-xe-non-4k-fix-6b2eded0a564
Best regards,
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io --- drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else { - aligned_size = ALIGN(size, SZ_4K); + aligned_size = ALIGN(size, PAGE_SIZE); flags &= ~XE_BO_FLAG_INTERNAL_64K; - alignment = SZ_4K >> PAGE_SHIFT; + alignment = PAGE_SIZE >> PAGE_SHIFT; }
if (type == ttm_bo_type_device && aligned_size != size) @@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile; - bo->size = size; + bo->size = aligned_size; bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size); + drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
flags &= ~XE_BO_FLAG_INTERNAL_64K;
alignment = SZ_4K >> PAGE_SHIFT;
alignment = PAGE_SIZE >> PAGE_SHIFT;
}
if (type == ttm_bo_type_device && aligned_size != size)
@@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile;
- bo->size = size;
- bo->size = aligned_size;
the interface of this function is that the caller needs to pass the correct size, it's not really expected the function will adjust it and the check is there to gurantee to return the appropriate error. There are other places that would need some additional fixes leading to this function. Example:
xe_gem_create_ioctl() { ... if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK)) return -EINVAL; ... }
Lucas De Marchi
bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
-- 2.48.1
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
I think using PAGE_SIZE makes sense in some cases. See my other comments.
flags &= ~XE_BO_FLAG_INTERNAL_64K;
alignment = SZ_4K >> PAGE_SHIFT;
alignment = PAGE_SIZE >> PAGE_SHIFT;
}
if (type == ttm_bo_type_device && aligned_size != size)
@@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile;
- bo->size = size;
- bo->size = aligned_size;
the interface of this function is that the caller needs to pass the correct size, it's not really expected the function will adjust it and the check is there to gurantee to return the appropriate error. There
Let me expand further on Lucas's comment. We reject user BOs that are unaligned here in ___xe_bo_create_locked.
1490 if (type == ttm_bo_type_device && aligned_size != size) 1491 return ERR_PTR(-EINVAL);
What we allow are kernel BOs (!= ttm_bo_type_device), which are never mapped to the CPU or the PPGTT (user GPU mappings), to be a smaller size. Examples of this include memory for GPU page tables, LRC state, etc. Memory for GPU page tables is always allocated in 4k blocks, so changing the allocation to the CPU page size of 16k or 64k would be wasteful.
AFAIK, kernel memory is always a VRAM allocation, so we don't have any CPU page size requirements. If this is not true (I haven't checked), or perhaps just to future-proof, change the snippet in my first comment to:
} else if (type == ttm_bo_type_device || flags & XE_BO_FLAG_SYSTEM)) { new code /w PAGE_SIZE } else { old code /w SZ_4K }
Then change BO assignment size too:
bo->size = flags & XE_BO_FLAG_SYSTEM ? aligned_size : size;
This should enable kernel VRAM allocations to be smaller than the CPU page size (I think). Can you try out this suggestion and see if the Xe boots with non-4k pages?
Also others in Cc... thoughts / double check my input?
are other places that would need some additional fixes leading to this function. Example:
xe_gem_create_ioctl() { ... if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK)) return -EINVAL;
This actually looks right, the minimum allocation size for user BOs should be PAGE_SIZE aligned. The last patch in the series fixes the query for this.
Matt
... }
Lucas De Marchi
bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
-- 2.48.1
On Tue, Feb 25, 2025 at 08:18:42PM -0800, Matthew Brost wrote:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
yes, that's what I meant: I think we should use XE_PAGE_SIZE (here and in other patches in this series) instead of SZ_4K. Even if both are the same, it makes things clearer.
Lucas De Marchi
On Tue, Feb 25, 2025 at 11:05:16PM -0600, Lucas De Marchi wrote:
On Tue, Feb 25, 2025 at 08:18:42PM -0800, Matthew Brost wrote:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
yes, that's what I meant: I think we should use XE_PAGE_SIZE (here and in other patches in this series) instead of SZ_4K. Even if both are the same, it makes things clearer.
I think, to some extent, it has to be considered in contextโfor example, Patch #2 in this series, where we are enforcing GuC memory requirements to 4K, makes SZ_4K appropriate. In contrast, in Patch #4, where we are operating in GPU page sizes, XE_PAGE_SIZE makes more sense.
Regardless, this series clearly shows that we have been carelessly mixing SZ_4K, XE_PAGE_SIZE, and PAGE_SIZE throughout the codeโprobably mostly my fault from the early Xe coding. I agree that we should try to clean this up.
Matt
Lucas De Marchi
Hi Matthew,
ๅจ 2025/2/26 13:43, Matthew Brost ๅ้:
On Tue, Feb 25, 2025 at 11:05:16PM -0600, Lucas De Marchi wrote:
On Tue, Feb 25, 2025 at 08:18:42PM -0800, Matthew Brost wrote:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
yes, that's what I meant: I think we should use XE_PAGE_SIZE (here and in other patches in this series) instead of SZ_4K. Even if both are the same, it makes things clearer.
I think, to some extent, it has to be considered in contextโfor example, Patch #2 in this series, where we are enforcing GuC memory requirements to 4K, makes SZ_4K appropriate. In contrast, in Patch #4, where we are operating in GPU page sizes, XE_PAGE_SIZE makes more sense.
Regardless, this series clearly shows that we have been carelessly mixing SZ_4K, XE_PAGE_SIZE, and PAGE_SIZE throughout the codeโprobably mostly my fault from the early Xe coding. I agree that we should try to clean this up.
This sounds good, I will revise as suggested in v2.
Best Regards, Mingcong Bai
Matt
Lucas De Marchi
Hi Matt,
ๅจ 2025/2/26 12:18, Matthew Brost ๅ้:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
I think using PAGE_SIZE makes sense in some cases. See my other comments.
flags &= ~XE_BO_FLAG_INTERNAL_64K;
alignment = SZ_4K >> PAGE_SHIFT;
alignment = PAGE_SIZE >> PAGE_SHIFT;
}
if (type == ttm_bo_type_device && aligned_size != size)
@@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile;
- bo->size = size;
- bo->size = aligned_size;
the interface of this function is that the caller needs to pass the correct size, it's not really expected the function will adjust it and the check is there to gurantee to return the appropriate error. There
Let me expand further on Lucas's comment. We reject user BOs that are unaligned here in ___xe_bo_create_locked.
1490 if (type == ttm_bo_type_device && aligned_size != size) 1491 return ERR_PTR(-EINVAL);
What we allow are kernel BOs (!= ttm_bo_type_device), which are never mapped to the CPU or the PPGTT (user GPU mappings), to be a smaller size. Examples of this include memory for GPU page tables, LRC state, etc. Memory for GPU page tables is always allocated in 4k blocks, so changing the allocation to the CPU page size of 16k or 64k would be wasteful.
AFAIK, kernel memory is always a VRAM allocation, so we don't have any CPU page size requirements. If this is not true (I haven't checked), or perhaps just to future-proof, change the snippet in my first comment to:
} else if (type == ttm_bo_type_device || flags & XE_BO_FLAG_SYSTEM)) { new code /w PAGE_SIZE } else { old code /w SZ_4K }
Then change BO assignment size too:
bo->size = flags & XE_BO_FLAG_SYSTEM ? aligned_size : size;
This should enable kernel VRAM allocations to be smaller than the CPU page size (I think). Can you try out this suggestion and see if the Xe boots with non-4k pages?
Also others in Cc... thoughts / double check my input?
Noted, I will try out this snippet on non-4KiB kernels and report back with results.
Best Regards, Mingcong Bai
are other places that would need some additional fixes leading to this function. Example:
xe_gem_create_ioctl() { ... if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK)) return -EINVAL;
This actually looks right, the minimum allocation size for user BOs should be PAGE_SIZE aligned. The last patch in the series fixes the query for this.
Matt
... }
Lucas De Marchi
bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
-- 2.48.1
On 26/02/2025 04:18, Matthew Brost wrote:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
I think using PAGE_SIZE makes sense in some cases. See my other comments.
flags &= ~XE_BO_FLAG_INTERNAL_64K;
alignment = SZ_4K >> PAGE_SHIFT;
alignment = PAGE_SIZE >> PAGE_SHIFT;
}
if (type == ttm_bo_type_device && aligned_size != size)
@@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile;
- bo->size = size;
- bo->size = aligned_size;
the interface of this function is that the caller needs to pass the correct size, it's not really expected the function will adjust it and the check is there to gurantee to return the appropriate error. There
Let me expand further on Lucas's comment. We reject user BOs that are unaligned here in ___xe_bo_create_locked.
1490 if (type == ttm_bo_type_device && aligned_size != size) 1491 return ERR_PTR(-EINVAL);
What we allow are kernel BOs (!= ttm_bo_type_device), which are never mapped to the CPU or the PPGTT (user GPU mappings), to be a smaller size. Examples of this include memory for GPU page tables, LRC state, etc. Memory for GPU page tables is always allocated in 4k blocks, so changing the allocation to the CPU page size of 16k or 64k would be wasteful.
AFAIK, kernel memory is always a VRAM allocation, so we don't have any CPU page size requirements. If this is not true (I haven't checked), or perhaps just to future-proof, change the snippet in my first comment to:
} else if (type == ttm_bo_type_device || flags & XE_BO_FLAG_SYSTEM)) { new code /w PAGE_SIZE } else { old code /w SZ_4K }
Then change BO assignment size too:
bo->size = flags & XE_BO_FLAG_SYSTEM ? aligned_size : size;
This should enable kernel VRAM allocations to be smaller than the CPU page size (I think). Can you try out this suggestion and see if the Xe boots with non-4k pages?
The vram allocator chunk size is PAGE_SIZE so that would also need some attention, I think.
But I think we also then need to deal with the assert in: https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/gpu/drm/drm_gem.c#....
Also others in Cc... thoughts / double check my input?
are other places that would need some additional fixes leading to this function. Example:
xe_gem_create_ioctl() { ... if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK)) return -EINVAL;
This actually looks right, the minimum allocation size for user BOs should be PAGE_SIZE aligned. The last patch in the series fixes the query for this.
Matt
... }
Lucas De Marchi
bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
-- 2.48.1
On Wed, Feb 26, 2025 at 10:38:40AM +0000, Matthew Auld wrote:
On 26/02/2025 04:18, Matthew Brost wrote:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
I think using PAGE_SIZE makes sense in some cases. See my other comments.
flags &= ~XE_BO_FLAG_INTERNAL_64K;
alignment = SZ_4K >> PAGE_SHIFT;
alignment = PAGE_SIZE >> PAGE_SHIFT;
}
if (type == ttm_bo_type_device && aligned_size != size)
@@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile;
- bo->size = size;
- bo->size = aligned_size;
the interface of this function is that the caller needs to pass the correct size, it's not really expected the function will adjust it and the check is there to gurantee to return the appropriate error. There
Let me expand further on Lucas's comment. We reject user BOs that are unaligned here in ___xe_bo_create_locked.
1490 if (type == ttm_bo_type_device && aligned_size != size) 1491 return ERR_PTR(-EINVAL);
What we allow are kernel BOs (!= ttm_bo_type_device), which are never mapped to the CPU or the PPGTT (user GPU mappings), to be a smaller size. Examples of this include memory for GPU page tables, LRC state, etc. Memory for GPU page tables is always allocated in 4k blocks, so changing the allocation to the CPU page size of 16k or 64k would be wasteful.
AFAIK, kernel memory is always a VRAM allocation, so we don't have any CPU page size requirements. If this is not true (I haven't checked), or perhaps just to future-proof, change the snippet in my first comment to:
} else if (type == ttm_bo_type_device || flags & XE_BO_FLAG_SYSTEM)) { new code /w PAGE_SIZE } else { old code /w SZ_4K }
Then change BO assignment size too:
bo->size = flags & XE_BO_FLAG_SYSTEM ? aligned_size : size;
This should enable kernel VRAM allocations to be smaller than the CPU page size (I think). Can you try out this suggestion and see if the Xe boots with non-4k pages?
The vram allocator chunk size is PAGE_SIZE so that would also need some attention, I think.
Agree. So I think __xe_ttm_vram_mgr_init should be called with s/PAGE_SIZE/SZ_4K?
But I think we also then need to deal with the assert in: https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/gpu/drm/drm_gem.c#....
Yep. I think that would need to be adjusted as well to be bypassed if we are never going to CPU map the BOโspecifically, CPU map it to user space or if the BO is not in VRAM. For kernel VRAM mapping, this resolves to an offset within an existing large PCIe BAR mapping, so allocations unaligned to PAGE_SIZE should work.
Maybe export __drm_gem_private_object_init, which skips the BUG_ON, and call this in Xe to avoid interfering with other drivers' expectations?
Matt
Also others in Cc... thoughts / double check my input?
are other places that would need some additional fixes leading to this function. Example:
xe_gem_create_ioctl() { ... if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK)) return -EINVAL;
This actually looks right, the minimum allocation size for user BOs should be PAGE_SIZE aligned. The last patch in the series fixes the query for this.
Matt
... }
Lucas De Marchi
bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
-- 2.48.1
On 26/02/2025 15:12, Matthew Brost wrote:
On Wed, Feb 26, 2025 at 10:38:40AM +0000, Matthew Auld wrote:
On 26/02/2025 04:18, Matthew Brost wrote:
On Tue, Feb 25, 2025 at 09:13:09PM -0600, Lucas De Marchi wrote:
On Wed, Feb 26, 2025 at 10:00:18AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
The bo/ttm interfaces with kernel memory mapping from dedicated GPU memory. It is not correct to assume that SZ_4K would suffice for page alignment as there are a few hardware platforms that commonly uses non-4K pages - for instance, currently, Loongson 3A5000/6000 devices (of the LoongArch architecture) commonly uses 16K kernel pages.
Per my testing Intel Xe/Arc families of GPUs works on at least Loongson 3A6000 platforms so long as "Above 4G Decoding" and "Resizable BAR" were enabled in the EFI firmware settings. I tested this patch series on my Loongson XA61200 (3A6000) motherboard with an Intel Arc A750 GPU.
Without this fix, the kernel will hang at a kernel BUG():
[ 7.425445] ------------[ cut here ]------------ [ 7.430032] kernel BUG at drivers/gpu/drm/drm_gem.c:181! [ 7.435330] Oops - BUG[#1]: [ 7.438099] CPU: 0 UID: 0 PID: 102 Comm: kworker/0:4 Tainted: G E 6.13.3-aosc-main-00336-g60829239b300-dirty #3 [ 7.449511] Tainted: [E]=UNSIGNED_MODULE [ 7.453402] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 7.467144] Workqueue: events work_for_cpu_fn [ 7.471472] pc 9000000001045fa4 ra ffff8000025331dc tp 90000001010c8000 sp 90000001010cb960 [ 7.479770] a0 900000012a3e8000 a1 900000010028c000 a2 000000000005d000 a3 0000000000000000 [ 7.488069] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000001 [ 7.496367] t0 0000000000001000 t1 9000000001045000 t2 0000000000000000 t3 0000000000000000 [ 7.504665] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 7.504667] t8 0000000000000000 u0 90000000029ea7d8 s9 900000012a3e9360 s0 900000010028c000 [ 7.504668] s1 ffff800002744000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000001 [ 7.504669] s5 900000012a3e8000 s6 0000000000000001 s7 0000000000022022 s8 0000000000000000 [ 7.537855] ra: ffff8000025331dc ___xe_bo_create_locked+0x158/0x3b0 [xe] [ 7.544893] ERA: 9000000001045fa4 drm_gem_private_object_init+0xcc/0xd0 [ 7.551639] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 7.557785] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 7.562111] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 7.566870] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 7.571628] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 7.577163] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 7.583128] Modules linked in: xe(E+) drm_gpuvm(E) drm_exec(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) loongson(E) r8169(E) cec(E) rc_core(E) realtek(E) i2c_algo_bit(E) tpm_tis_spi(E) led_class(E) hid_generic(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) la_ow_syscall(E) i2c_dev(E) [ 7.613049] Process kworker/0:4 (pid: 102, threadinfo=00000000bc26ebd1, task=0000000055480707) [ 7.621606] Stack : 0000000000000000 3030303a6963702b 000000000005d000 0000000000000000 [ 7.629563] 0000000000000001 0000000000000000 0000000000000000 8e1bfae42b2f7877 [ 7.637519] 000000000005d000 900000012a3e8000 900000012a3e9360 0000000000000000 [ 7.645475] ffffffffffffffff 0000000000000000 0000000000022022 0000000000000000 [ 7.653431] 0000000000000001 ffff800002533660 0000000000022022 9000000000234470 [ 7.661386] 90000001010cba28 0000000000001000 0000000000000000 000000000005c300 [ 7.669342] 900000012a3e8000 0000000000000000 0000000000000001 900000012a3e8000 [ 7.677298] ffffffffffffffff 0000000000022022 900000012a3e9498 ffff800002533a14 [ 7.685254] 0000000000022022 0000000000000000 900000000209c000 90000000010589e0 [ 7.693209] 90000001010cbab8 ffff8000027c78c0 fffffffffffff000 900000012a3e8000 [ 7.701165] ... [ 7.703588] Call Trace: [ 7.703590] [<9000000001045fa4>] drm_gem_private_object_init+0xcc/0xd0 [ 7.712496] [<ffff8000025331d8>] ___xe_bo_create_locked+0x154/0x3b0 [xe] [ 7.719268] [<ffff80000253365c>] __xe_bo_create_locked+0x228/0x304 [xe] [ 7.725951] [<ffff800002533a10>] xe_bo_create_pin_map_at_aligned+0x70/0x1b0 [xe] [ 7.733410] [<ffff800002533c7c>] xe_managed_bo_create_pin_map+0x34/0xcc [xe] [ 7.740522] [<ffff800002533d58>] xe_managed_bo_create_from_data+0x44/0xb0 [xe] [ 7.747807] [<ffff80000258d19c>] xe_uc_fw_init+0x3ec/0x904 [xe] [ 7.753814] [<ffff80000254a478>] xe_guc_init+0x30/0x3dc [xe] [ 7.759553] [<ffff80000258bc04>] xe_uc_init+0x20/0xf0 [xe] [ 7.765121] [<ffff800002542abc>] xe_gt_init_hwconfig+0x5c/0xd0 [xe] [ 7.771461] [<ffff800002537204>] xe_device_probe+0x240/0x588 [xe] [ 7.777627] [<ffff800002575448>] xe_pci_probe+0x6c0/0xa6c [xe] [ 7.783540] [<9000000000e9828c>] local_pci_probe+0x4c/0xb4 [ 7.788989] [<90000000002aa578>] work_for_cpu_fn+0x20/0x40 [ 7.794436] [<90000000002aeb50>] process_one_work+0x1a4/0x458 [ 7.800143] [<90000000002af5a0>] worker_thread+0x304/0x3fc [ 7.805591] [<90000000002bacac>] kthread+0x114/0x138 [ 7.810520] [<9000000000241f64>] ret_from_kernel_thread+0x8/0xa4 [ 7.816489] [ 7.817961] Code: 4c000020 29c3e2f9 53ff93ff <002a0001> 0015002c 03400000 02ff8063 29c04077 001500f7 [ 7.827651] [ 7.829140] ---[ end trace 0000000000000000 ]---
Revise all instances of `SZ_4K' with `PAGE_SIZE' and revise the call to `drm_gem_private_object_init()' in `*___xe_bo_create_locked()' (last call before BUG()) to use `size_t aligned_size' calculated from `PAGE_SIZE' to fix the above error.
Cc: stable@vger.kernel.org Fixes: 4e03b584143e ("drm/xe/uapi: Reject bo creation of unaligned size") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_bo.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3f5391d416d469c636d951dd6f0a2b3b5ae95dab..dd03c581441f352eff51d0eafe1298fca7d9653d 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1441,9 +1441,9 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, flags |= XE_BO_FLAG_INTERNAL_64K; alignment = align >> PAGE_SHIFT; } else {
} else if (type == ttm_bo_type_device) { new code /w PAGE_SIZE } else { old code /w SZ_4K (or maybe XE_PAGE_SIZE now)? }
See below for further explaination.
aligned_size = ALIGN(size, SZ_4K);
aligned_size = ALIGN(size, PAGE_SIZE);
in the very beginning of the driver we were set to use XE_PAGE_SIZE for things like this. It seems thing went side ways though.
Thanks for fixing these. XE_PAGE_SIZE is always 4k, but I think we should uxe XE_PAGE_SIZE for clarity. For others in Cc... any thoughts?
It looks like you have a typo here, Lucas. Could you please clarify?
However, XE_PAGE_SIZE should always be 4k, as it refers to the GPU page size, which is fixed.
I think using PAGE_SIZE makes sense in some cases. See my other comments.
flags &= ~XE_BO_FLAG_INTERNAL_64K;
alignment = SZ_4K >> PAGE_SHIFT;
alignment = PAGE_SIZE >> PAGE_SHIFT;
}
if (type == ttm_bo_type_device && aligned_size != size)
@@ -1457,7 +1457,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
bo->ccs_cleared = false; bo->tile = tile;
- bo->size = size;
- bo->size = aligned_size;
the interface of this function is that the caller needs to pass the correct size, it's not really expected the function will adjust it and the check is there to gurantee to return the appropriate error. There
Let me expand further on Lucas's comment. We reject user BOs that are unaligned here in ___xe_bo_create_locked.
1490 if (type == ttm_bo_type_device && aligned_size != size) 1491 return ERR_PTR(-EINVAL);
What we allow are kernel BOs (!= ttm_bo_type_device), which are never mapped to the CPU or the PPGTT (user GPU mappings), to be a smaller size. Examples of this include memory for GPU page tables, LRC state, etc. Memory for GPU page tables is always allocated in 4k blocks, so changing the allocation to the CPU page size of 16k or 64k would be wasteful.
AFAIK, kernel memory is always a VRAM allocation, so we don't have any CPU page size requirements. If this is not true (I haven't checked), or perhaps just to future-proof, change the snippet in my first comment to:
} else if (type == ttm_bo_type_device || flags & XE_BO_FLAG_SYSTEM)) { new code /w PAGE_SIZE } else { old code /w SZ_4K }
Then change BO assignment size too:
bo->size = flags & XE_BO_FLAG_SYSTEM ? aligned_size : size;
This should enable kernel VRAM allocations to be smaller than the CPU page size (I think). Can you try out this suggestion and see if the Xe boots with non-4k pages?
The vram allocator chunk size is PAGE_SIZE so that would also need some attention, I think.
Agree. So I think __xe_ttm_vram_mgr_init should be called with s/PAGE_SIZE/SZ_4K?
Should be fine from allocator pov. But also need to update the upper layers in the VRAM manager itself, I think.
But I think we also then need to deal with the assert in: https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/gpu/drm/drm_gem.c#....
Yep. I think that would need to be adjusted as well to be bypassed if we are never going to CPU map the BOโspecifically, CPU map it to user space or if the BO is not in VRAM. For kernel VRAM mapping, this resolves to an offset within an existing large PCIe BAR mapping, so allocations unaligned to PAGE_SIZE should work.
Yeah, agree. I thinks it's possible.
Maybe export __drm_gem_private_object_init, which skips the BUG_ON, and call this in Xe to avoid interfering with other drivers' expectations?
Some other places I spotted are the VRAM manager, and stuff like xe_bo_vmap() and then into TTM itself. So it might be quite widespread.
Matt
Also others in Cc... thoughts / double check my input?
are other places that would need some additional fixes leading to this function. Example:
xe_gem_create_ioctl() { ... if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK)) return -EINVAL;
This actually looks right, the minimum allocation size for user BOs should be PAGE_SIZE aligned. The last patch in the series fixes the query for this.
Matt
... }
Lucas De Marchi
bo->flags = flags; bo->cpu_caching = cpu_caching; bo->ttm.base.funcs = &xe_gem_object_funcs; @@ -1468,7 +1468,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo, #endif INIT_LIST_HEAD(&bo->vram_userfault_link);
- drm_gem_private_object_init(&xe->drm, &bo->ttm.base, size);
drm_gem_private_object_init(&xe->drm, &bo->ttm.base, aligned_size);
if (resv) { ctx.allow_res_evict = !(flags & XE_BO_FLAG_NO_RESV_EVICT);
-- 2.48.1
From: Mingcong Bai jeffbai@aosc.io
Per the "Firmware" chapter in "drm/xe Intel GFX Driver", as well as "Volume 8: Command Stream Programming" in "Intelยฎ Arcโข A-Series Graphics and Intel Data Center GPU Flex Series Open-Source Programmer's Reference Manual For the discrete GPUs code named "Alchemist" and "Arctic Sound-M"" and "Intelยฎ Irisยฎ Xe MAX Graphics Open Source Programmer's Reference Manual For the 2020 Discrete GPU formerly named "DG1"":
"The RINGBUF register sets (defined in Memory Interface Registers) are used to specify the ring buffer memory areas. The ring buffer must start on a 4KB boundary and be allocated in linear memory. The length of any one ring buffer is limited to 2MB."
The Graphics micro (ฮผ) Controller (GuC) really expects command buffers aligned to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures after loading the GuC firmware:
[ 7.398317] xe 0000:09:00.0: [drm] Found dg2/g10 (device ID 56a1) display version 13.00 stepping C0 [ 7.410429] xe 0000:09:00.0: [drm] Using GuC firmware from i915/dg2_guc_70.bin version 70.36.0 [ 10.719989] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status = 0x800001EC, time = 3297ms, freq = 2400MHz (req 2400MHz), done = 0 [ 10.732106] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x76, UKernel = 0x01, MIA = 0x00, Auth = 0x02 [ 10.744214] xe 0000:09:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:09:00.0 as wedged. Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new [ 10.828908] xe 0000:09:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100
Correct this by revising all instances of `PAGE_SIZE' to `SZ_4K' and revise `PAGE_ALIGN()' calls to `ALIGN()' with `SZ_4K' as the second argument (overriding `PAGE_SIZE').
Cc: stable@vger.kernel.org Fixes: 84d15f426110 ("drm/xe/guc: Add capture size check in GuC log buffer") Fixes: 9c8c7a7e6f1f ("drm/xe/guc: Prepare GuC register list and update ADS size for error capture") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Co-developed-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Mingcong Bai jeffbai@aosc.io --- drivers/gpu/drm/xe/xe_guc.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++---------------- drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++---- drivers/gpu/drm/xe/xe_guc_ct.c | 2 +- drivers/gpu/drm/xe/xe_guc_log.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++-- 6 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 408365dfe4eed02336bbd208b60491aea27a8a6e..595873780a5774501f04b2f01ebdf8a45c7ac931 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -88,7 +88,7 @@ static u32 guc_ctl_feature_flags(struct xe_guc *guc)
static u32 guc_ctl_log_params_flags(struct xe_guc *guc) { - u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> PAGE_SHIFT; + u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> XE_PTE_SHIFT; u32 flags;
#if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0) @@ -141,7 +141,7 @@ static u32 guc_ctl_log_params_flags(struct xe_guc *guc)
static u32 guc_ctl_ads_flags(struct xe_guc *guc) { - u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> PAGE_SHIFT; + u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> XE_PTE_SHIFT; u32 flags = ads << GUC_ADS_ADDR_SHIFT;
return flags; diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index fab259adc380be28c79fae5946e123427359ec60..65e88ad43e8adef752889300abd0197a0ac4a1a3 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -143,17 +143,17 @@ static size_t guc_ads_regset_size(struct xe_guc_ads *ads)
static size_t guc_ads_golden_lrc_size(struct xe_guc_ads *ads) { - return PAGE_ALIGN(ads->golden_lrc_size); + return ALIGN(ads->golden_lrc_size, SZ_4K); }
static u32 guc_ads_waklv_size(struct xe_guc_ads *ads) { - return PAGE_ALIGN(ads->ads_waklv_size); + return ALIGN(ads->ads_waklv_size, SZ_4K); }
static size_t guc_ads_capture_size(struct xe_guc_ads *ads) { - return PAGE_ALIGN(ads->capture_size); + return ALIGN(ads->capture_size, SZ_4K); }
static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) @@ -168,7 +168,7 @@ static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads)
static size_t guc_ads_private_data_size(struct xe_guc_ads *ads) { - return PAGE_ALIGN(ads_to_guc(ads)->fw.private_data_size); + return ALIGN(ads_to_guc(ads)->fw.private_data_size, SZ_4K); }
static size_t guc_ads_regset_offset(struct xe_guc_ads *ads) @@ -183,7 +183,7 @@ static size_t guc_ads_golden_lrc_offset(struct xe_guc_ads *ads) offset = guc_ads_regset_offset(ads) + guc_ads_regset_size(ads);
- return PAGE_ALIGN(offset); + return ALIGN(offset, SZ_4K); }
static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) @@ -193,7 +193,7 @@ static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) offset = guc_ads_golden_lrc_offset(ads) + guc_ads_golden_lrc_size(ads);
- return PAGE_ALIGN(offset); + return ALIGN(offset, SZ_4K); }
static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) @@ -203,7 +203,7 @@ static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) offset = guc_ads_waklv_offset(ads) + guc_ads_waklv_size(ads);
- return PAGE_ALIGN(offset); + return ALIGN(offset, SZ_4K); }
static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) @@ -213,7 +213,7 @@ static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) offset = guc_ads_capture_offset(ads) + guc_ads_capture_size(ads);
- return PAGE_ALIGN(offset); + return ALIGN(offset, SZ_4K); }
static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) @@ -223,7 +223,7 @@ static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) offset = guc_ads_um_queues_offset(ads) + guc_ads_um_queues_size(ads);
- return PAGE_ALIGN(offset); + return ALIGN(offset, SZ_4K); }
static size_t guc_ads_size(struct xe_guc_ads *ads) @@ -276,7 +276,7 @@ static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) continue;
real_size = xe_gt_lrc_size(gt, class); - alloc_size = PAGE_ALIGN(real_size); + alloc_size = ALIGN(real_size, SZ_4K); total_size += alloc_size; }
@@ -612,12 +612,12 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) offsetof(struct __guc_ads_blob, system_info));
/* first, set aside the first page for a capture_list with zero descriptors */ - total_size = PAGE_SIZE; + total_size = SZ_4K; if (!xe_guc_capture_getnullheader(guc, &ptr, &size)) xe_map_memcpy_to(ads_to_xe(ads), ads_to_map(ads), capture_offset, ptr, size);
null_ggtt = ads_ggtt + capture_offset; - capture_offset += PAGE_SIZE; + capture_offset += SZ_4K;
/* * Populate capture list : at this point adps is already allocated and @@ -681,10 +681,10 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) } }
- if (ads->capture_size != PAGE_ALIGN(total_size)) + if (ads->capture_size != ALIGN(total_size, SZ_4K)) xe_gt_dbg(gt, "ADS capture alloc size changed from %d to %d\n", - ads->capture_size, PAGE_ALIGN(total_size)); - return PAGE_ALIGN(total_size); + ads->capture_size, ALIGN(total_size, SZ_4K)); + return ALIGN(total_size, SZ_4K); }
static void guc_mmio_regset_write_one(struct xe_guc_ads *ads, @@ -928,7 +928,7 @@ static void guc_populate_golden_lrc(struct xe_guc_ads *ads) xe_gt_assert(gt, gt->default_lrc[class]);
real_size = xe_gt_lrc_size(gt, class); - alloc_size = PAGE_ALIGN(real_size); + alloc_size = ALIGN(real_size, SZ_4K); total_size += alloc_size;
/* diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c index f6d523e4c5feb7f07d695af90f4c44c7a9072c2d..dac51f8720fc6c7d27baa31a1b5c567f560e8c1f 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.c +++ b/drivers/gpu/drm/xe/xe_guc_capture.c @@ -590,8 +590,8 @@ guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, return -ENODATA;
if (size) - *size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) + - (num_regs * sizeof(struct guc_mmio_reg))); + *size = ALIGN((sizeof(struct guc_debug_capture_list)) + + (num_regs * sizeof(struct guc_mmio_reg)), SZ_4K);
return 0; } @@ -738,7 +738,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) * sequence, that is, during the pre-hwconfig phase before we have * the exact engine fusing info. */ - total_size = PAGE_SIZE; /* Pad a page in front for empty lists */ + total_size = SZ_4K; /* Pad a page in front for empty lists */ for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) { for (j = 0; j < GUC_CAPTURE_LIST_CLASS_MAX; j++) { if (xe_guc_capture_getlistsize(guc, i, @@ -758,7 +758,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) total_size += global_size; }
- return PAGE_ALIGN(total_size); + return ALIGN(total_size, SZ_4K); }
static int guc_capture_output_size_est(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 72ad576fc18eb583110b44b118abeba4c6be936a..a58c58e599122f3e9ebd1e8374c17c3b4663a5ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -212,7 +212,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) struct xe_bo *bo; int err;
- xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE)); + xe_gt_assert(gt, !(guc_ct_size() % SZ_4K));
ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", WQ_MEM_RECLAIM); if (!ct->g2h_wq) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index 0ca3056d8bd3fa37bdb79a7a71ef671270771657..9975005732f645b4735f95fbae8ebe431e793ebe 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -58,7 +58,7 @@ static size_t guc_log_size(void) * | Capture logs | * +===============================+ + CAPTURE_SIZE */ - return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + + return SZ_4K + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE; }
@@ -331,7 +331,7 @@ u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type) { enum guc_log_buffer_type i; - u32 offset = PAGE_SIZE;/* for the log_buffer_states */ + u32 offset = SZ_4K; /* for the log_buffer_states */
for (i = GUC_LOG_BUFFER_CRASH_DUMP; i < GUC_LOG_BUFFER_TYPE_MAX; ++i) { if (i == type) diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index df7f130fb663fc2fd170a94cc1b835b4b4cca167..0f97c6310a3a5696490aaa4827eb3aa0d45ea6d6 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -1000,7 +1000,7 @@ int xe_guc_pc_start(struct xe_guc_pc *pc) { struct xe_device *xe = pc_to_xe(pc); struct xe_gt *gt = pc_to_gt(pc); - u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data)); + u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); unsigned int fw_ref; int ret;
@@ -1110,7 +1110,7 @@ int xe_guc_pc_init(struct xe_guc_pc *pc) struct xe_tile *tile = gt_to_tile(gt); struct xe_device *xe = gt_to_xe(gt); struct xe_bo *bo; - u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data)); + u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); int err;
if (xe->info.skip_guc_pc)
On Wed, Feb 26, 2025 at 10:00:19AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
Per the "Firmware" chapter in "drm/xe Intel GFX Driver", as well as "Volume 8: Command Stream Programming" in "Intelยฎ Arcโข A-Series Graphics and Intel Data Center GPU Flex Series Open-Source Programmer's Reference Manual For the discrete GPUs code named "Alchemist" and "Arctic Sound-M"" and "Intelยฎ Irisยฎ Xe MAX Graphics Open Source Programmer's Reference Manual For the 2020 Discrete GPU formerly named "DG1"":
"The RINGBUF register sets (defined in Memory Interface Registers) are used to specify the ring buffer memory areas. The ring buffer must start on a 4KB boundary and be allocated in linear memory. The length of any one ring buffer is limited to 2MB."
The Graphics micro (ฮผ) Controller (GuC) really expects command buffers aligned to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures after loading the GuC firmware:
[ 7.398317] xe 0000:09:00.0: [drm] Found dg2/g10 (device ID 56a1) display version 13.00 stepping C0 [ 7.410429] xe 0000:09:00.0: [drm] Using GuC firmware from i915/dg2_guc_70.bin version 70.36.0 [ 10.719989] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status = 0x800001EC, time = 3297ms, freq = 2400MHz (req 2400MHz), done = 0 [ 10.732106] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x76, UKernel = 0x01, MIA = 0x00, Auth = 0x02 [ 10.744214] xe 0000:09:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:09:00.0 as wedged. Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new [ 10.828908] xe 0000:09:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100
Correct this by revising all instances of `PAGE_SIZE' to `SZ_4K' and revise `PAGE_ALIGN()' calls to `ALIGN()' with `SZ_4K' as the second argument (overriding `PAGE_SIZE').
Cc: stable@vger.kernel.org Fixes: 84d15f426110 ("drm/xe/guc: Add capture size check in GuC log buffer") Fixes: 9c8c7a7e6f1f ("drm/xe/guc: Prepare GuC register list and update ADS size for error capture") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Co-developed-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Mingcong Bai jeffbai@aosc.io
Reviewd-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/xe/xe_guc.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++---------------- drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++---- drivers/gpu/drm/xe/xe_guc_ct.c | 2 +- drivers/gpu/drm/xe/xe_guc_log.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++-- 6 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 408365dfe4eed02336bbd208b60491aea27a8a6e..595873780a5774501f04b2f01ebdf8a45c7ac931 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -88,7 +88,7 @@ static u32 guc_ctl_feature_flags(struct xe_guc *guc) static u32 guc_ctl_log_params_flags(struct xe_guc *guc) {
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> PAGE_SHIFT;
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> XE_PTE_SHIFT; u32 flags;
#if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0) @@ -141,7 +141,7 @@ static u32 guc_ctl_log_params_flags(struct xe_guc *guc) static u32 guc_ctl_ads_flags(struct xe_guc *guc) {
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> PAGE_SHIFT;
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> XE_PTE_SHIFT; u32 flags = ads << GUC_ADS_ADDR_SHIFT;
return flags; diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index fab259adc380be28c79fae5946e123427359ec60..65e88ad43e8adef752889300abd0197a0ac4a1a3 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -143,17 +143,17 @@ static size_t guc_ads_regset_size(struct xe_guc_ads *ads) static size_t guc_ads_golden_lrc_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->golden_lrc_size);
- return ALIGN(ads->golden_lrc_size, SZ_4K);
} static u32 guc_ads_waklv_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->ads_waklv_size);
- return ALIGN(ads->ads_waklv_size, SZ_4K);
} static size_t guc_ads_capture_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->capture_size);
- return ALIGN(ads->capture_size, SZ_4K);
} static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) @@ -168,7 +168,7 @@ static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) static size_t guc_ads_private_data_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads_to_guc(ads)->fw.private_data_size);
- return ALIGN(ads_to_guc(ads)->fw.private_data_size, SZ_4K);
} static size_t guc_ads_regset_offset(struct xe_guc_ads *ads) @@ -183,7 +183,7 @@ static size_t guc_ads_golden_lrc_offset(struct xe_guc_ads *ads) offset = guc_ads_regset_offset(ads) + guc_ads_regset_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) @@ -193,7 +193,7 @@ static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) offset = guc_ads_golden_lrc_offset(ads) + guc_ads_golden_lrc_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) @@ -203,7 +203,7 @@ static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) offset = guc_ads_waklv_offset(ads) + guc_ads_waklv_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) @@ -213,7 +213,7 @@ static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) offset = guc_ads_capture_offset(ads) + guc_ads_capture_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) @@ -223,7 +223,7 @@ static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) offset = guc_ads_um_queues_offset(ads) + guc_ads_um_queues_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_size(struct xe_guc_ads *ads) @@ -276,7 +276,7 @@ static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) continue; real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
total_size += alloc_size; }alloc_size = ALIGN(real_size, SZ_4K);
@@ -612,12 +612,12 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) offsetof(struct __guc_ads_blob, system_info)); /* first, set aside the first page for a capture_list with zero descriptors */
- total_size = PAGE_SIZE;
- total_size = SZ_4K; if (!xe_guc_capture_getnullheader(guc, &ptr, &size)) xe_map_memcpy_to(ads_to_xe(ads), ads_to_map(ads), capture_offset, ptr, size);
null_ggtt = ads_ggtt + capture_offset;
- capture_offset += PAGE_SIZE;
- capture_offset += SZ_4K;
/* * Populate capture list : at this point adps is already allocated and @@ -681,10 +681,10 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) } }
- if (ads->capture_size != PAGE_ALIGN(total_size))
- if (ads->capture_size != ALIGN(total_size, SZ_4K)) xe_gt_dbg(gt, "ADS capture alloc size changed from %d to %d\n",
ads->capture_size, PAGE_ALIGN(total_size));
- return PAGE_ALIGN(total_size);
ads->capture_size, ALIGN(total_size, SZ_4K));
- return ALIGN(total_size, SZ_4K);
} static void guc_mmio_regset_write_one(struct xe_guc_ads *ads, @@ -928,7 +928,7 @@ static void guc_populate_golden_lrc(struct xe_guc_ads *ads) xe_gt_assert(gt, gt->default_lrc[class]); real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
total_size += alloc_size;alloc_size = ALIGN(real_size, SZ_4K);
/* diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c index f6d523e4c5feb7f07d695af90f4c44c7a9072c2d..dac51f8720fc6c7d27baa31a1b5c567f560e8c1f 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.c +++ b/drivers/gpu/drm/xe/xe_guc_capture.c @@ -590,8 +590,8 @@ guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, return -ENODATA; if (size)
*size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) +
(num_regs * sizeof(struct guc_mmio_reg)));
*size = ALIGN((sizeof(struct guc_debug_capture_list)) +
(num_regs * sizeof(struct guc_mmio_reg)), SZ_4K);
return 0; } @@ -738,7 +738,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) * sequence, that is, during the pre-hwconfig phase before we have * the exact engine fusing info. */
- total_size = PAGE_SIZE; /* Pad a page in front for empty lists */
- total_size = SZ_4K; /* Pad a page in front for empty lists */ for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) { for (j = 0; j < GUC_CAPTURE_LIST_CLASS_MAX; j++) { if (xe_guc_capture_getlistsize(guc, i,
@@ -758,7 +758,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) total_size += global_size; }
- return PAGE_ALIGN(total_size);
- return ALIGN(total_size, SZ_4K);
} static int guc_capture_output_size_est(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 72ad576fc18eb583110b44b118abeba4c6be936a..a58c58e599122f3e9ebd1e8374c17c3b4663a5ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -212,7 +212,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) struct xe_bo *bo; int err;
- xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE));
- xe_gt_assert(gt, !(guc_ct_size() % SZ_4K));
ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", WQ_MEM_RECLAIM); if (!ct->g2h_wq) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index 0ca3056d8bd3fa37bdb79a7a71ef671270771657..9975005732f645b4735f95fbae8ebe431e793ebe 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -58,7 +58,7 @@ static size_t guc_log_size(void) * | Capture logs | * +===============================+ + CAPTURE_SIZE */
- return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
- return SZ_4K + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
} @@ -331,7 +331,7 @@ u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type) { enum guc_log_buffer_type i;
- u32 offset = PAGE_SIZE;/* for the log_buffer_states */
- u32 offset = SZ_4K; /* for the log_buffer_states */
for (i = GUC_LOG_BUFFER_CRASH_DUMP; i < GUC_LOG_BUFFER_TYPE_MAX; ++i) { if (i == type) diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index df7f130fb663fc2fd170a94cc1b835b4b4cca167..0f97c6310a3a5696490aaa4827eb3aa0d45ea6d6 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -1000,7 +1000,7 @@ int xe_guc_pc_start(struct xe_guc_pc *pc) { struct xe_device *xe = pc_to_xe(pc); struct xe_gt *gt = pc_to_gt(pc);
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); unsigned int fw_ref; int ret;
@@ -1110,7 +1110,7 @@ int xe_guc_pc_init(struct xe_guc_pc *pc) struct xe_tile *tile = gt_to_tile(gt); struct xe_device *xe = gt_to_xe(gt); struct xe_bo *bo;
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); int err;
if (xe->info.skip_guc_pc)
-- 2.48.1
On Wed, Feb 26, 2025 at 10:00:19AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
Per the "Firmware" chapter in "drm/xe Intel GFX Driver", as well as "Volume 8: Command Stream Programming" in "Intelยฎ Arcโข A-Series Graphics and Intel Data Center GPU Flex Series Open-Source Programmer's Reference Manual For the discrete GPUs code named "Alchemist" and "Arctic Sound-M"" and "Intelยฎ Irisยฎ Xe MAX Graphics Open Source Programmer's Reference Manual For the 2020 Discrete GPU formerly named "DG1"":
"The RINGBUF register sets (defined in Memory Interface Registers) are used to specify the ring buffer memory areas. The ring buffer must start on a 4KB boundary and be allocated in linear memory. The length of any one ring buffer is limited to 2MB."
The Graphics micro (ฮผ) Controller (GuC) really expects command buffers aligned to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures after loading the GuC firmware:
[ 7.398317] xe 0000:09:00.0: [drm] Found dg2/g10 (device ID 56a1) display version 13.00 stepping C0 [ 7.410429] xe 0000:09:00.0: [drm] Using GuC firmware from i915/dg2_guc_70.bin version 70.36.0 [ 10.719989] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status = 0x800001EC, time = 3297ms, freq = 2400MHz (req 2400MHz), done = 0 [ 10.732106] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x76, UKernel = 0x01, MIA = 0x00, Auth = 0x02 [ 10.744214] xe 0000:09:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:09:00.0 as wedged. Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new [ 10.828908] xe 0000:09:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100
Correct this by revising all instances of `PAGE_SIZE' to `SZ_4K' and revise `PAGE_ALIGN()' calls to `ALIGN()' with `SZ_4K' as the second argument (overriding `PAGE_SIZE').
Cc: stable@vger.kernel.org Fixes: 84d15f426110 ("drm/xe/guc: Add capture size check in GuC log buffer") Fixes: 9c8c7a7e6f1f ("drm/xe/guc: Prepare GuC register list and update ADS size for error capture") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Co-developed-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Mingcong Bai jeffbai@aosc.io
Typo in last reply: Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/xe/xe_guc.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++---------------- drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++---- drivers/gpu/drm/xe/xe_guc_ct.c | 2 +- drivers/gpu/drm/xe/xe_guc_log.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++-- 6 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 408365dfe4eed02336bbd208b60491aea27a8a6e..595873780a5774501f04b2f01ebdf8a45c7ac931 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -88,7 +88,7 @@ static u32 guc_ctl_feature_flags(struct xe_guc *guc) static u32 guc_ctl_log_params_flags(struct xe_guc *guc) {
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> PAGE_SHIFT;
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> XE_PTE_SHIFT; u32 flags;
#if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0) @@ -141,7 +141,7 @@ static u32 guc_ctl_log_params_flags(struct xe_guc *guc) static u32 guc_ctl_ads_flags(struct xe_guc *guc) {
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> PAGE_SHIFT;
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> XE_PTE_SHIFT; u32 flags = ads << GUC_ADS_ADDR_SHIFT;
return flags; diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index fab259adc380be28c79fae5946e123427359ec60..65e88ad43e8adef752889300abd0197a0ac4a1a3 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -143,17 +143,17 @@ static size_t guc_ads_regset_size(struct xe_guc_ads *ads) static size_t guc_ads_golden_lrc_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->golden_lrc_size);
- return ALIGN(ads->golden_lrc_size, SZ_4K);
} static u32 guc_ads_waklv_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->ads_waklv_size);
- return ALIGN(ads->ads_waklv_size, SZ_4K);
} static size_t guc_ads_capture_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->capture_size);
- return ALIGN(ads->capture_size, SZ_4K);
} static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) @@ -168,7 +168,7 @@ static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) static size_t guc_ads_private_data_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads_to_guc(ads)->fw.private_data_size);
- return ALIGN(ads_to_guc(ads)->fw.private_data_size, SZ_4K);
} static size_t guc_ads_regset_offset(struct xe_guc_ads *ads) @@ -183,7 +183,7 @@ static size_t guc_ads_golden_lrc_offset(struct xe_guc_ads *ads) offset = guc_ads_regset_offset(ads) + guc_ads_regset_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) @@ -193,7 +193,7 @@ static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) offset = guc_ads_golden_lrc_offset(ads) + guc_ads_golden_lrc_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) @@ -203,7 +203,7 @@ static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) offset = guc_ads_waklv_offset(ads) + guc_ads_waklv_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) @@ -213,7 +213,7 @@ static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) offset = guc_ads_capture_offset(ads) + guc_ads_capture_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) @@ -223,7 +223,7 @@ static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) offset = guc_ads_um_queues_offset(ads) + guc_ads_um_queues_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_size(struct xe_guc_ads *ads) @@ -276,7 +276,7 @@ static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) continue; real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
total_size += alloc_size; }alloc_size = ALIGN(real_size, SZ_4K);
@@ -612,12 +612,12 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) offsetof(struct __guc_ads_blob, system_info)); /* first, set aside the first page for a capture_list with zero descriptors */
- total_size = PAGE_SIZE;
- total_size = SZ_4K; if (!xe_guc_capture_getnullheader(guc, &ptr, &size)) xe_map_memcpy_to(ads_to_xe(ads), ads_to_map(ads), capture_offset, ptr, size);
null_ggtt = ads_ggtt + capture_offset;
- capture_offset += PAGE_SIZE;
- capture_offset += SZ_4K;
/* * Populate capture list : at this point adps is already allocated and @@ -681,10 +681,10 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) } }
- if (ads->capture_size != PAGE_ALIGN(total_size))
- if (ads->capture_size != ALIGN(total_size, SZ_4K)) xe_gt_dbg(gt, "ADS capture alloc size changed from %d to %d\n",
ads->capture_size, PAGE_ALIGN(total_size));
- return PAGE_ALIGN(total_size);
ads->capture_size, ALIGN(total_size, SZ_4K));
- return ALIGN(total_size, SZ_4K);
} static void guc_mmio_regset_write_one(struct xe_guc_ads *ads, @@ -928,7 +928,7 @@ static void guc_populate_golden_lrc(struct xe_guc_ads *ads) xe_gt_assert(gt, gt->default_lrc[class]); real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
total_size += alloc_size;alloc_size = ALIGN(real_size, SZ_4K);
/* diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c index f6d523e4c5feb7f07d695af90f4c44c7a9072c2d..dac51f8720fc6c7d27baa31a1b5c567f560e8c1f 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.c +++ b/drivers/gpu/drm/xe/xe_guc_capture.c @@ -590,8 +590,8 @@ guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, return -ENODATA; if (size)
*size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) +
(num_regs * sizeof(struct guc_mmio_reg)));
*size = ALIGN((sizeof(struct guc_debug_capture_list)) +
(num_regs * sizeof(struct guc_mmio_reg)), SZ_4K);
return 0; } @@ -738,7 +738,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) * sequence, that is, during the pre-hwconfig phase before we have * the exact engine fusing info. */
- total_size = PAGE_SIZE; /* Pad a page in front for empty lists */
- total_size = SZ_4K; /* Pad a page in front for empty lists */ for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) { for (j = 0; j < GUC_CAPTURE_LIST_CLASS_MAX; j++) { if (xe_guc_capture_getlistsize(guc, i,
@@ -758,7 +758,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) total_size += global_size; }
- return PAGE_ALIGN(total_size);
- return ALIGN(total_size, SZ_4K);
} static int guc_capture_output_size_est(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 72ad576fc18eb583110b44b118abeba4c6be936a..a58c58e599122f3e9ebd1e8374c17c3b4663a5ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -212,7 +212,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) struct xe_bo *bo; int err;
- xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE));
- xe_gt_assert(gt, !(guc_ct_size() % SZ_4K));
ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", WQ_MEM_RECLAIM); if (!ct->g2h_wq) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index 0ca3056d8bd3fa37bdb79a7a71ef671270771657..9975005732f645b4735f95fbae8ebe431e793ebe 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -58,7 +58,7 @@ static size_t guc_log_size(void) * | Capture logs | * +===============================+ + CAPTURE_SIZE */
- return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
- return SZ_4K + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
} @@ -331,7 +331,7 @@ u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type) { enum guc_log_buffer_type i;
- u32 offset = PAGE_SIZE;/* for the log_buffer_states */
- u32 offset = SZ_4K; /* for the log_buffer_states */
for (i = GUC_LOG_BUFFER_CRASH_DUMP; i < GUC_LOG_BUFFER_TYPE_MAX; ++i) { if (i == type) diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index df7f130fb663fc2fd170a94cc1b835b4b4cca167..0f97c6310a3a5696490aaa4827eb3aa0d45ea6d6 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -1000,7 +1000,7 @@ int xe_guc_pc_start(struct xe_guc_pc *pc) { struct xe_device *xe = pc_to_xe(pc); struct xe_gt *gt = pc_to_gt(pc);
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); unsigned int fw_ref; int ret;
@@ -1110,7 +1110,7 @@ int xe_guc_pc_init(struct xe_guc_pc *pc) struct xe_tile *tile = gt_to_tile(gt); struct xe_device *xe = gt_to_xe(gt); struct xe_bo *bo;
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); int err;
if (xe->info.skip_guc_pc)
-- 2.48.1
Hi,
On Tue, Feb 25, 2025 at 09:00:49PM -0800, Matthew Brost wrote:
On Wed, Feb 26, 2025 at 10:00:19AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
Per the "Firmware" chapter in "drm/xe Intel GFX Driver", as well as "Volume 8: Command Stream Programming" in "Intelยฎ Arcโข A-Series Graphics and Intel Data Center GPU Flex Series Open-Source Programmer's Reference Manual For the discrete GPUs code named "Alchemist" and "Arctic Sound-M"" and "Intelยฎ Irisยฎ Xe MAX Graphics Open Source Programmer's Reference Manual For the 2020 Discrete GPU formerly named "DG1"":
"The RINGBUF register sets (defined in Memory Interface Registers) are used to specify the ring buffer memory areas. The ring buffer must start on a 4KB boundary and be allocated in linear memory. The length of any one ring buffer is limited to 2MB."
The Graphics micro (ฮผ) Controller (GuC) really expects command buffers aligned to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures after loading the GuC firmware:
[ 7.398317] xe 0000:09:00.0: [drm] Found dg2/g10 (device ID 56a1) display version 13.00 stepping C0 [ 7.410429] xe 0000:09:00.0: [drm] Using GuC firmware from i915/dg2_guc_70.bin version 70.36.0 [ 10.719989] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status = 0x800001EC, time = 3297ms, freq = 2400MHz (req 2400MHz), done = 0 [ 10.732106] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x76, UKernel = 0x01, MIA = 0x00, Auth = 0x02 [ 10.744214] xe 0000:09:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:09:00.0 as wedged. Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new [ 10.828908] xe 0000:09:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100
Correct this by revising all instances of `PAGE_SIZE' to `SZ_4K' and revise `PAGE_ALIGN()' calls to `ALIGN()' with `SZ_4K' as the second argument (overriding `PAGE_SIZE').
Cc: stable@vger.kernel.org Fixes: 84d15f426110 ("drm/xe/guc: Add capture size check in GuC log buffer") Fixes: 9c8c7a7e6f1f ("drm/xe/guc: Prepare GuC register list and update ADS size for error capture") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Co-developed-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Mingcong Bai jeffbai@aosc.io
Typo in last reply: Reviewed-by: Matthew Brost matthew.brost@intel.com
Nit: in order to improve clarity and to limit potential mistakes in the future, you could add:
/* GuC really expects command buffers aligned to 4K boundaries. */ #define GUC_ALIGN SZ_4K
Then use s/SZ_4K/GUC_ALIGN/ in your changes. It would make clear this value does not come from PAGE_SIZE or XE_PAGE_SIZE.
Francois
drivers/gpu/drm/xe/xe_guc.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++---------------- drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++---- drivers/gpu/drm/xe/xe_guc_ct.c | 2 +- drivers/gpu/drm/xe/xe_guc_log.c | 4 ++-- drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++-- 6 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 408365dfe4eed02336bbd208b60491aea27a8a6e..595873780a5774501f04b2f01ebdf8a45c7ac931 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -88,7 +88,7 @@ static u32 guc_ctl_feature_flags(struct xe_guc *guc) static u32 guc_ctl_log_params_flags(struct xe_guc *guc) {
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> PAGE_SHIFT;
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> XE_PTE_SHIFT; u32 flags;
#if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0) @@ -141,7 +141,7 @@ static u32 guc_ctl_log_params_flags(struct xe_guc *guc) static u32 guc_ctl_ads_flags(struct xe_guc *guc) {
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> PAGE_SHIFT;
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >> XE_PTE_SHIFT; u32 flags = ads << GUC_ADS_ADDR_SHIFT;
return flags; diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index fab259adc380be28c79fae5946e123427359ec60..65e88ad43e8adef752889300abd0197a0ac4a1a3 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -143,17 +143,17 @@ static size_t guc_ads_regset_size(struct xe_guc_ads *ads) static size_t guc_ads_golden_lrc_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->golden_lrc_size);
- return ALIGN(ads->golden_lrc_size, SZ_4K);
} static u32 guc_ads_waklv_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->ads_waklv_size);
- return ALIGN(ads->ads_waklv_size, SZ_4K);
} static size_t guc_ads_capture_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads->capture_size);
- return ALIGN(ads->capture_size, SZ_4K);
} static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) @@ -168,7 +168,7 @@ static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) static size_t guc_ads_private_data_size(struct xe_guc_ads *ads) {
- return PAGE_ALIGN(ads_to_guc(ads)->fw.private_data_size);
- return ALIGN(ads_to_guc(ads)->fw.private_data_size, SZ_4K);
} static size_t guc_ads_regset_offset(struct xe_guc_ads *ads) @@ -183,7 +183,7 @@ static size_t guc_ads_golden_lrc_offset(struct xe_guc_ads *ads) offset = guc_ads_regset_offset(ads) + guc_ads_regset_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) @@ -193,7 +193,7 @@ static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) offset = guc_ads_golden_lrc_offset(ads) + guc_ads_golden_lrc_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) @@ -203,7 +203,7 @@ static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) offset = guc_ads_waklv_offset(ads) + guc_ads_waklv_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) @@ -213,7 +213,7 @@ static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) offset = guc_ads_capture_offset(ads) + guc_ads_capture_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) @@ -223,7 +223,7 @@ static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) offset = guc_ads_um_queues_offset(ads) + guc_ads_um_queues_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} static size_t guc_ads_size(struct xe_guc_ads *ads) @@ -276,7 +276,7 @@ static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) continue; real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
total_size += alloc_size; }alloc_size = ALIGN(real_size, SZ_4K);
@@ -612,12 +612,12 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) offsetof(struct __guc_ads_blob, system_info)); /* first, set aside the first page for a capture_list with zero descriptors */
- total_size = PAGE_SIZE;
- total_size = SZ_4K; if (!xe_guc_capture_getnullheader(guc, &ptr, &size)) xe_map_memcpy_to(ads_to_xe(ads), ads_to_map(ads), capture_offset, ptr, size);
null_ggtt = ads_ggtt + capture_offset;
- capture_offset += PAGE_SIZE;
- capture_offset += SZ_4K;
/* * Populate capture list : at this point adps is already allocated and @@ -681,10 +681,10 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) } }
- if (ads->capture_size != PAGE_ALIGN(total_size))
- if (ads->capture_size != ALIGN(total_size, SZ_4K)) xe_gt_dbg(gt, "ADS capture alloc size changed from %d to %d\n",
ads->capture_size, PAGE_ALIGN(total_size));
- return PAGE_ALIGN(total_size);
ads->capture_size, ALIGN(total_size, SZ_4K));
- return ALIGN(total_size, SZ_4K);
} static void guc_mmio_regset_write_one(struct xe_guc_ads *ads, @@ -928,7 +928,7 @@ static void guc_populate_golden_lrc(struct xe_guc_ads *ads) xe_gt_assert(gt, gt->default_lrc[class]); real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
total_size += alloc_size;alloc_size = ALIGN(real_size, SZ_4K);
/* diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c index f6d523e4c5feb7f07d695af90f4c44c7a9072c2d..dac51f8720fc6c7d27baa31a1b5c567f560e8c1f 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.c +++ b/drivers/gpu/drm/xe/xe_guc_capture.c @@ -590,8 +590,8 @@ guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, return -ENODATA; if (size)
*size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) +
(num_regs * sizeof(struct guc_mmio_reg)));
*size = ALIGN((sizeof(struct guc_debug_capture_list)) +
(num_regs * sizeof(struct guc_mmio_reg)), SZ_4K);
return 0; } @@ -738,7 +738,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) * sequence, that is, during the pre-hwconfig phase before we have * the exact engine fusing info. */
- total_size = PAGE_SIZE; /* Pad a page in front for empty lists */
- total_size = SZ_4K; /* Pad a page in front for empty lists */ for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) { for (j = 0; j < GUC_CAPTURE_LIST_CLASS_MAX; j++) { if (xe_guc_capture_getlistsize(guc, i,
@@ -758,7 +758,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) total_size += global_size; }
- return PAGE_ALIGN(total_size);
- return ALIGN(total_size, SZ_4K);
} static int guc_capture_output_size_est(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 72ad576fc18eb583110b44b118abeba4c6be936a..a58c58e599122f3e9ebd1e8374c17c3b4663a5ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -212,7 +212,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) struct xe_bo *bo; int err;
- xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE));
- xe_gt_assert(gt, !(guc_ct_size() % SZ_4K));
ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", WQ_MEM_RECLAIM); if (!ct->g2h_wq) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index 0ca3056d8bd3fa37bdb79a7a71ef671270771657..9975005732f645b4735f95fbae8ebe431e793ebe 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -58,7 +58,7 @@ static size_t guc_log_size(void) * | Capture logs | * +===============================+ + CAPTURE_SIZE */
- return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
- return SZ_4K + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
} @@ -331,7 +331,7 @@ u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type) { enum guc_log_buffer_type i;
- u32 offset = PAGE_SIZE;/* for the log_buffer_states */
- u32 offset = SZ_4K; /* for the log_buffer_states */
for (i = GUC_LOG_BUFFER_CRASH_DUMP; i < GUC_LOG_BUFFER_TYPE_MAX; ++i) { if (i == type) diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index df7f130fb663fc2fd170a94cc1b835b4b4cca167..0f97c6310a3a5696490aaa4827eb3aa0d45ea6d6 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -1000,7 +1000,7 @@ int xe_guc_pc_start(struct xe_guc_pc *pc) { struct xe_device *xe = pc_to_xe(pc); struct xe_gt *gt = pc_to_gt(pc);
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); unsigned int fw_ref; int ret;
@@ -1110,7 +1110,7 @@ int xe_guc_pc_init(struct xe_guc_pc *pc) struct xe_tile *tile = gt_to_tile(gt); struct xe_device *xe = gt_to_xe(gt); struct xe_bo *bo;
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data), SZ_4K); int err;
if (xe->info.skip_guc_pc)
-- 2.48.1
On Wed, 2025-02-26 at 10:03 +0100, Francois Dugast wrote:
Hi,
On Tue, Feb 25, 2025 at 09:00:49PM -0800, Matthew Brost wrote:
On Wed, Feb 26, 2025 at 10:00:19AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
Per the "Firmware" chapter in "drm/xe Intel GFX Driver", as well as "Volume 8: Command Stream Programming" in "Intelยฎ Arcโข A-Series Graphics and Intel Data Center GPU Flex Series Open-Source Programmer's Reference Manual For the discrete GPUs code named "Alchemist" and "Arctic Sound-M"" and "Intelยฎ Irisยฎ Xe MAX Graphics Open Source Programmer's Reference Manual For the 2020 Discrete GPU formerly named "DG1"":
"The RINGBUF register sets (defined in Memory Interface Registers) are ย used to specify the ring buffer memory areas. The ring buffer must start ย on a 4KB boundary and be allocated in linear memory. The length of any ย one ring buffer is limited to 2MB."
The Graphics micro (ฮผ) Controller (GuC) really expects command buffers aligned to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures after loading the GuC firmware:
[ย ย ย 7.398317] xe 0000:09:00.0: [drm] Found dg2/g10 (device ID 56a1) display version 13.00 stepping C0 [ย ย ย 7.410429] xe 0000:09:00.0: [drm] Using GuC firmware from i915/dg2_guc_70.bin version 70.36.0 [ย ย 10.719989] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status = 0x800001EC, time = 3297ms, freq = 2400MHz (req 2400MHz), done = 0 [ย ย 10.732106] xe 0000:09:00.0: [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x76, UKernel = 0x01, MIA = 0x00, Auth = 0x02 [ย ย 10.744214] xe 0000:09:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:09:00.0 as wedged. ย ย ย ย ย ย ย ย ย ย ย ย ย ย Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new [ย ย 10.828908] xe 0000:09:00.0: [drm] *ERROR* GT0: GuC mmio request 0x4100: no reply 0x4100
Correct this by revising all instances of `PAGE_SIZE' to `SZ_4K' and revise `PAGE_ALIGN()' calls to `ALIGN()' with `SZ_4K' as the second argument (overriding `PAGE_SIZE').
Cc: stable@vger.kernel.org Fixes: 84d15f426110 ("drm/xe/guc: Add capture size check in GuC log buffer") Fixes: 9c8c7a7e6f1f ("drm/xe/guc: Prepare GuC register list and update ADS size for error capture") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Co-developed-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Kexy Biscuit kexybiscuit@aosc.io Signed-off-by: Mingcong Bai jeffbai@aosc.io
Typo in last reply: Reviewed-by: Matthew Brost matthew.brost@intel.com
Nit: in order to improve clarity and to limit potential mistakes in the future, you could add:
/* GuC really expects command buffers aligned to 4K boundaries. */ ย ย ย #define GUC_ALIGN SZ_4K
Then use s/SZ_4K/GUC_ALIGN/ in your changes. It would make clear this value does not come from PAGE_SIZE or XE_PAGE_SIZE.
I was thinking the same.
But on another topic, From a quick read of the commit message it's not clear why a 16K alignment would fail, since a 16K aligned bo would also be 4K aligned?
/Thomas
Francois
drivers/gpu/drm/xe/xe_guc.cย ย ย ย ย ย ย ย |ย 4 ++-- ย drivers/gpu/drm/xe/xe_guc_ads.cย ย ย ย | 32 ++++++++++++++++-------
drivers/gpu/drm/xe/xe_guc_capture.c |ย 8 ++++---- ย drivers/gpu/drm/xe/xe_guc_ct.cย ย ย ย ย |ย 2 +- ย drivers/gpu/drm/xe/xe_guc_log.cย ย ย ย |ย 4 ++-- ย drivers/gpu/drm/xe/xe_guc_pc.cย ย ย ย ย |ย 4 ++-- ย 6 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index 408365dfe4eed02336bbd208b60491aea27a8a6e..595873780a5774501f04b2f 01ebdf8a45c7ac931 100644 --- a/drivers/gpu/drm/xe/xe_guc.c +++ b/drivers/gpu/drm/xe/xe_guc.c @@ -88,7 +88,7 @@ static u32 guc_ctl_feature_flags(struct xe_guc *guc) ย ย static u32 guc_ctl_log_params_flags(struct xe_guc *guc) ย {
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >>
PAGE_SHIFT;
- u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >>
XE_PTE_SHIFT; ย u32 flags; ย ย #if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0) @@ -141,7 +141,7 @@ static u32 guc_ctl_log_params_flags(struct xe_guc *guc) ย ย static u32 guc_ctl_ads_flags(struct xe_guc *guc) ย {
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >>
PAGE_SHIFT;
- u32 ads = guc_bo_ggtt_addr(guc, guc->ads.bo) >>
XE_PTE_SHIFT; ย u32 flags = ads << GUC_ADS_ADDR_SHIFT; ย ย return flags; diff --git a/drivers/gpu/drm/xe/xe_guc_ads.c b/drivers/gpu/drm/xe/xe_guc_ads.c index fab259adc380be28c79fae5946e123427359ec60..65e88ad43e8adef75288930 0abd0197a0ac4a1a3 100644 --- a/drivers/gpu/drm/xe/xe_guc_ads.c +++ b/drivers/gpu/drm/xe/xe_guc_ads.c @@ -143,17 +143,17 @@ static size_t guc_ads_regset_size(struct xe_guc_ads *ads) ย ย static size_t guc_ads_golden_lrc_size(struct xe_guc_ads *ads) ย {
- return PAGE_ALIGN(ads->golden_lrc_size);
- return ALIGN(ads->golden_lrc_size, SZ_4K);
} ย ย static u32 guc_ads_waklv_size(struct xe_guc_ads *ads) ย {
- return PAGE_ALIGN(ads->ads_waklv_size);
- return ALIGN(ads->ads_waklv_size, SZ_4K);
} ย ย static size_t guc_ads_capture_size(struct xe_guc_ads *ads) ย {
- return PAGE_ALIGN(ads->capture_size);
- return ALIGN(ads->capture_size, SZ_4K);
} ย ย static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) @@ -168,7 +168,7 @@ static size_t guc_ads_um_queues_size(struct xe_guc_ads *ads) ย ย static size_t guc_ads_private_data_size(struct xe_guc_ads *ads) ย {
- return PAGE_ALIGN(ads_to_guc(ads)-
fw.private_data_size);
- return ALIGN(ads_to_guc(ads)->fw.private_data_size,
SZ_4K); ย } ย ย static size_t guc_ads_regset_offset(struct xe_guc_ads *ads) @@ -183,7 +183,7 @@ static size_t guc_ads_golden_lrc_offset(struct xe_guc_ads *ads) ย offset = guc_ads_regset_offset(ads) + ย guc_ads_regset_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} ย ย static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) @@ -193,7 +193,7 @@ static size_t guc_ads_waklv_offset(struct xe_guc_ads *ads) ย offset = guc_ads_golden_lrc_offset(ads) + ย guc_ads_golden_lrc_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} ย ย static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) @@ -203,7 +203,7 @@ static size_t guc_ads_capture_offset(struct xe_guc_ads *ads) ย offset = guc_ads_waklv_offset(ads) + ย guc_ads_waklv_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} ย ย static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) @@ -213,7 +213,7 @@ static size_t guc_ads_um_queues_offset(struct xe_guc_ads *ads) ย offset = guc_ads_capture_offset(ads) + ย guc_ads_capture_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} ย ย static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) @@ -223,7 +223,7 @@ static size_t guc_ads_private_data_offset(struct xe_guc_ads *ads) ย offset = guc_ads_um_queues_offset(ads) + ย guc_ads_um_queues_size(ads);
- return PAGE_ALIGN(offset);
- return ALIGN(offset, SZ_4K);
} ย ย static size_t guc_ads_size(struct xe_guc_ads *ads) @@ -276,7 +276,7 @@ static size_t calculate_golden_lrc_size(struct xe_guc_ads *ads) ย continue; ย ย real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
alloc_size = ALIGN(real_size, SZ_4K);
total_size += alloc_size; ย } ย @@ -612,12 +612,12 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) ย offsetof(struct __guc_ads_blob, system_info)); ย ย /* first, set aside the first page for a capture_list with zero descriptors */
- total_size = PAGE_SIZE;
- total_size = SZ_4K;
if (!xe_guc_capture_getnullheader(guc, &ptr, &size)) ย xe_map_memcpy_to(ads_to_xe(ads), ads_to_map(ads), capture_offset, ptr, size); ย ย null_ggtt = ads_ggtt + capture_offset;
- capture_offset += PAGE_SIZE;
- capture_offset += SZ_4K;
/* ย * Populate capture list : at this point adps is already allocated and @@ -681,10 +681,10 @@ static int guc_capture_prep_lists(struct xe_guc_ads *ads) ย } ย }
- if (ads->capture_size != PAGE_ALIGN(total_size))
- if (ads->capture_size != ALIGN(total_size, SZ_4K))
xe_gt_dbg(gt, "ADS capture alloc size changed from %d to %d\n",
ย ads->capture_size,
PAGE_ALIGN(total_size));
- return PAGE_ALIGN(total_size);
ย ads->capture_size, ALIGN(total_size,
SZ_4K));
- return ALIGN(total_size, SZ_4K);
} ย ย static void guc_mmio_regset_write_one(struct xe_guc_ads *ads, @@ -928,7 +928,7 @@ static void guc_populate_golden_lrc(struct xe_guc_ads *ads) ย xe_gt_assert(gt, gt->default_lrc[class]); ย ย real_size = xe_gt_lrc_size(gt, class);
alloc_size = PAGE_ALIGN(real_size);
alloc_size = ALIGN(real_size, SZ_4K);
total_size += alloc_size; ย ย /* diff --git a/drivers/gpu/drm/xe/xe_guc_capture.c b/drivers/gpu/drm/xe/xe_guc_capture.c index f6d523e4c5feb7f07d695af90f4c44c7a9072c2d..dac51f8720fc6c7d27baa31 a1b5c567f560e8c1f 100644 --- a/drivers/gpu/drm/xe/xe_guc_capture.c +++ b/drivers/gpu/drm/xe/xe_guc_capture.c @@ -590,8 +590,8 @@ guc_capture_getlistsize(struct xe_guc *guc, u32 owner, u32 type, ย return -ENODATA; ย ย if (size)
*size = PAGE_ALIGN((sizeof(struct
guc_debug_capture_list)) +
ย ย (num_regs * sizeof(struct
guc_mmio_reg)));
*size = ALIGN((sizeof(struct
guc_debug_capture_list)) +
ย ย ย ย ย (num_regs * sizeof(struct
guc_mmio_reg)), SZ_4K); ย ย return 0; ย } @@ -738,7 +738,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) ย * sequence, that is, during the pre-hwconfig phase before we have ย * the exact engine fusing info. ย */
- total_size = PAGE_SIZE; /* Pad a page in front
for empty lists */
- total_size = SZ_4K; /* Pad a page in front for empty
lists */ ย for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) { ย for (j = 0; j < GUC_CAPTURE_LIST_CLASS_MAX; j++) { ย if (xe_guc_capture_getlistsize(guc, i, @@ -758,7 +758,7 @@ size_t xe_guc_capture_ads_input_worst_size(struct xe_guc *guc) ย total_size += global_size; ย }
- return PAGE_ALIGN(total_size);
- return ALIGN(total_size, SZ_4K);
} ย ย static int guc_capture_output_size_est(struct xe_guc *guc) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 72ad576fc18eb583110b44b118abeba4c6be936a..a58c58e599122f3e9ebd1e8 374c17c3b4663a5ed 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -212,7 +212,7 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) ย struct xe_bo *bo; ย int err;
- xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE));
- xe_gt_assert(gt, !(guc_ct_size() % SZ_4K));
ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", WQ_MEM_RECLAIM); ย if (!ct->g2h_wq) diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c index 0ca3056d8bd3fa37bdb79a7a71ef671270771657..9975005732f645b4735f95f bae8ebe431e793ebe 100644 --- a/drivers/gpu/drm/xe/xe_guc_log.c +++ b/drivers/gpu/drm/xe/xe_guc_log.c @@ -58,7 +58,7 @@ static size_t guc_log_size(void) ย *ย |ย ย ย ย ย ย ย ย Capture logsย ย ย ย ย ย ย ย ย | ย *ย +===============================+ + CAPTURE_SIZE ย */
- return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE
- return SZ_4K + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
CAPTURE_BUFFER_SIZE; ย } ย @@ -331,7 +331,7 @@ u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type ย u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type) ย { ย enum guc_log_buffer_type i;
- u32 offset = PAGE_SIZE;/* for the log_buffer_states */
- u32 offset = SZ_4K; /* for the log_buffer_states */
for (i = GUC_LOG_BUFFER_CRASH_DUMP; i < GUC_LOG_BUFFER_TYPE_MAX; ++i) { ย if (i == type) diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index df7f130fb663fc2fd170a94cc1b835b4b4cca167..0f97c6310a3a5696490aaa4 827eb3aa0d45ea6d6 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -1000,7 +1000,7 @@ int xe_guc_pc_start(struct xe_guc_pc *pc) ย { ย struct xe_device *xe = pc_to_xe(pc); ย struct xe_gt *gt = pc_to_gt(pc);
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data),
SZ_4K); ย unsigned int fw_ref; ย int ret; ย @@ -1110,7 +1110,7 @@ int xe_guc_pc_init(struct xe_guc_pc *pc) ย struct xe_tile *tile = gt_to_tile(gt); ย struct xe_device *xe = gt_to_xe(gt); ย struct xe_bo *bo;
- u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data));
- u32 size = ALIGN(sizeof(struct slpc_shared_data),
SZ_4K); ย int err; ย ย if (xe->info.skip_guc_pc)
-- 2.48.1
From: Mingcong Bai jeffbai@aosc.io
Similar to the preceding patch for GuC (and with the same references), Intel DG1 and DG2 GPUs expects command buffers to align to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up:
[ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]---
Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue.
Cc: stable@vger.kernel.org Fixes: b79e8fd954c4 ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io --- drivers/gpu/drm/xe/regs/xe_engine_regs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h index d86219dedde2a6dcd8701c7bf2e90d95ec7244e2..e48bf80d144a1a954f6d9f5d405e1d759b86134f 100644 --- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h @@ -52,8 +52,7 @@ #define RING_START(base) XE_REG((base) + 0x38)
#define RING_CTL(base) XE_REG((base) + 0x3c) -#define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ -#define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ +#define RING_CTL_SIZE(size) ((size) - SZ_4K) /* in bytes -> pages */
#define RING_START_UDW(base) XE_REG((base) + 0x48)
On Wed, Feb 26, 2025 at 10:00:20AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
Similar to the preceding patch for GuC (and with the same references), Intel DG1 and DG2 GPUs expects command buffers to align to 4K boundaries.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up:
[ 14.018975] ------------[ cut here ]------------ [ 14.023562] xe 0000:09:00.0: [drm] GT0: Kernel-submitted job timed out [ 14.030084] WARNING: CPU: 3 PID: 564 at drivers/gpu/drm/xe/xe_guc_submit.c:1181 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.041300] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) nls_iso8859_1(E) snd_hda_intel(E) snd_intel_dspcfg(E) qrtr(E) nls_cp437(E) snd_hda_codec(E) spi_loongson_pci(E) rtc_efi(E) snd_hda_core(E) loongson3_cpufreq(E) spi_loongson_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) gpio_loongson_64bit(E) input_leds(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 14.041369] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) realtek(E) led_class(E) loongson(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 14.153910] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.165325] Tainted: [E]=UNSIGNED_MODULE [ 14.169220] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.182970] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.189549] pc ffff8000024f3760 ra ffff8000024f3760 tp 900000012f150000 sp 900000012f153ca0 [ 14.197853] a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 14.206156] a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 14.214458] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 0000000000000000 [ 14.222761] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 14.231064] t8 0000000000000000 u0 900000000195c0c8 s9 900000012e4dcf48 s0 90000001285f3640 [ 14.239368] s1 90000001004f8000 s2 ffff8000026ec000 s3 0000000000000000 s4 900000012e4dc028 [ 14.247672] s5 90000001009f5e00 s6 000000000000137e s7 0000000000000001 s8 900000012f153ce8 [ 14.255975] ra: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.263379] ERA: ffff8000024f3760 guc_exec_queue_timedout_job+0x1c0/0xacc [xe] [ 14.270777] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 14.276927] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 14.281258] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 14.286024] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 14.290790] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 14.296329] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 14.302299] CPU: 3 UID: 0 PID: 564 Comm: kworker/u32:2 Tainted: G E 6.14.0-rc4-aosc-main-gbad70b1cd8b0-dirty #7 [ 14.302302] Tainted: [E]=UNSIGNED_MODULE [ 14.302302] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 14.302304] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched] [ 14.302307] Stack : 900000012f153928 d84a6232d48f1ac7 900000000023eb34 900000012f150000 [ 14.302310] 900000012f153900 0000000000000000 900000012f153908 9000000001c31c70 [ 14.302313] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302315] 0000000000000000 d84a6232d48f1ac7 0000000000000000 0000000000000000 [ 14.302318] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 14.302320] 0000000000000000 0000000000000000 00000000072b4000 900000012e4dcf48 [ 14.302323] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 14.302325] 0000000000000004 0000000000000000 000000000000137e 0000000000000001 [ 14.302328] 900000012f153ce8 9000000001c31c70 9000000000244174 0000555581840b98 [ 14.302331] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d [ 14.302333] ... [ 14.302335] Call Trace: [ 14.302336] [<9000000000244174>] show_stack+0x3c/0x16c [ 14.302341] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 14.302346] [<9000000000288208>] __warn+0x8c/0x174 [ 14.302350] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 14.302354] [<90000000017f66e8>] do_bp+0x280/0x344 [ 14.302359] [ 14.302360] ---[ end trace 0000000000000000 ]---
Revise calculation of `RING_CTL_SIZE(size)' to use `SZ_4K' to fix the aforementioned issue.
Cc: stable@vger.kernel.org Fixes: b79e8fd954c4 ("drm/xe: Remove dependency on intel_engine_regs.h") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
This conflict with previously submited patch [1] but this one LGTM too, with that: Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/xe/regs/xe_engine_regs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/regs/xe_engine_regs.h b/drivers/gpu/drm/xe/regs/xe_engine_regs.h index d86219dedde2a6dcd8701c7bf2e90d95ec7244e2..e48bf80d144a1a954f6d9f5d405e1d759b86134f 100644 --- a/drivers/gpu/drm/xe/regs/xe_engine_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_engine_regs.h @@ -52,8 +52,7 @@ #define RING_START(base) XE_REG((base) + 0x38) #define RING_CTL(base) XE_REG((base) + 0x3c) -#define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ -#define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ +#define RING_CTL_SIZE(size) ((size) - SZ_4K) /* in bytes -> pages */ #define RING_START_UDW(base) XE_REG((base) + 0x48)
-- 2.48.1
From: Mingcong Bai jeffbai@aosc.io
It appears that the xe_res_cursor also assumes 4K alignment.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up:
[ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty #8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty #8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]---
Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code.
Cc: stable@vger.kernel.org Fixes: e89b384cde62 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io --- drivers/gpu/drm/xe/xe_migrate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index 278bc96cf593d8a0b01003df26297c5a92a71c78..dd5d95a45b976010e718e074dd988c84253fb9fb 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -593,7 +593,7 @@ static void emit_pte(struct xe_migrate *m, u64 addr, flags = 0; bool devmem = false;
- addr = xe_res_dma(cur) & PAGE_MASK; + addr = xe_res_dma(cur) & ~XE_PTE_MASK; if (is_vram) { if (vm->flags & XE_VM_FLAG_64K) { u64 va = cur_ofs * XE_PAGE_SIZE / 8; @@ -614,7 +614,7 @@ static void emit_pte(struct xe_migrate *m, bb->cs[bb->len++] = lower_32_bits(addr); bb->cs[bb->len++] = upper_32_bits(addr);
- xe_res_next(cur, min_t(u32, size, PAGE_SIZE)); + xe_res_next(cur, min_t(u32, size, SZ_4K)); cur_ofs += 8; } }
On Wed, Feb 26, 2025 at 10:00:21AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
It appears that the xe_res_cursor also assumes 4K alignment.
Current code uses `PAGE_SIZE' as an assumed alignment reference but 4K kernel page sizes is by no means a guarantee. On 16K-paged kernels, this causes driver failures during boot up:
[ 23.242757] ------------[ cut here ]------------ [ 23.247363] WARNING: CPU: 0 PID: 2036 at drivers/gpu/drm/xe/xe_res_cursor.h:182 emit_pte+0x394/0x3b0 [xe] [ 23.256962] Modules linked in: nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) rfkill(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) nls_iso8859_1(E) qrtr(E) nls_cp437(E) snd_hda_core(E) loongson3_cpufreq(E) rtc_efi(E) snd_hwdep(E) snd_pcm(E) spi_loongson_pci(E) snd_timer(E) snd(E) spi_loongson_core(E) soundcore(E) gpio_loongson_64bit(E) rtc_loongson(E) i2c_ls2x(E) mousedev(E) input_leds(E) sch_fq_codel(E) fuse(E) nfnetlink(E) dmi_sysfs(E) ip_tables(E) x_tables(E) xe(E) d rm_gpuvm(E) drm_buddy(E) gpu_sched(E) [ 23.257034] drm_exec(E) drm_suballoc_helper(E) drm_display_helper(E) cec(E) rc_core(E) hid_generic(E) tpm_tis_spi(E) r8169(E) loongson(E) i2c_algo_bit(E) realtek(E) drm_ttm_helper(E) led_class(E) ttm(E) drm_client_lib(E) drm_kms_helper(E) sunrpc(E) i2c_dev(E) [ 23.369697] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty #8 [ 23.381640] Tainted: [E]=UNSIGNED_MODULE [ 23.385534] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.399319] pc ffff80000251efc0 ra ffff80000251eddc tp 900000011fe3c000 sp 900000011fe3f7e0 [ 23.407632] a0 0000000000000001 a1 0000000000000000 a2 0000000000000000 a3 0000000000000000 [ 23.415938] a4 0000000000000000 a5 0000000000000000 a6 0000000000060000 a7 900000010c947b00 [ 23.424240] t0 0000000000000000 t1 0000000000000000 t2 0000000000000000 t3 900000012e456230 [ 23.432543] t4 0000000000000035 t5 0000000000004000 t6 00000001fbc40403 t7 0000000000004000 [ 23.440845] t8 9000000100e688a8 u0 5cc06cee8ef0edee s9 9000000100024420 s0 0000000000000047 [ 23.449147] s1 0000000000004000 s2 0000000000000001 s3 900000012adba000 s4 ffffffffffffc000 [ 23.457450] s5 9000000108939428 s6 0000000000000000 s7 0000000000000000 s8 900000011fe3f8e0 [ 23.465851] ra: ffff80000251eddc emit_pte+0x1b0/0x3b0 [xe] [ 23.471761] ERA: ffff80000251efc0 emit_pte+0x394/0x3b0 [xe] [ 23.477557] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 23.483732] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 23.488068] EUEN: 00000003 (+FPE +SXE -ASXE -BTE) [ 23.492832] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 23.497594] ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) [ 23.503133] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000-HV) [ 23.509164] CPU: 0 UID: 1000 PID: 2036 Comm: QSGRenderThread Tainted: G E 6.14.0-rc4-aosc-main-g7cc07e6e50b0-dirty #8 [ 23.509168] Tainted: [E]=UNSIGNED_MODULE [ 23.509168] Hardware name: Loongson Loongson-3A6000-HV-7A2000-1w-V0.1-EVB/Loongson-3A6000-HV-7A2000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V4.0.05756-prestab [ 23.509170] Stack : ffffffffffffffff ffffffffffffffff 900000000023eb34 900000011fe3c000 [ 23.509176] 900000011fe3f440 0000000000000000 900000011fe3f448 9000000001c31c70 [ 23.509181] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509185] 0000000000000000 5cc06cee8ef0edee 0000000000000000 0000000000000000 [ 23.509190] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 23.509193] 0000000000000000 0000000000000000 00000000066b4000 9000000100024420 [ 23.509197] 9000000001eb8000 0000000000000000 9000000001c31c70 0000000000000004 [ 23.509202] 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [ 23.509206] 900000011fe3f8e0 9000000001c31c70 9000000000244174 00007fffac097534 [ 23.509211] 00000000000000b0 0000000000000004 0000000000000003 0000000000071c1d [ 23.509216] ... [ 23.509218] Call Trace: [ 23.509220] [<9000000000244174>] show_stack+0x3c/0x16c [ 23.509226] [<900000000023eb30>] dump_stack_lvl+0x84/0xe0 [ 23.509230] [<9000000000288208>] __warn+0x8c/0x174 [ 23.509234] [<90000000017c1918>] report_bug+0x1c0/0x22c [ 23.509238] [<90000000017f66e8>] do_bp+0x280/0x344 [ 23.509243] [<90000000002428a0>] handle_bp+0x120/0x1c0 [ 23.509247] [<ffff80000251efc0>] emit_pte+0x394/0x3b0 [xe] [ 23.509295] [<ffff800002520d38>] xe_migrate_clear+0x2d8/0xa54 [xe] [ 23.509341] [<ffff8000024e6c38>] xe_bo_move+0x324/0x930 [xe] [ 23.509387] [<ffff800002209468>] ttm_bo_handle_move_mem+0xd0/0x194 [ttm] [ 23.509392] [<ffff800002209ebc>] ttm_bo_validate+0xd4/0x1cc [ttm] [ 23.509396] [<ffff80000220a138>] ttm_bo_init_reserved+0x184/0x1dc [ttm] [ 23.509399] [<ffff8000024e7840>] ___xe_bo_create_locked+0x1e8/0x3d4 [xe] [ 23.509445] [<ffff8000024e7cf8>] __xe_bo_create_locked+0x2cc/0x390 [xe] [ 23.509489] [<ffff8000024e7e98>] xe_bo_create_user+0x34/0xe4 [xe] [ 23.509533] [<ffff8000024e875c>] xe_gem_create_ioctl+0x154/0x4d8 [xe] [ 23.509578] [<9000000001062784>] drm_ioctl_kernel+0xe0/0x14c [ 23.509582] [<9000000001062c10>] drm_ioctl+0x420/0x5f4 [ 23.509585] [<ffff8000024ea778>] xe_drm_ioctl+0x64/0xac [xe] [ 23.509630] [<9000000000653504>] sys_ioctl+0x2b8/0xf98 [ 23.509634] [<90000000017f684c>] do_syscall+0xa0/0x140 [ 23.509637] [<9000000000241e38>] handle_syscall+0xb8/0x158 [ 23.509640] [ 23.509644] ---[ end trace 0000000000000000 ]---
Revise calls to `xe_res_dma()' and `xe_res_cursor()' to use `XE_PTE_MASK' (12) and `SZ_4K' to fix this potentially confused use of `PAGE_SIZE' in relevant code.
Cc: stable@vger.kernel.org Fixes: e89b384cde62 ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_migrate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index 278bc96cf593d8a0b01003df26297c5a92a71c78..dd5d95a45b976010e718e074dd988c84253fb9fb 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -593,7 +593,7 @@ static void emit_pte(struct xe_migrate *m, u64 addr, flags = 0; bool devmem = false;
addr = xe_res_dma(cur) & PAGE_MASK;
addr = xe_res_dma(cur) & ~XE_PTE_MASK; if (is_vram) { if (vm->flags & XE_VM_FLAG_64K) { u64 va = cur_ofs * XE_PAGE_SIZE / 8;
@@ -614,7 +614,7 @@ static void emit_pte(struct xe_migrate *m, bb->cs[bb->len++] = lower_32_bits(addr); bb->cs[bb->len++] = upper_32_bits(addr);
xe_res_next(cur, min_t(u32, size, PAGE_SIZE));
xe_res_next(cur, min_t(u32, size, SZ_4K));
Nit: s/SZ_4K/XE_PAGE_SIZE
Otherwise LGTM.
Also, I believe someone at Intel will need to submit this series for CI, as I don't think you are on the approved list to trigger our CI (new WIP polciy). I'll take care of that and see if we can get you added to the list.
Matt
cur_ofs += 8; }
}
-- 2.48.1
From: Mingcong Bai jeffbai@aosc.io
As this component hooks into userspace API, it should be assumed that it will play well with non-4K/64K pages.
Use `PAGE_SIZE' as the final reference for page alignment instead.
Cc: stable@vger.kernel.org Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: 801989b08aff ("drm/xe/uapi: Make constant comments visible in kernel doc") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io --- drivers/gpu/drm/xe/xe_query.c | 2 +- include/uapi/drm/xe_drm.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index c059639613f7b548c168f808b7b7b354f1cf3c94..8a017c526942d1f2b401e8b9a4244e6083d7b1e5 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -336,7 +336,7 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query) config->info[DRM_XE_QUERY_CONFIG_FLAGS] = DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM; config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] = - xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K; + xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : PAGE_SIZE; config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits; config->info[DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY] = xe_exec_queue_device_get_max_priority(xe); diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index f62689ca861a4673b885629460c11d6f3bc6523d..db7cf904926ebd6789a29d620161ac051e59f13f 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -394,7 +394,7 @@ struct drm_xe_query_mem_regions { * - %DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM - Flag is set if the device * has usable VRAM * - %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment - * required by this device, typically SZ_4K or SZ_64K + * required by this device, typically PAGE_SIZE. * - %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address * - %DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY - Value of the highest * available exec queue priority
On Wed, Feb 26, 2025 at 10:00:22AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
As this component hooks into userspace API, it should be assumed that it will play well with non-4K/64K pages.
Use `PAGE_SIZE' as the final reference for page alignment instead.
Cc: stable@vger.kernel.org Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: 801989b08aff ("drm/xe/uapi: Make constant comments visible in kernel doc") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_query.c | 2 +- include/uapi/drm/xe_drm.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index c059639613f7b548c168f808b7b7b354f1cf3c94..8a017c526942d1f2b401e8b9a4244e6083d7b1e5 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -336,7 +336,7 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query) config->info[DRM_XE_QUERY_CONFIG_FLAGS] = DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM; config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K;
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : PAGE_SIZE;
We should probably assert or build a bug somewhere to ensure SZ_64K >= PAGE_SIZE for future-proofing. Otherwise, I think the patch makes sense. One more comment below.
config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits; config->info[DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY] = xe_exec_queue_device_get_max_priority(xe); diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index f62689ca861a4673b885629460c11d6f3bc6523d..db7cf904926ebd6789a29d620161ac051e59f13f 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -394,7 +394,7 @@ struct drm_xe_query_mem_regions {
- %DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM - Flag is set if the device
has usable VRAM
- %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment
- required by this device, typically SZ_4K or SZ_64K
- required by this device, typically PAGE_SIZE.
So I think the kernel doc needs bit more updating here, how about:
Minimal memory alignment required by this device and the CPU. The minimum page size for the device is usually SZ_4K or SZ_64K, while for the CPU, it is PAGE_SIZE. This value is calculated by max(min_gpu_page_size, PAGE_SIZE). This alignment is enforced on buffer object allocations and VM binds.
Again welcome others CC'd suggestion on this updated kernel doc.
Matt
- %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address
- %DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY - Value of the highest
- available exec queue priority
-- 2.48.1
ๅจ 2025/2/26 12:43, Matthew Brost ๅ้:
On Wed, Feb 26, 2025 at 10:00:22AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
As this component hooks into userspace API, it should be assumed that it will play well with non-4K/64K pages.
Use `PAGE_SIZE' as the final reference for page alignment instead.
Cc: stable@vger.kernel.org Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: 801989b08aff ("drm/xe/uapi: Make constant comments visible in kernel doc") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_query.c | 2 +- include/uapi/drm/xe_drm.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index c059639613f7b548c168f808b7b7b354f1cf3c94..8a017c526942d1f2b401e8b9a4244e6083d7b1e5 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -336,7 +336,7 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query) config->info[DRM_XE_QUERY_CONFIG_FLAGS] = DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM; config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K;
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : PAGE_SIZE;
We should probably assert or build a bug somewhere to ensure SZ_64K >= PAGE_SIZE for future-proofing. Otherwise, I think the patch makes sense. One more comment below.
Hmm, >= 64KiB kernel pages don't seem to be a thing yet but this does make sense for the sake of completeness. Will change in v2.
config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits; config->info[DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY] = xe_exec_queue_device_get_max_priority(xe); diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index f62689ca861a4673b885629460c11d6f3bc6523d..db7cf904926ebd6789a29d620161ac051e59f13f 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -394,7 +394,7 @@ struct drm_xe_query_mem_regions {
- %DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM - Flag is set if the device
has usable VRAM
- %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment
- required by this device, typically SZ_4K or SZ_64K
- required by this device, typically PAGE_SIZE.
So I think the kernel doc needs bit more updating here, how about:
Minimal memory alignment required by this device and the CPU. The minimum page size for the device is usually SZ_4K or SZ_64K, while for the CPU, it is PAGE_SIZE. This value is calculated by max(min_gpu_page_size, PAGE_SIZE). This alignment is enforced on buffer object allocations and VM binds.
Again welcome others CC'd suggestion on this updated kernel doc.
Looks good to me, will revise in v2.
Best Regards, Mingcong Bai
Matt
- %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address
- %DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY - Value of the highest
- available exec queue priority
-- 2.48.1
Hi Matt,
ๅจ 2025/2/26 12:43, Matthew Brost ๅ้:
On Wed, Feb 26, 2025 at 10:00:22AM +0800, Mingcong Bai via B4 Relay wrote:
From: Mingcong Bai jeffbai@aosc.io
As this component hooks into userspace API, it should be assumed that it will play well with non-4K/64K pages.
Use `PAGE_SIZE' as the final reference for page alignment instead.
Cc: stable@vger.kernel.org Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: 801989b08aff ("drm/xe/uapi: Make constant comments visible in kernel doc") Tested-by: Mingcong Bai jeffbai@aosc.io Tested-by: Haien Liang 27873200@qq.com Tested-by: Shirong Liu lsr1024@qq.com Tested-by: Haofeng Wu s2600cw2@126.com Link: https://github.com/FanFansfan/loongson-linux/commit/22c55ab3931c32410a077b3d... Co-developed-by: Shang Yatsen 429839446@qq.com Signed-off-by: Shang Yatsen 429839446@qq.com Signed-off-by: Mingcong Bai jeffbai@aosc.io
drivers/gpu/drm/xe/xe_query.c | 2 +- include/uapi/drm/xe_drm.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index c059639613f7b548c168f808b7b7b354f1cf3c94..8a017c526942d1f2b401e8b9a4244e6083d7b1e5 100644 --- a/drivers/gpu/drm/xe/xe_query.c +++ b/drivers/gpu/drm/xe/xe_query.c @@ -336,7 +336,7 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query) config->info[DRM_XE_QUERY_CONFIG_FLAGS] = DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM; config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] =
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K;
xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : PAGE_SIZE;
We should probably assert or build a bug somewhere to ensure SZ_64K >= PAGE_SIZE for future-proofing. Otherwise, I think the patch makes sense. One more comment below.
config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits; config->info[DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY] = xe_exec_queue_device_get_max_priority(xe); diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index f62689ca861a4673b885629460c11d6f3bc6523d..db7cf904926ebd6789a29d620161ac051e59f13f 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -394,7 +394,7 @@ struct drm_xe_query_mem_regions {
- %DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM - Flag is set if the device
has usable VRAM
- %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment
- required by this device, typically SZ_4K or SZ_64K
- required by this device, typically PAGE_SIZE.
So I think the kernel doc needs bit more updating here, how about:
Minimal memory alignment required by this device and the CPU. The minimum page size for the device is usually SZ_4K or SZ_64K, while for the CPU, it is PAGE_SIZE. This value is calculated by max(min_gpu_page_size, PAGE_SIZE). This alignment is enforced on buffer object allocations and VM binds.
I will revise as such, taking into account other comments sent in before I get my next revision together. Thanks!
Best Regards, Mingcong Bai
Again welcome others CC'd suggestion on this updated kernel doc.
Matt
- %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address
- %DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY - Value of the highest
- available exec queue priority
-- 2.48.1
linux-stable-mirror@lists.linaro.org