Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

17 participants
3001 discussions

[PATCH v2] drm/amdgpu: Pin buffers while vmap'ing exported dma-buf objects

by Thomas Zimmermann

Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed. For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below. [ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping. Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap. Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code. A discussion of various approaches to solving the problem is available at [1]. v2: - only use mapable domains (Christian) - try pinning to domains in prefered order Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann <tzimmermann(a)suse.de> Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse… Cc: Shixiong Ou <oushixiong(a)kylinos.cn> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: David Airlie <airlied(a)gmail.com> Cc: Simona Vetter <simona(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: dri-devel(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org Link: https://lore.kernel.org/dri-devel/9792c6c3-a2b8-4b2b-b5ba-fba19b153e21@suse… # [1] Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 41 ++++++++++++++++++++- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 5743ebb2f1b7..471b41bd3e29 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -285,6 +285,43 @@ static int amdgpu_dma_buf_begin_cpu_access(struct dma_buf *dma_buf, return ret; } +static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + int ret; + + /* + * Pin to keep buffer in place while it's vmap'ed. The actual + * domain is not that important as long as it's mapable. Using + * GTT should be compatible with most use cases. VRAM and CPU + * are the fallbacks if the buffer has already been pinned there. + */ + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT); + if (ret) { + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM); + if (ret) { + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_CPU); + if (ret) + return ret; + } + } + ret = drm_gem_dmabuf_vmap(dma_buf, map); + if (ret) + amdgpu_bo_unpin(bo); + + return ret; +} + +static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + + drm_gem_dmabuf_vunmap(dma_buf, map); + amdgpu_bo_unpin(bo); +} + const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin, @@ -294,8 +331,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, - .vmap = drm_gem_dmabuf_vmap, - .vunmap = drm_gem_dmabuf_vunmap, + .vmap = amdgpu_dma_buf_vmap, + .vunmap = amdgpu_dma_buf_vunmap, }; /** -- 2.50.1

5 hours, 22 minutes

Re: [PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL

by kernel test robot

Hello, kernel test robot noticed "WARNING:at_drivers/dma-buf/dma-fence.c:#dma_fence_signal" on: commit: 409db68e04bdf052bc03f620e70339764b598ade ("[PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL") url: https://github.com/intel-lab-lkp/linux/commits/Christian-K-nig/dma-buf-add-… base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 53e760d8949895390e256e723e7ee46618310361 patch link: https://lore.kernel.org/all/20250812143402.8619-2-christian.koenig@amd.com/ patch subject: [PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL in testcase: ltp version: ltp-x86_64-9f512c1d8-1_20250809 with following parameters: test: syscalls-ipc-msgstress config: x86_64-rhel-9.4-ltp compiler: gcc-12 test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 16G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang(a)intel.com> | Closes: https://lore.kernel.org/oe-lkp/202508200843.8b006132-lkp@intel.com [ 51.636268][ T218] ------------[ cut here ]------------ [ 51.636273][ T218] WARNING: CPU: 3 PID: 218 at drivers/dma-buf/dma-fence.c:420 dma_fence_signal (drivers/dma-buf/dma-fence.c:420 drivers/dma-buf/dma-fence.c:502) [ 51.636292][ T218] Modules linked in: coretemp sd_mod snd_hda_codec_realtek_lib snd_hda_codec_generic sg ipmi_devintf kvm_intel snd_hda_intel ipmi_msghandler platform_profile i915(+) kvm snd_hda_codec intel_gtt dell_wmi snd_hda_core drm_buddy binfmt_misc dell_smbios snd_intel_dspcfg ttm dell_wmi_descriptor snd_intel_sdw_acpi snd_hwdep mei_wdt sparse_keymap irqbypass drm_display_helper ahci ghash_clmulni_intel snd_pcm libahci rfkill cec mei_me rapl intel_cstate dcdbas snd_timer drm_client_lib libata intel_uncore mei snd drm_kms_helper i2c_i801 i2c_smbus pcspkr lpc_ich soundcore video wmi fuse drm loop dm_mod [ 51.636385][ T218] CPU: 3 UID: 0 PID: 218 Comm: (udev-worker) Not tainted 6.17.0-rc1-00006-g409db68e04bd #1 PREEMPT(voluntary) [ 51.636395][ T218] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013 [ 51.636399][ T218] RIP: 0010:dma_fence_signal (drivers/dma-buf/dma-fence.c:420 drivers/dma-buf/dma-fence.c:502) [ 51.636415][ T218] Code: 00 fc ff df 80 3c 02 00 75 36 48 8b 3b 4c 89 e6 e8 10 33 27 01 89 e8 5b 5d 41 5c c3 cc cc cc cc e8 b0 2e 77 fe 48 85 c0 75 bc <0f> 0b eb b8 0f 0b bd ea ff ff ff 5b 89 e8 5d 41 5c c3 cc cc cc cc All code ======== 0: 00 fc add %bh,%ah 2: ff (bad) 3: df 80 3c 02 00 75 filds 0x7500023c(%rax) 9: 36 48 8b 3b ss mov (%rbx),%rdi d: 4c 89 e6 mov %r12,%rsi 10: e8 10 33 27 01 call 0x1273325 15: 89 e8 mov %ebp,%eax 17: 5b pop %rbx 18: 5d pop %rbp 19: 41 5c pop %r12 1b: c3 ret 1c: cc int3 1d: cc int3 1e: cc int3 1f: cc int3 20: e8 b0 2e 77 fe call 0xfffffffffe772ed5 25: 48 85 c0 test %rax,%rax 28: 75 bc jne 0xffffffffffffffe6 2a:* 0f 0b ud2 <-- trapping instruction 2c: eb b8 jmp 0xffffffffffffffe6 2e: 0f 0b ud2 30: bd ea ff ff ff mov $0xffffffea,%ebp 35: 5b pop %rbx 36: 89 e8 mov %ebp,%eax 38: 5d pop %rbp 39: 41 5c pop %r12 3b: c3 ret 3c: cc int3 3d: cc int3 3e: cc int3 3f: cc int3 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: eb b8 jmp 0xffffffffffffffbc 4: 0f 0b ud2 6: bd ea ff ff ff mov $0xffffffea,%ebp b: 5b pop %rbx c: 89 e8 mov %ebp,%eax e: 5d pop %rbp f: 41 5c pop %r12 11: c3 ret 12: cc int3 13: cc int3 14: cc int3 15: cc int3 [ 51.636420][ T218] RSP: 0018:ffffc90000a9ed30 EFLAGS: 00010046 [ 51.636428][ T218] RAX: 0000000000000000 RBX: ffff88811750fc00 RCX: 0000000000000018 [ 51.636437][ T218] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88810691512c [ 51.636440][ T218] RBP: 0000000be56b1408 R08: 0000000000000001 R09: fffff52000153d9a [ 51.636445][ T218] R10: 0000000000000003 R11: ffff888108145000 R12: 0000000000000246 [ 51.636452][ T218] R13: ffffffffc1c9b060 R14: ffff88810406ba0c R15: 1ffff92000153dc2 [ 51.636455][ T218] FS: 00007efd90c038c0(0000) GS:ffff8883e4077000(0000) knlGS:0000000000000000 [ 51.636459][ T218] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 51.636462][ T218] CR2: 00007f5bd8238c20 CR3: 000000040ed8a005 CR4: 00000000001726f0 [ 51.636466][ T218] Call Trace: [ 51.636469][ T218] <TASK> [ 51.636477][ T218] fence_work (include/linux/dma-fence.h:272 drivers/gpu/drm/i915/i915_sw_fence_work.c:23) i915 [ 51.637304][ T218] fence_notify (drivers/gpu/drm/i915/i915_sw_fence_work.c:39) i915 [ 51.637827][ T218] __i915_sw_fence_complete (drivers/gpu/drm/i915/i915_sw_fence.c:201) i915 [ 51.638300][ T218] i915_vma_pin_ww (drivers/gpu/drm/i915/i915_vma.c:1601) i915 [ 51.638763][ T218] ? __pfx_i915_vma_pin_ww (drivers/gpu/drm/i915/i915_vma.c:1434) i915 [ 51.639218][ T218] ? i915_gem_object_make_unshrinkable (include/linux/list.h:215 include/linux/list.h:287 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:500) i915 [ 51.639648][ T218] ? i915_vma_make_unshrinkable (drivers/gpu/drm/i915/i915_vma.c:2292) i915 [ 51.640091][ T218] ? intel_ring_pin (drivers/gpu/drm/i915/gt/intel_ring.c:73) i915 [ 51.640505][ T218] intel_ring_submission_setup (drivers/gpu/drm/i915/gt/intel_ring_submission.c:1290 drivers/gpu/drm/i915/gt/intel_ring_submission.c:1421) i915 [ 51.640918][ T218] ? __pfx_intel_ring_submission_setup (drivers/gpu/drm/i915/gt/intel_ring_submission.c:1349) i915 [ 51.641232][ T65] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 51.641321][ T218] ? intel_engine_init_whitelist (drivers/gpu/drm/i915/gt/intel_workarounds.c:2104) i915 [ 51.641735][ T218] ? __intel_wakeref_init (arch/x86/include/asm/atomic.h:28 include/linux/atomic/atomic-arch-fallback.h:503 include/linux/atomic/atomic-instrumented.h:68 drivers/gpu/drm/i915/intel_wakeref.c:109) i915 [ 51.642126][ T218] intel_engines_init (drivers/gpu/drm/i915/gt/intel_engine_cs.c:1514) i915 [ 51.642521][ T218] ? i915_gem_object_make_unshrinkable (arch/x86/include/asm/atomic.h:93 include/linux/atomic/atomic-arch-fallback.h:667 include/linux/atomic/atomic-arch-fallback.h:1119 include/linux/atomic/atomic-instrumented.h:524 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:498) i915 [ 51.642929][ T218] ? __pfx_intel_ring_submission_setup (drivers/gpu/drm/i915/gt/intel_ring_submission.c:1349) i915 [ 51.643331][ T218] intel_gt_init (drivers/gpu/drm/i915/gt/intel_gt.c:719) i915 [ 51.643728][ T218] i915_gem_init (drivers/gpu/drm/i915/i915_gem.c:1191) i915 [ 51.644140][ T218] i915_driver_probe (drivers/gpu/drm/i915/i915_driver.c:831) i915 [ 51.644524][ T218] ? __pfx_i915_driver_probe (drivers/gpu/drm/i915/i915_driver.c:780) i915 [ 51.644903][ T218] ? drm_privacy_screen_get (drivers/gpu/drm/drm_privacy_screen.c:169) drm [ 51.645047][ T218] ? intel_display_driver_probe_defer (drivers/gpu/drm/i915/display/intel_display_driver.c:84) i915 [ 51.645483][ T218] ? i915_pci_probe (drivers/gpu/drm/i915/i915_pci.c:995) i915 [ 51.645876][ T218] ? __pfx_i915_pci_probe (drivers/gpu/drm/i915/i915_pci.c:956) i915 [ 51.646267][ T218] local_pci_probe (drivers/pci/pci-driver.c:324) [ 51.646283][ T218] pci_call_probe (drivers/pci/pci-driver.c:392) [ 51.646295][ T218] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 51.646308][ T218] ? __pfx_pci_call_probe (drivers/pci/pci-driver.c:352) [ 51.646321][ T218] ? kernfs_add_one (fs/kernfs/dir.c:834) [ 51.646337][ T218] ? pci_assign_irq (drivers/pci/irq.c:149) [ 51.646350][ T218] ? pci_match_device (drivers/pci/pci-driver.c:159 (discriminator 1)) [ 51.646362][ T218] ? kernfs_put (arch/x86/include/asm/atomic.h:67 (discriminator 1) include/linux/atomic/atomic-arch-fallback.h:2278 (discriminator 1) include/linux/atomic/atomic-instrumented.h:1384 (discriminator 1) fs/kernfs/dir.c:569 (discriminator 1)) [ 51.646368][ T218] pci_device_probe (drivers/pci/pci-driver.c:452) [ 51.646377][ T218] really_probe (drivers/base/dd.c:581 drivers/base/dd.c:659) [ 51.646391][ T218] __driver_probe_device (drivers/base/dd.c:801) [ 51.646404][ T218] driver_probe_device (drivers/base/dd.c:831) [ 51.646416][ T218] __driver_attach (drivers/base/dd.c:1218) [ 51.646424][ T218] ? __pfx___driver_attach (drivers/base/dd.c:1158) [ 51.646428][ T218] bus_for_each_dev (drivers/base/bus.c:369) [ 51.646441][ T218] ? __pfx_bus_for_each_dev (drivers/base/bus.c:358) [ 51.646444][ T218] ? __kmalloc_cache_noprof (arch/x86/include/asm/jump_label.h:46 include/linux/memcontrol.h:1714 mm/slub.c:2210 mm/slub.c:4190 mm/slub.c:4229 mm/slub.c:4391) [ 51.646456][ T218] ? klist_add_tail (include/linux/list.h:150 include/linux/list.h:183 lib/klist.c:104 lib/klist.c:137) [ 51.646468][ T218] bus_add_driver (drivers/base/bus.c:678) [ 51.646482][ T218] driver_register (drivers/base/driver.c:249) [ 51.646490][ T218] i915_init (drivers/gpu/drm/i915/i915_driver.c:1428) i915 [ 51.646891][ T218] ? __pfx_i915_init (drivers/gpu/drm/i915/i915_config.c:13) i915 [ 51.647101][ T67] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 51.647277][ T218] do_one_initcall (init/main.c:1269) [ 51.647292][ T218] ? kfree (mm/slub.c:4680 mm/slub.c:4879) [ 51.647304][ T218] ? __pfx_do_one_initcall (init/main.c:1260) [ 51.647315][ T218] ? kasan_unpoison (mm/kasan/shadow.c:156 mm/kasan/shadow.c:182) [ 51.647327][ T218] ? __kasan_slab_alloc (mm/kasan/common.c:329 mm/kasan/common.c:356) [ 51.647340][ T218] ? __kmalloc_cache_noprof (mm/slub.c:4180 mm/slub.c:4229 mm/slub.c:4391) [ 51.647352][ T218] ? kasan_save_track (arch/x86/include/asm/current.h:25 mm/kasan/common.c:60 mm/kasan/common.c:69) [ 51.647365][ T218] ? kasan_unpoison (mm/kasan/shadow.c:156 mm/kasan/shadow.c:182) [ 51.647377][ T218] do_init_module (kernel/module/main.c:3039) [ 51.647388][ T218] ? __pfx_do_init_module (kernel/module/main.c:3011) [ 51.647402][ T218] ? kfree (mm/slub.c:4680 mm/slub.c:4879) [ 51.647414][ T218] ? klp_module_coming (kernel/livepatch/core.c:1317) [ 51.647426][ T218] ? load_module (kernel/module/main.c:2468 kernel/module/main.c:2463 kernel/module/main.c:3504) [ 51.647441][ T218] load_module (kernel/module/main.c:3509) [ 51.647449][ T218] ? ima_post_read_file (security/integrity/ima/ima_main.c:896 security/integrity/ima/ima_main.c:878) [ 51.647466][ T218] ? __pfx_load_module (kernel/module/main.c:3353) [ 51.647478][ T218] ? __pfx_kernel_read_file (fs/kernel_read_file.c:38) [ 51.647489][ T218] ? init_module_from_file (kernel/module/main.c:3701) [ 51.647499][ T218] init_module_from_file (kernel/module/main.c:3701) [ 51.647514][ T218] ? __pfx_init_module_from_file (kernel/module/main.c:3677) [ 51.647525][ T218] ? mm_get_unmapped_area (arch/x86/include/asm/bitops.h:206 arch/x86/include/asm/bitops.h:238 include/asm-generic/bitops/instrumented-non-atomic.h:142 mm/mmap.c:805 mm/mmap.c:871) [ 51.647540][ T218] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 51.647547][ T218] ? __pfx__raw_spin_lock (kernel/locking/spinlock.c:153) [ 51.647560][ T218] idempotent_init_module (kernel/module/main.c:3713) [ 51.647573][ T218] ? __pfx_idempotent_init_module (kernel/module/main.c:3705) [ 51.647582][ T218] ? __pfx___seccomp_filter (kernel/seccomp.c:1244) [ 51.647590][ T218] ? fdget (include/linux/atomic/atomic-arch-fallback.h:479 include/linux/atomic/atomic-instrumented.h:50 fs/file.c:1167 fs/file.c:1181) [ 51.647607][ T218] ? security_capable (security/security.c:1142) [ 51.647615][ T218] __x64_sys_finit_module (include/linux/file.h:62 include/linux/file.h:83 kernel/module/main.c:3736 kernel/module/main.c:3723 kernel/module/main.c:3723) [ 51.647627][ T218] ? syscall_trace_enter (kernel/entry/syscall-common.c:44) [ 51.647640][ T218] do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) [ 51.647657][ T218] ? fput (arch/x86/include/asm/atomic64_64.h:79 include/linux/atomic/atomic-arch-fallback.h:2913 include/linux/atomic/atomic-arch-fallback.h:3364 include/linux/atomic/atomic-long.h:698 include/linux/atomic/atomic-instrumented.h:3767 include/linux/file_ref.h:157 fs/file_table.c:544) [ 51.647668][ T218] ? ksys_mmap_pgoff (mm/mmap.c:609) [ 51.647682][ T218] ? do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) [ 51.647694][ T218] ? from_kgid_munged (kernel/user_namespace.c:535) [ 51.647708][ T218] ? _copy_to_user (arch/x86/include/asm/uaccess_64.h:126 arch/x86/include/asm/uaccess_64.h:147 include/linux/uaccess.h:197 lib/usercopy.c:26) [ 51.647722][ T218] ? cp_new_stat (fs/stat.c:471) [ 51.647732][ T218] ? __pfx_cp_new_stat (fs/stat.c:471) The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250820/202508200843.8b006132-lkp@… -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

6 hours, 59 minutes

Re: [PATCH net-next 5/6] net: ti: icssg-prueth: Add AF_XDP zero copy for RX

by Jakub Kicinski

On Mon, 18 Aug 2025 16:54:23 +0530 Meghana Malladi wrote: > @@ -1332,6 +1350,13 @@ static int prueth_xsk_wakeup(struct net_device *ndev, u32 qid, u32 flags) > } > } > > + if (flags & XDP_WAKEUP_RX) { > + if (!napi_if_scheduled_mark_missed(&emac->napi_rx)) { > + if (likely(napi_schedule_prep(&emac->napi_rx))) > + __napi_schedule(&emac->napi_rx); > + } > + } > + > return 0; I suspect this series is generated against old source or there's another conflicting series in flight, because git ends up applying this chunk to prueth_xsk_pool_disable() :S Before you proceed with AF_XDP could you make this driver build under COMPILE_TEST on x86? This is very easy to miss, luckily we got an off list report but its pure luck. And obviously much more effort for the maintainers to investigate than if it was caught by the CI. -- pw-bot: cr

23 hours, 51 minutes

Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks

by Christian König

On 19.08.25 11:04, Janusz Krzysztofik wrote: > On Monday, 18 August 2025 16:42:56 CEST Christian König wrote: >> On 18.08.25 16:30, Janusz Krzysztofik wrote: >>> Hi Christian, >>> >>> On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote: >>>> >>>> On 14.08.25 10:16, Janusz Krzysztofik wrote: >>>>> When first user starts waiting on a not yet signaled fence of a chain >>>>> link, a dma_fence_chain callback is added to a user fence of that link. >>>>> When the user fence of that chain link is then signaled, the chain is >>>>> traversed in search for a first not signaled link and the callback is >>>>> rearmed on a user fence of that link. >>>>> >>>>> Since chain fences may be exposed to user space, e.g. over drm_syncobj >>>>> IOCTLs, users may start waiting on any link of the chain, then many links >>>>> of a chain may have signaling enabled and their callbacks added to their >>>>> user fences. Once an arbitrary user fence is signaled, all >>>>> dma_fence_chain callbacks added to it so far must be rearmed to another >>>>> user fence of the chain. In extreme scenarios, when all N links of a >>>>> chain are awaited and then signaled in reverse order, the dma_fence_chain >>>>> callback may be called up to N * (N + 1) / 2 times (an arithmetic series). >>>>> >>>>> To avoid that potential excessive accumulation of dma_fence_chain >>>>> callbacks, rearm a trimmed-down, signal only callback version to the base >>>>> fence of a previous link, if not yet signaled, otherwise just signal the >>>>> base fence of the current link instead of traversing the chain in search >>>>> for a first not signaled link and moving all callbacks collected so far to >>>>> a user fence of that link. >>>> >>>> Well clear NAK to that! You can easily overflow the kernel stack with that! >>> >>> I'll be happy to propose a better solution, but for that I need to understand >>> better your message. Could you please point out an exact piece of the >>> proposed code and/or describe a scenario where you can see the risk of stack >>> overflow? >> >> The sentence "rearm .. to the base fence of a previous link" sounds like you are trying to install a callback on the signaling to the previous chain element. >> >> That is exactly what I pointed out previously where you need to be super careful because when this chain signals the callbacks will execute recursively which means that you can trivially overflow the kernel stack if you have more than a handful of chain elements. >> >> In other words A waits for B, B waits for C, C waits for D etc.... when D finally signals it will call C which in turn calls B which in turn calls A. > > OK, maybe my commit description was not precise enough, however, I didn't > describe implementation details (how) intentionally. > When D signals then it doesn't call C directly, only it submits an irq work > that calls C. Then C doesn't just call B, only it submits another irq work > that calls B, and so on. > Doesn't that code pattern effectively break the recursion loop into separate > work items, each with its own separate stack? No, it's architecture dependent if the irq_work executes on a separate stack or not. You would need a work_struct to really separate the two and I would reject that because it adds additional latency to the signaling path. >> >> Even if the chain is a recursive data structure you absolutely can't use recursion for the handling of it. >> >> Maybe I misunderstood your textual description but reading a sentence like this rings all alarm bells here. Otherwise I can't see what the patch is supposed to be optimizing. > > OK, maybe I should start my commit description of this patch with a copy of > the first sentence from cover letter and also from patch 1/4 description that > informs about the problem as reported by CI. Maybe I should also provide a > comparison of measured signaling times from trybot executions [1][2][3]. > Here are example numbers from CI machine fi-bsw-n3050: Yeah and I've pointed out before that this is irrelevant. The problem is *not* the dma_fence_chain implementation, that one is doing exactly what is expected to do. The problem is that the test case is nonsense. I've pointed that out numerous times now! Regards, Christian. > > With signaling time reports only added to selftests (patch 1 of 4): > <6> [777.914451] dma-buf: Running dma_fence_chain/wait_forward > <6> [778.123516] wait_forward: 4096 signals in 21373487 ns > <6> [778.335709] dma-buf: Running dma_fence_chain/wait_backward > <6> [795.791546] wait_backward: 4096 signals in 17249051192 ns > <6> [795.859699] dma-buf: Running dma_fence_chain/wait_random > <6> [796.161375] wait_random: 4096 signals in 97386256 ns > > With dma_fence_enable_signaling() replaced in selftests with dma_fence_wait() > (patches 1-3 of 4): > <6> [782.505692] dma-buf: Running dma_fence_chain/wait_forward > <6> [784.609213] wait_forward: 4096 signals in 36513103 ns > <3> [784.837226] Reported -4 for kthread_stop_put(0)! > <6> [785.147643] dma-buf: Running dma_fence_chain/wait_backward > <6> [806.367763] wait_backward: 4096 signals in 18428009499 ns > <6> [807.175325] dma-buf: Running dma_fence_chain/wait_random > <6> [809.453942] wait_random: 4096 signals in 119761950 ns > > With the fix (patches 1-4 of 4): > <6> [731.519020] dma-buf: Running dma_fence_chain/wait_forward > <6> [733.623375] wait_forward: 4096 signals in 31890220 ns > <6> [734.258972] dma-buf: Running dma_fence_chain/wait_backward > <6> [736.267325] wait_backward: 4096 signals in 39007955 ns > <6> [736.700221] dma-buf: Running dma_fence_chain/wait_random > <6> [739.346706] wait_random: 4096 signals in 48384865 ns > > Signaling time in wait_backward selftest has been reduced from 17s to 39ms. > > [1] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152785v1/index.html? > [2] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152828v2/index.html? > [3] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152830v2/index.html? > >> >>>> >>>> Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go. >>> >>> Could you please explain what piece of code you are referring to when you say >>> "messing with the fence ops outside the dma_fence code"? If not this patch >>> then which particular one of this series did you mean? I'm assuming you >>> didn't mean drm_syncobj code that I mentioned in my commit descriptions. >> >> See below. >> >>> >>> Thanks, >>> Janusz >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 >>>>> Suggested-by: Chris Wilson <chris.p.wilson(a)linux.intel.com> >>>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> >>>>> --- >>>>> drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++----- >>>>> 1 file changed, 84 insertions(+), 17 deletions(-) >>>>> >>>>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c >>>>> index a8a90acf4f34d..90eff264ee05c 100644 >>>>> --- a/drivers/dma-buf/dma-fence-chain.c >>>>> +++ b/drivers/dma-buf/dma-fence-chain.c >>>>> @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence) >>>>> return "unbound"; >>>>> } >>>>> >>>>> -static void dma_fence_chain_irq_work(struct irq_work *work) >>>>> +static void signal_irq_work(struct irq_work *work) >>>>> { >>>>> struct dma_fence_chain *chain; >>>>> >>>>> chain = container_of(work, typeof(*chain), work); >>>>> >>>>> - /* Try to rearm the callback */ >>>>> - if (!dma_fence_chain_enable_signaling(&chain->base)) >>>>> - /* Ok, we are done. No more unsignaled fences left */ >>>>> - dma_fence_signal(&chain->base); >>>>> + dma_fence_signal(&chain->base); >>>>> dma_fence_put(&chain->base); >>>>> } >>>>> >>>>> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>>>> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>>>> +{ >>>>> + struct dma_fence_chain *chain; >>>>> + >>>>> + chain = container_of(cb, typeof(*chain), cb); >>>>> + init_irq_work(&chain->work, signal_irq_work); >>>>> + irq_work_queue(&chain->work); >>>>> +} >>>>> + >>>>> +static void rearm_irq_work(struct irq_work *work) >>>>> +{ >>>>> + struct dma_fence_chain *chain; >>>>> + struct dma_fence *prev; >>>>> + >>>>> + chain = container_of(work, typeof(*chain), work); >>>>> + >>>>> + rcu_read_lock(); >>>>> + prev = rcu_dereference(chain->prev); >>>>> + if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb)) >>>>> + prev = NULL; >>>>> + rcu_read_unlock(); >>>>> + if (prev) >>>>> + return; >>>>> + >>>>> + /* Ok, we are done. No more unsignaled fences left */ >>>>> + signal_irq_work(work); >>>>> +} >>>>> + >>>>> +static inline bool fence_is_signaled__nested(struct dma_fence *fence) >>>>> +{ >>>>> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >>>>> + return true; >>>>> + >> >>>>> + if (fence->ops->signaled && fence->ops->signaled(fence)) { >> >> Calling this outside of dma-fence.[ch] is a clear no-go. > > But this patch applies only to drivers/dma-buf/dma-fence-chain.c, not > outside of it. > > Thanks, > Janusz > >> >> Regards, >> Christian. >> >>>>> + unsigned long flags; >>>>> + >>>>> + spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING); >>>>> + dma_fence_signal_locked(fence); >>>>> + spin_unlock_irqrestore(fence->lock, flags); >>>>> + >>>>> + return true; >>>>> + } >>>>> + >>>>> + return false; >>>>> +} >>>>> + >>>>> +static bool prev_is_signaled(struct dma_fence_chain *chain) >>>>> +{ >>>>> + struct dma_fence *prev; >>>>> + bool result; >>>>> + >>>>> + rcu_read_lock(); >>>>> + prev = rcu_dereference(chain->prev); >>>>> + result = !prev || fence_is_signaled__nested(prev); >>>>> + rcu_read_unlock(); >>>>> + >>>>> + return result; >>>>> +} >>>>> + >>>>> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>>>> { >>>>> struct dma_fence_chain *chain; >>>>> >>>>> chain = container_of(cb, typeof(*chain), cb); >>>>> - init_irq_work(&chain->work, dma_fence_chain_irq_work); >>>>> + if (prev_is_signaled(chain)) { >>>>> + /* Ok, we are done. No more unsignaled fences left */ >>>>> + init_irq_work(&chain->work, signal_irq_work); >>>>> + } else { >>>>> + /* Try to rearm the callback */ >>>>> + init_irq_work(&chain->work, rearm_irq_work); >>>>> + } >>>>> + >>>>> irq_work_queue(&chain->work); >>>>> - dma_fence_put(f); >>>>> } >>>>> >>>>> static bool dma_fence_chain_enable_signaling(struct dma_fence *fence) >>>>> { >>>>> struct dma_fence_chain *head = to_dma_fence_chain(fence); >>>>> + int err = -ENOENT; >>>>> >>>>> - dma_fence_get(&head->base); >>>>> - dma_fence_chain_for_each(fence, &head->base) { >>>>> - struct dma_fence *f = dma_fence_chain_contained(fence); >>>>> + if (WARN_ON(!head)) >>>>> + return false; >>>>> >>>>> - dma_fence_get(f); >>>>> - if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) { >>>>> + dma_fence_get(fence); >>>>> + if (head->fence) >>>>> + err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb); >>>>> + if (err) { >>>>> + if (prev_is_signaled(head)) { >>>>> dma_fence_put(fence); >>>>> - return true; >>>>> + } else { >>>>> + init_irq_work(&head->work, rearm_irq_work); >>>>> + irq_work_queue(&head->work); >>>>> + err = 0; >>>>> } >>>>> - dma_fence_put(f); >>>>> } >>>>> - dma_fence_put(&head->base); >>>>> - return false; >>>>> + >>>>> + return !err; >>>>> } >>>>> >>>>> static bool dma_fence_chain_signaled(struct dma_fence *fence) >>>> >>>> >>> >>> >>> >>> >> >> > > > >

1 day, 5 hours

[PATCH] drm/amdgpu: Pin buffer while vmap'ing exported dma-buf objects

by Thomas Zimmermann

Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed. For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below. [ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping. Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap. Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code. A discussion of various approaches to solving the problem is available at [1]. Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann <tzimmermann(a)suse.de> Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse… Cc: Shixiong Ou <oushixiong(a)kylinos.cn> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: David Airlie <airlied(a)gmail.com> Cc: Simona Vetter <simona(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: dri-devel(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org Link: https://lore.kernel.org/dri-devel/9792c6c3-a2b8-4b2b-b5ba-fba19b153e21@suse… # [1] --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 36 +++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 5743ebb2f1b7..5b33776eeece 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -285,6 +285,38 @@ static int amdgpu_dma_buf_begin_cpu_access(struct dma_buf *dma_buf, return ret; } +static int amdgpu_dma_buf_vmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + int ret; + + /* + * Pin to keep buffer in place while it's vmap'ed. The actual + * location is not important as long as it's mapable. + * + * This code is required for exporting to GEM-SHMEM without S/G table. + * Once GEM-SHMEM supports dynamic imports, it should be dropped. + */ + ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_MASK); + if (ret) + return ret; + ret = drm_gem_dmabuf_vmap(dma_buf, map); + if (ret) + amdgpu_bo_unpin(bo); + + return ret; +} + +static void amdgpu_dma_buf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map) +{ + struct drm_gem_object *obj = dma_buf->priv; + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + + drm_gem_dmabuf_vunmap(dma_buf, map); + amdgpu_bo_unpin(bo); +} + const struct dma_buf_ops amdgpu_dmabuf_ops = { .attach = amdgpu_dma_buf_attach, .pin = amdgpu_dma_buf_pin, @@ -294,8 +326,8 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .release = drm_gem_dmabuf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, - .vmap = drm_gem_dmabuf_vmap, - .vunmap = drm_gem_dmabuf_vunmap, + .vmap = amdgpu_dma_buf_vmap, + .vunmap = amdgpu_dma_buf_vunmap, }; /** -- 2.50.1

1 day, 23 hours

Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks

by Christian König

On 18.08.25 16:30, Janusz Krzysztofik wrote: > Hi Christian, > > On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote: >> >> On 14.08.25 10:16, Janusz Krzysztofik wrote: >>> When first user starts waiting on a not yet signaled fence of a chain >>> link, a dma_fence_chain callback is added to a user fence of that link. >>> When the user fence of that chain link is then signaled, the chain is >>> traversed in search for a first not signaled link and the callback is >>> rearmed on a user fence of that link. >>> >>> Since chain fences may be exposed to user space, e.g. over drm_syncobj >>> IOCTLs, users may start waiting on any link of the chain, then many links >>> of a chain may have signaling enabled and their callbacks added to their >>> user fences. Once an arbitrary user fence is signaled, all >>> dma_fence_chain callbacks added to it so far must be rearmed to another >>> user fence of the chain. In extreme scenarios, when all N links of a >>> chain are awaited and then signaled in reverse order, the dma_fence_chain >>> callback may be called up to N * (N + 1) / 2 times (an arithmetic series). >>> >>> To avoid that potential excessive accumulation of dma_fence_chain >>> callbacks, rearm a trimmed-down, signal only callback version to the base >>> fence of a previous link, if not yet signaled, otherwise just signal the >>> base fence of the current link instead of traversing the chain in search >>> for a first not signaled link and moving all callbacks collected so far to >>> a user fence of that link. >> >> Well clear NAK to that! You can easily overflow the kernel stack with that! > > I'll be happy to propose a better solution, but for that I need to understand > better your message. Could you please point out an exact piece of the > proposed code and/or describe a scenario where you can see the risk of stack > overflow? The sentence "rearm .. to the base fence of a previous link" sounds like you are trying to install a callback on the signaling to the previous chain element. That is exactly what I pointed out previously where you need to be super careful because when this chain signals the callbacks will execute recursively which means that you can trivially overflow the kernel stack if you have more than a handful of chain elements. In other words A waits for B, B waits for C, C waits for D etc.... when D finally signals it will call C which in turn calls B which in turn calls A. Even if the chain is a recursive data structure you absolutely can't use recursion for the handling of it. Maybe I misunderstood your textual description but reading a sentence like this rings all alarm bells here. Otherwise I can't see what the patch is supposed to be optimizing. >> >> Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go. > > Could you please explain what piece of code you are referring to when you say > "messing with the fence ops outside the dma_fence code"? If not this patch > then which particular one of this series did you mean? I'm assuming you > didn't mean drm_syncobj code that I mentioned in my commit descriptions. See below. > > Thanks, > Janusz > >> >> Regards, >> Christian. >> >>> >>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 >>> Suggested-by: Chris Wilson <chris.p.wilson(a)linux.intel.com> >>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com> >>> --- >>> drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++----- >>> 1 file changed, 84 insertions(+), 17 deletions(-) >>> >>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c >>> index a8a90acf4f34d..90eff264ee05c 100644 >>> --- a/drivers/dma-buf/dma-fence-chain.c >>> +++ b/drivers/dma-buf/dma-fence-chain.c >>> @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence) >>> return "unbound"; >>> } >>> >>> -static void dma_fence_chain_irq_work(struct irq_work *work) >>> +static void signal_irq_work(struct irq_work *work) >>> { >>> struct dma_fence_chain *chain; >>> >>> chain = container_of(work, typeof(*chain), work); >>> >>> - /* Try to rearm the callback */ >>> - if (!dma_fence_chain_enable_signaling(&chain->base)) >>> - /* Ok, we are done. No more unsignaled fences left */ >>> - dma_fence_signal(&chain->base); >>> + dma_fence_signal(&chain->base); >>> dma_fence_put(&chain->base); >>> } >>> >>> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>> +{ >>> + struct dma_fence_chain *chain; >>> + >>> + chain = container_of(cb, typeof(*chain), cb); >>> + init_irq_work(&chain->work, signal_irq_work); >>> + irq_work_queue(&chain->work); >>> +} >>> + >>> +static void rearm_irq_work(struct irq_work *work) >>> +{ >>> + struct dma_fence_chain *chain; >>> + struct dma_fence *prev; >>> + >>> + chain = container_of(work, typeof(*chain), work); >>> + >>> + rcu_read_lock(); >>> + prev = rcu_dereference(chain->prev); >>> + if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb)) >>> + prev = NULL; >>> + rcu_read_unlock(); >>> + if (prev) >>> + return; >>> + >>> + /* Ok, we are done. No more unsignaled fences left */ >>> + signal_irq_work(work); >>> +} >>> + >>> +static inline bool fence_is_signaled__nested(struct dma_fence *fence) >>> +{ >>> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >>> + return true; >>> + >>> + if (fence->ops->signaled && fence->ops->signaled(fence)) { Calling this outside of dma-fence.[ch] is a clear no-go. Regards, Christian. >>> + unsigned long flags; >>> + >>> + spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING); >>> + dma_fence_signal_locked(fence); >>> + spin_unlock_irqrestore(fence->lock, flags); >>> + >>> + return true; >>> + } >>> + >>> + return false; >>> +} >>> + >>> +static bool prev_is_signaled(struct dma_fence_chain *chain) >>> +{ >>> + struct dma_fence *prev; >>> + bool result; >>> + >>> + rcu_read_lock(); >>> + prev = rcu_dereference(chain->prev); >>> + result = !prev || fence_is_signaled__nested(prev); >>> + rcu_read_unlock(); >>> + >>> + return result; >>> +} >>> + >>> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>> { >>> struct dma_fence_chain *chain; >>> >>> chain = container_of(cb, typeof(*chain), cb); >>> - init_irq_work(&chain->work, dma_fence_chain_irq_work); >>> + if (prev_is_signaled(chain)) { >>> + /* Ok, we are done. No more unsignaled fences left */ >>> + init_irq_work(&chain->work, signal_irq_work); >>> + } else { >>> + /* Try to rearm the callback */ >>> + init_irq_work(&chain->work, rearm_irq_work); >>> + } >>> + >>> irq_work_queue(&chain->work); >>> - dma_fence_put(f); >>> } >>> >>> static bool dma_fence_chain_enable_signaling(struct dma_fence *fence) >>> { >>> struct dma_fence_chain *head = to_dma_fence_chain(fence); >>> + int err = -ENOENT; >>> >>> - dma_fence_get(&head->base); >>> - dma_fence_chain_for_each(fence, &head->base) { >>> - struct dma_fence *f = dma_fence_chain_contained(fence); >>> + if (WARN_ON(!head)) >>> + return false; >>> >>> - dma_fence_get(f); >>> - if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) { >>> + dma_fence_get(fence); >>> + if (head->fence) >>> + err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb); >>> + if (err) { >>> + if (prev_is_signaled(head)) { >>> dma_fence_put(fence); >>> - return true; >>> + } else { >>> + init_irq_work(&head->work, rearm_irq_work); >>> + irq_work_queue(&head->work); >>> + err = 0; >>> } >>> - dma_fence_put(f); >>> } >>> - dma_fence_put(&head->base); >>> - return false; >>> + >>> + return !err; >>> } >>> >>> static bool dma_fence_chain_signaled(struct dma_fence *fence) >> >> > > > >

1 day, 23 hours

[PATCH v2 0/3] udmabuf: Sync to attached devices

by Andrew Davis

Hello all, This series makes it so the udmabuf will sync the backing buffer with the set of attached devices as required for DMA-BUFs when doing {begin,end}_cpu_access. Thanks Andrew Changes for v2: - fix attachment table use-after-free - rebased on v6.17-rc1 Andrew Davis (3): udmabuf: Keep track current device mappings udmabuf: Sync buffer mappings for attached devices udmabuf: Use module_misc_device() to register this device drivers/dma-buf/udmabuf.c | 133 +++++++++++++++++++------------------- 1 file changed, 67 insertions(+), 66 deletions(-) -- 2.39.2

2 days, 3 hours

[PATCH] drm/gem-shmem: Pin and unpin buffers when importing w/o S/G table

by Thomas Zimmermann

Imported dma-buf objects need to be pinned while being vmap'ed into kernel address space. This used to be done before while creating an S/G table. GEM-SHMEN can import dma-buf objects without creating the S/G table, but the pin/unpin is now missing. Leads to page-mapping errors such as the one shown below. [ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] Support pin/unpin in drm_buf_map_attachment() without creating an S/G table. Passing DMA_NONE for the DMA direction will only pin. Do the inverse for unmap_attachment(). Modify GEM-SHMEM accordingly, so that it pins the imported dma-buf. Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Fixes: 660cd44659a0 ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann <tzimmermann(a)suse.de> Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse… Cc: Shixiong Ou <oushixiong(a)kylinos.cn> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: David Airlie <airlied(a)gmail.com> Cc: Simona Vetter <simona(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: dri-devel(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/dma-buf/dma-buf.c | 16 +++++++++++++--- drivers/gpu/drm/drm_gem_shmem_helper.c | 11 ++++++++++- drivers/gpu/drm/drm_prime.c | 2 ++ 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 2bcf9ceca997..f1e1385ce630 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1086,7 +1086,8 @@ EXPORT_SYMBOL_NS_GPL(dma_buf_unpin, "DMA_BUF"); * @direction: [in] direction of DMA transfer * * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR - * on error. May return -EINTR if it is interrupted by a signal. + * on error. May return -EINTR if it is interrupted by a signal. Returns NULL + * on success iff direction is DMA_NONE. * * On success, the DMA addresses and lengths in the returned scatterlist are * PAGE_SIZE aligned. @@ -1122,6 +1123,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (ret) return ERR_PTR(ret); } + if (!valid_dma_direction(direction)) + return NULL; /* only pin; don't map */ sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); if (!sg_table) @@ -1216,14 +1219,21 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, { might_sleep(); - if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) + if (WARN_ON(!attach || !attach->dmabuf)) return; dma_resv_assert_held(attach->dmabuf->resv); + if (!valid_dma_direction(direction)) + goto unpin; + + if (WARN_ON(!sg_table)) + return; + mangle_sg_table(sg_table); attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, direction); +unpin: if (dma_buf_pin_on_map(attach)) attach->dmabuf->ops->unpin(attach); } @@ -1245,7 +1255,7 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, { might_sleep(); - if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) + if (WARN_ON(!attach || !attach->dmabuf)) return; dma_resv_lock(attach->dmabuf->resv, NULL); diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 5d1349c34afd..1b66501420d3 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -817,6 +817,7 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev, struct dma_buf *dma_buf) { struct dma_buf_attachment *attach; + struct sg_table *sgt; struct drm_gem_shmem_object *shmem; struct drm_gem_object *obj; size_t size; @@ -838,12 +839,18 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev, get_dma_buf(dma_buf); + sgt = dma_buf_map_attachment_unlocked(attach, DMA_NONE); + if (IS_ERR(sgt)) { + ret = PTR_ERR(sgt); + goto fail_detach; + } + size = PAGE_ALIGN(attach->dmabuf->size); shmem = __drm_gem_shmem_create(dev, size, true, NULL); if (IS_ERR(shmem)) { ret = PTR_ERR(shmem); - goto fail_detach; + goto fail_unmap; } drm_dbg_prime(dev, "size = %zu\n", size); @@ -853,6 +860,8 @@ struct drm_gem_object *drm_gem_shmem_prime_import_no_map(struct drm_device *dev, return &shmem->base; +fail_unmap: + dma_buf_unmap_attachment_unlocked(attach, sgt, DMA_NONE); fail_detach: dma_buf_detach(dma_buf, attach); dma_buf_put(dma_buf); diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 43a10b4af43a..b3b070868e3b 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -1109,6 +1109,8 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg) attach = obj->import_attach; if (sg) dma_buf_unmap_attachment_unlocked(attach, sg, DMA_BIDIRECTIONAL); + else + dma_buf_unmap_attachment_unlocked(attach, NULL, DMA_NONE); dma_buf = attach->dmabuf; dma_buf_detach(attach->dmabuf, attach); /* remove the reference */ -- 2.50.1

2 days, 5 hours

[PATCH v2 0/2] accel: Add Arm Ethos-U NPU

by Rob Herring (Arm)

The Arm Ethos-U65/85 NPUs are designed for edge AI inference applications[0]. The driver works with Mesa Teflon. A merge request for Ethos support is here[1]. The UAPI should also be compatible with the downstream (open source) driver stack[2] and Vela compiler though that has not been implemented. Testing so far has been on i.MX93 boards with Ethos-U65. Support for U85 is still todo. Only minor changes on driver side will be needed for U85 support. A git tree is here[3]. Rob [0] https://www.arm.com/products/silicon-ip-cpu?families=ethos%20npus [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36699/ [2] https://gitlab.arm.com/artificial-intelligence/ethos-u/ [3] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git ethos-v2 Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org> --- Changes in v2: - Rebase on v6.17-rc1 adapting to scheduler changes - scheduler: Drop the reset workqueue. According to the scheduler docs, we don't need it since we have a single h/w queue. - scheduler: Rework the timeout handling to continue running if we are making progress. Fixes timeouts on larger jobs. - Reset the NPU on resume so it's in a known state - Add error handling on clk_get() calls - Fix drm_mm splat on module unload. We were missing a put on the cmdstream BO in the scheduler clean-up. - Fix 0-day report needing explicit bitfield.h include - Link to v1: https://lore.kernel.org/r/20250722-ethos-v1-0-cc1c5a0cbbfb@kernel.org --- Rob Herring (Arm) (2): dt-bindings: npu: Add Arm Ethos-U65/U85 accel: Add Arm Ethos-U NPU driver .../devicetree/bindings/npu/arm,ethos.yaml | 79 +++ MAINTAINERS | 9 + drivers/accel/Kconfig | 1 + drivers/accel/Makefile | 1 + drivers/accel/ethos/Kconfig | 10 + drivers/accel/ethos/Makefile | 4 + drivers/accel/ethos/ethos_device.h | 181 ++++++ drivers/accel/ethos/ethos_drv.c | 418 ++++++++++++ drivers/accel/ethos/ethos_drv.h | 15 + drivers/accel/ethos/ethos_gem.c | 707 +++++++++++++++++++++ drivers/accel/ethos/ethos_gem.h | 46 ++ drivers/accel/ethos/ethos_job.c | 514 +++++++++++++++ drivers/accel/ethos/ethos_job.h | 41 ++ include/uapi/drm/ethos_accel.h | 262 ++++++++ 14 files changed, 2288 insertions(+) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250715-ethos-3fdd39ef6f19 Best regards, -- Rob Herring (Arm) <robh(a)kernel.org>

5 days, 2 hours

Re: [PATCH v11 2/9] dma-buf: dma-heap: export declared functions

by T.J. Mercier

On Wed, Aug 13, 2025 at 11:13 PM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > On Wed, Aug 13, 2025 at 08:02:51AM +0200, Jens Wiklander wrote: > > Export the dma-buf heap functions to allow them to be used by the OP-TEE > > driver. The OP-TEE driver wants to register and manage specific secure > > DMA heaps with it. > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com> > > --- > > drivers/dma-buf/dma-heap.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > Can we get an ack from DMAbuf maintainers here? With that we should be > able to queue this patch-set for linux-next targetting the 6.18 merge > window. > > -Sumit Reviewed-by: T.J. Mercier <tjmercier(a)google.com> Sorry I haven't been able to participate much upstream lately. > > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c > > index 3cbe87d4a464..cdddf0e24dce 100644 > > --- a/drivers/dma-buf/dma-heap.c > > +++ b/drivers/dma-buf/dma-heap.c > > @@ -202,6 +202,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap) > > { > > return heap->priv; > > } > > +EXPORT_SYMBOL(dma_heap_get_drvdata); > > > > /** > > * dma_heap_get_name - get heap name > > @@ -214,6 +215,7 @@ const char *dma_heap_get_name(struct dma_heap *heap) > > { > > return heap->name; > > } > > +EXPORT_SYMBOL(dma_heap_get_name); > > > > /** > > * dma_heap_add - adds a heap to dmabuf heaps > > @@ -303,6 +305,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) > > kfree(heap); > > return err_ret; > > } > > +EXPORT_SYMBOL(dma_heap_add); > > > > static char *dma_heap_devnode(const struct device *dev, umode_t *mode) > > { > > -- > > 2.43.0 > >

5 days, 14 hours

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig