- Linaro-mm-sig - lists.linaro.org

Re: Warnings in next-20250703 caused by commit 582111e630f5

by Thomas Zimmermann

Hi Am 03.07.25 um 13:59 schrieb Bert Karwatzki: > When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid (last > updated 20250703) I get a several warnings of the following kind: > > [ 8.702999] [ T1628] ------------[ cut here ]------------ > [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628 Well, that didn't take long to blow up. Thanks for reporting the bug. I have an idea how to fix this, but it would likely just trigger the next issue. Christian, can we revert this patch, and also the other patches that switch from import_attach->dmabuf to ->dma_buf that cased the problem? Best regards Thomas > [ 8.703007] [ T1628] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common amd_pmc evdev mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse > [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform i2c_designware_core > [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} > [ 8.703085] [ T1628] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 > [ 8.703086] [ T1628] RIP: 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 > [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 48 85 c0 > [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 > [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 > [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: ffff94cdc062b478 RDI: ffff94ce71390448 > [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: ffff94cdc062b618 R09: ffff94ce14780278 > [ 8.703094] [ T1628] R10: 0000000000000001 R11: ffff94cdc062b478 R12: ffff94ce14780010 > [ 8.703095] [ T1628] R13: 0000000000000007 R14: 0000000000000004 R15: ffff94ce14780010 > [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 > [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: 000000012533f000 CR4: 0000000000750ef0 > [ 8.703099] [ T1628] PKRU: 55555554 > [ 8.703100] [ T1628] Call Trace: > [ 8.703101] [ T1628] <TASK> > [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 [drm_kms_helper] > [ 8.703113] [ T1628] __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] > [ 8.703119] [ T1628] drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] > [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 > [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 > [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 [drm_kms_helper] > [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 > [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 > [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 > [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 > [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 > [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 > [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 > [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 > [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 > [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] > [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 > [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 > [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d > [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db > [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: 000056224e896ea0 RCX: 00007fc1645f78db > [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: 00000000c01c64a3 RDI: 000000000000000f > [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: 0000000000000100 R09: 0000562210547ab0 > [ 8.703298] [ T1628] R10: 000000000000004c R11: 0000000000000246 R12: 00000000c01c64a3 > [ 8.703298] [ T1628] R13: 000000000000000f R14: 0000000000000000 R15: 000056224e5c1cd0 > [ 8.703302] [ T1628] </TASK> > [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]--- > > As the warnings do not occur in next-20250702, I looked at the commits given by > $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm > to search for a culprit. So I reverted the most likely candidate, > commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for framebuffers"), > in next-20250703 and the warnings disappeared. > This is the hardware I used: > $ lspci > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU > 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge > 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge > 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge > 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge > 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge > 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge > 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge > 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge > 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus > 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) > 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) > 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 > 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 > 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 > 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 > 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 > 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 > 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 > 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 > 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) > 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch > 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) > 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller > 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz > 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) > 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01) > 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03) > 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) > 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller > 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor > 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 > 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 > 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01) > 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller > 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub > > > Bert Karwatzki -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)

5 months, 2 weeks

2
4
0 0

Re: [PATCH v10 5/9] tee: new ioctl to a register tee_shm from a dmabuf file descriptor

by Jens Wiklander

On Thu, Jul 3, 2025 at 9:22 AM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > On Wed, Jun 18, 2025 at 08:47:51AM +0200, Jens Wiklander wrote: > > On Tue, Jun 17, 2025 at 12:48 PM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > > > > > On Tue, Jun 10, 2025 at 03:13:49PM +0200, Jens Wiklander wrote: > > > > From: Etienne Carriere <etienne.carriere(a)foss.st.com> > > > > > > > > Add a userspace API to create a tee_shm object that refers to a dmabuf > > > > reference. > > > > > > > > Userspace registers the dmabuf file descriptor as in a tee_shm object. > > > > The registration is completed with a tee_shm returned file descriptor. > > > > > > > > Userspace is free to close the dmabuf file descriptor after it has been > > > > registered since all the resources are now held via the new tee_shm > > > > object. > > > > > > > > Closing the tee_shm file descriptor will eventually release all > > > > resources used by the tee_shm object when all references are released. > > > > > > > > The new IOCTL, TEE_IOC_SHM_REGISTER_FD, supports dmabuf references to > > > > physically contiguous memory buffers. Dmabuf references acquired from > > > > the TEE DMA-heap can be used as protected memory for Secure Video Path > > > > and such use cases. It depends on the TEE and the TEE driver if dmabuf > > > > references acquired by other means can be used. > > > > > > > > A new tee_shm flag is added to identify tee_shm objects built from a > > > > registered dmabuf, TEE_SHM_DMA_BUF. > > > > > > > > Signed-off-by: Etienne Carriere <etienne.carriere(a)foss.st.com> > > > > Signed-off-by: Olivier Masse <olivier.masse(a)nxp.com> > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > > > --- > > > > drivers/tee/tee_core.c | 63 +++++++++++++++++++++- > > > > drivers/tee/tee_private.h | 10 ++++ > > > > drivers/tee/tee_shm.c | 106 ++++++++++++++++++++++++++++++++++++-- > > > > include/linux/tee_core.h | 1 + > > > > include/linux/tee_drv.h | 10 ++++ > > > > include/uapi/linux/tee.h | 31 +++++++++++ > > > > 6 files changed, 217 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c > > > > index 5259b8223c27..0e9d9e5872a4 100644 > > > > --- a/drivers/tee/tee_core.c > > > > +++ b/drivers/tee/tee_core.c > > > > @@ -353,11 +353,49 @@ tee_ioctl_shm_register(struct tee_context *ctx, > > > > return ret; > > > > } > > > > > > > > +static int > > > > +tee_ioctl_shm_register_fd(struct tee_context *ctx, > > > > + struct tee_ioctl_shm_register_fd_data __user *udata) > > > > +{ > > > > + struct tee_ioctl_shm_register_fd_data data; > > > > + struct tee_shm *shm; > > > > + long ret; > > > > + > > > > + if (copy_from_user(&data, udata, sizeof(data))) > > > > + return -EFAULT; > > > > + > > > > + /* Currently no input flags are supported */ > > > > + if (data.flags) > > > > + return -EINVAL; > > > > + > > > > + shm = tee_shm_register_fd(ctx, data.fd); > > > > + if (IS_ERR(shm)) > > > > + return -EINVAL; > > > > + > > > > + data.id = shm->id; > > > > + data.flags = shm->flags; > > > > + data.size = shm->size; > > > > + > > > > + if (copy_to_user(udata, &data, sizeof(data))) > > > > + ret = -EFAULT; > > > > + else > > > > + ret = tee_shm_get_fd(shm); > > > > + > > > > + /* > > > > + * When user space closes the file descriptor the shared memory > > > > + * should be freed or if tee_shm_get_fd() failed then it will > > > > + * be freed immediately. > > > > + */ > > > > + tee_shm_put(shm); > > > > + return ret; > > > > +} > > > > + > > > > static int param_from_user_memref(struct tee_context *ctx, > > > > struct tee_param_memref *memref, > > > > struct tee_ioctl_param *ip) > > > > { > > > > struct tee_shm *shm; > > > > + size_t offs = 0; > > > > > > > > /* > > > > * If a NULL pointer is passed to a TA in the TEE, > > > > @@ -388,6 +426,26 @@ static int param_from_user_memref(struct tee_context *ctx, > > > > tee_shm_put(shm); > > > > return -EINVAL; > > > > } > > > > + > > > > + if (shm->flags & TEE_SHM_DMA_BUF) { > > > > + struct tee_shm_dmabuf_ref *ref; > > > > + > > > > + ref = container_of(shm, struct tee_shm_dmabuf_ref, shm); > > > > + if (ref->parent_shm) { > > > > + /* > > > > + * The shm already has one reference to > > > > + * ref->parent_shm so we are clear of 0. > > > > + * We're getting another reference since > > > > + * this shm will be used in the parameter > > > > + * list instead of the shm we got with > > > > + * tee_shm_get_from_id() above. > > > > + */ > > > > + refcount_inc(&ref->parent_shm->refcount); > > > > + tee_shm_put(shm); > > > > + shm = ref->parent_shm; > > > > + offs = ref->offset; > > > > + } > > > > + } > > > > } else if (ctx->cap_memref_null) { > > > > /* Pass NULL pointer to OP-TEE */ > > > > shm = NULL; > > > > @@ -395,7 +453,7 @@ static int param_from_user_memref(struct tee_context *ctx, > > > > return -EINVAL; > > > > } > > > > > > > > - memref->shm_offs = ip->a; > > > > + memref->shm_offs = ip->a + offs; > > > > memref->size = ip->b; > > > > memref->shm = shm; > > > > > > > > @@ -841,6 +899,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > > > > return tee_ioctl_shm_alloc(ctx, uarg); > > > > case TEE_IOC_SHM_REGISTER: > > > > return tee_ioctl_shm_register(ctx, uarg); > > > > + case TEE_IOC_SHM_REGISTER_FD: > > > > + return tee_ioctl_shm_register_fd(ctx, uarg); > > > > case TEE_IOC_OPEN_SESSION: > > > > return tee_ioctl_open_session(ctx, uarg); > > > > case TEE_IOC_INVOKE: > > > > @@ -1300,3 +1360,4 @@ MODULE_AUTHOR("Linaro"); > > > > MODULE_DESCRIPTION("TEE Driver"); > > > > MODULE_VERSION("1.0"); > > > > MODULE_LICENSE("GPL v2"); > > > > +MODULE_IMPORT_NS("DMA_BUF"); > > > > diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h > > > > index 6c6ff5d5eed2..308467705da6 100644 > > > > --- a/drivers/tee/tee_private.h > > > > +++ b/drivers/tee/tee_private.h > > > > @@ -13,6 +13,16 @@ > > > > #include <linux/mutex.h> > > > > #include <linux/types.h> > > > > > > > > +/* extra references appended to shm object for registered shared memory */ > > > > +struct tee_shm_dmabuf_ref { > > > > + struct tee_shm shm; > > > > + size_t offset; > > > > + struct dma_buf *dmabuf; > > > > + struct dma_buf_attachment *attach; > > > > + struct sg_table *sgt; > > > > + struct tee_shm *parent_shm; > > > > +}; > > > > + > > > > int tee_shm_get_fd(struct tee_shm *shm); > > > > > > > > bool tee_device_get(struct tee_device *teedev); > > > > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c > > > > index daf6e5cfd59a..e63095e84644 100644 > > > > --- a/drivers/tee/tee_shm.c > > > > +++ b/drivers/tee/tee_shm.c > > > > @@ -4,6 +4,7 @@ > > > > */ > > > > #include <linux/anon_inodes.h> > > > > #include <linux/device.h> > > > > +#include <linux/dma-buf.h> > > > > #include <linux/idr.h> > > > > #include <linux/io.h> > > > > #include <linux/mm.h> > > > > @@ -45,7 +46,21 @@ static void release_registered_pages(struct tee_shm *shm) > > > > > > > > static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > > > { > > > > - if (shm->flags & TEE_SHM_POOL) { > > > > + void *p = shm; > > > > + > > > > + if (shm->flags & TEE_SHM_DMA_BUF) { > > > > + struct tee_shm_dmabuf_ref *ref; > > > > + > > > > + ref = container_of(shm, struct tee_shm_dmabuf_ref, shm); > > > > + p = ref; > > > > + if (ref->attach) { > > > > + dma_buf_unmap_attachment(ref->attach, ref->sgt, > > > > + DMA_BIDIRECTIONAL); > > > > + > > > > + dma_buf_detach(ref->dmabuf, ref->attach); > > > > + } > > > > + dma_buf_put(ref->dmabuf); > > > > + } else if (shm->flags & TEE_SHM_POOL) { > > > > teedev->pool->ops->free(teedev->pool, shm); > > > > } else if (shm->flags & TEE_SHM_DYNAMIC) { > > > > int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm); > > > > @@ -59,7 +74,7 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > > > > > > > teedev_ctx_put(shm->ctx); > > > > > > > > - kfree(shm); > > > > + kfree(p); > > > > > > > > tee_device_put(teedev); > > > > } > > > > @@ -169,7 +184,7 @@ struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size) > > > > * tee_client_invoke_func(). The memory allocated is later freed with a > > > > * call to tee_shm_free(). > > > > * > > > > - * @returns a pointer to 'struct tee_shm' > > > > + * @returns a pointer to 'struct tee_shm' on success, and ERR_PTR on failure > > > > */ > > > > struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size) > > > > { > > > > @@ -179,6 +194,91 @@ struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size) > > > > } > > > > EXPORT_SYMBOL_GPL(tee_shm_alloc_kernel_buf); > > > > > > > > +struct tee_shm *tee_shm_register_fd(struct tee_context *ctx, int fd) > > > > +{ > > > > + struct tee_shm_dmabuf_ref *ref; > > > > + int rc; > > > > + > > > > + if (!tee_device_get(ctx->teedev)) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > + teedev_ctx_get(ctx); > > > > + > > > > + ref = kzalloc(sizeof(*ref), GFP_KERNEL); > > > > + if (!ref) { > > > > + rc = -ENOMEM; > > > > + goto err_put_tee; > > > > + } > > > > + > > > > + refcount_set(&ref->shm.refcount, 1); > > > > + ref->shm.ctx = ctx; > > > > + ref->shm.id = -1; > > > > + ref->shm.flags = TEE_SHM_DMA_BUF; > > > > + > > > > + ref->dmabuf = dma_buf_get(fd); > > > > + if (IS_ERR(ref->dmabuf)) { > > > > + rc = PTR_ERR(ref->dmabuf); > > > > + goto err_kfree_ref; > > > > + } > > > > + > > > > + rc = tee_heap_update_from_dma_buf(ctx->teedev, ref->dmabuf, > > > > + &ref->offset, &ref->shm, > > > > + &ref->parent_shm); > > > > + if (!rc) > > > > + goto out; > > > > > > One odd thing I find here, why do we bail out on success case here? > > > Don't we need the DMA buffer attach and map APIs to be invoked on > > > success case here? > > > > No, because if tee_heap_update_from_dma_buf() succeeds, we know > > everything we need about the buffer. Note that we're returning a valid > > pointer below to indicate success. > > AFAICS, protmem_pool_op_dyn_update_shm() and > protmem_pool_op_static_update_shm() both return 0 on success case... > > > > > > > > + if (rc != -EINVAL) > > > > + goto err_put_dmabuf; > > ...and with this check the below code path is only executed when rc == > -EINVAL, is that expected behaviour? Shouldn't we error out when -EINVAL > is returned? > > > > > + > > > > + ref->attach = dma_buf_attach(ref->dmabuf, &ctx->teedev->dev); > > > > + if (IS_ERR(ref->attach)) { > > > > + rc = PTR_ERR(ref->attach); > > > > + goto err_put_dmabuf; > > > > + } > > > > + > > > > + ref->sgt = dma_buf_map_attachment(ref->attach, DMA_BIDIRECTIONAL); > > > > + if (IS_ERR(ref->sgt)) { > > > > + rc = PTR_ERR(ref->sgt); > > > > + goto err_detach; > > > > + } > > Given above, have we really tested this code leg? I haven't tested it in this version of the patchset, but the code path has been used in the earliest version of the patchset. The idea was to support DMA-bufs allocated by other means, but it's not needed in this patchset, so let's remove it. Cheers, Jens > > -Sumit > > > > > + > > > > + if (sg_nents(ref->sgt->sgl) != 1) { > > > > + rc = -EINVAL; > > > > + goto err_unmap_attachement; > > > > + } > > > > + > > > > + ref->shm.paddr = page_to_phys(sg_page(ref->sgt->sgl)); > > > > + ref->shm.size = ref->sgt->sgl->length; > > > > + > > > > +out: > > > > + mutex_lock(&ref->shm.ctx->teedev->mutex); > > > > + ref->shm.id = idr_alloc(&ref->shm.ctx->teedev->idr, &ref->shm, > > > > + 1, 0, GFP_KERNEL); > > > > + mutex_unlock(&ref->shm.ctx->teedev->mutex); > > > > + if (ref->shm.id < 0) { > > > > + rc = ref->shm.id; > > > > + if (ref->attach) > > > > + goto err_unmap_attachement; > > > > + goto err_put_dmabuf; > > > > + } > > > > + > > > > + return &ref->shm; > > > > + > > > > +err_unmap_attachement: > > > > + dma_buf_unmap_attachment(ref->attach, ref->sgt, DMA_BIDIRECTIONAL); > > > > +err_detach: > > > > + dma_buf_detach(ref->dmabuf, ref->attach); > > > > +err_put_dmabuf: > > > > + dma_buf_put(ref->dmabuf); > > > > +err_kfree_ref: > > > > + kfree(ref); > > > > +err_put_tee: > > > > + teedev_ctx_put(ctx); > > > > + tee_device_put(ctx->teedev); > > > > + > > > > + return ERR_PTR(rc); > > > > +} > > > > +EXPORT_SYMBOL_GPL(tee_shm_register_fd); > > > > + > > > > /** > > > > * tee_shm_alloc_priv_buf() - Allocate shared memory for a privately shared > > > > * kernel buffer > > > > diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h > > > > index 22e03d897dc3..f17710196c4c 100644 > > > > --- a/include/linux/tee_core.h > > > > +++ b/include/linux/tee_core.h > > > > @@ -28,6 +28,7 @@ > > > > #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ > > > > #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ > > > > #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ > > > > +#define TEE_SHM_DMA_BUF BIT(4) /* Memory with dma-buf handle */ > > > > > > > > #define TEE_DEVICE_FLAG_REGISTERED 0x1 > > > > #define TEE_MAX_DEV_NAME_LEN 32 > > > > diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h > > > > index a54c203000ed..824f1251de60 100644 > > > > --- a/include/linux/tee_drv.h > > > > +++ b/include/linux/tee_drv.h > > > > @@ -116,6 +116,16 @@ struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size); > > > > struct tee_shm *tee_shm_register_kernel_buf(struct tee_context *ctx, > > > > void *addr, size_t length); > > > > > > > > +/** > > > > + * tee_shm_register_fd() - Register shared memory from file descriptor > > > > + * > > > > + * @ctx: Context that allocates the shared memory > > > > + * @fd: Shared memory file descriptor reference > > > > + * > > > > + * @returns a pointer to 'struct tee_shm' on success, and ERR_PTR on failure > > > > + */ > > > > +struct tee_shm *tee_shm_register_fd(struct tee_context *ctx, int fd); > > > > + > > > > /** > > > > * tee_shm_free() - Free shared memory > > > > * @shm: Handle to shared memory to free > > > > diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h > > > > index d0430bee8292..d843cf980d98 100644 > > > > --- a/include/uapi/linux/tee.h > > > > +++ b/include/uapi/linux/tee.h > > > > @@ -378,6 +378,37 @@ struct tee_ioctl_shm_register_data { > > > > __s32 id; > > > > }; > > > > > > > > +/** > > > > + * struct tee_ioctl_shm_register_fd_data - Shared memory registering argument > > > > + * @fd: [in] File descriptor identifying dmabuf reference > > > > + * @size: [out] Size of referenced memory > > > > + * @flags: [in] Flags to/from allocation. > > > > + * @id: [out] Identifier of the shared memory > > > > + * > > > > + * The flags field should currently be zero as input. Updated by the call > > > > + * with actual flags as defined by TEE_IOCTL_SHM_* above. > > > > + * This structure is used as argument for TEE_IOC_SHM_REGISTER_FD below. > > > > + */ > > > > +struct tee_ioctl_shm_register_fd_data { > > > > + __s64 fd; > > > > + __u64 size; > > > > + __u32 flags; > > > > + __s32 id; > > > > +}; > > > > + > > > > +/** > > > > + * TEE_IOC_SHM_REGISTER_FD - register a shared memory from a file descriptor > > > > + * > > > > + * Returns a file descriptor on success or < 0 on failure > > > > + * > > > > + * The returned file descriptor refers to the shared memory object in the > > > > + * kernel. The supplied file deccriptor can be closed if it's not needed > > > > + * for other purposes. The shared memory is freed when the descriptor is > > > > + * closed. > > > > + */ > > > > +#define TEE_IOC_SHM_REGISTER_FD _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 8, \ > > > > + struct tee_ioctl_shm_register_fd_data) > > > > + > > > > /** > > > > * TEE_IOC_SHM_REGISTER - Register shared memory argument > > > > * > > > > -- > > > > 2.43.0 > > > >

5 months, 2 weeks

1
0
0 0

Re: [PATCH v10 6/9] tee: add tee_shm_alloc_dma_mem()

by Jens Wiklander

On Thu, Jul 3, 2025 at 8:28 AM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > On Wed, Jun 18, 2025 at 09:03:00AM +0200, Jens Wiklander wrote: > > On Tue, Jun 17, 2025 at 1:32 PM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > > > > > On Tue, Jun 10, 2025 at 03:13:50PM +0200, Jens Wiklander wrote: > > > > Add tee_shm_alloc_dma_mem() to allocate DMA memory. The memory is > > > > represented by a tee_shm object using the new flag TEE_SHM_DMA_MEM to > > > > identify it as DMA memory. The allocated memory will later be lent to > > > > the TEE to be used as protected memory. > > > > > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > > > --- > > > > drivers/tee/tee_shm.c | 85 +++++++++++++++++++++++++++++++++++++++- > > > > include/linux/tee_core.h | 5 +++ > > > > 2 files changed, 88 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c > > > > index e63095e84644..60b0f3932cee 100644 > > > > --- a/drivers/tee/tee_shm.c > > > > +++ b/drivers/tee/tee_shm.c > > > > @@ -5,6 +5,8 @@ > > > > #include <linux/anon_inodes.h> > > > > #include <linux/device.h> > > > > #include <linux/dma-buf.h> > > > > +#include <linux/dma-mapping.h> > > > > +#include <linux/highmem.h> > > > > #include <linux/idr.h> > > > > #include <linux/io.h> > > > > #include <linux/mm.h> > > > > @@ -13,9 +15,14 @@ > > > > #include <linux/tee_core.h> > > > > #include <linux/uaccess.h> > > > > #include <linux/uio.h> > > > > -#include <linux/highmem.h> > > > > #include "tee_private.h" > > > > > > > > +struct tee_shm_dma_mem { > > > > + struct tee_shm shm; > > > > + dma_addr_t dma_addr; > > > > + struct page *page; > > > > +}; > > > > + > > > > static void shm_put_kernel_pages(struct page **pages, size_t page_count) > > > > { > > > > size_t n; > > > > @@ -48,7 +55,16 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > > > { > > > > void *p = shm; > > > > > > > > - if (shm->flags & TEE_SHM_DMA_BUF) { > > > > + if (shm->flags & TEE_SHM_DMA_MEM) { > > > > +#if IS_ENABLED(CONFIG_TEE_DMABUF_HEAPS) > > > > > > nit: this config check can be merged into the above if check. > > > > No, because dma_free_pages() is only defined if > > CONFIG_TEE_DMABUF_HEAPS is enabled. > > It looks like you misunderstood my above comment, I rather meant: > > if (IS_ENABLED(CONFIG_TEE_DMABUF_HEAPS) && > (shm->flags & TEE_SHM_DMA_MEM)) That depends on the compiler optimizing away the call to dma_free_pages() if CONFIG_TEE_DMABUF_HEAPS isn't defined. This is normally the case, but if you compile for debugging, you may get unresolved symbols. Cheers, Jens > > -Sumit > > > > > > > > > > + struct tee_shm_dma_mem *dma_mem; > > > > + > > > > + dma_mem = container_of(shm, struct tee_shm_dma_mem, shm); > > > > + p = dma_mem; > > > > + dma_free_pages(&teedev->dev, shm->size, dma_mem->page, > > > > + dma_mem->dma_addr, DMA_BIDIRECTIONAL); > > > > +#endif > > > > + } else if (shm->flags & TEE_SHM_DMA_BUF) { > > > > > > Do we need a similar config check for this flag too? > > > > No, because DMA_SHARED_BUFFER is selected, so the dma_buf functions are defined. > > > > Cheers, > > Jens > > > > > > > > > > With these addressed, feel free to add: > > > > > > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com> > > > > > > -Sumit > > > > > > > struct tee_shm_dmabuf_ref *ref; > > > > > > > > ref = container_of(shm, struct tee_shm_dmabuf_ref, shm); > > > > @@ -303,6 +319,71 @@ struct tee_shm *tee_shm_alloc_priv_buf(struct tee_context *ctx, size_t size) > > > > } > > > > EXPORT_SYMBOL_GPL(tee_shm_alloc_priv_buf); > > > > > > > > +#if IS_ENABLED(CONFIG_TEE_DMABUF_HEAPS) > > > > +/** > > > > + * tee_shm_alloc_dma_mem() - Allocate DMA memory as shared memory object > > > > + * @ctx: Context that allocates the shared memory > > > > + * @page_count: Number of pages > > > > + * > > > > + * The allocated memory is expected to be lent (made inaccessible to the > > > > + * kernel) to the TEE while it's used and returned (accessible to the > > > > + * kernel again) before it's freed. > > > > + * > > > > + * This function should normally only be used internally in the TEE > > > > + * drivers. > > > > + * > > > > + * @returns a pointer to 'struct tee_shm' > > > > + */ > > > > +struct tee_shm *tee_shm_alloc_dma_mem(struct tee_context *ctx, > > > > + size_t page_count) > > > > +{ > > > > + struct tee_device *teedev = ctx->teedev; > > > > + struct tee_shm_dma_mem *dma_mem; > > > > + dma_addr_t dma_addr; > > > > + struct page *page; > > > > + > > > > + if (!tee_device_get(teedev)) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > + page = dma_alloc_pages(&teedev->dev, page_count * PAGE_SIZE, > > > > + &dma_addr, DMA_BIDIRECTIONAL, GFP_KERNEL); > > > > + if (!page) > > > > + goto err_put_teedev; > > > > + > > > > + dma_mem = kzalloc(sizeof(*dma_mem), GFP_KERNEL); > > > > + if (!dma_mem) > > > > + goto err_free_pages; > > > > + > > > > + refcount_set(&dma_mem->shm.refcount, 1); > > > > + dma_mem->shm.ctx = ctx; > > > > + dma_mem->shm.paddr = page_to_phys(page); > > > > + dma_mem->dma_addr = dma_addr; > > > > + dma_mem->page = page; > > > > + dma_mem->shm.size = page_count * PAGE_SIZE; > > > > + dma_mem->shm.flags = TEE_SHM_DMA_MEM; > > > > + > > > > + teedev_ctx_get(ctx); > > > > + > > > > + return &dma_mem->shm; > > > > + > > > > +err_free_pages: > > > > + dma_free_pages(&teedev->dev, page_count * PAGE_SIZE, page, dma_addr, > > > > + DMA_BIDIRECTIONAL); > > > > +err_put_teedev: > > > > + tee_device_put(teedev); > > > > + > > > > + return ERR_PTR(-ENOMEM); > > > > +} > > > > +EXPORT_SYMBOL_GPL(tee_shm_alloc_dma_mem); > > > > +#else > > > > +struct tee_shm *tee_shm_alloc_dma_mem(struct tee_context *ctx, > > > > + size_t page_count) > > > > +{ > > > > + return ERR_PTR(-EINVAL); > > > > +} > > > > +EXPORT_SYMBOL_GPL(tee_shm_alloc_dma_mem); > > > > +#endif > > > > + > > > > int tee_dyn_shm_alloc_helper(struct tee_shm *shm, size_t size, size_t align, > > > > int (*shm_register)(struct tee_context *ctx, > > > > struct tee_shm *shm, > > > > diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h > > > > index f17710196c4c..e46a53e753af 100644 > > > > --- a/include/linux/tee_core.h > > > > +++ b/include/linux/tee_core.h > > > > @@ -29,6 +29,8 @@ > > > > #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ > > > > #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ > > > > #define TEE_SHM_DMA_BUF BIT(4) /* Memory with dma-buf handle */ > > > > +#define TEE_SHM_DMA_MEM BIT(5) /* Memory allocated with */ > > > > + /* dma_alloc_pages() */ > > > > > > > > #define TEE_DEVICE_FLAG_REGISTERED 0x1 > > > > #define TEE_MAX_DEV_NAME_LEN 32 > > > > @@ -310,6 +312,9 @@ void *tee_get_drvdata(struct tee_device *teedev); > > > > */ > > > > struct tee_shm *tee_shm_alloc_priv_buf(struct tee_context *ctx, size_t size); > > > > > > > > +struct tee_shm *tee_shm_alloc_dma_mem(struct tee_context *ctx, > > > > + size_t page_count); > > > > + > > > > int tee_dyn_shm_alloc_helper(struct tee_shm *shm, size_t size, size_t align, > > > > int (*shm_register)(struct tee_context *ctx, > > > > struct tee_shm *shm, > > > > -- > > > > 2.43.0 > > > >

5 months, 2 weeks

1
0
0 0

Re: [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation

by Heiko Stübner

Am Freitag, 6. Juni 2025, 08:28:23 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso: > This uses the SHMEM DRM helpers and we map right away to the CPU and NPU > sides, as all buffers are expected to be accessed from both. > > v2: > - Sync the IOMMUs for the other cores when mapping and unmapping. > > v3: > - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) > > v6: > - Use mutexes guard (Markus Elfring) > > v7: > - Assign its own IOMMU domain to each client, for isolation (Daniel > Stone and Robin Murphy) > > Reviewed-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c > new file mode 100644 > index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee > --- /dev/null > +++ b/drivers/accel/rocket/rocket_gem.c > @@ -0,0 +1,115 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu(a)tomeuvizoso.net> */ > + > +#include <drm/drm_device.h> > +#include <drm/drm_utils.h> > +#include <drm/rocket_accel.h> > +#include <linux/dma-mapping.h> > +#include <linux/iommu.h> > + > +#include "rocket_device.h" > +#include "rocket_drv.h" > +#include "rocket_gem.h" > + > +static void rocket_gem_bo_free(struct drm_gem_object *obj) > +{ > + struct rocket_device *rdev = to_rocket_device(obj->dev); > + struct rocket_gem_object *bo = to_rocket_bo(obj); > + size_t unmapped; > + > + drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1); This should probably be drm_WARN_ON(obj->dev, refcount_read(&bo->base.pages_use_count) > 1); as pages_use_count is of type refcount_t since commit 051b6646d36d ("drm/shmem-helper: Use refcount_t for pages_use_count") Heiko

5 months, 2 weeks

1
0
0 0

Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL

by Rob Herring

On Fri, Jun 6, 2025 at 1:29 AM Tomeu Vizoso <tomeu(a)tomeuvizoso.net> wrote: > > Using the DRM GPU scheduler infrastructure, with a scheduler for each > core. > > Userspace can decide for a series of tasks to be executed sequentially > in the same core, so SRAM locality can be taken advantage of. > > The job submission code was initially based on Panfrost. > > v2: > - Remove hardcoded number of cores > - Misc. style fixes (Jeffrey Hugo) > - Repack IOCTL struct (Jeffrey Hugo) > > v3: > - Adapt to a split of the register block in the DT bindings (Nicolas > Frattaroli) > - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) > - Use drm_* logging functions (Thomas Zimmermann) > - Rename reg i/o macros (Thomas Zimmermann) > - Add padding to ioctls and check for zero (Jeff Hugo) > - Improve error handling (Nicolas Frattaroli) > > v6: > - Use mutexes guard (Markus Elfring) > - Use u64_to_user_ptr (Jeff Hugo) > - Drop rocket_fence (Rob Herring) > > v7: > - Assign its own IOMMU domain to each client, for isolation (Daniel > Stone and Robin Murphy) > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- [...] > --- a/include/uapi/drm/rocket_accel.h > +++ b/include/uapi/drm/rocket_accel.h > @@ -12,8 +12,10 @@ extern "C" { > #endif > > #define DRM_ROCKET_CREATE_BO 0x00 > +#define DRM_ROCKET_SUBMIT 0x01 > > #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo) > +#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit) > > /** > * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs. > @@ -37,6 +39,68 @@ struct drm_rocket_create_bo { > __u64 offset; > }; > > +/** > + * struct drm_rocket_task - A task to be run on the NPU > + * > + * A task is the smallest unit of work that can be run on the NPU. > + */ > +struct drm_rocket_task { > + /** Input: DMA address to NPU mapping of register command buffer */ > + __u64 regcmd; > + > + /** Input: Number of commands in the register command buffer */ > + __u32 regcmd_count; > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +}; > + > +/** > + * struct drm_rocket_job - A job to be run on the NPU > + * > + * The kernel will schedule the execution of this job taking into account its > + * dependencies with other jobs. All tasks in the same job will be executed > + * sequentially on the same core, to benefit from memory residency in SRAM. > + */ > +struct drm_rocket_job { > + /** Input: Pointer to an array of struct drm_rocket_task. */ > + __u64 tasks; > + > + /** Input: Pointer to a u32 array of the BOs that are read by the job. */ > + __u64 in_bo_handles; > + > + /** Input: Pointer to a u32 array of the BOs that are written to by the job. */ > + __u64 out_bo_handles; > + > + /** Input: Number of tasks passed in. */ > + __u32 task_count; > + > + /** Input: Number of input BO handles passed in (size is that times 4). */ > + __u32 in_bo_handle_count; > + > + /** Input: Number of output BO handles passed in (size is that times 4). */ > + __u32 out_bo_handle_count; > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +}; > + > +/** > + * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU. > + * > + * The kernel will schedule the execution of these jobs in dependency order. > + */ > +struct drm_rocket_submit { > + /** Input: Pointer to an array of struct drm_rocket_job. */ > + __u64 jobs; > + > + /** Input: Number of jobs passed in. */ > + __u32 job_count; Isn't there a problem if you need to expand drm_rocket_job beyond using the 1 reserved field? You can't add to the struct because then you don't know the size here. So you have to modify drm_rocket_submit to modify drm_rocket_job. Maybe better if you plan for that now rather than later by making the size explicit. Though etnaviv at least has similar issues. Rob > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +};

5 months, 2 weeks

1
0
0 0

[PATCH v2] drm/gem: Acquire references on GEM handles for framebuffers

by Thomas Zimmermann

A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below. [ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]--- Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references. Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before. v2: - acquire the handle instead of the BO (Christian) - fix comment style (Christian) - drop the Fixes tag (Christian) - rename err_ gotos - add missing Link tag Suggested-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Anusha Srivatsa <asrivats(a)redhat.com> Cc: Christian König <christian.koenig(a)amd.com> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linaro-mm-sig(a)lists.linaro.org Cc: <stable(a)vger.kernel.org> --- drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++---- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 51 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..bc505d938b3e 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock)); + + if (obj->handle_count++ == 0) + drm_gem_object_get(obj); +} + +/** + * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles + * @obj: GEM object + * + * Acquires a reference on the GEM buffer object's handle. Required + * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked() + * to release the reference. + */ +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + guard(mutex)(&dev->object_name_lock); + + drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */ + drm_gem_object_handle_get(obj); +} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked); + /** * drm_gem_object_handle_free - release resources bound to userspace handles * @obj: GEM object to clean up. @@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/** + * drm_gem_object_handle_put_unlocked - releases reference on user-space handles + * @obj: GEM object + * + * Releases a reference on the GEM buffer object's handle. Possibly releases + * the GEM buffer object and associated dma-buf objects. + */ +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /* * Called at device or object close to release the file's @@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock)); - if (obj->handle_count++ == 0) - drm_gem_object_get(obj); + + drm_gem_object_handle_get(obj); /* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..c60d0044d036 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -100,7 +100,7 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) unsigned int i; for (i = 0; i < fb->format->num_planes; i++) - drm_gem_object_put(fb->obj[i]); + drm_gem_object_handle_put_unlocked(fb->obj[i]); drm_framebuffer_cleanup(fb); kfree(fb); @@ -183,8 +183,10 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } + drm_gem_object_handle_get_unlocked(objs[i]); + drm_gem_object_put(objs[i]); min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -194,22 +196,22 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } } ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret) - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; return 0; -err_gem_object_put: +err_gem_object_handle_put_unlocked: while (i > 0) { --i; - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); } return ret; } diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep); -- 2.50.0

5 months, 2 weeks

2
3
0 0

[PATCH] drm/gem: Acquire references on GEM handles for framebuffers

by Thomas Zimmermann

A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below. [ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]--- Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references. The Fixes tag points to commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance"), which triggers the segmentation fault. The issue has been present before. Suggested-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Fixes: 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Anusha Srivatsa <asrivats(a)redhat.com> Cc: Christian König <christian.koenig(a)amd.com> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linaro-mm-sig(a)lists.linaro.org Cc: <stable(a)vger.kernel.org> --- drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 7 +++- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 48 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..8be50b3cc9c2 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock)); + + if (obj->handle_count++ == 0) + drm_gem_object_get(obj); +} + +/** + * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles + * @obj: GEM object + * + * Acquires a reference on the GEM buffer object's handle. Required + * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked() + * to release the reference. + */ +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + guard(mutex)(&dev->object_name_lock); + + drm_WARN_ON(dev, !obj->handle_count); // first ref taken in create-tail helper + drm_gem_object_handle_get(obj); +} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked); + /** * drm_gem_object_handle_free - release resources bound to userspace handles * @obj: GEM object to clean up. @@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/** + * drm_gem_object_handle_put_unlocked - releases reference on user-space handles + * @obj: GEM object + * + * Releases a reference on the GEM buffer object's handle. Possibly releases + * the GEM buffer object and associated dma-buf objects. + */ +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /* * Called at device or object close to release the file's @@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock)); - if (obj->handle_count++ == 0) - drm_gem_object_get(obj); + + drm_gem_object_handle_get(obj); /* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..723f1d652c01 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -99,8 +99,10 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) { unsigned int i; - for (i = 0; i < fb->format->num_planes; i++) + for (i = 0; i < fb->format->num_planes; i++) { + drm_gem_object_handle_put_unlocked(fb->obj[i]); drm_gem_object_put(fb->obj[i]); + } drm_framebuffer_cleanup(fb); kfree(fb); @@ -185,6 +187,7 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, ret = -ENOENT; goto err_gem_object_put; } + drm_gem_object_handle_get_unlocked(objs[i]); min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -195,6 +198,7 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL; goto err_gem_object_put; } @@ -210,6 +214,7 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, while (i > 0) { --i; drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); } return ret; } diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep); -- 2.50.0

5 months, 2 weeks

2
1
0 0

Re: [PATCH 0/6] Add few updates to the STM32 SPI driver

by Mark Brown

On Mon, 16 Jun 2025 11:21:01 +0200, Clément Le Goffic wrote: > This series aims to improve the STM32 SPI driver in different areas. > It adds SPI_READY mode, fixes an issue raised by a kernel bot, > add the ability to use DMA-MDMA chaining for RX and deprecate an ST bindings > vendor property. > > Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/6] spi: stm32: Add SPI_READY mode to spi controller commit: e4feefa5c71912ebfcb97a3dbe2b021fd1cea9d1 [2/6] spi: stm32: Check for cfg availability in stm32_spi_probe commit: 21f1c800f6620e43f31dfd76709dbac8ebaa5a16 [3/6] dt-bindings: spi: stm32: update bindings with SPI Rx DMA-MDMA chaining commit: bd60f94a3eb4f80cb66c9687d640554fd0c579d0 [4/6] spi: stm32: use STM32 DMA with STM32 MDMA to enhance DDR use commit: d17dd2f1d8a1d919e39c6302b024f135a2f90773 [5/6] spi: stm32: deprecate `st,spi-midi-ns` property commit: 4956bf44524394211ca80aa04d0c9e1e9bb0219d [6/6] dt-bindings: spi: stm32: deprecate `st,spi-midi-ns` property commit: 9a944494c299fabf3cc781798eb7c02a0bece364 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark

5 months, 3 weeks

1
0
0 0

Re: [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: > The NPU cores have their own access to the memory bus, and this isn't > cache coherent with the CPUs. > > Add IOCTLs so userspace can mark when the caches need to be flushed, and > also when a writer job needs to be waited for before the buffer can be > accessed from the CPU. > > Initially based on the same IOCTLs from the Etnaviv driver. > > v2: > - Don't break UABI by reordering the IOCTL IDs (Jeff Hugo) > > v3: > - Check that padding fields in IOCTLs are zero (Jeff Hugo) > > v6: > - Fix conversion logic to make sure we use DMA_BIDIRECTIONAL when needed > (Lucas Stach) > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > Reviewed-by: Jeff Hugo <jeff.hugo(a)oss.qualcomm.com> > --- > drivers/accel/rocket/rocket_drv.c | 2 + > drivers/accel/rocket/rocket_gem.c | 82 +++++++++++++++++++++++++++++++++++++++ > drivers/accel/rocket/rocket_gem.h | 5 +++ > include/uapi/drm/rocket_accel.h | 37 ++++++++++++++++++ > 4 files changed, 126 insertions(+) > > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c > index 4ab78193c186dfcfc3e323f16c588e85e6a8a334..eb9284ee2511f730afe6a532225c2706ce0e2822 100644 > --- a/drivers/accel/rocket/rocket_drv.c > +++ b/drivers/accel/rocket/rocket_drv.c > @@ -62,6 +62,8 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = { > > ROCKET_IOCTL(CREATE_BO, create_bo), > ROCKET_IOCTL(SUBMIT, submit), > + ROCKET_IOCTL(PREP_BO, prep_bo), > + ROCKET_IOCTL(FINI_BO, fini_bo), > }; > > DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops); > diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c > index 61b7f970a6885aa13784daa1222611a02aa10dee..07024b6e71bf544dc7f00b008b9afb74b0c4e802 100644 > --- a/drivers/accel/rocket/rocket_gem.c > +++ b/drivers/accel/rocket/rocket_gem.c > @@ -113,3 +113,85 @@ int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file * > > return ret; > } > + > +static inline enum dma_data_direction rocket_op_to_dma_dir(u32 op) > +{ > + op &= ROCKET_PREP_READ | ROCKET_PREP_WRITE; > + > + if (op == ROCKET_PREP_READ) > + return DMA_FROM_DEVICE; > + else if (op == ROCKET_PREP_WRITE) > + return DMA_TO_DEVICE; > + else > + return DMA_BIDIRECTIONAL; > +} > + > +int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file) > +{ > + struct drm_rocket_prep_bo *args = data; > + unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns); > + struct rocket_device *rdev = to_rocket_device(dev); > + struct drm_gem_object *gem_obj; > + struct drm_gem_shmem_object *shmem_obj; > + bool write = !!(args->op & ROCKET_PREP_WRITE); > + long ret = 0; > + > + if (args->op & ~(ROCKET_PREP_READ | ROCKET_PREP_WRITE)) > + return -EINVAL; > + > + gem_obj = drm_gem_object_lookup(file, args->handle); > + if (!gem_obj) > + return -ENOENT; > + > + ret = dma_resv_wait_timeout(gem_obj->resv, dma_resv_usage_rw(write), > + true, timeout); > + if (!ret) > + ret = timeout ? -ETIMEDOUT : -EBUSY; > + > + shmem_obj = &to_rocket_bo(gem_obj)->base; > + > + for (unsigned int core = 1; core < rdev->num_cores; core++) { Huh? If you need to sync the BO memory ever, then you need to sync it for the same device it was mapped, and certainly not 0 or 2+ times depending on how may cores happen to be enabled. Please throw CONFIG_DMA_API_DEBUG at this. > + dma_sync_sgtable_for_cpu(rdev->cores[core].dev, shmem_obj->sgt, > + rocket_op_to_dma_dir(args->op)); Hmm, the intent of the API is really that the direction for sync should match the direction for map and unmap too; if it was mapped DMA_BIDIRECTIONAL then it should be synced DMA_BIDIRECTIONAL. If you have BOs which are really only used for one-directional purposes then they should be mapped as such at creation. Does anything actually prevent one thread form trying to read from a buffer while another thread is writing it, and thus the read inintuitively destroying newly-written data (and/or the write unwittingly destroying its own data in FINI_BO because last_cpu_prep_op got overwritten)? Unless there's a significant measurable benefit to trying to be clever here (of which I'm somewhjat doubtful), I would be strongly inclined to just keep things simple and straightforward. Thanks, Robin. > + } > + > + to_rocket_bo(gem_obj)->last_cpu_prep_op = args->op; > + > + drm_gem_object_put(gem_obj); > + > + return ret; > +} > + > +int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file) > +{ > + struct rocket_device *rdev = to_rocket_device(dev); > + struct drm_rocket_fini_bo *args = data; > + struct drm_gem_shmem_object *shmem_obj; > + struct rocket_gem_object *rkt_obj; > + struct drm_gem_object *gem_obj; > + > + if (args->reserved != 0) { > + drm_dbg(dev, "Reserved field in drm_rocket_fini_bo struct should be 0.\n"); > + return -EINVAL; > + } > + > + gem_obj = drm_gem_object_lookup(file, args->handle); > + if (!gem_obj) > + return -ENOENT; > + > + rkt_obj = to_rocket_bo(gem_obj); > + shmem_obj = &rkt_obj->base; > + > + WARN_ON(rkt_obj->last_cpu_prep_op == 0); > + > + for (unsigned int core = 1; core < rdev->num_cores; core++) { > + dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt, > + rocket_op_to_dma_dir(rkt_obj->last_cpu_prep_op)); > + } > + > + rkt_obj->last_cpu_prep_op = 0; > + > + drm_gem_object_put(gem_obj); > + > + return 0; > +} > diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h > index e8a4d6213fd80419be2ec8af04583a67fb1a4b75..a52a63cd78339a6150b99592ab5f94feeeb51fde 100644 > --- a/drivers/accel/rocket/rocket_gem.h > +++ b/drivers/accel/rocket/rocket_gem.h > @@ -12,12 +12,17 @@ struct rocket_gem_object { > struct iommu_domain *domain; > size_t size; > u32 offset; > + u32 last_cpu_prep_op; > }; > > struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size); > > int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file); > > +int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file); > + > +int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file); > + > static inline > struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj) > { > diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h > index cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b..b5c80dd767be56e9720b51e4a82617a425a881a1 100644 > --- a/include/uapi/drm/rocket_accel.h > +++ b/include/uapi/drm/rocket_accel.h > @@ -13,9 +13,13 @@ extern "C" { > > #define DRM_ROCKET_CREATE_BO 0x00 > #define DRM_ROCKET_SUBMIT 0x01 > +#define DRM_ROCKET_PREP_BO 0x02 > +#define DRM_ROCKET_FINI_BO 0x03 > > #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo) > #define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit) > +#define DRM_IOCTL_ROCKET_PREP_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_PREP_BO, struct drm_rocket_prep_bo) > +#define DRM_IOCTL_ROCKET_FINI_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_FINI_BO, struct drm_rocket_fini_bo) > > /** > * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs. > @@ -39,6 +43,39 @@ struct drm_rocket_create_bo { > __u64 offset; > }; > > +#define ROCKET_PREP_READ 0x01 > +#define ROCKET_PREP_WRITE 0x02 > + > +/** > + * struct drm_rocket_prep_bo - ioctl argument for starting CPU ownership of the BO. > + * > + * Takes care of waiting for any NPU jobs that might still use the NPU and performs cache > + * synchronization. > + */ > +struct drm_rocket_prep_bo { > + /** Input: GEM handle of the buffer object. */ > + __u32 handle; > + > + /** Input: mask of ROCKET_PREP_x, direction of the access. */ > + __u32 op; > + > + /** Input: Amount of time to wait for NPU jobs. */ > + __s64 timeout_ns; > +}; > + > +/** > + * struct drm_rocket_fini_bo - ioctl argument for finishing CPU ownership of the BO. > + * > + * Synchronize caches for NPU access. > + */ > +struct drm_rocket_fini_bo { > + /** Input: GEM handle of the buffer object. */ > + __u32 handle; > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +}; > + > /** > * struct drm_rocket_task - A task to be run on the NPU > * >

5 months, 3 weeks

1
0
0 0

Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: [...] > diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h > index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644 > --- a/drivers/accel/rocket/rocket_device.h > +++ b/drivers/accel/rocket/rocket_device.h > @@ -13,6 +13,8 @@ > struct rocket_device { > struct drm_device ddev; > > + struct mutex sched_lock; > + > struct mutex iommu_lock; Just realised I missed this in the last patch, but iommu_lock appears to be completely unnecessary now. > struct rocket_core *cores; [...] > +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job) > +{ > + struct rocket_task *task; > + bool task_pp_en = 1; > + bool task_count = 1; > + > + /* GO ! */ > + > + /* Don't queue the job if a reset is in progress */ > + if (atomic_read(&core->reset.pending)) > + return; > + > + task = &job->tasks[job->next_task_idx]; > + job->next_task_idx++; > + > + rocket_pc_writel(core, BASE_ADDRESS, 0x1); > + > + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index); > + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index); Those really look like bitfield operations rather than actual arithmetic to me. > + > + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd); I don't see how regcmd is created (I guess that's in userspace?), but given that it's explicitly u64 all the way through - and especially since you claim to support 40-bit DMA addresses - it definitely seems suspicious that the upper 32 bits never seem to be consumed anywhere :/ > + rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1); > + > + rocket_pc_writel(core, INTERRUPT_MASK, PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1); > + rocket_pc_writel(core, INTERRUPT_CLEAR, PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1); > + > + rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count); > + > + rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0); > + > + rocket_pc_writel(core, OPERATION_ENABLE, 0x1); > + > + dev_dbg(core->dev, "Submitted regcmd at 0x%llx to core %d", task->regcmd, core->index); > +} [...] > +static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job) > +{ > + struct rocket_job *job = to_rocket_job(sched_job); > + struct rocket_device *rdev = job->rdev; > + struct rocket_core *core = sched_to_core(rdev, sched_job->sched); > + struct dma_fence *fence = NULL; > + int ret; > + > + if (unlikely(job->base.s_fence->finished.error)) > + return NULL; > + > + /* > + * Nothing to execute: can happen if the job has finished while > + * we were resetting the GPU. GPU? (Similarly in various other comments/prints) > + */ > + if (job->next_task_idx == job->task_count) > + return NULL; > + > + fence = rocket_fence_create(core); > + if (IS_ERR(fence)) > + return fence; > + > + if (job->done_fence) > + dma_fence_put(job->done_fence); > + job->done_fence = dma_fence_get(fence); > + > + ret = pm_runtime_get_sync(core->dev); > + if (ret < 0) > + return fence; > + > + ret = iommu_attach_group(job->domain, iommu_group_get(core->dev)); I don't see iommu_group_put() anywhere, so you're leaking refcounts all over. > + if (ret < 0) > + return fence; > + > + scoped_guard(spinlock, &core->job_lock) { > + core->in_flight_job = job; > + rocket_job_hw_submit(core, job); > + } > + > + return fence; > +} [...] > +static void rocket_job_handle_irq(struct rocket_core *core) > +{ > + u32 status, raw_status; > + > + pm_runtime_mark_last_busy(core->dev); > + > + status = rocket_pc_readl(core, INTERRUPT_STATUS); > + raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); > + > + rocket_pc_writel(core, OPERATION_ENABLE, 0x0); > + rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff); What was the point of reading the status registers if we're just going to blindly clear every possible condition anyway? > + scoped_guard(spinlock, &core->job_lock) > + if (core->in_flight_job) > + rocket_job_handle_done(core, core->in_flight_job); But then is it really OK to just start the next task regardless of whether the current task was reporting successful completion or an error? > +} > + > +static void > +rocket_reset(struct rocket_core *core, struct drm_sched_job *bad) > +{ > + bool cookie; > + > + if (!atomic_read(&core->reset.pending)) > + return; > + > + /* > + * Stop the scheduler. > + * > + * FIXME: We temporarily get out of the dma_fence_signalling section > + * because the cleanup path generate lockdep splats when taking locks > + * to release job resources. We should rework the code to follow this > + * pattern: > + * > + * try_lock > + * if (locked) > + * release > + * else > + * schedule_work_to_release_later > + */ > + drm_sched_stop(&core->sched, bad); > + > + cookie = dma_fence_begin_signalling(); > + > + if (bad) > + drm_sched_increase_karma(bad); > + > + /* > + * Mask job interrupts and synchronize to make sure we won't be > + * interrupted during our reset. > + */ > + rocket_pc_writel(core, INTERRUPT_MASK, 0x0); > + synchronize_irq(core->irq); ...except it's a shared IRQ, so it can still merrily fire at any time. > + > + /* Handle the remaining interrupts before we reset. */ > + rocket_job_handle_irq(core); > + > + /* > + * Remaining interrupts have been handled, but we might still have > + * stuck jobs. Let's make sure the PM counters stay balanced by > + * manually calling pm_runtime_put_noidle() and > + * rocket_devfreq_record_idle() for each stuck job. > + * Let's also make sure the cycle counting register's refcnt is > + * kept balanced to prevent it from running forever Comments that don't match the code are more confusing than helpful :/ > + */ > + scoped_guard(spinlock, &core->job_lock) { > + if (core->in_flight_job) > + pm_runtime_put_noidle(core->dev); > + > + core->in_flight_job = NULL; > + } > + > + /* Proceed with reset now. */ > + pm_runtime_force_suspend(core->dev); > + pm_runtime_force_resume(core->dev); Can you guarantee that actually resets the hardware if something else is holding the power domain open or RPM is disabled? I'm not familiar with the details of drm_sched, but if there are other jobs queued behind the stuck one would it even pass the rocket_job_is_idle() check for suspend to succeed anyway? Not to mention that you have an actual reset control in the DT binding, which isn't even optional... :/ > + /* GPU has been reset, we can clear the reset pending bit. */ > + atomic_set(&core->reset.pending, 0); > + > + /* > + * Now resubmit jobs that were previously queued but didn't have a > + * chance to finish. > + * FIXME: We temporarily get out of the DMA fence signalling section > + * while resubmitting jobs because the job submission logic will > + * allocate memory with the GFP_KERNEL flag which can trigger memory > + * reclaim and exposes a lock ordering issue. > + */ > + dma_fence_end_signalling(cookie); > + drm_sched_resubmit_jobs(&core->sched); Since I happened to look, this says it's deprecated? > + cookie = dma_fence_begin_signalling(); > + > + /* Restart the scheduler */ > + drm_sched_start(&core->sched, 0); > + > + dma_fence_end_signalling(cookie); > +} > + > +static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job) > +{ > + struct rocket_job *job = to_rocket_job(sched_job); > + struct rocket_device *rdev = job->rdev; > + struct rocket_core *core = sched_to_core(rdev, sched_job->sched); > + > + /* > + * If the GPU managed to complete this jobs fence, the timeout is > + * spurious. Bail out. > + */ > + if (dma_fence_is_signaled(job->done_fence)) > + return DRM_GPU_SCHED_STAT_NOMINAL; Do we really need the same return condition twice? What if the IRQ fires immediately after we've made this check, and is handled without delay such that sychronize_irq() effectively still does nothing? Either way we've taken longer than the timeout value to observe the job completing successfully, and either that's significant and worth warning about or it's not - I don't see any point in trying to (inaccurately) nitpick *why* it might have happened. > + /* > + * Rocket IRQ handler may take a long time to process an interrupt > + * if there is another IRQ handler hogging the processing. > + * For example, the HDMI encoder driver might be stuck in the IRQ > + * handler for a significant time in a case of bad cable connection. What have HDMI cables got to do with anything here? Yes, in general IRQ latency can be high, since CPUs can have IRQs masked and/or be taking higher-priority interrupts for any number of reasons. I don't see how an oddly-specific example (of apparently poor driver design, to boot) is useful. > + * In order to catch such cases and not report spurious rocket > + * job timeouts, synchronize the IRQ handler and re-check the fence > + * status. > + */ > + synchronize_irq(core->irq); > + > + if (dma_fence_is_signaled(job->done_fence)) { > + dev_warn(core->dev, "unexpectedly high interrupt latency\n"); > + return DRM_GPU_SCHED_STAT_NOMINAL; > + } > + > + dev_err(core->dev, "gpu sched timeout"); > + > + atomic_set(&core->reset.pending, 1); > + rocket_reset(core, sched_job); > + iommu_detach_group(NULL, iommu_group_get(core->dev)); > + > + return DRM_GPU_SCHED_STAT_NOMINAL; > +} > + > +static void rocket_reset_work(struct work_struct *work) > +{ > + struct rocket_core *core; > + > + core = container_of(work, struct rocket_core, reset.work); > + rocket_reset(core, NULL); > +} > + > +static const struct drm_sched_backend_ops rocket_sched_ops = { > + .run_job = rocket_job_run, > + .timedout_job = rocket_job_timedout, > + .free_job = rocket_job_free > +}; > + > +static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data) > +{ > + struct rocket_core *core = data; > + > + rocket_job_handle_irq(core); > + > + return IRQ_HANDLED; > +} > + > +static irqreturn_t rocket_job_irq_handler(int irq, void *data) > +{ > + struct rocket_core *core = data; > + u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); Given that this can be a shared IRQ as above, it would be a good idea to take care to avoid register accesses while suspended. Especially if you're trying to utilise suspend to reset a failing job that may well be throwing IOMMU faults. > + > + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > + > + if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 || > + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) > + return IRQ_NONE; > + > + rocket_pc_writel(core, INTERRUPT_MASK, 0x0); > + > + return IRQ_WAKE_THREAD; > +} > + > +int rocket_job_init(struct rocket_core *core) > +{ > + struct drm_sched_init_args args = { > + .ops = &rocket_sched_ops, > + .num_rqs = DRM_SCHED_PRIORITY_COUNT, > + .credit_limit = 1, Ah, does this mean that all the stuff about queued jobs was in fact all nonsense anyway? > + .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS), > + .name = dev_name(core->dev), > + .dev = core->dev, > + }; > + int ret; > + > + INIT_WORK(&core->reset.work, rocket_reset_work); > + spin_lock_init(&core->job_lock); > + > + core->irq = platform_get_irq(to_platform_device(core->dev), 0); > + if (core->irq < 0) > + return core->irq; > + > + ret = devm_request_threaded_irq(core->dev, core->irq, > + rocket_job_irq_handler, > + rocket_job_irq_handler_thread, > + IRQF_SHARED, KBUILD_MODNAME "-job", Is it really a "job" interrupt though? The binding and the register definitions suggest it's just a general status interrupt for the core. Furthermore since we expect to have multiple cores, being able to more easily identify and attribute per-core IRQ activity seems more useful for debugging than copy-pasting from something really rather different which also expects to be the only one of its kind on the system. Thanks, Robin. > + core); > + if (ret) { > + dev_err(core->dev, "failed to request job irq"); > + return ret; > + }

5 months, 3 weeks

1
0
0 0

Re: [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: > Add the bindings for the Neural Processing Unit IP from Rockchip. > > v2: > - Adapt to new node structure (one node per core, each with its own > IOMMU) > - Several misc. fixes from Sebastian Reichel > > v3: > - Split register block in its constituent subblocks, and only require > the ones that the kernel would ever use (Nicolas Frattaroli) > - Group supplies (Rob Herring) > - Explain the way in which the top core is special (Rob Herring) > > v4: > - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski) > - Remove unneeded items: (Krzysztof Kozlowski) > - Fix use of minItems/maxItems (Krzysztof Kozlowski) > - Add reg-names to list of required properties (Krzysztof Kozlowski) > - Fix example (Krzysztof Kozlowski) > > v5: > - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski) > - Streamline compatible property (Krzysztof Kozlowski) > > v6: > - Remove mention to NVDLA, as the hardware is only incidentally related > (Kever Yang) > - Mark pclk and npu clocks as required by all clocks (Rob Herring) > > v7: > - Remove allOf section, not needed now that all nodes require 4 clocks > (Heiko Stübner) > > Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> > --- > .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 118 +++++++++++++++++++++ > 1 file changed, 118 insertions(+) > > diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml > new file mode 100644 > index 0000000000000000000000000000000000000000..0588c085a723a34f4fa30a9680ea948d960b092f > --- /dev/null > +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml > @@ -0,0 +1,118 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Neural Processing Unit IP from Rockchip > + > +maintainers: > + - Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > + > +description: > + Rockchip IP for accelerating inference of neural networks. > + > + There is to be a node per each core in the NPU. In Rockchip's design there > + will be one core that is special because it is able to redistribute work to > + the other cores by forwarding register writes and sharing data. This special > + core is called the top core and should have the compatible string that > + corresponds to top cores. Say a future SoC, for scaling reasons, puts down two or more whole NPUs rather than just increasing the number of sub-cores in one? How is a DT consumer then going to know which "cores" are associated with which "top cores"? I think at the very least they want phandles in one direction or the other, but if there is a real functional hierarchy then I'd be strongly tempted to have the "core" nodes as children of their "top core", particularly since "forwarding register writes" sounds absolutely like something which could justify being represented as a "bus" in the DT sense. Thanks, Robin. > + > +properties: > + $nodename: > + pattern: '^npu@[a-f0-9]+$' > + > + compatible: > + enum: > + - rockchip,rk3588-rknn-core-top > + - rockchip,rk3588-rknn-core > + > + reg: > + maxItems: 3 > + > + reg-names: > + items: > + - const: pc > + - const: cna > + - const: core > + > + clocks: > + maxItems: 4 > + > + clock-names: > + items: > + - const: aclk > + - const: hclk > + - const: npu > + - const: pclk > + > + interrupts: > + maxItems: 1 > + > + iommus: > + maxItems: 1 > + > + npu-supply: true > + > + power-domains: > + maxItems: 1 > + > + resets: > + maxItems: 2 > + > + reset-names: > + items: > + - const: srst_a > + - const: srst_h > + > + sram-supply: true > + > +required: > + - compatible > + - reg > + - reg-names > + - clocks > + - clock-names > + - interrupts > + - iommus > + - power-domains > + - resets > + - reset-names > + - npu-supply > + - sram-supply > + > +additionalProperties: false > + > +examples: > + - | > + #include <dt-bindings/clock/rockchip,rk3588-cru.h> > + #include <dt-bindings/interrupt-controller/irq.h> > + #include <dt-bindings/interrupt-controller/arm-gic.h> > + #include <dt-bindings/power/rk3588-power.h> > + #include <dt-bindings/reset/rockchip,rk3588-cru.h> > + > + bus { > + #address-cells = <2>; > + #size-cells = <2>; > + > + npu@fdab0000 { > + compatible = "rockchip,rk3588-rknn-core-top"; > + reg = <0x0 0xfdab0000 0x0 0x1000>, > + <0x0 0xfdab1000 0x0 0x1000>, > + <0x0 0xfdab3000 0x0 0x1000>; > + reg-names = "pc", "cna", "core"; > + assigned-clocks = <&scmi_clk SCMI_CLK_NPU>; > + assigned-clock-rates = <200000000>; > + clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>, > + <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>; > + clock-names = "aclk", "hclk", "npu", "pclk"; > + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>; > + iommus = <&rknn_mmu_top>; > + npu-supply = <&vdd_npu_s0>; > + power-domains = <&power RK3588_PD_NPUTOP>; > + resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>; > + reset-names = "srst_a", "srst_h"; > + sram-supply = <&vdd_npu_mem_s0>; > + }; > + }; > +... >

5 months, 3 weeks

1
0
0 0

Re: [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: > This uses the SHMEM DRM helpers and we map right away to the CPU and NPU > sides, as all buffers are expected to be accessed from both. > > v2: > - Sync the IOMMUs for the other cores when mapping and unmapping. > > v3: > - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) > > v6: > - Use mutexes guard (Markus Elfring) > > v7: > - Assign its own IOMMU domain to each client, for isolation (Daniel > Stone and Robin Murphy) > > Reviewed-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > drivers/accel/rocket/Makefile | 3 +- > drivers/accel/rocket/rocket_device.c | 4 ++ > drivers/accel/rocket/rocket_device.h | 2 + > drivers/accel/rocket/rocket_drv.c | 7 ++- > drivers/accel/rocket/rocket_gem.c | 115 +++++++++++++++++++++++++++++++++++ > drivers/accel/rocket/rocket_gem.h | 27 ++++++++ > include/uapi/drm/rocket_accel.h | 44 ++++++++++++++ > 7 files changed, 200 insertions(+), 2 deletions(-) > > diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile > index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644 > --- a/drivers/accel/rocket/Makefile > +++ b/drivers/accel/rocket/Makefile > @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o > rocket-y := \ > rocket_core.o \ > rocket_device.o \ > - rocket_drv.o > + rocket_drv.o \ > + rocket_gem.o > diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c > index a05c103e117e3eaa6439884b7acb6e3483296edb..5e559104741af22c528914c96e44558323ab6c89 100644 > --- a/drivers/accel/rocket/rocket_device.c > +++ b/drivers/accel/rocket/rocket_device.c > @@ -4,6 +4,7 @@ > #include <linux/array_size.h> > #include <linux/clk.h> > #include <linux/dev_printk.h> > +#include <linux/mutex.h> > > #include "rocket_device.h" > > @@ -16,10 +17,13 @@ int rocket_device_init(struct rocket_device *rdev) > if (err) > return err; > > + mutex_init(&rdev->iommu_lock); > + > return 0; > } > > void rocket_device_fini(struct rocket_device *rdev) > { > + mutex_destroy(&rdev->iommu_lock); > rocket_core_fini(&rdev->cores[0]); > } > diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h > index b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d..10acfe8534f00a7985d40a93f4b2f7f69d43caee 100644 > --- a/drivers/accel/rocket/rocket_device.h > +++ b/drivers/accel/rocket/rocket_device.h > @@ -13,6 +13,8 @@ > struct rocket_device { > struct drm_device ddev; > > + struct mutex iommu_lock; > + > struct rocket_core *cores; > unsigned int num_cores; > }; > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c > index b38a5c6264cb4e74d5e381adaeba1426e576fa56..2b8a88db20c408f313f4f4fe36b051c9d5e4829b 100644 > --- a/drivers/accel/rocket/rocket_drv.c > +++ b/drivers/accel/rocket/rocket_drv.c > @@ -6,6 +6,7 @@ > #include <drm/drm_gem.h> > #include <drm/drm_ioctl.h> > #include <drm/drm_of.h> > +#include <drm/rocket_accel.h> > #include <linux/array_size.h> > #include <linux/clk.h> > #include <linux/component.h> > @@ -16,6 +17,7 @@ > #include <linux/pm_runtime.h> > > #include "rocket_drv.h" > +#include "rocket_gem.h" > > static int > rocket_open(struct drm_device *dev, struct drm_file *file) > @@ -46,6 +48,8 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file) > static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = { > #define ROCKET_IOCTL(n, func) \ > DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0) > + > + ROCKET_IOCTL(CREATE_BO, create_bo), > }; > > DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops); > @@ -55,9 +59,10 @@ DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops); > * - 1.0 - initial interface > */ > static const struct drm_driver rocket_drm_driver = { > - .driver_features = DRIVER_COMPUTE_ACCEL, > + .driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM, > .open = rocket_open, > .postclose = rocket_postclose, > + .gem_create_object = rocket_gem_create_object, > .ioctls = rocket_drm_driver_ioctls, > .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls), > .fops = &rocket_accel_driver_fops, > diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c > new file mode 100644 > index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee > --- /dev/null > +++ b/drivers/accel/rocket/rocket_gem.c > @@ -0,0 +1,115 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu(a)tomeuvizoso.net> */ > + > +#include <drm/drm_device.h> > +#include <drm/drm_utils.h> > +#include <drm/rocket_accel.h> > +#include <linux/dma-mapping.h> > +#include <linux/iommu.h> > + > +#include "rocket_device.h" > +#include "rocket_drv.h" > +#include "rocket_gem.h" > + > +static void rocket_gem_bo_free(struct drm_gem_object *obj) > +{ > + struct rocket_device *rdev = to_rocket_device(obj->dev); > + struct rocket_gem_object *bo = to_rocket_bo(obj); > + size_t unmapped; > + > + drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1); > + > + guard(mutex)(&rdev->iommu_lock); > + > + unmapped = iommu_unmap(bo->domain, bo->base.sgt->sgl->dma_address, bo->size); > + drm_WARN_ON(obj->dev, unmapped != bo->size); > + > + /* This will unmap the pages from the IOMMU linked to core 0 */ This means "DMA-unmap the pages", right? If things have been done correctly then the iommu_unmap() above will already have removed the actual translation all cores' IOMMUs were using. > + drm_gem_shmem_free(&bo->base); > +} > + > +static const struct drm_gem_object_funcs rocket_gem_funcs = { > + .free = rocket_gem_bo_free, > + .print_info = drm_gem_shmem_object_print_info, > + .pin = drm_gem_shmem_object_pin, > + .unpin = drm_gem_shmem_object_unpin, > + .get_sg_table = drm_gem_shmem_object_get_sg_table, > + .vmap = drm_gem_shmem_object_vmap, > + .vunmap = drm_gem_shmem_object_vunmap, > + .mmap = drm_gem_shmem_object_mmap, > + .vm_ops = &drm_gem_shmem_vm_ops, > +}; > + > +struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size) > +{ > + struct rocket_gem_object *obj; > + > + obj = kzalloc(sizeof(*obj), GFP_KERNEL); > + if (!obj) > + return ERR_PTR(-ENOMEM); > + > + obj->base.base.funcs = &rocket_gem_funcs; > + > + return &obj->base.base; > +} > + > +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file) > +{ > + struct rocket_file_priv *rocket_priv = file->driver_priv; > + struct drm_rocket_create_bo *args = data; > + struct rocket_device *rdev = to_rocket_device(dev); > + struct drm_gem_shmem_object *shmem_obj; > + struct rocket_gem_object *rkt_obj; > + struct drm_gem_object *gem_obj; > + struct sg_table *sgt; > + int ret; > + > + shmem_obj = drm_gem_shmem_create(dev, args->size); > + if (IS_ERR(shmem_obj)) > + return PTR_ERR(shmem_obj); > + > + gem_obj = &shmem_obj->base; > + rkt_obj = to_rocket_bo(gem_obj); > + > + rkt_obj->domain = rocket_priv->domain; > + rkt_obj->size = args->size; > + rkt_obj->offset = 0; > + > + ret = drm_gem_handle_create(file, gem_obj, &args->handle); > + drm_gem_object_put(gem_obj); > + > + guard(mutex)(&rdev->iommu_lock); > + > + if (ret) > + goto err; > + > + sgt = drm_gem_shmem_get_pages_sgt(shmem_obj); > + if (IS_ERR(sgt)) { > + ret = PTR_ERR(sgt); > + goto err; > + } > + > + ret = iommu_map_sgtable(rocket_priv->domain, > + shmem_obj->sgt->sgl->dma_address, Is this expected to be a DMA address implicitly generated by the dma_map_sg() in drm_gem_shmem_get_pages_sgt()? I would strongly recommend against relying on that - at the moment it happens that iommu-dma still does complete dma_map_* operations in the unattached DMA ops domain, mostly redundantly, but I've long been meaning to optimise that so that it only performs any necessary cache maintenance on the underlying memory when the caller is already using their own IOMMU domain. At that point the returned DMA address is likely to just be the PA, and this tactic probably won't work. > + shmem_obj->sgt, > + IOMMU_READ | IOMMU_WRITE); > + if (ret < 0 || ret < args->size) { > + drm_err(dev, "failed to map buffer: size=%d request_size=%u\n", > + ret, args->size); > + ret = -ENOMEM; > + goto err; > + } > + > + /* iommu_map_sgtable might have aligned the size */ > + rkt_obj->size = ret; > + dma_sync_sgtable_for_device(dev->dev, shmem_obj->sgt, DMA_BIDIRECTIONAL); What's this for? The buffer is already in for_device state when it initially comes back from get_pages_sgt, and hasn't even been touched yet anyway. Thanks, Robin. > + args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node); > + args->dma_address = sg_dma_address(shmem_obj->sgt->sgl); > + > + return 0; > + > +err: > + drm_gem_shmem_object_free(gem_obj); > + > + return ret; > +} > diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h > new file mode 100644 > index 0000000000000000000000000000000000000000..e8a4d6213fd80419be2ec8af04583a67fb1a4b75 > --- /dev/null > +++ b/drivers/accel/rocket/rocket_gem.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu(a)tomeuvizoso.net> */ > + > +#ifndef __ROCKET_GEM_H__ > +#define __ROCKET_GEM_H__ > + > +#include <drm/drm_gem_shmem_helper.h> > + > +struct rocket_gem_object { > + struct drm_gem_shmem_object base; > + > + struct iommu_domain *domain; > + size_t size; > + u32 offset; > +}; > + > +struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size); > + > +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file); > + > +static inline > +struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj) > +{ > + return container_of(to_drm_gem_shmem_obj(obj), struct rocket_gem_object, base); > +} > + > +#endif > diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h > new file mode 100644 > index 0000000000000000000000000000000000000000..95720702b7c4413d72b89c1f0f59abb22dc8c6b3 > --- /dev/null > +++ b/include/uapi/drm/rocket_accel.h > @@ -0,0 +1,44 @@ > +/* SPDX-License-Identifier: MIT */ > +/* > + * Copyright © 2024 Tomeu Vizoso > + */ > +#ifndef __DRM_UAPI_ROCKET_ACCEL_H__ > +#define __DRM_UAPI_ROCKET_ACCEL_H__ > + > +#include "drm.h" > + > +#if defined(__cplusplus) > +extern "C" { > +#endif > + > +#define DRM_ROCKET_CREATE_BO 0x00 > + > +#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo) > + > +/** > + * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs. > + * > + */ > +struct drm_rocket_create_bo { > + /** Input: Size of the requested BO. */ > + __u32 size; > + > + /** Output: GEM handle for the BO. */ > + __u32 handle; > + > + /** > + * Output: DMA address for the BO in the NPU address space. This address > + * is private to the DRM fd and is valid for the lifetime of the GEM > + * handle. > + */ > + __u64 dma_address; > + > + /** Output: Offset into the drm node to use for subsequent mmap call. */ > + __u64 offset; > +}; > + > +#if defined(__cplusplus) > +} > +#endif > + > +#endif /* __DRM_UAPI_ROCKET_ACCEL_H__ */ >

5 months, 3 weeks

1
0
0 0

Re: [PATCH v10 7/9] optee: support protected memory allocation

by Jens Wiklander

Hi Amir, On Tue, Jun 24, 2025 at 8:54 AM Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com> wrote: > > Hi Jens, > > On 6/10/2025 11:13 PM, Jens Wiklander wrote: > > Add support in the OP-TEE backend driver for protected memory > > allocation. The support is limited to only the SMC ABI and for secure > > video buffers. > > > > OP-TEE is probed for the range of protected physical memory and a > > memory pool allocator is initialized if OP-TEE have support for such > > memory. > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > --- > > drivers/tee/optee/Kconfig | 5 +++ > > drivers/tee/optee/core.c | 10 +++++ > > drivers/tee/optee/optee_private.h | 2 + > > drivers/tee/optee/smc_abi.c | 70 ++++++++++++++++++++++++++++++- > > 4 files changed, 85 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/tee/optee/Kconfig b/drivers/tee/optee/Kconfig > > index 7bb7990d0b07..50d2051f7f20 100644 > > --- a/drivers/tee/optee/Kconfig > > +++ b/drivers/tee/optee/Kconfig > > @@ -25,3 +25,8 @@ config OPTEE_INSECURE_LOAD_IMAGE > > > > Additional documentation on kernel security risks are at > > Documentation/tee/op-tee.rst. > > + > > +config OPTEE_STATIC_PROTMEM_POOL > > + bool > > + depends on HAS_IOMEM && TEE_DMABUF_HEAPS > > + default y > > diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c > > index c75fddc83576..4b14a7ac56f9 100644 > > --- a/drivers/tee/optee/core.c > > +++ b/drivers/tee/optee/core.c > > @@ -56,6 +56,15 @@ int optee_rpmb_intf_rdev(struct notifier_block *intf, unsigned long action, > > return 0; > > } > > > > +int optee_set_dma_mask(struct optee *optee, u_int pa_width) > > +{ > > + u64 mask = DMA_BIT_MASK(min(64, pa_width)); > > + > > nit: Why not dma_coerce_mask_and_coherent() instead of bellow? Good point, I'll update in the next version. Thanks, Jens > > - Amir > > > + optee->teedev->dev.dma_mask = &optee->teedev->dev.coherent_dma_mask; > > + > > + return dma_set_mask_and_coherent(&optee->teedev->dev, mask); > > +} > > + > > static void optee_bus_scan(struct work_struct *work) > > { > > WARN_ON(optee_enumerate_devices(PTA_CMD_GET_DEVICES_SUPP)); > > @@ -181,6 +190,7 @@ void optee_remove_common(struct optee *optee) > > tee_device_unregister(optee->supp_teedev); > > tee_device_unregister(optee->teedev); > > > > + tee_device_unregister_all_dma_heaps(optee->teedev); > > tee_shm_pool_free(optee->pool); > > optee_supp_uninit(&optee->supp); > > mutex_destroy(&optee->call_queue.mutex); > > diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h > > index dc0f355ef72a..5e3c34802121 100644 > > --- a/drivers/tee/optee/optee_private.h > > +++ b/drivers/tee/optee/optee_private.h > > @@ -272,6 +272,8 @@ struct optee_call_ctx { > > > > extern struct blocking_notifier_head optee_rpmb_intf_added; > > > > +int optee_set_dma_mask(struct optee *optee, u_int pa_width); > > + > > int optee_notif_init(struct optee *optee, u_int max_key); > > void optee_notif_uninit(struct optee *optee); > > int optee_notif_wait(struct optee *optee, u_int key, u32 timeout); > > diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c > > index f0c3ac1103bb..cf106d15e64e 100644 > > --- a/drivers/tee/optee/smc_abi.c > > +++ b/drivers/tee/optee/smc_abi.c > > @@ -1584,6 +1584,68 @@ static inline int optee_load_fw(struct platform_device *pdev, > > } > > #endif > > > > +static struct tee_protmem_pool *static_protmem_pool_init(struct optee *optee) > > +{ > > +#if IS_ENABLED(CONFIG_OPTEE_STATIC_PROTMEM_POOL) > > + union { > > + struct arm_smccc_res smccc; > > + struct optee_smc_get_protmem_config_result result; > > + } res; > > + struct tee_protmem_pool *pool; > > + void *p; > > + int rc; > > + > > + optee->smc.invoke_fn(OPTEE_SMC_GET_PROTMEM_CONFIG, 0, 0, 0, 0, > > + 0, 0, 0, &res.smccc); > > + if (res.result.status != OPTEE_SMC_RETURN_OK) > > + return ERR_PTR(-EINVAL); > > + > > + rc = optee_set_dma_mask(optee, res.result.pa_width); > > + if (rc) > > + return ERR_PTR(rc); > > + > > + /* > > + * Map the memory as uncached to make sure the kernel can work with > > + * __pfn_to_page() and friends since that's needed when passing the > > + * protected DMA-buf to a device. The memory should otherwise not > > + * be touched by the kernel since it's likely to cause an external > > + * abort due to the protection status. > > + */ > > + p = devm_memremap(&optee->teedev->dev, res.result.start, > > + res.result.size, MEMREMAP_WC); > > + if (IS_ERR(p)) > > + return p; > > + > > + pool = tee_protmem_static_pool_alloc(res.result.start, res.result.size); > > + if (IS_ERR(pool)) > > + devm_memunmap(&optee->teedev->dev, p); > > + > > + return pool; > > +#else > > + return ERR_PTR(-EINVAL); > > +#endif > > +} > > + > > +static int optee_protmem_pool_init(struct optee *optee) > > +{ > > + enum tee_dma_heap_id heap_id = TEE_DMA_HEAP_SECURE_VIDEO_PLAY; > > + struct tee_protmem_pool *pool = ERR_PTR(-EINVAL); > > + int rc; > > + > > + if (!(optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_PROTMEM)) > > + return 0; > > + > > + pool = static_protmem_pool_init(optee); > > + if (IS_ERR(pool)) > > + return PTR_ERR(pool); > > + > > + rc = tee_device_register_dma_heap(optee->teedev, heap_id, pool); > > + if (rc) > > + pool->ops->destroy_pool(pool); > > + > > + return rc; > > +} > > + > > static int optee_probe(struct platform_device *pdev) > > { > > optee_invoke_fn *invoke_fn; > > @@ -1679,7 +1741,7 @@ static int optee_probe(struct platform_device *pdev) > > optee = kzalloc(sizeof(*optee), GFP_KERNEL); > > if (!optee) { > > rc = -ENOMEM; > > - goto err_free_pool; > > + goto err_free_shm_pool; > > } > > > > optee->ops = &optee_ops; > > @@ -1752,6 +1814,9 @@ static int optee_probe(struct platform_device *pdev) > > pr_info("Asynchronous notifications enabled\n"); > > } > > > > + if (optee_protmem_pool_init(optee)) > > + pr_info("Protected memory service not available\n"); > > + > > /* > > * Ensure that there are no pre-existing shm objects before enabling > > * the shm cache so that there's no chance of receiving an invalid > > @@ -1787,6 +1852,7 @@ static int optee_probe(struct platform_device *pdev) > > optee_disable_shm_cache(optee); > > optee_smc_notif_uninit_irq(optee); > > optee_unregister_devices(); > > + tee_device_unregister_all_dma_heaps(optee->teedev); > > err_notif_uninit: > > optee_notif_uninit(optee); > > err_close_ctx: > > @@ -1803,7 +1869,7 @@ static int optee_probe(struct platform_device *pdev) > > tee_device_unregister(optee->teedev); > > err_free_optee: > > kfree(optee); > > -err_free_pool: > > +err_free_shm_pool: > > tee_shm_pool_free(pool); > > if (memremaped_shm) > > memunmap(memremaped_shm); >

5 months, 3 weeks

1
0
0 0

Re: [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU

by Heiko Stuebner

Am Freitag, 6. Juni 2025, 08:28:20 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso: > This series adds a new driver for the NPU that Rockchip includes in its > newer SoCs, developed by them on the NVDLA base. > > In its current form, it supports the specific NPU in the RK3588 SoC. > > The userspace driver is part of Mesa and an initial draft can be found at: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698 > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > Nicolas Frattaroli (2): > arm64: dts: rockchip: add pd_npu label for RK3588 power domains > arm64: dts: rockchip: enable NPU on ROCK 5B > > Tomeu Vizoso (8): > accel/rocket: Add registers header > accel/rocket: Add a new driver for Rockchip's NPU > accel/rocket: Add IOCTL for BO creation > accel/rocket: Add job submission IOCTL > accel/rocket: Add IOCTLs for synchronizing memory accesses > dt-bindings: npu: rockchip,rknn: Add bindings > arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base > arm64: dts: rockchip: Enable the NPU on quartzpro64 from a handling point of view, I would expect patch 1 - 6 (driver code + dt-binding patch) to go through some driver tree but have not clue which one that is. And afterwards, I would pick up the arm64 devicetree additions patches 7 - 10 . Heiko

5 months, 4 weeks

1
0
0 0

Re: [PATCH v7 00/10] New DRM accel driver for Rockchip's RKNN NPU

by Heiko Stuebner

Am Freitag, 6. Juni 2025, 08:28:20 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso: > This series adds a new driver for the NPU that Rockchip includes in its > newer SoCs, developed by them on the NVDLA base. > > In its current form, it supports the specific NPU in the RK3588 SoC. > > The userspace driver is part of Mesa and an initial draft can be found at: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698 > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > Changes in v7: > - Actually enable process isolation by allocating its own IOMMU domain > to each DRM client. > - Link to v6: https://lore.kernel.org/r/20250604-6-10-rocket-v6-0-237ac75ddb5e@tomeuvizos… I was able to successfully run the SSDLite MobileDet model, detecting elements correctly on that "Sounds of New York" youtube video all the demos seem to be using ;-) - on a rk3588-tiger board. NPU needed like 30ms per frame or so and also detected the expected things, so Tested-by: Heiko Stuebner <heiko(a)sntech.de>

5 months, 4 weeks

1
0
0 0

Re: [PATCH v10 6/9] tee: add tee_shm_alloc_dma_mem()

by Jens Wiklander

On Tue, Jun 17, 2025 at 1:32 PM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > On Tue, Jun 10, 2025 at 03:13:50PM +0200, Jens Wiklander wrote: > > Add tee_shm_alloc_dma_mem() to allocate DMA memory. The memory is > > represented by a tee_shm object using the new flag TEE_SHM_DMA_MEM to > > identify it as DMA memory. The allocated memory will later be lent to > > the TEE to be used as protected memory. > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > --- > > drivers/tee/tee_shm.c | 85 +++++++++++++++++++++++++++++++++++++++- > > include/linux/tee_core.h | 5 +++ > > 2 files changed, 88 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c > > index e63095e84644..60b0f3932cee 100644 > > --- a/drivers/tee/tee_shm.c > > +++ b/drivers/tee/tee_shm.c > > @@ -5,6 +5,8 @@ > > #include <linux/anon_inodes.h> > > #include <linux/device.h> > > #include <linux/dma-buf.h> > > +#include <linux/dma-mapping.h> > > +#include <linux/highmem.h> > > #include <linux/idr.h> > > #include <linux/io.h> > > #include <linux/mm.h> > > @@ -13,9 +15,14 @@ > > #include <linux/tee_core.h> > > #include <linux/uaccess.h> > > #include <linux/uio.h> > > -#include <linux/highmem.h> > > #include "tee_private.h" > > > > +struct tee_shm_dma_mem { > > + struct tee_shm shm; > > + dma_addr_t dma_addr; > > + struct page *page; > > +}; > > + > > static void shm_put_kernel_pages(struct page **pages, size_t page_count) > > { > > size_t n; > > @@ -48,7 +55,16 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > { > > void *p = shm; > > > > - if (shm->flags & TEE_SHM_DMA_BUF) { > > + if (shm->flags & TEE_SHM_DMA_MEM) { > > +#if IS_ENABLED(CONFIG_TEE_DMABUF_HEAPS) > > nit: this config check can be merged into the above if check. No, because dma_free_pages() is only defined if CONFIG_TEE_DMABUF_HEAPS is enabled. > > > + struct tee_shm_dma_mem *dma_mem; > > + > > + dma_mem = container_of(shm, struct tee_shm_dma_mem, shm); > > + p = dma_mem; > > + dma_free_pages(&teedev->dev, shm->size, dma_mem->page, > > + dma_mem->dma_addr, DMA_BIDIRECTIONAL); > > +#endif > > + } else if (shm->flags & TEE_SHM_DMA_BUF) { > > Do we need a similar config check for this flag too? No, because DMA_SHARED_BUFFER is selected, so the dma_buf functions are defined. Cheers, Jens > > With these addressed, feel free to add: > > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com> > > -Sumit > > > struct tee_shm_dmabuf_ref *ref; > > > > ref = container_of(shm, struct tee_shm_dmabuf_ref, shm); > > @@ -303,6 +319,71 @@ struct tee_shm *tee_shm_alloc_priv_buf(struct tee_context *ctx, size_t size) > > } > > EXPORT_SYMBOL_GPL(tee_shm_alloc_priv_buf); > > > > +#if IS_ENABLED(CONFIG_TEE_DMABUF_HEAPS) > > +/** > > + * tee_shm_alloc_dma_mem() - Allocate DMA memory as shared memory object > > + * @ctx: Context that allocates the shared memory > > + * @page_count: Number of pages > > + * > > + * The allocated memory is expected to be lent (made inaccessible to the > > + * kernel) to the TEE while it's used and returned (accessible to the > > + * kernel again) before it's freed. > > + * > > + * This function should normally only be used internally in the TEE > > + * drivers. > > + * > > + * @returns a pointer to 'struct tee_shm' > > + */ > > +struct tee_shm *tee_shm_alloc_dma_mem(struct tee_context *ctx, > > + size_t page_count) > > +{ > > + struct tee_device *teedev = ctx->teedev; > > + struct tee_shm_dma_mem *dma_mem; > > + dma_addr_t dma_addr; > > + struct page *page; > > + > > + if (!tee_device_get(teedev)) > > + return ERR_PTR(-EINVAL); > > + > > + page = dma_alloc_pages(&teedev->dev, page_count * PAGE_SIZE, > > + &dma_addr, DMA_BIDIRECTIONAL, GFP_KERNEL); > > + if (!page) > > + goto err_put_teedev; > > + > > + dma_mem = kzalloc(sizeof(*dma_mem), GFP_KERNEL); > > + if (!dma_mem) > > + goto err_free_pages; > > + > > + refcount_set(&dma_mem->shm.refcount, 1); > > + dma_mem->shm.ctx = ctx; > > + dma_mem->shm.paddr = page_to_phys(page); > > + dma_mem->dma_addr = dma_addr; > > + dma_mem->page = page; > > + dma_mem->shm.size = page_count * PAGE_SIZE; > > + dma_mem->shm.flags = TEE_SHM_DMA_MEM; > > + > > + teedev_ctx_get(ctx); > > + > > + return &dma_mem->shm; > > + > > +err_free_pages: > > + dma_free_pages(&teedev->dev, page_count * PAGE_SIZE, page, dma_addr, > > + DMA_BIDIRECTIONAL); > > +err_put_teedev: > > + tee_device_put(teedev); > > + > > + return ERR_PTR(-ENOMEM); > > +} > > +EXPORT_SYMBOL_GPL(tee_shm_alloc_dma_mem); > > +#else > > +struct tee_shm *tee_shm_alloc_dma_mem(struct tee_context *ctx, > > + size_t page_count) > > +{ > > + return ERR_PTR(-EINVAL); > > +} > > +EXPORT_SYMBOL_GPL(tee_shm_alloc_dma_mem); > > +#endif > > + > > int tee_dyn_shm_alloc_helper(struct tee_shm *shm, size_t size, size_t align, > > int (*shm_register)(struct tee_context *ctx, > > struct tee_shm *shm, > > diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h > > index f17710196c4c..e46a53e753af 100644 > > --- a/include/linux/tee_core.h > > +++ b/include/linux/tee_core.h > > @@ -29,6 +29,8 @@ > > #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ > > #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ > > #define TEE_SHM_DMA_BUF BIT(4) /* Memory with dma-buf handle */ > > +#define TEE_SHM_DMA_MEM BIT(5) /* Memory allocated with */ > > + /* dma_alloc_pages() */ > > > > #define TEE_DEVICE_FLAG_REGISTERED 0x1 > > #define TEE_MAX_DEV_NAME_LEN 32 > > @@ -310,6 +312,9 @@ void *tee_get_drvdata(struct tee_device *teedev); > > */ > > struct tee_shm *tee_shm_alloc_priv_buf(struct tee_context *ctx, size_t size); > > > > +struct tee_shm *tee_shm_alloc_dma_mem(struct tee_context *ctx, > > + size_t page_count); > > + > > int tee_dyn_shm_alloc_helper(struct tee_shm *shm, size_t size, size_t align, > > int (*shm_register)(struct tee_context *ctx, > > struct tee_shm *shm, > > -- > > 2.43.0 > >

6 months

1
0
0 0

Re: [PATCH v10 5/9] tee: new ioctl to a register tee_shm from a dmabuf file descriptor

by Jens Wiklander

On Tue, Jun 17, 2025 at 12:48 PM Sumit Garg <sumit.garg(a)kernel.org> wrote: > > On Tue, Jun 10, 2025 at 03:13:49PM +0200, Jens Wiklander wrote: > > From: Etienne Carriere <etienne.carriere(a)foss.st.com> > > > > Add a userspace API to create a tee_shm object that refers to a dmabuf > > reference. > > > > Userspace registers the dmabuf file descriptor as in a tee_shm object. > > The registration is completed with a tee_shm returned file descriptor. > > > > Userspace is free to close the dmabuf file descriptor after it has been > > registered since all the resources are now held via the new tee_shm > > object. > > > > Closing the tee_shm file descriptor will eventually release all > > resources used by the tee_shm object when all references are released. > > > > The new IOCTL, TEE_IOC_SHM_REGISTER_FD, supports dmabuf references to > > physically contiguous memory buffers. Dmabuf references acquired from > > the TEE DMA-heap can be used as protected memory for Secure Video Path > > and such use cases. It depends on the TEE and the TEE driver if dmabuf > > references acquired by other means can be used. > > > > A new tee_shm flag is added to identify tee_shm objects built from a > > registered dmabuf, TEE_SHM_DMA_BUF. > > > > Signed-off-by: Etienne Carriere <etienne.carriere(a)foss.st.com> > > Signed-off-by: Olivier Masse <olivier.masse(a)nxp.com> > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > --- > > drivers/tee/tee_core.c | 63 +++++++++++++++++++++- > > drivers/tee/tee_private.h | 10 ++++ > > drivers/tee/tee_shm.c | 106 ++++++++++++++++++++++++++++++++++++-- > > include/linux/tee_core.h | 1 + > > include/linux/tee_drv.h | 10 ++++ > > include/uapi/linux/tee.h | 31 +++++++++++ > > 6 files changed, 217 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c > > index 5259b8223c27..0e9d9e5872a4 100644 > > --- a/drivers/tee/tee_core.c > > +++ b/drivers/tee/tee_core.c > > @@ -353,11 +353,49 @@ tee_ioctl_shm_register(struct tee_context *ctx, > > return ret; > > } > > > > +static int > > +tee_ioctl_shm_register_fd(struct tee_context *ctx, > > + struct tee_ioctl_shm_register_fd_data __user *udata) > > +{ > > + struct tee_ioctl_shm_register_fd_data data; > > + struct tee_shm *shm; > > + long ret; > > + > > + if (copy_from_user(&data, udata, sizeof(data))) > > + return -EFAULT; > > + > > + /* Currently no input flags are supported */ > > + if (data.flags) > > + return -EINVAL; > > + > > + shm = tee_shm_register_fd(ctx, data.fd); > > + if (IS_ERR(shm)) > > + return -EINVAL; > > + > > + data.id = shm->id; > > + data.flags = shm->flags; > > + data.size = shm->size; > > + > > + if (copy_to_user(udata, &data, sizeof(data))) > > + ret = -EFAULT; > > + else > > + ret = tee_shm_get_fd(shm); > > + > > + /* > > + * When user space closes the file descriptor the shared memory > > + * should be freed or if tee_shm_get_fd() failed then it will > > + * be freed immediately. > > + */ > > + tee_shm_put(shm); > > + return ret; > > +} > > + > > static int param_from_user_memref(struct tee_context *ctx, > > struct tee_param_memref *memref, > > struct tee_ioctl_param *ip) > > { > > struct tee_shm *shm; > > + size_t offs = 0; > > > > /* > > * If a NULL pointer is passed to a TA in the TEE, > > @@ -388,6 +426,26 @@ static int param_from_user_memref(struct tee_context *ctx, > > tee_shm_put(shm); > > return -EINVAL; > > } > > + > > + if (shm->flags & TEE_SHM_DMA_BUF) { > > + struct tee_shm_dmabuf_ref *ref; > > + > > + ref = container_of(shm, struct tee_shm_dmabuf_ref, shm); > > + if (ref->parent_shm) { > > + /* > > + * The shm already has one reference to > > + * ref->parent_shm so we are clear of 0. > > + * We're getting another reference since > > + * this shm will be used in the parameter > > + * list instead of the shm we got with > > + * tee_shm_get_from_id() above. > > + */ > > + refcount_inc(&ref->parent_shm->refcount); > > + tee_shm_put(shm); > > + shm = ref->parent_shm; > > + offs = ref->offset; > > + } > > + } > > } else if (ctx->cap_memref_null) { > > /* Pass NULL pointer to OP-TEE */ > > shm = NULL; > > @@ -395,7 +453,7 @@ static int param_from_user_memref(struct tee_context *ctx, > > return -EINVAL; > > } > > > > - memref->shm_offs = ip->a; > > + memref->shm_offs = ip->a + offs; > > memref->size = ip->b; > > memref->shm = shm; > > > > @@ -841,6 +899,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > > return tee_ioctl_shm_alloc(ctx, uarg); > > case TEE_IOC_SHM_REGISTER: > > return tee_ioctl_shm_register(ctx, uarg); > > + case TEE_IOC_SHM_REGISTER_FD: > > + return tee_ioctl_shm_register_fd(ctx, uarg); > > case TEE_IOC_OPEN_SESSION: > > return tee_ioctl_open_session(ctx, uarg); > > case TEE_IOC_INVOKE: > > @@ -1300,3 +1360,4 @@ MODULE_AUTHOR("Linaro"); > > MODULE_DESCRIPTION("TEE Driver"); > > MODULE_VERSION("1.0"); > > MODULE_LICENSE("GPL v2"); > > +MODULE_IMPORT_NS("DMA_BUF"); > > diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h > > index 6c6ff5d5eed2..308467705da6 100644 > > --- a/drivers/tee/tee_private.h > > +++ b/drivers/tee/tee_private.h > > @@ -13,6 +13,16 @@ > > #include <linux/mutex.h> > > #include <linux/types.h> > > > > +/* extra references appended to shm object for registered shared memory */ > > +struct tee_shm_dmabuf_ref { > > + struct tee_shm shm; > > + size_t offset; > > + struct dma_buf *dmabuf; > > + struct dma_buf_attachment *attach; > > + struct sg_table *sgt; > > + struct tee_shm *parent_shm; > > +}; > > + > > int tee_shm_get_fd(struct tee_shm *shm); > > > > bool tee_device_get(struct tee_device *teedev); > > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c > > index daf6e5cfd59a..e63095e84644 100644 > > --- a/drivers/tee/tee_shm.c > > +++ b/drivers/tee/tee_shm.c > > @@ -4,6 +4,7 @@ > > */ > > #include <linux/anon_inodes.h> > > #include <linux/device.h> > > +#include <linux/dma-buf.h> > > #include <linux/idr.h> > > #include <linux/io.h> > > #include <linux/mm.h> > > @@ -45,7 +46,21 @@ static void release_registered_pages(struct tee_shm *shm) > > > > static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > { > > - if (shm->flags & TEE_SHM_POOL) { > > + void *p = shm; > > + > > + if (shm->flags & TEE_SHM_DMA_BUF) { > > + struct tee_shm_dmabuf_ref *ref; > > + > > + ref = container_of(shm, struct tee_shm_dmabuf_ref, shm); > > + p = ref; > > + if (ref->attach) { > > + dma_buf_unmap_attachment(ref->attach, ref->sgt, > > + DMA_BIDIRECTIONAL); > > + > > + dma_buf_detach(ref->dmabuf, ref->attach); > > + } > > + dma_buf_put(ref->dmabuf); > > + } else if (shm->flags & TEE_SHM_POOL) { > > teedev->pool->ops->free(teedev->pool, shm); > > } else if (shm->flags & TEE_SHM_DYNAMIC) { > > int rc = teedev->desc->ops->shm_unregister(shm->ctx, shm); > > @@ -59,7 +74,7 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > > > teedev_ctx_put(shm->ctx); > > > > - kfree(shm); > > + kfree(p); > > > > tee_device_put(teedev); > > } > > @@ -169,7 +184,7 @@ struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size) > > * tee_client_invoke_func(). The memory allocated is later freed with a > > * call to tee_shm_free(). > > * > > - * @returns a pointer to 'struct tee_shm' > > + * @returns a pointer to 'struct tee_shm' on success, and ERR_PTR on failure > > */ > > struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size) > > { > > @@ -179,6 +194,91 @@ struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size) > > } > > EXPORT_SYMBOL_GPL(tee_shm_alloc_kernel_buf); > > > > +struct tee_shm *tee_shm_register_fd(struct tee_context *ctx, int fd) > > +{ > > + struct tee_shm_dmabuf_ref *ref; > > + int rc; > > + > > + if (!tee_device_get(ctx->teedev)) > > + return ERR_PTR(-EINVAL); > > + > > + teedev_ctx_get(ctx); > > + > > + ref = kzalloc(sizeof(*ref), GFP_KERNEL); > > + if (!ref) { > > + rc = -ENOMEM; > > + goto err_put_tee; > > + } > > + > > + refcount_set(&ref->shm.refcount, 1); > > + ref->shm.ctx = ctx; > > + ref->shm.id = -1; > > + ref->shm.flags = TEE_SHM_DMA_BUF; > > + > > + ref->dmabuf = dma_buf_get(fd); > > + if (IS_ERR(ref->dmabuf)) { > > + rc = PTR_ERR(ref->dmabuf); > > + goto err_kfree_ref; > > + } > > + > > + rc = tee_heap_update_from_dma_buf(ctx->teedev, ref->dmabuf, > > + &ref->offset, &ref->shm, > > + &ref->parent_shm); > > + if (!rc) > > + goto out; > > One odd thing I find here, why do we bail out on success case here? > Don't we need the DMA buffer attach and map APIs to be invoked on > success case here? No, because if tee_heap_update_from_dma_buf() succeeds, we know everything we need about the buffer. Note that we're returning a valid pointer below to indicate success. Cheers, Jens > > -Sumit > > > + if (rc != -EINVAL) > > + goto err_put_dmabuf; > > + > > + ref->attach = dma_buf_attach(ref->dmabuf, &ctx->teedev->dev); > > + if (IS_ERR(ref->attach)) { > > + rc = PTR_ERR(ref->attach); > > + goto err_put_dmabuf; > > + } > > + > > + ref->sgt = dma_buf_map_attachment(ref->attach, DMA_BIDIRECTIONAL); > > + if (IS_ERR(ref->sgt)) { > > + rc = PTR_ERR(ref->sgt); > > + goto err_detach; > > + } > > + > > + if (sg_nents(ref->sgt->sgl) != 1) { > > + rc = -EINVAL; > > + goto err_unmap_attachement; > > + } > > + > > + ref->shm.paddr = page_to_phys(sg_page(ref->sgt->sgl)); > > + ref->shm.size = ref->sgt->sgl->length; > > + > > +out: > > + mutex_lock(&ref->shm.ctx->teedev->mutex); > > + ref->shm.id = idr_alloc(&ref->shm.ctx->teedev->idr, &ref->shm, > > + 1, 0, GFP_KERNEL); > > + mutex_unlock(&ref->shm.ctx->teedev->mutex); > > + if (ref->shm.id < 0) { > > + rc = ref->shm.id; > > + if (ref->attach) > > + goto err_unmap_attachement; > > + goto err_put_dmabuf; > > + } > > + > > + return &ref->shm; > > + > > +err_unmap_attachement: > > + dma_buf_unmap_attachment(ref->attach, ref->sgt, DMA_BIDIRECTIONAL); > > +err_detach: > > + dma_buf_detach(ref->dmabuf, ref->attach); > > +err_put_dmabuf: > > + dma_buf_put(ref->dmabuf); > > +err_kfree_ref: > > + kfree(ref); > > +err_put_tee: > > + teedev_ctx_put(ctx); > > + tee_device_put(ctx->teedev); > > + > > + return ERR_PTR(rc); > > +} > > +EXPORT_SYMBOL_GPL(tee_shm_register_fd); > > + > > /** > > * tee_shm_alloc_priv_buf() - Allocate shared memory for a privately shared > > * kernel buffer > > diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h > > index 22e03d897dc3..f17710196c4c 100644 > > --- a/include/linux/tee_core.h > > +++ b/include/linux/tee_core.h > > @@ -28,6 +28,7 @@ > > #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ > > #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ > > #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ > > +#define TEE_SHM_DMA_BUF BIT(4) /* Memory with dma-buf handle */ > > > > #define TEE_DEVICE_FLAG_REGISTERED 0x1 > > #define TEE_MAX_DEV_NAME_LEN 32 > > diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h > > index a54c203000ed..824f1251de60 100644 > > --- a/include/linux/tee_drv.h > > +++ b/include/linux/tee_drv.h > > @@ -116,6 +116,16 @@ struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size); > > struct tee_shm *tee_shm_register_kernel_buf(struct tee_context *ctx, > > void *addr, size_t length); > > > > +/** > > + * tee_shm_register_fd() - Register shared memory from file descriptor > > + * > > + * @ctx: Context that allocates the shared memory > > + * @fd: Shared memory file descriptor reference > > + * > > + * @returns a pointer to 'struct tee_shm' on success, and ERR_PTR on failure > > + */ > > +struct tee_shm *tee_shm_register_fd(struct tee_context *ctx, int fd); > > + > > /** > > * tee_shm_free() - Free shared memory > > * @shm: Handle to shared memory to free > > diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h > > index d0430bee8292..d843cf980d98 100644 > > --- a/include/uapi/linux/tee.h > > +++ b/include/uapi/linux/tee.h > > @@ -378,6 +378,37 @@ struct tee_ioctl_shm_register_data { > > __s32 id; > > }; > > > > +/** > > + * struct tee_ioctl_shm_register_fd_data - Shared memory registering argument > > + * @fd: [in] File descriptor identifying dmabuf reference > > + * @size: [out] Size of referenced memory > > + * @flags: [in] Flags to/from allocation. > > + * @id: [out] Identifier of the shared memory > > + * > > + * The flags field should currently be zero as input. Updated by the call > > + * with actual flags as defined by TEE_IOCTL_SHM_* above. > > + * This structure is used as argument for TEE_IOC_SHM_REGISTER_FD below. > > + */ > > +struct tee_ioctl_shm_register_fd_data { > > + __s64 fd; > > + __u64 size; > > + __u32 flags; > > + __s32 id; > > +}; > > + > > +/** > > + * TEE_IOC_SHM_REGISTER_FD - register a shared memory from a file descriptor > > + * > > + * Returns a file descriptor on success or < 0 on failure > > + * > > + * The returned file descriptor refers to the shared memory object in the > > + * kernel. The supplied file deccriptor can be closed if it's not needed > > + * for other purposes. The shared memory is freed when the descriptor is > > + * closed. > > + */ > > +#define TEE_IOC_SHM_REGISTER_FD _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 8, \ > > + struct tee_ioctl_shm_register_fd_data) > > + > > /** > > * TEE_IOC_SHM_REGISTER - Register shared memory argument > > * > > -- > > 2.43.0 > >

6 months

1
0
0 0

[PATCH] Documentation: dma-buf: heaps: Add naming guidelines

by Maxime Ripard

We've discussed a number of times of how some heap names are bad, but not really what makes a good heap name. Let's document what we expect the heap names to look like. Signed-off-by: Maxime Ripard <mripard(a)kernel.org> --- Documentation/userspace-api/dma-buf-heaps.rst | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst index 535f49047ce6450796bf4380c989e109355efc05..b24618e360a9a9ba0bd85135d8c1760776f1a37f 100644 --- a/Documentation/userspace-api/dma-buf-heaps.rst +++ b/Documentation/userspace-api/dma-buf-heaps.rst @@ -21,5 +21,24 @@ following heaps: usually created either through the kernel commandline through the `cma` parameter, a memory region Device-Tree node with the `linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or `CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it might be called ``reserved``, ``linux,cma``, or ``default-pool``. + +Naming Convention +================= + +A good heap name is a name that: + +- Is stable, and won't change from one version to the other; + +- Describes the memory region the heap will allocate from, and will + uniquely identify it in a given platform; + +- Doesn't use implementation details, such as the allocator; + +- Can describe intended usage. + +For example, assuming a platform with a reserved memory region located +at the RAM address 0x42000000, intended to allocate video framebuffers, +and backed by the CMA kernel allocator. Good names would be +`memory@42000000` or `video@42000000`, but `cma-video` wouldn't. --- base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb change-id: 20250520-dma-buf-heap-names-doc-31261aa0cfe6 Best regards, -- Maxime Ripard <mripard(a)kernel.org>

6 months

3
3
0 0

Re: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range

by Christoph Hellwig

On Fri, Jun 13, 2025 at 09:33:55AM +0000, wangtao wrote: > > > > > On Mon, Jun 09, 2025 at 09:32:20AM +0000, wangtao wrote: > > > Are you suggesting adding an ITER_DMABUF type to iov_iter, > > > > Yes. > > May I clarify: Do all disk operations require data to pass through > memory (reading into memory or writing from memory)? In the block layer, > the bio structure uses bio_iov_iter_get_pages to convert iter_type > objects into memory-backed bio_vec representations. > However, some dmabufs are not memory-based, making page-to-bio_vec > conversion impossible. This suggests adding a callback function in > dma_buf_ops to handle dmabuf- to-bio_vec conversion. bios do support PCI P2P tranfers. This could be fairly easily extended to other peer to peer transfers if we manage to come up with a coherent model for them. No need for a callback.

6 months

1
0
0 0

Re: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range

by Christoph Hellwig

On Fri, Jun 13, 2025 at 09:43:08AM +0000, wangtao wrote: > Here's my analysis on Linux 6.6 with F2FS/iomap. Linux 6.6 is almost two years old and completely irrelevant. Please provide numbers on 6.16 or current Linus' tree.

6 months

1
0
0 0

Re: [PATCH v5 03/12] tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF

by Andrew Davis

On 5/27/25 1:56 AM, Amirreza Zarrabi wrote: > For drivers that can transfer data to the TEE without using shared > memory from client, it is necessary to receive the user address > directly, bypassing any processing by the TEE subsystem. Introduce > TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT/OUTPUT/INOUT to represent > userspace buffers. > Could you expand on this, what is the issue with normal MEMREF? Andrew > Reviewed-by: Sumit Garg <sumit.garg(a)oss.qualcomm.com> > Tested-by: Neil Armstrong <neil.armstrong(a)linaro.org> > Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com> > --- > drivers/tee/tee_core.c | 33 +++++++++++++++++++++++++++++++++ > include/linux/tee_drv.h | 6 ++++++ > include/uapi/linux/tee.h | 22 ++++++++++++++++------ > 3 files changed, 55 insertions(+), 6 deletions(-) > > diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c > index b9ea5a85278c..74e40ed83fa7 100644 > --- a/drivers/tee/tee_core.c > +++ b/drivers/tee/tee_core.c > @@ -387,6 +387,17 @@ static int params_from_user(struct tee_context *ctx, struct tee_param *params, > params[n].u.value.b = ip.b; > params[n].u.value.c = ip.c; > break; > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT: > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT: > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT: > + params[n].u.ubuf.uaddr = u64_to_user_ptr(ip.a); > + params[n].u.ubuf.size = ip.b; > + > + if (!access_ok(params[n].u.ubuf.uaddr, > + params[n].u.ubuf.size)) > + return -EFAULT; > + > + break; > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: > @@ -455,6 +466,11 @@ static int params_to_user(struct tee_ioctl_param __user *uparams, > put_user(p->u.value.c, &up->c)) > return -EFAULT; > break; > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT: > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT: > + if (put_user((u64)p->u.ubuf.size, &up->b)) > + return -EFAULT; > + break; > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: > if (put_user((u64)p->u.memref.size, &up->b)) > @@ -655,6 +671,13 @@ static int params_to_supp(struct tee_context *ctx, > ip.b = p->u.value.b; > ip.c = p->u.value.c; > break; > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT: > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT: > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT: > + ip.a = (u64)p->u.ubuf.uaddr; > + ip.b = p->u.ubuf.size; > + ip.c = 0; > + break; > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: > @@ -757,6 +780,16 @@ static int params_from_supp(struct tee_param *params, size_t num_params, > p->u.value.b = ip.b; > p->u.value.c = ip.c; > break; > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT: > + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT: > + p->u.ubuf.uaddr = u64_to_user_ptr(ip.a); > + p->u.ubuf.size = ip.b; > + > + if (!access_ok(params[n].u.ubuf.uaddr, > + params[n].u.ubuf.size)) > + return -EFAULT; > + > + break; > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: > case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: > /* > diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h > index a54c203000ed..78bbf12f02f0 100644 > --- a/include/linux/tee_drv.h > +++ b/include/linux/tee_drv.h > @@ -82,6 +82,11 @@ struct tee_param_memref { > struct tee_shm *shm; > }; > > +struct tee_param_ubuf { > + void * __user uaddr; > + size_t size; > +}; > + > struct tee_param_value { > u64 a; > u64 b; > @@ -92,6 +97,7 @@ struct tee_param { > u64 attr; > union { > struct tee_param_memref memref; > + struct tee_param_ubuf ubuf; > struct tee_param_value value; > } u; > }; > diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h > index d0430bee8292..3e9b1ec5dfde 100644 > --- a/include/uapi/linux/tee.h > +++ b/include/uapi/linux/tee.h > @@ -151,6 +151,13 @@ struct tee_ioctl_buf_data { > #define TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT 6 > #define TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT 7 /* input and output */ > > +/* > + * These defines userspace buffer parameters. > + */ > +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT 8 > +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT 9 > +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT 10 /* input and output */ > + > /* > * Mask for the type part of the attribute, leaves room for more types > */ > @@ -186,14 +193,17 @@ struct tee_ioctl_buf_data { > /** > * struct tee_ioctl_param - parameter > * @attr: attributes > - * @a: if a memref, offset into the shared memory object, else a value parameter > - * @b: if a memref, size of the buffer, else a value parameter > + * @a: if a memref, offset into the shared memory object, > + * else if a ubuf, address of the user buffer, > + * else a value parameter > + * @b: if a memref or ubuf, size of the buffer, else a value parameter > * @c: if a memref, shared memory identifier, else a value parameter > * > - * @attr & TEE_PARAM_ATTR_TYPE_MASK indicates if memref or value is used in > - * the union. TEE_PARAM_ATTR_TYPE_VALUE_* indicates value and > - * TEE_PARAM_ATTR_TYPE_MEMREF_* indicates memref. TEE_PARAM_ATTR_TYPE_NONE > - * indicates that none of the members are used. > + * @attr & TEE_PARAM_ATTR_TYPE_MASK indicates if memref, ubuf, or value is > + * used in the union. TEE_PARAM_ATTR_TYPE_VALUE_* indicates value, > + * TEE_PARAM_ATTR_TYPE_MEMREF_* indicates memref, and TEE_PARAM_ATTR_TYPE_UBUF_* > + * indicates ubuf. TEE_PARAM_ATTR_TYPE_NONE indicates that none of the members > + * are used. > * > * Shared memory is allocated with TEE_IOC_SHM_ALLOC which returns an > * identifier representing the shared memory object. A memref can reference >

6 months

1
0
0 0

Re: [PATCH v6 1/4] sync_file: Protect access to driver and timeline name

by Christian König

On 6/10/25 18:42, Tvrtko Ursulin wrote: > Protect the access to driver and timeline name which otherwise could be > freed as dma-fence exported is signalling fences. > > This prepares the code for incoming dma-fence API changes which will start > asserting these accesses are done from a RCU locked section. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/dma-buf/sync_file.c | 24 ++++++++++++++++++++---- > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > index 212df4b849fe..747e377fb954 100644 > --- a/drivers/dma-buf/sync_file.c > +++ b/drivers/dma-buf/sync_file.c > @@ -135,12 +135,18 @@ char *sync_file_get_name(struct sync_file *sync_file, char *buf, int len) > strscpy(buf, sync_file->user_name, len); > } else { > struct dma_fence *fence = sync_file->fence; > + const char __rcu *timeline; > + const char __rcu *driver; > > + rcu_read_lock(); > + driver = dma_fence_driver_name(fence); > + timeline = dma_fence_timeline_name(fence); > snprintf(buf, len, "%s-%s%llu-%lld", > - dma_fence_driver_name(fence), > - dma_fence_timeline_name(fence), > + rcu_dereference(driver), > + rcu_dereference(timeline), > fence->context, > fence->seqno); > + rcu_read_unlock(); > } > > return buf; > @@ -262,9 +268,17 @@ static long sync_file_ioctl_merge(struct sync_file *sync_file, > static int sync_fill_fence_info(struct dma_fence *fence, > struct sync_fence_info *info) > { > - strscpy(info->obj_name, dma_fence_timeline_name(fence), > + const char __rcu *timeline; > + const char __rcu *driver; > + > + rcu_read_lock(); > + > + driver = dma_fence_driver_name(fence); > + timeline = dma_fence_timeline_name(fence); > + > + strscpy(info->obj_name, rcu_dereference(timeline), > sizeof(info->obj_name)); > - strscpy(info->driver_name, dma_fence_driver_name(fence), > + strscpy(info->driver_name, rcu_dereference(driver), > sizeof(info->driver_name)); > > info->status = dma_fence_get_status(fence); > @@ -273,6 +287,8 @@ static int sync_fill_fence_info(struct dma_fence *fence, > ktime_to_ns(dma_fence_timestamp(fence)) : > ktime_set(0, 0); > > + rcu_read_unlock(); > + > return info->status; > } >

6 months

1
0
0 0

Re: [PATCH v6 3/4] dma-fence: Add safe access helpers and document the rules

by Christian König

On 6/10/25 18:42, Tvrtko Ursulin wrote: > Dma-fence objects currently suffer from a potential use after free problem > where fences exported to userspace and other drivers can outlive the > exporting driver, or the associated data structures. > > The discussion on how to address this concluded that adding reference > counting to all the involved objects is not desirable, since it would need > to be very wide reaching and could cause unloadable drivers if another > entity would be holding onto a signaled fence reference potentially > indefinitely. > > This patch enables the safe access by introducing and documenting a > contract between fence exporters and users. It documents a set of > contraints and adds helpers which a) drivers with potential to suffer from > the use after free must use and b) users of the dma-fence API must use as > well. > > Premise of the design has multiple sides: > > 1. Drivers (fence exporters) MUST ensure a RCU grace period between > signalling a fence and freeing the driver private data associated with it. > > The grace period does not have to follow the signalling immediately but > HAS to happen before data is freed. > > 2. Users of the dma-fence API marked with such requirement MUST contain > the complete access to the data within a single code block guarded by > rcu_read_lock() and rcu_read_unlock(). > > The combination of the two ensures that whoever sees the > DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > valid fence->lock and valid data potentially accessed by the fence->ops > virtual functions, until the call to rcu_read_unlock(). > > 3. Module unload (fence->ops) disappearing is for now explicitly not > handled. That would required a more complex protection, possibly needing > SRCU instead of RCU to handle callers such as dma_fence_release() and > dma_fence_wait_timeout(), where race between > dma_fence_enable_sw_signaling, signalling, and dereference of > fence->ops->wait() would need a sleeping SRCU context. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/dma-buf/dma-fence.c | 111 ++++++++++++++++++++++++++++--- > include/linux/dma-fence.h | 31 ++++++--- > include/trace/events/dma_fence.h | 38 +++++++++-- > 3 files changed, 157 insertions(+), 23 deletions(-) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index 74f9e4b665e3..3f78c56b58dc 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -511,12 +511,20 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout) > > dma_fence_enable_sw_signaling(fence); > > - trace_dma_fence_wait_start(fence); > + if (trace_dma_fence_wait_start_enabled()) { > + rcu_read_lock(); > + trace_dma_fence_wait_start(fence); > + rcu_read_unlock(); > + } > if (fence->ops->wait) > ret = fence->ops->wait(fence, intr, timeout); > else > ret = dma_fence_default_wait(fence, intr, timeout); > - trace_dma_fence_wait_end(fence); > + if (trace_dma_fence_wait_end_enabled()) { > + rcu_read_lock(); > + trace_dma_fence_wait_end(fence); > + rcu_read_unlock(); > + } > return ret; > } > EXPORT_SYMBOL(dma_fence_wait_timeout); > @@ -533,16 +541,23 @@ void dma_fence_release(struct kref *kref) > struct dma_fence *fence = > container_of(kref, struct dma_fence, refcount); > > + rcu_read_lock(); > trace_dma_fence_destroy(fence); > > - if (WARN(!list_empty(&fence->cb_list) && > - !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags), > - "Fence %s:%s:%llx:%llx released with pending signals!\n", > - dma_fence_driver_name(fence), > - dma_fence_timeline_name(fence), > - fence->context, fence->seqno)) { > + if (!list_empty(&fence->cb_list) && > + !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) { > + const char __rcu *timeline; > + const char __rcu *driver; > unsigned long flags; > > + driver = dma_fence_driver_name(fence); > + timeline = dma_fence_timeline_name(fence); > + > + WARN(1, > + "Fence %s:%s:%llx:%llx released with pending signals!\n", > + rcu_dereference(driver), rcu_dereference(timeline), > + fence->context, fence->seqno); > + > /* > * Failed to signal before release, likely a refcounting issue. > * > @@ -556,6 +571,8 @@ void dma_fence_release(struct kref *kref) > spin_unlock_irqrestore(fence->lock, flags); > } > > + rcu_read_unlock(); > + > if (fence->ops->release) > fence->ops->release(fence); > else > @@ -982,11 +999,21 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > */ > void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > { > + const char __rcu *timeline; > + const char __rcu *driver; > + > + rcu_read_lock(); > + > + timeline = dma_fence_timeline_name(fence); > + driver = dma_fence_driver_name(fence); > + > seq_printf(seq, "%s %s seq %llu %ssignalled\n", > - dma_fence_driver_name(fence), > - dma_fence_timeline_name(fence), > + rcu_dereference(driver), > + rcu_dereference(timeline), > fence->seqno, > dma_fence_is_signaled(fence) ? "" : "un"); > + > + rcu_read_unlock(); > } > EXPORT_SYMBOL(dma_fence_describe); > > @@ -1055,3 +1082,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, > BIT(DMA_FENCE_FLAG_SEQNO64_BIT)); > } > EXPORT_SYMBOL(dma_fence_init64); > + > +/** > + * dma_fence_driver_name - Access the driver name > + * @fence: the fence to query > + * > + * Returns a driver name backing the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &rcu_read_lock and &rcu_read_unlock pair. > + */ > +const char __rcu *dma_fence_driver_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "RCU protection is required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "detached-driver"; > +} > +EXPORT_SYMBOL(dma_fence_driver_name); > + > +/** > + * dma_fence_timeline_name - Access the timeline name > + * @fence: the fence to query > + * > + * Returns a timeline name provided by the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &rcu_read_lock and &rcu_read_unlock pair. > + */ > +const char __rcu *dma_fence_timeline_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "RCU protection is required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "signaled-timeline"; > +} > +EXPORT_SYMBOL(dma_fence_timeline_name); > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index 10a849cb2d3f..64639e104110 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -378,15 +378,28 @@ bool dma_fence_remove_callback(struct dma_fence *fence, > struct dma_fence_cb *cb); > void dma_fence_enable_sw_signaling(struct dma_fence *fence); > > -static inline const char *dma_fence_driver_name(struct dma_fence *fence) > -{ > - return fence->ops->get_driver_name(fence); > -} > - > -static inline const char *dma_fence_timeline_name(struct dma_fence *fence) > -{ > - return fence->ops->get_timeline_name(fence); > -} > +/** > + * DOC: Safe external access to driver provided object members > + * > + * All data not stored directly in the dma-fence object, such as the > + * &dma_fence.lock and memory potentially accessed by functions in the > + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled > + * because after that point drivers are allowed to free it. > + * > + * All code accessing that data via the dma-fence API (or directly, which is > + * discouraged), MUST make sure to contain the complete access within a > + * &rcu_read_lock and &rcu_read_unlock pair. > + * > + * Some dma-fence API handles this automatically, while other, as for example > + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that > + * responsibility to the caller. > + * > + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses > + * between signalling the fence and freeing the said data. > + * > + */ > +const char __rcu *dma_fence_driver_name(struct dma_fence *fence); > +const char __rcu *dma_fence_timeline_name(struct dma_fence *fence); > > /** > * dma_fence_is_signaled_locked - Return an indication if the fence > diff --git a/include/trace/events/dma_fence.h b/include/trace/events/dma_fence.h > index 84c83074ee81..4814a65b68dc 100644 > --- a/include/trace/events/dma_fence.h > +++ b/include/trace/events/dma_fence.h > @@ -34,14 +34,44 @@ DECLARE_EVENT_CLASS(dma_fence, > __entry->seqno) > ); > > -DEFINE_EVENT(dma_fence, dma_fence_emit, > +/* > + * Safe only for call sites which are guaranteed to not race with fence > + * signaling,holding the fence->lock and having checked for not signaled, or the > + * signaling path itself. > + */ > +DECLARE_EVENT_CLASS(dma_fence_unsignaled, > + > + TP_PROTO(struct dma_fence *fence), > + > + TP_ARGS(fence), > + > + TP_STRUCT__entry( > + __string(driver, fence->ops->get_driver_name(fence)) > + __string(timeline, fence->ops->get_timeline_name(fence)) > + __field(unsigned int, context) > + __field(unsigned int, seqno) > + ), > + > + TP_fast_assign( > + __assign_str(driver); > + __assign_str(timeline); > + __entry->context = fence->context; > + __entry->seqno = fence->seqno; > + ), > + > + TP_printk("driver=%s timeline=%s context=%u seqno=%u", > + __get_str(driver), __get_str(timeline), __entry->context, > + __entry->seqno) > +); > + > +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_emit, > > TP_PROTO(struct dma_fence *fence), > > TP_ARGS(fence) > ); > > -DEFINE_EVENT(dma_fence, dma_fence_init, > +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_init, > > TP_PROTO(struct dma_fence *fence), > > @@ -55,14 +85,14 @@ DEFINE_EVENT(dma_fence, dma_fence_destroy, > TP_ARGS(fence) > ); > > -DEFINE_EVENT(dma_fence, dma_fence_enable_signal, > +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_enable_signal, > > TP_PROTO(struct dma_fence *fence), > > TP_ARGS(fence) > ); > > -DEFINE_EVENT(dma_fence, dma_fence_signaled, > +DEFINE_EVENT(dma_fence_unsignaled, dma_fence_signaled, > > TP_PROTO(struct dma_fence *fence), >

6 months

1
0
0 0

[PATCH v3 0/3] media: fix incorrect use of dma_sync_sg_*() calls

by Marek Szyprowski

Dear All, This patchset fixes the incorrect use of dma_sync_sg_*() calls in media and related drivers. They are replaced with much safer dma_sync_sgtable_*() variants, which take care of passing the proper number of elements for the sync operation. Best regards Marek Szyprowski, PhD Samsung R&D Institute Poland Change log: v3: added cc: stable to tags v2: fixes typos and added cc: stable Patch summary: Marek Szyprowski (3): media: videobuf2: use sgtable-based scatterlist wrappers udmabuf: use sgtable-based scatterlist wrappers media: omap3isp: use sgtable-based scatterlist wrappers drivers/dma-buf/udmabuf.c | 5 ++--- drivers/media/common/videobuf2/videobuf2-dma-sg.c | 4 ++-- drivers/media/platform/ti/omap3isp/ispccdc.c | 8 ++++---- drivers/media/platform/ti/omap3isp/ispstat.c | 6 ++---- 4 files changed, 10 insertions(+), 13 deletions(-) -- 2.34.1

6 months

5
10
0 0

[PATCH] dma-buf: fix compare in WARN_ON_ONCE

by Christian König

Smatch pointed out this trivial typo: drivers/dma-buf/dma-buf.c:1123 dma_buf_map_attachment() warn: passing positive error code '16' to 'ERR_PTR' drivers/dma-buf/dma-buf.c 1113 dma_resv_assert_held(attach->dmabuf->resv); 1114 1115 if (dma_buf_pin_on_map(attach)) { 1116 ret = attach->dmabuf->ops->pin(attach); 1117 /* 1118 * Catch exporters making buffers inaccessible even when 1119 * attachments preventing that exist. 1120 */ 1121 WARN_ON_ONCE(ret == EBUSY); ^^^^^ This was probably intended to be -EBUSY? 1122 if (ret) --> 1123 return ERR_PTR(ret); ^^^ Otherwise we will eventually crash. 1124 } 1125 1126 sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); 1127 if (!sg_table) 1128 sg_table = ERR_PTR(-ENOMEM); 1129 if (IS_ERR(sg_table)) 1130 goto error_unpin; 1131 Signed-off-by: Christian König <christian.koenig(a)amd.com> --- drivers/dma-buf/dma-buf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 0c48d41dd5eb..451714008e8a 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1060,7 +1060,7 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, * Catch exporters making buffers inaccessible even when * attachments preventing that exist. */ - WARN_ON_ONCE(ret == EBUSY); + WARN_ON_ONCE(ret == -EBUSY); if (ret) return ERR_PTR(ret); } -- 2.43.0

6 months, 1 week

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig