Linaro-mm-sig January 2026

linaro-mm-sig@lists.linaro.org

18 participants
59 discussions

Re: [PATCH v2 2/5] accel/thames: Add driver for the C7x DSPs in TI SoCs

by Andrew Davis

On 1/14/26 2:46 AM, Tomeu Vizoso wrote: > Some SoCs from Texas Instruments contain DSPs that can be used for > general compute tasks. > > This driver provides a drm/accel UABI to userspace for submitting jobs > to the DSP cores and managing the input, output and intermediate memory. > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > Documentation/accel/thames/index.rst | 28 +++++ > MAINTAINERS | 9 ++ > drivers/accel/Kconfig | 1 + > drivers/accel/Makefile | 3 +- > drivers/accel/thames/Kconfig | 26 +++++ > drivers/accel/thames/Makefile | 9 ++ > drivers/accel/thames/thames_core.c | 155 ++++++++++++++++++++++++++ > drivers/accel/thames/thames_core.h | 53 +++++++++ > drivers/accel/thames/thames_device.c | 93 ++++++++++++++++ > drivers/accel/thames/thames_device.h | 46 ++++++++ > drivers/accel/thames/thames_drv.c | 155 ++++++++++++++++++++++++++ > drivers/accel/thames/thames_drv.h | 21 ++++ > drivers/accel/thames/thames_ipc.h | 204 +++++++++++++++++++++++++++++++++++ > drivers/accel/thames/thames_rpmsg.c | 155 ++++++++++++++++++++++++++ > drivers/accel/thames/thames_rpmsg.h | 27 +++++ > 15 files changed, 984 insertions(+), 1 deletion(-) > > diff --git a/Documentation/accel/thames/index.rst b/Documentation/accel/thames/index.rst > new file mode 100644 > index 0000000000000000000000000000000000000000..ca8391031f226f7ef1dc210a356c86acbe126c6f > --- /dev/null > +++ b/Documentation/accel/thames/index.rst > @@ -0,0 +1,28 @@ > +.. SPDX-License-Identifier: GPL-2.0-only > + > +============================================================ > + accel/thames Driver for the C7x DSPs from Texas Instruments > +============================================================ > + > +The accel/thames driver supports the C7x DSPs inside some Texas Instruments SoCs > +such as the J722S. These can be used as accelerators for various workloads, > +including machine learning inference. > + > +This driver controls the power state of the hardware via :doc:`remoteproc </staging/remoteproc>` > +and communicates with the firmware running on the DSP via :doc:`rpmsg_virtio </staging/rpmsg_virtio>`. > +The kernel driver itself allocates buffers, manages contexts, and submits jobs > +to the DSP firmware. Buffers are mapped by the DSP itself using its MMU, > +providing memory isolation among different clients. > + > +The source code for the firmware running on the DSP is available at: > +https://gitlab.freedesktop.org/tomeu/thames_firmware/. > + > +Everything else is done in userspace, as a Gallium driver (also called thames) > +that is part of the Mesa3D project: https://docs.mesa3d.org/teflon.html > + > +If there is more than one core that advertises the same rpmsg_virtio service > +name, the driver will load balance jobs between them with drm-gpu-scheduler. > + > +Hardware currently supported: > + > +* J722S > diff --git a/MAINTAINERS b/MAINTAINERS > index dc731d37c8feeff25613c59fe9c929927dadaa7e..a3fc809c797269d0792dfe5202cc1b49f6ff57e9 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -7731,6 +7731,15 @@ F: Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml > F: drivers/accel/rocket/ > F: include/uapi/drm/rocket_accel.h > > +DRM ACCEL DRIVER FOR TI C7x DSPS > +M: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > +L: dri-devel(a)lists.freedesktop.org > +S: Supported > +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git > +F: Documentation/accel/thames/ > +F: drivers/accel/thames/ > +F: include/uapi/drm/thames_accel.h > + > DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK > M: Oded Gabbay <ogabbay(a)kernel.org> > L: dri-devel(a)lists.freedesktop.org > diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig > index bdf48ccafcf21b2fd685ec963e39e256196e6e17..cb49c71cd4e4a4220624f7041a75ba950a1a2ee1 100644 > --- a/drivers/accel/Kconfig > +++ b/drivers/accel/Kconfig > @@ -30,5 +30,6 @@ source "drivers/accel/habanalabs/Kconfig" > source "drivers/accel/ivpu/Kconfig" > source "drivers/accel/qaic/Kconfig" > source "drivers/accel/rocket/Kconfig" > +source "drivers/accel/thames/Kconfig" > > endif > diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile > index 1d3a7251b950f39e2ae600a2fc07a3ef7e41831e..8472989cbe22746f1e7292d2401fa0f7424a6c15 100644 > --- a/drivers/accel/Makefile > +++ b/drivers/accel/Makefile > @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU) += ethosu/ > obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/ > obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/ > obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/ > -obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/ > \ No newline at end of file > +obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/ > +obj-$(CONFIG_DRM_ACCEL_THAMES) += thames/ > \ No newline at end of file > diff --git a/drivers/accel/thames/Kconfig b/drivers/accel/thames/Kconfig > new file mode 100644 > index 0000000000000000000000000000000000000000..50e0b6ac2a16a942ba8463333991f5b0161b99ac > --- /dev/null > +++ b/drivers/accel/thames/Kconfig > @@ -0,0 +1,26 @@ > +# SPDX-License-Identifier: GPL-2.0-only > + > +config DRM_ACCEL_THAMES > + tristate "Thames (support for TI C7x DSP accelerators)" > + depends on DRM_ACCEL > + depends on TI_K3_R5_REMOTEPROC || COMPILE_TEST COMPILE_TEST part shouldn't be needed here, TI_K3_R5_REMOTEPROC can be built under COMPILE_TEST so TI_K3_R5_REMOTEPROC would just be enabled to test. > + depends on RPMSG > + depends on MMU > + select DRM_SCHED > + select DRM_GEM_SHMEM_HELPER > + help > + Choose this option if you have a Texas Instruments SoC that contains > + C7x DSP cores that can be used as compute accelerators. This includes > + SoCs such as the AM62A, J721E, J721S2, and J784S4. > + > + The C7x DSP cores can be used for general-purpose compute acceleration > + and are exposed through the DRM accel subsystem. > + > + The interface exposed to userspace is described in > + include/uapi/drm/thames_accel.h and is used by the Thames userspace > + driver in Mesa3D. > + > + If unsure, say N. > + > + To compile this driver as a module, choose M here: the > + module will be called thames. > diff --git a/drivers/accel/thames/Makefile b/drivers/accel/thames/Makefile > new file mode 100644 > index 0000000000000000000000000000000000000000..7ccd8204f0f5ea800f30e84b319f355be948109d > --- /dev/null > +++ b/drivers/accel/thames/Makefile > @@ -0,0 +1,9 @@ > +# SPDX-License-Identifier: GPL-2.0-only > + > +obj-$(CONFIG_DRM_ACCEL_THAMES) := thames.o > + > +thames-y := \ > + thames_core.o \ > + thames_device.o \ > + thames_drv.o \ > + thames_rpmsg.o > diff --git a/drivers/accel/thames/thames_core.c b/drivers/accel/thames/thames_core.c > new file mode 100644 > index 0000000000000000000000000000000000000000..92af1d68063116bcfa28a33960cbe829029fc1bf > --- /dev/null > +++ b/drivers/accel/thames/thames_core.c > @@ -0,0 +1,155 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright 2026 Texas Instruments Incorporated - https://www.ti.com/ */ > + > +#include "linux/remoteproc.h" > +#include <linux/dev_printk.h> > +#include <linux/err.h> > +#include <linux/of.h> > +#include <linux/of_address.h> > +#include <linux/platform_device.h> > +#include <linux/completion.h> > +#include <linux/jiffies.h> > +#include <linux/rpmsg.h> > + > +#include "thames_core.h" > +#include "thames_device.h" > +#include "thames_rpmsg.h" > + > +/* Shift to convert bytes to megabytes (divide by 1048576) */ > +#define THAMES_BYTES_TO_MB_SHIFT 20 Seems unused/unneeded. [...] > + > +static const struct rpmsg_device_id thames_rpmsg_id_table[] = { { .name = THAMES_SERVICE_NAME }, > + {} }; > + Some odd formatting here. > +static struct rpmsg_driver thames_rpmsg_driver = { > + .drv = { > + .name = "thames", > + .owner = THIS_MODULE, Above line shoulnd't be needed. Andrew

4 weeks

Re: [PATCH v2 1/5] arm64: dts: ti: k3-j722s-ti-ipc-firmware: Add memory pool for DSP i/o buffers

by Andrew Davis

On 1/14/26 2:46 AM, Tomeu Vizoso wrote: > This memory region is used by the DRM/accel driver to allocate addresses > for buffers that are used for communication with the DSP cores and for > their intermediate results. > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > arch/arm64/boot/dts/ti/k3-j722s-ti-ipc-firmware.dtsi | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/boot/dts/ti/k3-j722s-ti-ipc-firmware.dtsi b/arch/arm64/boot/dts/ti/k3-j722s-ti-ipc-firmware.dtsi > index 3fbff927c4c08bce741555aa2753a394b751144f..b80d2a5a157ad59eaed8e57b22f1f4bce4765a85 100644 > --- a/arch/arm64/boot/dts/ti/k3-j722s-ti-ipc-firmware.dtsi > +++ b/arch/arm64/boot/dts/ti/k3-j722s-ti-ipc-firmware.dtsi > @@ -42,6 +42,11 @@ c7x_0_memory_region: memory@a3100000 { > no-map; > }; > > + c7x_iova_pool: iommu-pool@a7000000 { > + reg = <0x00 0xa7000000 0x00 0x18200000>; > + no-map; Could you expand on why this carveout is needed? The C7 NPU has a full MMU and should be able to work with any buffer Linux allocates from any address, even non-contiguous buffers too. Communication should already happen over the existing RPMSG channels without needing extra buffers. And space for intermediate results should be provided dynamically by the drivers (I believe that would match how GPUs without dedicated memory handle getting intermediate buffers space from system memory these days, but do correct me if I'm wrong about that one). Andrew > + }; > + > c7x_1_dma_memory_region: memory@a4000000 { > compatible = "shared-dma-pool"; > reg = <0x00 0xa4000000 0x00 0x100000>; > @@ -151,13 +156,15 @@ &main_r5fss0_core0 { > &c7x_0 { > mboxes = <&mailbox0_cluster2 &mbox_c7x_0>; > memory-region = <&c7x_0_dma_memory_region>, > - <&c7x_0_memory_region>; > + <&c7x_0_memory_region>, > + <&c7x_iova_pool>; > status = "okay"; > }; > > &c7x_1 { > mboxes = <&mailbox0_cluster3 &mbox_c7x_1>; > memory-region = <&c7x_1_dma_memory_region>, > - <&c7x_1_memory_region>; > + <&c7x_1_memory_region>, > + <&c7x_iova_pool>; > status = "okay"; > }; >

4 weeks

types: reuse common phys_vec type instead of DMABUF open‑coded variant

by Leon Romanovsky

From: Leon Romanovsky <leonro(a)nvidia.com> After commit fcf463b92a08 ("types: move phys_vec definition to common header"), we can use the shared phys_vec type instead of the DMABUF‑specific dma_buf_phys_vec, which duplicated the same structure and semantics. Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> --- Alex, According to diffstat, VFIO is the subsystem with the largest set of changes, so it would be great if you could take it through your tree. The series is based on the for-7.0/blk-pvec shared branch from Jens: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git/log/?h=for-… Thanks --- Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linaro-mm-sig(a)lists.linaro.org Cc: linux-kernel(a)vger.kernel.org Cc: iommu(a)lists.linux.dev Cc: kvm(a)vger.kernel.org To: Sumit Semwal <sumit.semwal(a)linaro.org> To: Christian König <christian.koenig(a)amd.com> To: Jason Gunthorpe <jgg(a)ziepe.ca> To: Kevin Tian <kevin.tian(a)intel.com> To: Joerg Roedel <joro(a)8bytes.org> To: Will Deacon <will(a)kernel.org> To: Robin Murphy <robin.murphy(a)arm.com> To: Yishai Hadas <yishaih(a)nvidia.com> To: Shameer Kolothum <skolothumtho(a)nvidia.com> To: Ankit Agrawal <ankita(a)nvidia.com> To: Alex Williamson <alex(a)shazbot.org> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Jens Axboe <axboe(a)kernel.dk> --- drivers/dma-buf/dma-buf-mapping.c | 6 +++--- drivers/iommu/iommufd/io_pagetable.h | 2 +- drivers/iommu/iommufd/iommufd_private.h | 5 ++--- drivers/iommu/iommufd/pages.c | 4 ++-- drivers/iommu/iommufd/selftest.c | 2 +- drivers/vfio/pci/nvgrace-gpu/main.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 8 ++++---- include/linux/dma-buf-mapping.h | 2 +- include/linux/dma-buf.h | 10 ---------- include/linux/vfio_pci_core.h | 13 ++++++------- 10 files changed, 21 insertions(+), 33 deletions(-) diff --git a/drivers/dma-buf/dma-buf-mapping.c b/drivers/dma-buf/dma-buf-mapping.c index b7352e609fbd..174677faa577 100644 --- a/drivers/dma-buf/dma-buf-mapping.c +++ b/drivers/dma-buf/dma-buf-mapping.c @@ -33,8 +33,8 @@ static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length, } static unsigned int calc_sg_nents(struct dma_iova_state *state, - struct dma_buf_phys_vec *phys_vec, - size_t nr_ranges, size_t size) + struct phys_vec *phys_vec, size_t nr_ranges, + size_t size) { unsigned int nents = 0; size_t i; @@ -91,7 +91,7 @@ struct dma_buf_dma { */ struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach, struct p2pdma_provider *provider, - struct dma_buf_phys_vec *phys_vec, + struct phys_vec *phys_vec, size_t nr_ranges, size_t size, enum dma_data_direction dir) { diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h index 14cd052fd320..27e3e311d395 100644 --- a/drivers/iommu/iommufd/io_pagetable.h +++ b/drivers/iommu/iommufd/io_pagetable.h @@ -202,7 +202,7 @@ struct iopt_pages_dmabuf_track { struct iopt_pages_dmabuf { struct dma_buf_attachment *attach; - struct dma_buf_phys_vec phys; + struct phys_vec phys; /* Always PAGE_SIZE aligned */ unsigned long start; struct list_head tracker; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index eb6d1a70f673..6ac1965199e9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -20,7 +20,6 @@ struct iommu_group; struct iommu_option; struct iommufd_device; struct dma_buf_attachment; -struct dma_buf_phys_vec; struct iommufd_sw_msi_map { struct list_head sw_msi_item; @@ -718,7 +717,7 @@ int __init iommufd_test_init(void); void iommufd_test_exit(void); bool iommufd_selftest_is_mock_dev(struct device *dev); int iommufd_test_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, - struct dma_buf_phys_vec *phys); + struct phys_vec *phys); #else static inline void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd, unsigned int ioas_id, @@ -742,7 +741,7 @@ static inline bool iommufd_selftest_is_mock_dev(struct device *dev) } static inline int iommufd_test_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, - struct dma_buf_phys_vec *phys) + struct phys_vec *phys) { return -EOPNOTSUPP; } diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c index dbe51ecb9a20..bababd564cf9 100644 --- a/drivers/iommu/iommufd/pages.c +++ b/drivers/iommu/iommufd/pages.c @@ -1077,7 +1077,7 @@ static int pfn_reader_user_update_pinned(struct pfn_reader_user *user, } struct pfn_reader_dmabuf { - struct dma_buf_phys_vec phys; + struct phys_vec phys; unsigned long start_offset; }; @@ -1460,7 +1460,7 @@ static struct dma_buf_attach_ops iopt_dmabuf_attach_revoke_ops = { */ static int sym_vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, - struct dma_buf_phys_vec *phys) + struct phys_vec *phys) { typeof(&vfio_pci_dma_buf_iommufd_map) fn; int rc; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 550ff36dec3a..989d8c4c60a7 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -2002,7 +2002,7 @@ static const struct dma_buf_ops iommufd_test_dmabuf_ops = { }; int iommufd_test_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, - struct dma_buf_phys_vec *phys) + struct phys_vec *phys) { struct iommufd_test_dma_buf *priv = attachment->dmabuf->priv; diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c index 84d142a47ec6..a0f4edd6a30b 100644 --- a/drivers/vfio/pci/nvgrace-gpu/main.c +++ b/drivers/vfio/pci/nvgrace-gpu/main.c @@ -784,7 +784,7 @@ nvgrace_gpu_write(struct vfio_device *core_vdev, static int nvgrace_get_dmabuf_phys(struct vfio_pci_core_device *core_vdev, struct p2pdma_provider **provider, unsigned int region_index, - struct dma_buf_phys_vec *phys_vec, + struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges) { diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index d4d0f7d08c53..9a84c238c013 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -14,7 +14,7 @@ struct vfio_pci_dma_buf { struct vfio_pci_core_device *vdev; struct list_head dmabufs_elm; size_t size; - struct dma_buf_phys_vec *phys_vec; + struct phys_vec *phys_vec; struct p2pdma_provider *provider; u32 nr_ranges; u8 revoked : 1; @@ -94,7 +94,7 @@ static const struct dma_buf_ops vfio_pci_dmabuf_ops = { * will fail if it is currently revoked */ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, - struct dma_buf_phys_vec *phys) + struct phys_vec *phys) { struct vfio_pci_dma_buf *priv; @@ -116,7 +116,7 @@ int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, } EXPORT_SYMBOL_FOR_MODULES(vfio_pci_dma_buf_iommufd_map, "iommufd"); -int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, +int vfio_pci_core_fill_phys_vec(struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges, phys_addr_t start, phys_addr_t len) @@ -148,7 +148,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_fill_phys_vec); int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider, unsigned int region_index, - struct dma_buf_phys_vec *phys_vec, + struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges) { diff --git a/include/linux/dma-buf-mapping.h b/include/linux/dma-buf-mapping.h index a3c0ce2d3a42..09bde3f748e4 100644 --- a/include/linux/dma-buf-mapping.h +++ b/include/linux/dma-buf-mapping.h @@ -9,7 +9,7 @@ struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach, struct p2pdma_provider *provider, - struct dma_buf_phys_vec *phys_vec, + struct phys_vec *phys_vec, size_t nr_ranges, size_t size, enum dma_data_direction dir); void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt, diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 0bc492090237..400a5311368e 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -531,16 +531,6 @@ struct dma_buf_export_info { void *priv; }; -/** - * struct dma_buf_phys_vec - describe continuous chunk of memory - * @paddr: physical address of that chunk - * @len: Length of this chunk - */ -struct dma_buf_phys_vec { - phys_addr_t paddr; - size_t len; -}; - /** * DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters * @name: export-info name diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 706877f998ff..2ac288bb2c60 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -28,7 +28,6 @@ struct vfio_pci_core_device; struct vfio_pci_region; struct p2pdma_provider; -struct dma_buf_phys_vec; struct dma_buf_attachment; struct vfio_pci_eventfd { @@ -62,25 +61,25 @@ struct vfio_pci_device_ops { int (*get_dmabuf_phys)(struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider, unsigned int region_index, - struct dma_buf_phys_vec *phys_vec, + struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges); }; #if IS_ENABLED(CONFIG_VFIO_PCI_DMABUF) -int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, +int vfio_pci_core_fill_phys_vec(struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges, phys_addr_t start, phys_addr_t len); int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider, unsigned int region_index, - struct dma_buf_phys_vec *phys_vec, + struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges); #else static inline int -vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, +vfio_pci_core_fill_phys_vec(struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges, phys_addr_t start, phys_addr_t len) @@ -89,7 +88,7 @@ vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, } static inline int vfio_pci_core_get_dmabuf_phys( struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider, - unsigned int region_index, struct dma_buf_phys_vec *phys_vec, + unsigned int region_index, struct phys_vec *phys_vec, struct vfio_region_dma_range *dma_ranges, size_t nr_ranges) { return -EOPNOTSUPP; @@ -228,6 +227,6 @@ static inline bool is_aligned_for_order(struct vm_area_struct *vma, } int vfio_pci_dma_buf_iommufd_map(struct dma_buf_attachment *attachment, - struct dma_buf_phys_vec *phys); + struct phys_vec *phys); #endif /* VFIO_PCI_CORE_H */ --- base-commit: fcf463b92a08686d1aeb1e66674a72eb7a8bfb9b change-id: 20260107-convert-to-pvec-bf04dfcf3d12 Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

4 weeks, 1 day

Re: [PATCH 2/5] accel/thames: Add driver for the C7x DSPs in TI SoCs

by Jani Nikula

On Tue, 13 Jan 2026, Tomeu Vizoso <tomeu(a)tomeuvizoso.net> wrote: > diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile > index 1d3a7251b950f39e2ae600a2fc07a3ef7e41831e..8472989cbe22746f1e7292d2401fa0f7424a6c15 100644 > --- a/drivers/accel/Makefile > +++ b/drivers/accel/Makefile > @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU) += ethosu/ > obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/ > obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/ > obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/ > -obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/ > \ No newline at end of file > +obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/ > +obj-$(CONFIG_DRM_ACCEL_THAMES) += thames/ > \ No newline at end of file Maybe add the newline while at it. > diff --git a/drivers/accel/thames/thames_core.c b/drivers/accel/thames/thames_core.c > new file mode 100644 > index 0000000000000000000000000000000000000000..92af1d68063116bcfa28a33960cbe829029fc1bf > --- /dev/null > +++ b/drivers/accel/thames/thames_core.c > @@ -0,0 +1,155 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright 2026 Texas Instruments Incorporated - https://www.ti.com/ */ > + > +#include "linux/remoteproc.h" Ditto here about <> not "". -- Jani Nikula, Intel

4 weeks, 1 day

Re: [PATCH 4/5] accel/thames: Add IOCTL for job submission

by Jani Nikula

On Tue, 13 Jan 2026, Tomeu Vizoso <tomeu(a)tomeuvizoso.net> wrote: > +#include "linux/dev_printk.h" Random drive-by comment, please use <> instead of "" for include/ headers. > +#include <drm/drm_file.h> > +#include <drm/drm_gem.h> > +#include <drm/drm_print.h> > +#include <drm/thames_accel.h> > +#include <linux/platform_device.h> In general, I think it will make everyone's life easier in the long run if the include directives are grouped and sorted. BR, Jani. -- Jani Nikula, Intel

4 weeks, 1 day

Re: [PATCH 02/10] dma-buf: add dma_fence_is_initialized function

by Christian König

On 1/14/26 10:53, Tvrtko Ursulin wrote: > \ > On 13/01/2026 15:16, Christian König wrote: >> Some driver use fence->ops to test if a fence was initialized or not. >> The problem is that this utilizes internal behavior of the dma_fence >> implementation. >> >> So better abstract that into a function. >> >> Signed-off-by: Christian König <christian.koenig(a)amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 13 +++++++------ >> drivers/gpu/drm/qxl/qxl_release.c | 2 +- >> include/linux/dma-fence.h | 12 ++++++++++++ >> 3 files changed, 20 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> index 0a0dcbf0798d..b97f90bbe8b9 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c >> @@ -278,9 +278,10 @@ void amdgpu_job_free_resources(struct amdgpu_job *job) >> unsigned i; >> /* Check if any fences were initialized */ >> - if (job->base.s_fence && job->base.s_fence->finished.ops) >> + if (job->base.s_fence && >> + dma_fence_is_initialized(&job->base.s_fence->finished)) >> f = &job->base.s_fence->finished; >> - else if (job->hw_fence && job->hw_fence->base.ops) >> + else if (dma_fence_is_initialized(&job->hw_fence->base)) >> f = &job->hw_fence->base; >> else >> f = NULL; >> @@ -297,11 +298,11 @@ static void amdgpu_job_free_cb(struct drm_sched_job *s_job) >> amdgpu_sync_free(&job->explicit_sync); >> - if (job->hw_fence->base.ops) >> + if (dma_fence_is_initialized(&job->hw_fence->base)) >> dma_fence_put(&job->hw_fence->base); >> else >> kfree(job->hw_fence); >> - if (job->hw_vm_fence->base.ops) >> + if (dma_fence_is_initialized(&job->hw_vm_fence->base)) >> dma_fence_put(&job->hw_vm_fence->base); >> else >> kfree(job->hw_vm_fence); >> @@ -335,11 +336,11 @@ void amdgpu_job_free(struct amdgpu_job *job) >> if (job->gang_submit != &job->base.s_fence->scheduled) >> dma_fence_put(job->gang_submit); >> - if (job->hw_fence->base.ops) >> + if (dma_fence_is_initialized(&job->hw_fence->base)) >> dma_fence_put(&job->hw_fence->base); >> else >> kfree(job->hw_fence); >> - if (job->hw_vm_fence->base.ops) >> + if (dma_fence_is_initialized(&job->hw_vm_fence->base)) >> dma_fence_put(&job->hw_vm_fence->base); >> else >> kfree(job->hw_vm_fence); >> diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c >> index 7b3c9a6016db..b38ae0b25f3c 100644 >> --- a/drivers/gpu/drm/qxl/qxl_release.c >> +++ b/drivers/gpu/drm/qxl/qxl_release.c >> @@ -146,7 +146,7 @@ qxl_release_free(struct qxl_device *qdev, >> idr_remove(&qdev->release_idr, release->id); >> spin_unlock(&qdev->release_idr_lock); >> - if (release->base.ops) { >> + if (dma_fence_is_initialized(&release->base)) { >> WARN_ON(list_empty(&release->bos)); >> qxl_release_free_list(release); >> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >> index eea674acdfa6..371aa8ecf18e 100644 >> --- a/include/linux/dma-fence.h >> +++ b/include/linux/dma-fence.h >> @@ -274,6 +274,18 @@ void dma_fence_release(struct kref *kref); >> void dma_fence_free(struct dma_fence *fence); >> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq); >> +/** >> + * dma_fence_is_initialized - test if fence was initialized >> + * @fence: fence to test >> + * >> + * Return: True if fence was initialized, false otherwise. Works correctly only >> + * when memory backing the fence structure is zero initialized on allocation. >> + */ >> +static inline bool dma_fence_is_initialized(struct dma_fence *fence) >> +{ >> + return fence && !!fence->ops; > > This patch should precede the one adding RCU protection to fence->ops. And that one then needs to add a rcu_dereference() here. Good point. > At which point however it would start exploding? When we start setting the ops pointer to NULL in the next patch. > Which also means the new API is racy by definition and can give false positives if fence would be to be signaled as someone is checking. Oh, that is a really really good point. I haven't thought about that because all current users would check the fence only after it is signaled. > Hmm.. is the new API too weak, being able to only be called under very limited circumstances? Yes, exactly that. All callers use this only to decide on the correct cleanup path. So the fence is either fully signaled or was never initialized in the first place. > Would it be better to solve it in the drivers by tracking state? The alternative I had in mind was to use another DMA_FENCE_FLAG_... for that. I will probably use that approach instead, just to make it extra defensive. Thanks, Christian. > > Regards, > > Tvrtko > >> +} >> + >> /** >> * dma_fence_put - decreases refcount of the fence >> * @fence: fence to reduce refcount of >

4 weeks, 1 day

Re: [PATCH] i2c: qcom-geni: make sure I2C hub controllers can't use SE DMA

by Wolfram Sang

On Wed, Oct 29, 2025 at 07:07:42PM +0100, Neil Armstrong wrote: > The I2C Hub controller is a simpler GENI I2C variant that doesn't > support DMA at all, add a no_dma flag to make sure it nevers selects > the SE DMA mode with mappable 32bytes long transfers. > > Fixes: cacd9643eca7 ("i2c: qcom-geni: add support for I2C Master Hub variant") > Signed-off-by: Neil Armstrong <neil.armstrong(a)linaro.org> > Reviewed-by: Konrad Dybcio <konrad.dybcio(a)oss.qualcomm.com> > Reviewed-by: Mukesh Kumar Savaliya <mukesh.savaliya(a)oss.qualcomm.com>> Applied to for-current, thanks!

4 weeks, 1 day

Re: [PATCH 10/10] drm/sched: use inline locks for the drm-sched-fence v2

by Christian König

On 1/13/26 17:12, Philipp Stanner wrote: > On Tue, 2026-01-13 at 16:16 +0100, Christian König wrote: >> Using the inline lock is now the recommended way for dma_fence implementations. >> >> For the scheduler fence use the inline lock for the scheduled fence part >> and then the lock from the scheduled fence as external lock for the finished fence. >> >> This way there is no functional difference, except for saving the space >> for the separate lock. >> >> v2: re-work the patch to avoid any functional difference > > *cough cough* > >> >> Signed-off-by: Christian König <christian.koenig(a)amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_fence.c | 6 +++--- >> include/drm/gpu_scheduler.h | 4 ---- >> 2 files changed, 3 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c >> index 724d77694246..112677231f9a 100644 >> --- a/drivers/gpu/drm/scheduler/sched_fence.c >> +++ b/drivers/gpu/drm/scheduler/sched_fence.c >> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity, >> >> fence->owner = owner; >> fence->drm_client_id = drm_client_id; >> - spin_lock_init(&fence->lock); >> >> return fence; >> } >> @@ -230,9 +229,10 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, >> fence->sched = entity->rq->sched; >> seq = atomic_inc_return(&entity->fence_seq); >> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, >> - &fence->lock, entity->fence_context, seq); >> + NULL, entity->fence_context, seq); >> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished, >> - &fence->lock, entity->fence_context + 1, seq); >> + dma_fence_spinlock(&fence->scheduled), > > I think while you are correct that this is no functional difference, it > is still a bad idea which violates the entire idea of your series: > > All fences are now independent from each other and the fence context – > except for those two. > > Some fences are more equal than others ;) Yeah, I was going back and forth once more if I should keep this patch at all or just drop it. > By implementing this, you would also show to people browsing the code > that it can be a good idea or can be done to have fences share locks. > Do you want that? Good question. For almost all cases we don't want this, but once more the scheduler is special. In the scheduler we have two fences in one, the scheduled one and the finished one. So here it technically makes sense to have this construct to be defensive. But on the other hand it has no practical value because it still doesn't allow us to unload the scheduler module. We would need a much wider rework for being able to do that. So maybe I should just really drop this patch or at least keep it back until we had time to figure out what the next steps are. > As far as I have learned from you and our discussions, that would be a > very bombastic violation of the sacred "dma-fence-rules". Well using the inline fence is "only" a strong recommendation. It's not as heavy as the signaling rules because when you mess up those you can easily kill the whole system. > I believe it's definitely worth sacrificing some bytes so that those > two fences get fully decoupled. Who will have it on their radar that > they are special? Think about future reworks. This doesn't even save any bytes, my thinking was more that this is the more defensive approach should anybody use the spinlock pointer from the scheduler fence to do some locking. > Besides that, no objections from my side. Thanks, Christian. > > > P. > >> + entity->fence_context + 1, seq); >> } >> >> module_init(drm_sched_fence_slab_init); >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index 78e07c2507c7..ad3704685163 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -297,10 +297,6 @@ struct drm_sched_fence { >> * belongs to. >> */ >> struct drm_gpu_scheduler *sched; >> - /** >> - * @lock: the lock used by the scheduled and the finished fences. >> - */ >> - spinlock_t lock; >> /** >> * @owner: job owner for debugging >> */ >

4 weeks, 1 day

Re: [PATCH v2 2/2] dma-buf: system_heap: account for system heap allocation in memcg

by Christian König

On 1/13/26 22:32, Eric Chanudet wrote: > The system dma-buf heap lets userspace allocate buffers from the page > allocator. However, these allocations are not accounted for in memcg, > allowing processes to escape limits that may be configured. > > Pass __GFP_ACCOUNT for system heap allocations, based on the > dma_heap.mem_accounting parameter, to use memcg and account for them. > > Signed-off-by: Eric Chanudet <echanude(a)redhat.com> > --- > drivers/dma-buf/heaps/system_heap.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c > index 4c782fe33fd497a74eb5065797259576f9b651b6..139b50df64ed4c4a6fdd69f25fe48324fbe2c481 100644 > --- a/drivers/dma-buf/heaps/system_heap.c > +++ b/drivers/dma-buf/heaps/system_heap.c > @@ -52,6 +52,8 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP}; > static const unsigned int orders[] = {8, 4, 0}; > #define NUM_ORDERS ARRAY_SIZE(orders) > > +extern bool mem_accounting; Please define that in some header. Apart from that looks good technically. But after the discussion it sounds more and more like we don't want to account device driver allocated memory in memcg at all. Regards, Christian. > + > static int dup_sg_table(struct sg_table *from, struct sg_table *to) > { > struct scatterlist *sg, *new_sg; > @@ -320,14 +322,17 @@ static struct page *alloc_largest_available(unsigned long size, > { > struct page *page; > int i; > + gfp_t flags; > > for (i = 0; i < NUM_ORDERS; i++) { > if (size < (PAGE_SIZE << orders[i])) > continue; > if (max_order < orders[i]) > continue; > - > - page = alloc_pages(order_flags[i], orders[i]); > + flags = order_flags[i]; > + if (mem_accounting) > + flags |= __GFP_ACCOUNT; > + page = alloc_pages(flags, orders[i]); > if (!page) > continue; > return page; >

4 weeks, 1 day

Independence for dma_fences! v5

by Christian König

Hi everyone, dma_fences have ever lived under the tyranny dictated by the module lifetime of their issuer, leading to crashes should anybody still holding a reference to a dma_fence when the module of the issuer was unloaded. The basic problem is that when buffer are shared between drivers dma_fence objects can leak into external drivers and stay there even after they are signaled. The dma_resv object for example only lazy releases dma_fences. So what happens is that when the module who originally created the dma_fence unloads the dma_fence_ops function table becomes unavailable as well and so any attempt to release the fence crashes the system. Previously various approaches have been discussed, including changing the locking semantics of the dma_fence callbacks (by me) as well as using the drm scheduler as intermediate layer (by Sima) to disconnect dma_fences from their actual users, but none of them are actually solving all problems. Tvrtko did some really nice prerequisite work by protecting the returned strings of the dma_fence_ops by RCU. This way dma_fence creators where able to just wait for an RCU grace period after fence signaling before they could be save to free those data structures. Now this patch set here goes a step further and protects the whole dma_fence_ops structure by RCU, so that after the fence signals the pointer to the dma_fence_ops is set to NULL when there is no wait nor release callback given. All functionality which use the dma_fence_ops reference are put inside an RCU critical section, except for the deprecated issuer specific wait and of course the optional release callback. Additional to the RCU changes the lock protecting the dma_fence state previously had to be allocated external. This set here now changes the functionality to make that external lock optional and allows dma_fences to use an inline lock and be self contained. v4: Rebases the whole set on upstream changes, especially the cleanup from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!". Adding two patches which brings the DMA-fence self tests up to date. The first selftest changes removes the mock_wait and so actually starts testing the default behavior instead of some hacky implementation in the test. This one got upstreamed independent of this set. The second drops the mock_fence as well and tests the new RCU and inline spinlock functionality. v5: Rebase on top of drm-misc-next instead of drm-tip, leave out all driver changes for now since those should go through the driver specific paths anyway. Address a few more review comments, especially some rebase mess and typos. And finally fix one more bug found by AMDs CI system. Especially the first patch still needs a Reviewed-by, apart from that I think I've addressed all review comments and problems. Please review and comment, Christian.

4 weeks, 1 day

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig January 2026