Linaro-mm-sig July 2025

linaro-mm-sig@lists.linaro.org

25 participants
79 discussions

Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions

by Leon Romanovsky

On Fri, Jul 25, 2025 at 05:34:40AM +0000, Kasireddy, Vivek wrote: > Hi Leon, > > > Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO > > regions > > > > > > > > > > From: Leon Romanovsky <leonro(a)nvidia.com> > > > > > > > > Add support for exporting PCI device MMIO regions through dma-buf, > > > > enabling safe sharing of non-struct page memory with controlled > > > > lifetime management. This allows RDMA and other subsystems to > > import > > > > dma-buf FDs and build them into memory regions for PCI P2P > > operations. > > > > > > > > The implementation provides a revocable attachment mechanism using > > > > dma-buf move operations. MMIO regions are normally pinned as BARs > > > > don't change physical addresses, but access is revoked when the VFIO > > > > device is closed or a PCI reset is issued. This ensures kernel > > > > self-defense against potentially hostile userspace. > > > > > > > > Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com> > > > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com> > > > > Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> > > > > --- > > > > drivers/vfio/pci/Kconfig | 20 ++ > > > > drivers/vfio/pci/Makefile | 2 + > > > > drivers/vfio/pci/vfio_pci_config.c | 22 +- > > > > drivers/vfio/pci/vfio_pci_core.c | 25 ++- > > > > drivers/vfio/pci/vfio_pci_dmabuf.c | 321 > > +++++++++++++++++++++++++++++ > > > > drivers/vfio/pci/vfio_pci_priv.h | 23 +++ > > > > include/linux/dma-buf.h | 1 + > > > > include/linux/vfio_pci_core.h | 3 + > > > > include/uapi/linux/vfio.h | 19 ++ > > > > 9 files changed, 431 insertions(+), 5 deletions(-) > > > > create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c > > > > <...> > > > > > > +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev, > > > > + struct vfio_device_feature_dma_buf > > *dma_buf) > > > > +{ > > > > + struct pci_dev *pdev = vdev->pdev; > > > > + u32 bar = dma_buf->region_index; > > > > + u64 offset = dma_buf->offset; > > > > + u64 len = dma_buf->length; > > > > + resource_size_t bar_size; > > > > + u64 sum; > > > > + > > > > + /* > > > > + * For PCI the region_index is the BAR number like everything else. > > > > + */ > > > > + if (bar >= VFIO_PCI_ROM_REGION_INDEX) > > > > + return -ENODEV; > > > > <...> > > > > > > +/** > > > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the > > > > + * regions selected. > > > > + * > > > > + * open_flags are the typical flags passed to open(2), eg O_RDWR, > > > > O_CLOEXEC, > > > > + * etc. offset/length specify a slice of the region to create the dmabuf > > from. > > > > + * nr_ranges is the total number of (P2P DMA) ranges that comprise the > > > > dmabuf. > > > Any particular reason why you dropped the option (nr_ranges) of creating > > a > > > single dmabuf from multiple ranges of an MMIO region? > > > > I did it for two reasons. First, I wanted to simplify the code in order > > to speed-up discussion over the patchset itself. Second, I failed to > > find justification for need of multiple ranges, as the number of BARs > > are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality > > can be achieved by multiple calls to DMABUF import. > I don't think the same functionality can be achieved by multiple calls to > dmabuf import. AFAIU, a dmabuf (as of today) is backed by a SGL that can > have multiple entries because it represents a scattered buffer (multiple > non-contiguous entries in System RAM or an MMIO region). I don't know all the reasons why SG was chosen, but one of the main reasons is that DMA SG API was the only one possible way to handle p2p transfers (peer2peer flag). > But in this patch you are constraining it such that only one entry associated with a > buffer would be included, which effectively means that we cannot create > a dmabuf to represent scattered buffers (located in a single MMIO region > such as VRAM or other device memory) anymore. Yes > > > > > > > > > Restricting the dmabuf to a single range (or having to create multiple > > dmabufs > > > to represent multiple regions/ranges associated with a single scattered > > buffer) > > > would be very limiting and may not work in all cases. For instance, in my > > use-case, > > > I am trying to share a large (4k mode) framebuffer (FB) located in GPU's > > VRAM > > > between two (p2p compatible) GPU devices. And, this would probably not > > work > > > given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may > > not be > > > feasible when there is memory pressure. > > > > Can you please help me and point to the place in the code where this can > > fail? > > I'm probably missing something basic as there are no large allocations > > in the current patchset. > Sorry, I was not very clear. What I meant is that it is not prudent to assume that > there will only be one range associated with an MMIO region which we need to > consider while creating a dmabuf. And, I was pointing out my use-case as an > example where vfio-pci needs to create a dmabuf for a large buffer (FB) that > would likely be scattered (and not contiguous) in an MMIO region (such as VRAM). > > Let me further explain with my use-case. Here is a link to my Qemu-based test: > https://gitlab.freedesktop.org/Vivek/qemu/-/commit/b2bdb16d9cfaf55384c95b1f… Ohh, thanks. I'll add nr_ranges in next version. I see that you are using same region_index for all ranges and this is how I would like to keep it: "multiple nr_ranges, same region_index". Thanks

3 months, 4 weeks

Re: [PATCH 05/10] PCI/P2PDMA: Export pci_p2pdma_map_type() function

by Leon Romanovsky

On Fri, Jul 25, 2025 at 01:12:35PM -0600, Logan Gunthorpe wrote: > > > On 2025-07-25 12:54, Leon Romanovsky wrote: > >> The solution that would make more sense to me would be for either > >> dma_iova_try_alloc() or another helper in dma-iommu.c to handle the > >> P2PDMA case. dma-iommu.c already uses those same interfaces and thus > >> there would be no need to export the low level helpers from the p2pdma code. > > > > I had same idea in early versions of DMA phys API discussion and it was > > pointed (absolutely right) that this is layering violation. > > Respectfully, I have to disagree with this. Having the layer (ie. > dma-iommu) that normally checks how to handle a P2PDMA request now check > how to handle these DMA requests is the exact opposite of a layering > violation. I'm aware of your implementation and have feeling that it was very influenced by NVMe requirements, so the end result is very tailored for it. Other users have very different paths if p2p is taken. Just see last VFIO patch in this series, it skips all DMA logic. > Expecting every driver that wants to do P2PDMA to have to > figure out for themselves how to map the memory before calling into the > DMA API doesn't seem like a good design choice to me. We had this discussion earlier too on previous versions. The summary is that p2p capable devices are very special anyway. They need to work with p2p natively. BTW, the implementation is not supposed to be in the drivers, but in their respective subsystems. > > > So unfortunately, I think that dma*.c|h is not right place for p2p > > type check. > > dma*.c is already where those checks are done. I'm not sure patches to > remove the code from that layer and put it into the NVMe driver would > make a lot of sense (and then, of course, we'd have to put it into every > other driver that wants to participate in p2p transactions). I don't have plans to remove existing checks right now, but NVMe was already converted to new DMA phys API. Thanks > > Logan > >

3 months, 4 weeks

Re: [PATCH 05/10] PCI/P2PDMA: Export pci_p2pdma_map_type() function

by Leon Romanovsky

On Fri, Jul 25, 2025 at 10:30:46AM -0600, Logan Gunthorpe wrote: > > > On 2025-07-24 02:13, Leon Romanovsky wrote: > > On Thu, Jul 24, 2025 at 10:03:13AM +0200, Christoph Hellwig wrote: > >> On Wed, Jul 23, 2025 at 04:00:06PM +0300, Leon Romanovsky wrote: > >>> From: Leon Romanovsky <leonro(a)nvidia.com> > >>> > >>> Export the pci_p2pdma_map_type() function to allow external modules > >>> and subsystems to determine the appropriate mapping type for P2PDMA > >>> transfers between a provider and target device. > >> > >> External modules have no business doing this. > > > > VFIO PCI code is built as module. There is no way to access PCI p2p code > > without exporting functions in it. > > The solution that would make more sense to me would be for either > dma_iova_try_alloc() or another helper in dma-iommu.c to handle the > P2PDMA case. dma-iommu.c already uses those same interfaces and thus > there would be no need to export the low level helpers from the p2pdma code. I had same idea in early versions of DMA phys API discussion and it was pointed (absolutely right) that this is layering violation. At that time, that remark wasn't such clear to me because HMM code performs check for p2p on every page and has call to dma_iova_try_alloc() before that check. But this VFIO DMABUF code shows it much more clearer. The p2p check is performed before any DMA calls and in case of PCI_P2PDMA_MAP_BUS_ADDR p2p type between DMABUF exporter device and DMABUF importer device, we don't call dma_iova_try_alloc() or any DMA API at all. So unfortunately, I think that dma*.c|h is not right place for p2p type check. Thanks > > Logan >

4 months

Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions

by Leon Romanovsky

On Thu, Jul 24, 2025 at 05:13:49AM +0000, Kasireddy, Vivek wrote: > Hi Leon, > > > Subject: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO > > regions > > > > From: Leon Romanovsky <leonro(a)nvidia.com> > > > > Add support for exporting PCI device MMIO regions through dma-buf, > > enabling safe sharing of non-struct page memory with controlled > > lifetime management. This allows RDMA and other subsystems to import > > dma-buf FDs and build them into memory regions for PCI P2P operations. > > > > The implementation provides a revocable attachment mechanism using > > dma-buf move operations. MMIO regions are normally pinned as BARs > > don't change physical addresses, but access is revoked when the VFIO > > device is closed or a PCI reset is issued. This ensures kernel > > self-defense against potentially hostile userspace. > > > > Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com> > > Signed-off-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com> > > Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> > > --- > > drivers/vfio/pci/Kconfig | 20 ++ > > drivers/vfio/pci/Makefile | 2 + > > drivers/vfio/pci/vfio_pci_config.c | 22 +- > > drivers/vfio/pci/vfio_pci_core.c | 25 ++- > > drivers/vfio/pci/vfio_pci_dmabuf.c | 321 +++++++++++++++++++++++++++++ > > drivers/vfio/pci/vfio_pci_priv.h | 23 +++ > > include/linux/dma-buf.h | 1 + > > include/linux/vfio_pci_core.h | 3 + > > include/uapi/linux/vfio.h | 19 ++ > > 9 files changed, 431 insertions(+), 5 deletions(-) > > create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c <...> > > +static int validate_dmabuf_input(struct vfio_pci_core_device *vdev, > > + struct vfio_device_feature_dma_buf *dma_buf) > > +{ > > + struct pci_dev *pdev = vdev->pdev; > > + u32 bar = dma_buf->region_index; > > + u64 offset = dma_buf->offset; > > + u64 len = dma_buf->length; > > + resource_size_t bar_size; > > + u64 sum; > > + > > + /* > > + * For PCI the region_index is the BAR number like everything else. > > + */ > > + if (bar >= VFIO_PCI_ROM_REGION_INDEX) > > + return -ENODEV; <...> > > +/** > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the > > + * regions selected. > > + * > > + * open_flags are the typical flags passed to open(2), eg O_RDWR, > > O_CLOEXEC, > > + * etc. offset/length specify a slice of the region to create the dmabuf from. > > + * nr_ranges is the total number of (P2P DMA) ranges that comprise the > > dmabuf. > Any particular reason why you dropped the option (nr_ranges) of creating a > single dmabuf from multiple ranges of an MMIO region? I did it for two reasons. First, I wanted to simplify the code in order to speed-up discussion over the patchset itself. Second, I failed to find justification for need of multiple ranges, as the number of BARs are limited by VFIO_PCI_ROM_REGION_INDEX (6) and same functionality can be achieved by multiple calls to DMABUF import. > > Restricting the dmabuf to a single range (or having to create multiple dmabufs > to represent multiple regions/ranges associated with a single scattered buffer) > would be very limiting and may not work in all cases. For instance, in my use-case, > I am trying to share a large (4k mode) framebuffer (FB) located in GPU's VRAM > between two (p2p compatible) GPU devices. And, this would probably not work > given that allocating a large contiguous FB (nr_ranges = 1) in VRAM may not be > feasible when there is memory pressure. Can you please help me and point to the place in the code where this can fail? I'm probably missing something basic as there are no large allocations in the current patchset. > > Furthermore, since you are adding a new UAPI with this patch/feature, as you know, > we cannot go back and tweak it (to add support for nr_ranges > 1) should there > be a need in the future, but you can always use nr_ranges = 1 anytime. Therefore, > I think it makes sense to be flexible in terms of the number of ranges to include > while creating a dmabuf instead of restricting ourselves to one range. I'm not a big fan of over-engineering. Let's first understand if this case is needed. Thanks > > Thanks, > Vivek > > > + * > > + * Return: The fd number on success, -1 and errno is set on failure. > > + */ > > +#define VFIO_DEVICE_FEATURE_DMA_BUF 11 > > + > > +struct vfio_device_feature_dma_buf { > > + __u32 region_index; > > + __u32 open_flags; > > + __u64 offset; > > + __u64 length; > > +}; > > + > > /* -------- API for Type1 VFIO IOMMU -------- */ > > > > /** > > -- > > 2.50.1 >

4 months

Re: [PATCH v6 2/2] i2c: i2c-qcom-geni: Add Block event interrupt support

by Vinod Koul

On 22-07-25, 15:46, Dmitry Baryshkov wrote: > On Tue, Jul 22, 2025 at 05:50:08PM +0530, Jyothi Kumar Seerapu wrote: > > On 7/19/2025 3:27 PM, Dmitry Baryshkov wrote: > > > On Mon, Jul 07, 2025 at 09:58:30PM +0530, Jyothi Kumar Seerapu wrote: > > > > On 7/4/2025 1:11 AM, Dmitry Baryshkov wrote: > > > > > On Thu, 3 Jul 2025 at 15:51, Jyothi Kumar Seerapu [Folks, would be nice to trim replies] > > > > Could you please confirm if can go with the similar approach of unmap the > > > > processed TREs based on a fixed threshold or constant value, instead of > > > > unmapping them all at once? > > > > > > I'd still say, that's a bad idea. Please stay within the boundaries of > > > the DMA API. > > > > > I agree with the approach you suggested—it's the GPI's responsibility to > > manage the available TREs. > > > > However, I'm curious whether can we set a dynamic watermark value perhaps > > half the available TREs) to trigger unmapping of processed TREs ? This would > > allow the software to prepare the next set of TREs while the hardware > > continues processing the remaining ones, enabling better parallelism and > > throughput. > > Let's land the simple implementation first, which can then be improved. > However I don't see any way to return 'above the watermark' from the DMA > controller. You might need to enhance the API. Traditionally, we set the dma transfers for watermark level and we get a interrupt. So you might want to set the callback for watermark level and then do mapping/unmapping etc in the callback. This is typical model for dmaengines, we should follow that well BR -- ~Vinod

4 months

[PATCH RFC 0/2] accel: Add Arm Ethos-U NPU

by Rob Herring (Arm)

The Arm Ethos-U65/85 NPUs are designed for edge AI inference applications[0]. The driver works with Mesa Teflon. WIP support is available here[1]. The UAPI should also be compatible with the downstream driver stack[2] and Vela compiler though that has not been implemented. Testing so far has been on i.MX93 boards with Ethos-U65. Support for U85 is still todo. Only minor changes on driver side will be needed for U85 support. A git tree is here[3]. Rob [0] https://www.arm.com/products/silicon-ip-cpu?families=ethos%20npus [1] https://gitlab.freedesktop.org/tomeu/mesa.git ethos [2] https://gitlab.arm.com/artificial-intelligence/ethos-u/ [3] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git ethos Signed-off-by: Rob Herring (Arm) <robh(a)kernel.org> --- Rob Herring (Arm) (2): dt-bindings: npu: Add Arm Ethos-U65/U85 accel: Add Arm Ethos-U NPU driver .../devicetree/bindings/npu/arm,ethos.yaml | 79 +++ MAINTAINERS | 9 + drivers/accel/Kconfig | 1 + drivers/accel/Makefile | 1 + drivers/accel/ethos/Kconfig | 10 + drivers/accel/ethos/Makefile | 4 + drivers/accel/ethos/ethos_device.h | 186 ++++++ drivers/accel/ethos/ethos_drv.c | 412 ++++++++++++ drivers/accel/ethos/ethos_drv.h | 15 + drivers/accel/ethos/ethos_gem.c | 707 +++++++++++++++++++++ drivers/accel/ethos/ethos_gem.h | 46 ++ drivers/accel/ethos/ethos_job.c | 527 +++++++++++++++ drivers/accel/ethos/ethos_job.h | 41 ++ include/uapi/drm/ethos_accel.h | 262 ++++++++ 14 files changed, 2300 insertions(+) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250715-ethos-3fdd39ef6f19 Best regards, -- Rob Herring (Arm) <robh(a)kernel.org>

4 months

Re: [PATCH v9 00/10] New DRM accel driver for Rockchip's RKNN NPU

by Heiko Stübner

Hi Jeff, Am Montag, 21. Juli 2025, 16:55:01 Mitteleuropäische Sommerzeit schrieb Jeff Hugo: > On 7/21/2025 3:17 AM, Tomeu Vizoso wrote: > > This series adds a new driver for the NPU that Rockchip includes in its > > newer SoCs, developed by them on the NVDLA base. > > > > In its current form, it supports the specific NPU in the RK3588 SoC. > > > > The userspace driver is part of Mesa and an initial draft can be found at: > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698 > > > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > > This (and the userspace component) appear ready for merge from what I > can tell. Tomeu is still working on his drm-misc access so I've offered > to merge on his behalf. Planning on waiting until Friday for any final > feedback to come in before doing so. sounds great. Just to make sure, you're planning to merge patches 1-6 (driver + binding) into drm-misc and I'll pick up the "arm64: dts: " patches 7-10 afterwards? Heiko

4 months

Re: [PATCH v9 06/10] dt-bindings: npu: rockchip,rknn: Add bindings

by Rob Herring (Arm)

On Mon, 21 Jul 2025 11:17:33 +0200, Tomeu Vizoso wrote: > Add the bindings for the Neural Processing Unit IP from Rockchip. > > v2: > - Adapt to new node structure (one node per core, each with its own > IOMMU) > - Several misc. fixes from Sebastian Reichel > > v3: > - Split register block in its constituent subblocks, and only require > the ones that the kernel would ever use (Nicolas Frattaroli) > - Group supplies (Rob Herring) > - Explain the way in which the top core is special (Rob Herring) > > v4: > - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski) > - Remove unneeded items: (Krzysztof Kozlowski) > - Fix use of minItems/maxItems (Krzysztof Kozlowski) > - Add reg-names to list of required properties (Krzysztof Kozlowski) > - Fix example (Krzysztof Kozlowski) > > v5: > - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski) > - Streamline compatible property (Krzysztof Kozlowski) > > v6: > - Remove mention to NVDLA, as the hardware is only incidentally related > (Kever Yang) > - Mark pclk and npu clocks as required by all clocks (Rob Herring) > > v7: > - Remove allOf section, not needed now that all nodes require 4 clocks > (Heiko Stübner) > > v8: > - Remove notion of top core (Robin Murphy) > > Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com> > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> > Tested-by: Heiko Stuebner <heiko(a)sntech.de> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 112 +++++++++++++++++++++ > 1 file changed, 112 insertions(+) > Reviewed-by: Rob Herring (Arm) <robh(a)kernel.org>

4 months

Re: [PATCH v8 08/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base

by Heiko Stübner

Am Sonntag, 13. Juli 2025, 10:38:58 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso: > See Chapter 36 "RKNN" from the RK3588 TRM (Part 1). > > The IP is divided in three cores, programmed independently. The first > core though is special, being able to delegate work to the other cores. > > The IOMMU of the first core is also special in that it has two subunits > (read/write?) that need to be programmed in sync. > > v2: > - Have one device for each NPU core (Sebastian Reichel) > - Have one device for each IOMMU (Sebastian Reichel) > - Correctly sort nodes (Diederik de Haas) > - Add rockchip,iommu compatible to IOMMU nodes (Sebastian Reichel) > > v3: > - Adapt to a split of the register block in the DT bindings (Nicolas > Frattaroli) > > v4: > - Adapt to changes in bindings > > v6: > - pclk and npu clocks are needed by all clocks (Rob Herring) > > v8: > - Remove notion of top core (Robin Murphy) > > Tested-by: Heiko Stuebner <heiko(a)sntech.de> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 91 +++++++++++++++++++++++++++ > 1 file changed, 91 insertions(+) > > diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi > index 1eddc69fd9c9ed95cdc810ba48d9683e3f82489a..dbd472feaa8b3f8c14597a48a4f5afbe3cb45b6a 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi > @@ -1140,6 +1140,97 @@ power-domain@RK3588_PD_SDMMC { > }; > }; > > + rknn_core_0: npu@fdab0000 { > + compatible = "rockchip,rk3588-rknn-core"; > + reg = <0x0 0xfdab0000 0x0 0x1000>, > + <0x0 0xfdab1000 0x0 0x1000>, > + <0x0 0xfdab3000 0x0 0x1000>; > + reg-names = "pc", "cna", "core"; > + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>; > + clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>, > + <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>; > + clock-names = "aclk", "hclk", "npu", "pclk"; > + assigned-clocks = <&scmi_clk SCMI_CLK_NPU>; > + assigned-clock-rates = <200000000>; > + resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>; > + reset-names = "srst_a", "srst_h"; > + power-domains = <&power RK3588_PD_NPUTOP>; > + iommus = <&rknn_mmu_top>; > + status = "disabled"; > + }; > + > + rknn_mmu_top: iommu@fdab9000 { nit: phandle for the mmu should probably also follow the naming change? I.e. become rknn_mmu_0 ? > + compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu"; > + reg = <0x0 0xfdab9000 0x0 0x100>, > + <0x0 0xfdaba000 0x0 0x100>; > + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>; > + clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>; > + clock-names = "aclk", "iface"; > + #iommu-cells = <0>; > + power-domains = <&power RK3588_PD_NPUTOP>; > + status = "disabled"; > + };

4 months, 1 week

Re: [PATCH 6.15 085/192] drm/gem: Acquire references on GEM handles for framebuffers

by Christian König

On 15.07.25 15:56, Simona Vetter wrote: > On Tue, Jul 15, 2025 at 03:43:08PM +0200, Christian König wrote: >> We are about to revert this patch. Not sure if backporting it makes sense at the moment. > > I think it still makes sense, at least as an interim fix. > > What I discussed with Thomas is that first we want to revert back from > gem_bo->dma_buf to gem_bo->import_attach.dmabuf everywhere. And those > patches are sprinkled over various branches/trees/releases. So it'll take > a while to catch them all. Meanwhile these two patches at least take care > of the worst fallout, they're already tested and in the pipeline - I don't > think it makes sense to hold them up and wait. > > It might take us until 6.17-rc1 until we've caught all the trees and made > sure the backports of those reverts have happened, and I don't think we > should wait that long. Works for me. I just wanted to point out that the patch backported might be reverted again. Christian. > > Cheers, Sima > >> Regards, >> Christian. >> >> On 15.07.25 15:13, Greg Kroah-Hartman wrote: >>> 6.15-stable review patch. If anyone has any objections, please let me know. >>> >>> ------------------ >>> >>> From: Thomas Zimmermann <tzimmermann(a)suse.de> >>> >>> commit 5307dce878d4126e1b375587318955bd019c3741 upstream. >>> >>> A GEM handle can be released while the GEM buffer object is attached >>> to a DRM framebuffer. This leads to the release of the dma-buf backing >>> the buffer object, if any. [1] Trying to use the framebuffer in further >>> mode-setting operations leads to a segmentation fault. Most easily >>> happens with driver that use shadow planes for vmap-ing the dma-buf >>> during a page flip. An example is shown below. >>> >>> [ 156.791968] ------------[ cut here ]------------ >>> [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 >>> [...] >>> [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 >>> [ 157.043420] Call Trace: >>> [ 157.045898] <TASK> >>> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 >>> [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 >>> [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 >>> [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 >>> [ 157.065567] ? dma_buf_vmap+0x224/0x430 >>> [ 157.069446] ? __warn.cold+0x58/0xe4 >>> [ 157.073061] ? dma_buf_vmap+0x224/0x430 >>> [ 157.077111] ? report_bug+0x1dd/0x390 >>> [ 157.080842] ? handle_bug+0x5e/0xa0 >>> [ 157.084389] ? exc_invalid_op+0x14/0x50 >>> [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 >>> [ 157.092548] ? dma_buf_vmap+0x224/0x430 >>> [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 >>> [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 >>> [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 >>> [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 >>> [ 157.114866] drm_gem_vmap+0xa9/0x1b0 >>> [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 >>> [ 157.123086] drm_gem_fb_vmap+0xab/0x300 >>> [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 >>> [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 >>> [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 >>> [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 >>> [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 >>> [...] >>> [ 157.346424] ---[ end trace 0000000000000000 ]--- >>> >>> Acquiring GEM handles for the framebuffer's GEM buffer objects prevents >>> this from happening. The framebuffer's cleanup later puts the handle >>> references. >>> >>> Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object >>> instance") triggers the segmentation fault easily by using the dma-buf >>> field more widely. The underlying issue with reference counting has >>> been present before. >>> >>> v2: >>> - acquire the handle instead of the BO (Christian) >>> - fix comment style (Christian) >>> - drop the Fixes tag (Christian) >>> - rename err_ gotos >>> - add missing Link tag >>> >>> Suggested-by: Christian König <christian.koenig(a)amd.com> >>> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> >>> Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] >>> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> >>> Cc: Anusha Srivatsa <asrivats(a)redhat.com> >>> Cc: Christian König <christian.koenig(a)amd.com> >>> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> >>> Cc: Maxime Ripard <mripard(a)kernel.org> >>> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> >>> Cc: "Christian König" <christian.koenig(a)amd.com> >>> Cc: linux-media(a)vger.kernel.org >>> Cc: dri-devel(a)lists.freedesktop.org >>> Cc: linaro-mm-sig(a)lists.linaro.org >>> Cc: <stable(a)vger.kernel.org> >>> Reviewed-by: Christian König <christian.koenig(a)amd.com> >>> Link: https://lore.kernel.org/r/20250630084001.293053-1-tzimmermann@suse.de >>> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> >>> --- >>> drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++++++++--- >>> drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++++---- >>> drivers/gpu/drm/drm_internal.h | 2 + >>> 3 files changed, 51 insertions(+), 11 deletions(-) >>> >>> --- a/drivers/gpu/drm/drm_gem.c >>> +++ b/drivers/gpu/drm/drm_gem.c >>> @@ -212,6 +212,35 @@ void drm_gem_private_object_fini(struct >>> } >>> EXPORT_SYMBOL(drm_gem_private_object_fini); >>> >>> +static void drm_gem_object_handle_get(struct drm_gem_object *obj) >>> +{ >>> + struct drm_device *dev = obj->dev; >>> + >>> + drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock)); >>> + >>> + if (obj->handle_count++ == 0) >>> + drm_gem_object_get(obj); >>> +} >>> + >>> +/** >>> + * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles >>> + * @obj: GEM object >>> + * >>> + * Acquires a reference on the GEM buffer object's handle. Required >>> + * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked() >>> + * to release the reference. >>> + */ >>> +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) >>> +{ >>> + struct drm_device *dev = obj->dev; >>> + >>> + guard(mutex)(&dev->object_name_lock); >>> + >>> + drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */ >>> + drm_gem_object_handle_get(obj); >>> +} >>> +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked); >>> + >>> /** >>> * drm_gem_object_handle_free - release resources bound to userspace handles >>> * @obj: GEM object to clean up. >>> @@ -242,8 +271,14 @@ static void drm_gem_object_exported_dma_ >>> } >>> } >>> >>> -static void >>> -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) >>> +/** >>> + * drm_gem_object_handle_put_unlocked - releases reference on user-space handles >>> + * @obj: GEM object >>> + * >>> + * Releases a reference on the GEM buffer object's handle. Possibly releases >>> + * the GEM buffer object and associated dma-buf objects. >>> + */ >>> +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) >>> { >>> struct drm_device *dev = obj->dev; >>> bool final = false; >>> @@ -268,6 +303,7 @@ drm_gem_object_handle_put_unlocked(struc >>> if (final) >>> drm_gem_object_put(obj); >>> } >>> +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); >>> >>> /* >>> * Called at device or object close to release the file's >>> @@ -389,8 +425,8 @@ drm_gem_handle_create_tail(struct drm_fi >>> int ret; >>> >>> WARN_ON(!mutex_is_locked(&dev->object_name_lock)); >>> - if (obj->handle_count++ == 0) >>> - drm_gem_object_get(obj); >>> + >>> + drm_gem_object_handle_get(obj); >>> >>> /* >>> * Get the user-visible handle using idr. Preload and perform >>> --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c >>> +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c >>> @@ -99,7 +99,7 @@ void drm_gem_fb_destroy(struct drm_frame >>> unsigned int i; >>> >>> for (i = 0; i < fb->format->num_planes; i++) >>> - drm_gem_object_put(fb->obj[i]); >>> + drm_gem_object_handle_put_unlocked(fb->obj[i]); >>> >>> drm_framebuffer_cleanup(fb); >>> kfree(fb); >>> @@ -182,8 +182,10 @@ int drm_gem_fb_init_with_funcs(struct dr >>> if (!objs[i]) { >>> drm_dbg_kms(dev, "Failed to lookup GEM object\n"); >>> ret = -ENOENT; >>> - goto err_gem_object_put; >>> + goto err_gem_object_handle_put_unlocked; >>> } >>> + drm_gem_object_handle_get_unlocked(objs[i]); >>> + drm_gem_object_put(objs[i]); >>> >>> min_size = (height - 1) * mode_cmd->pitches[i] >>> + drm_format_info_min_pitch(info, i, width) >>> @@ -193,22 +195,22 @@ int drm_gem_fb_init_with_funcs(struct dr >>> drm_dbg_kms(dev, >>> "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", >>> objs[i]->size, min_size, i); >>> - drm_gem_object_put(objs[i]); >>> + drm_gem_object_handle_put_unlocked(objs[i]); >>> ret = -EINVAL; >>> - goto err_gem_object_put; >>> + goto err_gem_object_handle_put_unlocked; >>> } >>> } >>> >>> ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); >>> if (ret) >>> - goto err_gem_object_put; >>> + goto err_gem_object_handle_put_unlocked; >>> >>> return 0; >>> >>> -err_gem_object_put: >>> +err_gem_object_handle_put_unlocked: >>> while (i > 0) { >>> --i; >>> - drm_gem_object_put(objs[i]); >>> + drm_gem_object_handle_put_unlocked(objs[i]); >>> } >>> return ret; >>> } >>> --- a/drivers/gpu/drm/drm_internal.h >>> +++ b/drivers/gpu/drm/drm_internal.h >>> @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_de >>> >>> /* drm_gem.c */ >>> int drm_gem_init(struct drm_device *dev); >>> +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); >>> +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); >>> int drm_gem_handle_create_tail(struct drm_file *file_priv, >>> struct drm_gem_object *obj, >>> u32 *handlep); >>> >>> >> >

4 months, 1 week

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig July 2025