Linaro-mm-sig January 2025

linaro-mm-sig@lists.linaro.org

13 participants
40 discussions

Re: [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device

by Jason Gunthorpe

On Mon, Jan 20, 2025 at 08:45:51PM +1100, Alexey Kardashevskiy wrote: > > For CC I'm expecting the KVM fd to be the handle for the cVM, so any > > RPCs that want to call into the secure world need the KVM FD to get > > the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI > > information and the cVM's handle. > > And keep KVM fd open until unbind? Or just for the short time to call the > PSP? iommufd will keep the KVM fd alive so long as the vIOMMU object exists. Other uses for kvm require it to work like this. > > But it also seems to me that VFIO should be able to support putting > > the device into the RUN state without involving KVM or cVMs. > > AMD's TDI bind handler in the PSP wants a guest handle ("GCTX") and a guest > device BDFn, and VFIO has no desire to dive into this KVM business beyond > IOMMUFD. As in my other email, VFIO is not restricted to running VMs, useful things should be available to apps like DPDK. There is a use case for using TDISP and getting devices up into an ecrypted/attested state on pure bare metal without any KVM, VFIO should work in that use case too. Jason

1 year

Re: [PATCH v7 0/7] Improve gpu_scheduler trace events + uAPI

by Christian König

I can't see patch #1 in my inbox for some reason, but I already know what it does from your repository. Feel free to add Reviewed-by: Christian König <christian.koenig(a)amd.com> to the entire series. Regards, Christian. Am 31.01.25 um 12:02 schrieb Pierre-Eric Pelloux-Prayer: > Hi, > > The initial goal of this series was to improve the drm and amdgpu > trace events to be able to expose more of the inner workings of > the scheduler and drivers to developers via tools. > > Then, the series evolved to become focused only on gpu_scheduler. > The changes around vblank events will be part of a different > series, as well as the amdgpu ones. > > Moreover Sima suggested to make some trace events stable uAPI, > so tools can rely on them long term. > > The first patches extend and cleanup the gpu scheduler events. > > The last one adds a documentation entry in drm-uapi.rst. > > Changes since v6: > * Addressed comments from Philipp, Tvrtko, Steven > * The commit storing drm_client_id in sched_fence adds an extra > field which looks like a duplicate of the owner field. Currently > amdgpu uses the owner field with some magic values, so we have to > have 2 separate fields for now, but ultimately one could be removed. > Similarly storing the drm_client_id in the sched_entity is not > really possible as there's nothing preventing a driver to use a > sched_entity with multiple drm_file instances. > > > Useful links: > - userspace tool using the updated events: > https://gitlab.freedesktop.org/tomstdenis/umr/-/merge_requests/37 > - v6: > https://lists.freedesktop.org/archives/dri-devel/2024-November/477644.html > > Pierre-Eric Pelloux-Prayer (7): > drm/debugfs: output client_id in in drm_clients_info > drm/sched: store the drm client_id in drm_sched_fence > drm/sched: add device name to the drm_sched_process_job event > drm/sched: cleanup gpu_scheduler trace events > drm/sched: trace dependencies for gpu jobs > drm/sched: add the drm_client_id to the drm_sched_run/exec_job events > drm/doc: document some tracepoints as uAPI > > Documentation/gpu/drm-uapi.rst | 19 +++ > drivers/accel/amdxdna/aie2_ctx.c | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 3 +- > drivers/gpu/drm/drm_debugfs.c | 10 +- > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 2 +- > drivers/gpu/drm/imagination/pvr_job.c | 2 +- > drivers/gpu/drm/imagination/pvr_queue.c | 5 +- > drivers/gpu/drm/imagination/pvr_queue.h | 2 +- > drivers/gpu/drm/lima/lima_gem.c | 2 +- > drivers/gpu/drm/lima/lima_sched.c | 6 +- > drivers/gpu/drm/lima/lima_sched.h | 3 +- > drivers/gpu/drm/msm/msm_gem_submit.c | 8 +- > drivers/gpu/drm/nouveau/nouveau_sched.c | 3 +- > drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- > drivers/gpu/drm/panthor/panthor_drv.c | 3 +- > drivers/gpu/drm/panthor/panthor_mmu.c | 2 +- > drivers/gpu/drm/panthor/panthor_sched.c | 5 +- > drivers/gpu/drm/panthor/panthor_sched.h | 3 +- > .../gpu/drm/scheduler/gpu_scheduler_trace.h | 123 ++++++++++++++---- > drivers/gpu/drm/scheduler/sched_entity.c | 8 +- > drivers/gpu/drm/scheduler/sched_fence.c | 4 +- > drivers/gpu/drm/scheduler/sched_main.c | 8 +- > drivers/gpu/drm/v3d/v3d_submit.c | 2 +- > drivers/gpu/drm/xe/xe_sched_job.c | 3 +- > include/drm/gpu_scheduler.h | 12 +- > 28 files changed, 192 insertions(+), 64 deletions(-) >

1 year, 2 months

Re: [RFC PATCH 1/5] dt-bindings: dma: Add CMA Heap bindings

by Rob Herring

On Thu, Jan 30, 2025 at 01:08:57PM +0000, Florent Tomasin wrote: > Introduce a CMA Heap dt-binding allowing custom > CMA heap registrations. > > * Note to the reviewers: > The patch was used for the development of the protected mode > feature in Panthor CSF kernel driver and is not initially thought > to land in the Linux kernel. It is mostly relevant if someone > wants to reproduce the environment of testing. Please, raise > interest if you think the patch has value in the Linux kernel. Why would panthor need CMA, it has an MMU. In any case, I agree with Maxime that this is redundant. Rob

1 year, 2 months

Re: [RFC PATCH 0/5] drm/panthor: Protected mode support for Mali CSF GPUs

by Maxime Ripard

Hi, I started to review it, but it's probably best to discuss it here. On Thu, Jan 30, 2025 at 01:08:56PM +0000, Florent Tomasin wrote: > Hi, > > This is a patch series covering the support for protected mode execution in > Mali Panthor CSF kernel driver. > > The Mali CSF GPUs come with the support for protected mode execution at the > HW level. This feature requires two main changes in the kernel driver: > > 1) Configure the GPU with a protected buffer. The system must provide a DMA > heap from which the driver can allocate a protected buffer. > It can be a carved-out memory or dynamically allocated protected memory region. > Some system includes a trusted FW which is in charge of the protected memory. > Since this problem is integration specific, the Mali Panthor CSF kernel > driver must import the protected memory from a device specific exporter. Why do you need a heap for it in the first place? My understanding of your series is that you have a carved out memory region somewhere, and you want to allocate from that carved out memory region your buffers. How is that any different from using a reserved-memory region, adding the reserved-memory property to the GPU device and doing all your allocation through the usual dma_alloc_* API? Or do you expect to have another dma-buf heap that would call into the firmware to perform the allocations? The semantics of the CMA heap allocations is a concern too. Another question is how would you expect something like a secure video-playback pipeline to operate with that kind of interface. Assuming you have a secure codec, you would likely get that protected buffer from the codec, right? So the most likely scenario would be to then import that dma-buf into the GPU driver, but not allocate the buffer from it. Overall, I think a better abstraction would be to have a heap indeed to allocate your protected buffers from, and then import them in the devices that need them. The responsibility would be on the userspace to do so, but it already kind of does with your design anyway. Maxime

1 year, 2 months

Re: [RFC PATCH 2/5] cma-heap: Allow registration of custom cma heaps

by Maxime Ripard

Hi, On Thu, Jan 30, 2025 at 01:08:58PM +0000, Florent Tomasin wrote: > This patch introduces a cma-heap probe function, allowing > users to register custom cma heaps in the device tree. > > A "memory-region" is bound to the cma heap at probe time > allowing allocation of DMA buffers from that heap. > > Use cases: > - registration of carved out secure heaps. Some devices > are implementing secure memory by reserving a specific > memory regions for that purpose. For example, this is the > case of platforms making use of early version of > ARM TrustZone. In such a case, the CMA heap would de-facto become un-mappable for userspace, right? > - registration of multiple memory regions at different > locations for efficiency or HW integration reasons. > For example, a peripheral may expect to share data at a > specific location in RAM. This information could have been > programmed by a FW prior to the kernel boot. How would you differentiate between them? Maxime

1 year, 2 months

Re: [RFC PATCH 1/5] dt-bindings: dma: Add CMA Heap bindings

by Maxime Ripard

Hi, On Thu, Jan 30, 2025 at 01:08:57PM +0000, Florent Tomasin wrote: > Introduce a CMA Heap dt-binding allowing custom > CMA heap registrations. > > * Note to the reviewers: > The patch was used for the development of the protected mode > feature in Panthor CSF kernel driver and is not initially thought > to land in the Linux kernel. It is mostly relevant if someone > wants to reproduce the environment of testing. Please, raise > interest if you think the patch has value in the Linux kernel. > > Signed-off-by: Florent Tomasin <florent.tomasin(a)arm.com> > --- > .../devicetree/bindings/dma/linux,cma.yml | 43 +++++++++++++++++++ > 1 file changed, 43 insertions(+) > create mode 100644 Documentation/devicetree/bindings/dma/linux,cma.yml > > diff --git a/Documentation/devicetree/bindings/dma/linux,cma.yml b/Documentation/devicetree/bindings/dma/linux,cma.yml > new file mode 100644 > index 000000000000..c532e016bbe5 > --- /dev/null > +++ b/Documentation/devicetree/bindings/dma/linux,cma.yml > @@ -0,0 +1,43 @@ > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/dma/linux-cma.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Custom Linux CMA heap > + > +description: > + The custom Linux CMA heap device tree node allows registering > + of multiple CMA heaps. > + > + The CMA heap name will match the node name of the "memory-region". > + > +properties: > + compatible: > + enum: > + - linux,cma > + > + memory-region: > + maxItems: 1 > + description: | > + Phandle to the reserved memory node associated with the CMA Heap. > + The reserved memory node must follow this binding convention: > + - Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt > + > +examples: > + - | > + reserved-memory { > + #address-cells = <2>; > + #size-cells = <2>; > + > + custom_cma_heap: custom-cma-heap { > + compatible = "shared-dma-pool"; > + reg = <0x0 0x90600000 0x0 0x1000000>; > + reusable; > + }; > + }; > + > + device_cma_heap: device-cma-heap { > + compatible = "linux,cma"; > + memory-region = <&custom_cma_heap>; > + }; Isn't it redundant with the linux,cma-default shared-dma-pool property already? Maxime

1 year, 2 months

Re: [RFC PATCH 3/5] dt-bindings: gpu: Add protected heap name to Mali Valhall CSF binding

by Krzysztof Kozlowski

On 30/01/2025 14:08, Florent Tomasin wrote: > Allow mali-valhall-csf driver to retrieve a protected > heap at probe time by passing the name of the heap > as attribute to the device tree GPU node. Please wrap commit message according to Linux coding style / submission process (neither too early nor over the limit): https://elixir.bootlin.com/linux/v6.4-rc1/source/Documentation/process/subm… Why this cannot be passed by phandle, just like all reserved regions? From where do you take these protected heaps? Firmware? This would explain why no relation is here (no probe ordering, no device links, nothing connecting separate devices). Best regards, Krzysztof

1 year, 2 months

[PATCH] dma-buf: fix timeout handling in dma_resv_wait_timeout v2

by Christian König

Even the kerneldoc says that with a zero timeout the function should not wait for anything, but still return 1 to indicate that the fences are signaled now. Unfortunately that isn't what was implemented, instead of only returning 1 we also waited for at least one jiffies. Fix that by adjusting the handling to what the function is actually documented to do. v2: improve code readability Reported-by: Marek Olšák <marek.olsak(a)amd.com> Reported-by: Lucas Stach <l.stach(a)pengutronix.de> Signed-off-by: Christian König <christian.koenig(a)amd.com> Cc: <stable(a)vger.kernel.org> --- drivers/dma-buf/dma-resv.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 5f8d010516f0..c78cdae3deaf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -684,11 +684,13 @@ long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, dma_resv_iter_begin(&cursor, obj, usage); dma_resv_for_each_fence_unlocked(&cursor, fence) { - ret = dma_fence_wait_timeout(fence, intr, ret); - if (ret <= 0) { - dma_resv_iter_end(&cursor); - return ret; - } + ret = dma_fence_wait_timeout(fence, intr, timeout); + if (ret <= 0) + break; + + /* Even for zero timeout the return value is 1 */ + if (timeout) + timeout = ret; } dma_resv_iter_end(&cursor); -- 2.34.1

1 year, 2 months

[PATCH] dma-buf: fix timeout handling in dma_resv_wait_timeout

by Christian König

Even the kerneldoc says that with a zero timeout the function should not wait for anything, but still return 1 to indicate that the fences are signaled now. Unfortunately that isn't what was implemented, instead of only returning 1 we also waited for at least one jiffies. Fix that by adjusting the handling to what the function is actually documented to do. Reported-by: Marek Olšák <marek.olsak(a)amd.com> Reported-by: Lucas Stach <l.stach(a)pengutronix.de> Signed-off-by: Christian König <christian.koenig(a)amd.com> Cc: <stable(a)vger.kernel.org> --- drivers/dma-buf/dma-resv.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 5f8d010516f0..ae92d9d2394d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -684,11 +684,12 @@ long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, dma_resv_iter_begin(&cursor, obj, usage); dma_resv_for_each_fence_unlocked(&cursor, fence) { - ret = dma_fence_wait_timeout(fence, intr, ret); - if (ret <= 0) { - dma_resv_iter_end(&cursor); - return ret; - } + ret = dma_fence_wait_timeout(fence, intr, timeout); + if (ret <= 0) + break; + + /* Even for zero timeout the return value is 1 */ + timeout = min(timeout, ret); } dma_resv_iter_end(&cursor); -- 2.34.1

1 year, 2 months

Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

by Christian König

Am 22.01.25 um 12:04 schrieb Simona Vetter: > On Tue, Jan 21, 2025 at 01:36:33PM -0400, Jason Gunthorpe wrote: >> On Tue, Jan 21, 2025 at 05:11:32PM +0100, Simona Vetter wrote: >>> On Mon, Jan 20, 2025 at 03:48:04PM -0400, Jason Gunthorpe wrote: >>>> On Mon, Jan 20, 2025 at 07:50:23PM +0100, Simona Vetter wrote: >>>>> On Mon, Jan 20, 2025 at 01:59:01PM -0400, Jason Gunthorpe wrote: >>>>>> On Mon, Jan 20, 2025 at 01:14:12PM +0100, Christian König wrote: >>>>>> What is going wrong with your email? You replied to Simona, but >>>>>> Simona Vetter <simona.vetter(a)ffwll.ch> is dropped from the To/CC >>>>>> list??? I added the address back, but seems like a weird thing to >>>>>> happen. >>>>> Might also be funny mailing list stuff, depending how you get these. I >>>>> read mails over lore and pretty much ignore cc (unless it's not also on >>>>> any list, since those tend to be security issues) because I get cc'ed on >>>>> way too much stuff for that to be a useful signal. >>>> Oh I see, you are sending a Mail-followup-to header that excludes your >>>> address, so you don't get any emails at all.. My mutt is dropping you >>>> as well. I'm having all kind of funny phenomena with AMDs mail servers since coming back from xmas vacation. From the news it looks like Outlook on Windows has a new major security issue where just viewing a mail can compromise the system and my educated guess is that our IT guys went into panic mode because of this and has changed something. >> [SNIP] >> I have been assuming that dmabuf mmap remains unchanged, that >> exporters will continue to implement that mmap() callback as today. That sounds really really good to me because that was my major concern when you noted that you want to have PFNs to build up KVM page tables. But you don't want to handle mmap() on your own, you basically don't want to have a VMA for this stuff at all, correct? >> My main interest has been what data structure is produced in the >> attach APIs. >> >> Eg today we have a struct dma_buf_attachment that returns a sg_table. >> >> I'm expecting some kind of new data structure, lets call it "physical >> list" that is some efficient coding of meta/addr/len tuples that works >> well with the new DMA API. Matthew has been calling this thing phyr.. I would not use a data structure at all. Instead we should have something like an iterator/cursor based approach similar to what the new DMA API is doing. >> So, I imagine, struct dma_buf_attachment gaining an optional >> feature negotiation and then we have in dma_buf_attachment: >> >> union { >> struct sg_table *sgt; >> struct physical_list *phyr; >> }; >> >> That's basicaly it, an alternative to scatterlist that has a clean >> architecture. I would rather suggest something like dma_buf_attachment() gets offset and size to map and returns a cursor object you can use to get your address, length and access attributes. And then you can iterate over this cursor and fill in your importer data structure with the necessary information. This way neither the exporter nor the importer need to convert their data back and forth between their specific representations of the information. >> Now, if you are asking if the current dmabuf mmap callback can be >> improved with the above? Maybe? phyr should have the neccessary >> information inside it to populate a VMA - eventually even fully >> correctly with all the right cachable/encrypted/forbidden/etc flags. That won't work like this. See the exporter needs to be informed about page faults on the VMA to eventually wait for operations to end and sync caches. Otherwise we either potentially allow access to freed up or re-used memory or run into issues with device cache coherency. >> So, you could imagine that exporters could just have one routine to >> generate the phyr list and that goes into the attachment, goes into >> some common code to fill VMA PTEs, and some other common code that >> will convert it into the DMABUF scatterlist. If performance is not a >> concern with these data structure conversions it could be an appealing >> simplification. >> >> And yes, I could imagine the meta information being descriptive enough >> to support the private interconnect cases, the common code could >> detect private meta information and just cleanly fail. > I'm kinda leaning towards entirely separate dma-buf interfaces for the new > phyr stuff, because I fear that adding that to the existing ones will only > make the chaos worse. But that aside sounds all reasonable, and even that > could just be too much worry on my side and mixing phyr into existing > attachments (with a pile of importer/exporter flags probably) is fine. I lean into the other direction. Dmitry and Thomas have done a really good job at cleaning up all the interaction between dynamic and static exporters / importers. Especially that we now have consistent locking for map_dma_buf() and unmap_dma_buf() should make that transition rather straight forward. > For the existing dma-buf importers/exporters I'm kinda hoping for a pure > dma_addr_t based list eventually. Going all the way to a phyr based > approach for everyone might be too much churn, there's some real bad cruft > there. It's not going to work for every case, but it covers a lot of them > and might be less pain for existing importers. The point is we have use cases that won't work without exchanging DMA addresses any more. For example we have cases with multiple devices are in the same IOMMU domain and re-using their DMA address mappings. > But in theory it should be possible to use phyr everywhere eventually, as > long as there's no obviously api-rules-breaking way to go from a phyr back to > a struct page even when that exists. I would rather say we should stick to DMA addresses as much as possible. What we can do is to add an address space description to the addresses, e.g. if it's a PCIe BUS addr in IOMMU domain X, or of it's a device private bus addr or in the case of sharing with iommufd and KVM PFNs. Regards, Christian. >>> At least the device mapping / dma_buf_attachment >>> side should be doable with just the pfn and the new dma-api? >> Yes, that would be my first goal post. Figure out some meta >> information and a container data structure that allows struct >> page-less P2P mapping through the new DMA API. >> >>>> I'm hoping we can get to something where we describe not just how the >>>> pfns should be DMA mapped, but also can describe how they should be >>>> CPU mapped. For instance that this PFN space is always mapped >>>> uncachable, in CPU and in IOMMU. >>> I was pondering whether dma_mmap and friends would be a good place to >>> prototype this and go for a fully generic implementation. But then even >>> those have _wc/_uncached variants. >> Given that the inability to correctly DMA map P2P MMIO without struct >> page is a current pain point and current source of hacks in dmabuf >> exporters, I wanted to make resolving that a priority. >> >> However, if you mean what I described above for "fully generic [dmabuf >> mmap] implementation", then we'd have the phyr datastructure as a >> dependency to attempt that work. >> >> phyr, and particularly the meta information, has a number of >> stakeholders. I was thinking of going first with rdma's memory >> registration flow because we are now pretty close to being able to do >> such a big change, and it can demonstrate most of the requirements. >> >> But that doesn't mean mmap couldn't go concurrently on the same agreed >> datastructure if people are interested. > Yeah cpu mmap needs a lot more, going with a very limited p2p use-case > first only makes sense. > >>>> We also have current bugs in the iommu/vfio side where we are fudging >>>> CC stuff, like assuming CPU memory is encrypted (not always true) and >>>> that MMIO is non-encrypted (not always true) >>> tbf CC pte flags I just don't grok at all. I've once tried to understand >>> what current exporters and gpu drivers do and just gave up. But that's >>> also a bit why I'm worried here because it's an enigma to me. >> For CC, inside the secure world, is some information if each PFN >> inside the VM is 'encrypted' or not. Any VM PTE (including the IOPTEs) >> pointing at the PFN must match the secure world's view of >> 'encrypted'. The VM can ask the secure world to change its view at >> runtime. >> >> The way CC has been bolted on to the kernel so far laregly hides this >> from drivers, so it is troubled to tell in driver code if the PFN you >> have is 'encrypted' or not. Right now the general rule (that is not >> always true) is that struct page CPU memory is encrypted and >> everything else is decrypted. >> >> So right now, you can mostly ignore it and the above assumption >> largely happens for you transparently. >> >> However, soon we will have encrypted P2P MMIO which will stress this >> hiding strategy. > It's already breaking with stuff like virtual gpu drivers, vmwgfx is > fiddling around with these bits (at least last I tried to understand this > all) and I think a few others do too. > >>>>> I thought iommuv2 (or whatever linux calls these) has full fault support >>>>> and could support current move semantics. But yeah for iommu without >>>>> fault support we need some kind of pin or a newly formalized revoke model. >>>> No, this is HW dependent, including PCI device, and I'm aware of no HW >>>> that fully implements this in a way that could be useful to implement >>>> arbitary move semantics for VFIO.. >>> Hm I thought we've had at least prototypes floating around of device fault >>> repair, but I guess that only works with ATS/pasid stuff and not general >>> iommu traffic from devices. Definitely needs some device cooperation since >>> the timeouts of a full fault are almost endless. >> Yes, exactly. What all real devices I'm aware have done is make a >> subset of their traffic work with ATS and PRI, but not all their >> traffic. Without *all* traffic you can't make any generic assumption >> in the iommu that a transient non-present won't be fatal to the >> device. >> >> Stuff like dmabuf move semantics rely on transient non-present being >> non-disruptive... > Ah now I get it, at the iommu level you have to pessimistically assume > whether a device can handle a fault, and none can for all traffic. I was > thinking too much about the driver level where generally the dma-buf you > importer are only used for the subset of device functions that can cope > with faults on many devices. > > Cheers, Sima

1 year, 2 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig January 2025