Linaro-mm-sig January 2025

linaro-mm-sig@lists.linaro.org

13 participants
40 discussions

Re: [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device

by Jason Gunthorpe

On Sat, Jan 11, 2025 at 11:48:06AM +0800, Xu Yilun wrote: > > > > can be sure what is the correct UAPI. In other words, make the > > > > VFIO device into a CC device should also prevent mmaping it and so on. > > > > > > My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). > > > > I think you need to start the TDI process much earlier. Some arches > > are going to need work to prepare the TDI before the VM is started. > > Could you elaborate more on that? AFAICS Intel & AMD are all good on > "late bind", but not sure for other architectures. I'm not sure about this, the topic has been confused a bit, and people often seem to misunderstand what the full scenario actually is. :\ What I'm talking abou there is that you will tell the secure world to create vPCI function that has the potential to be secure "TDI run" down the road. The VM will decide when it reaches the run state. This is needed so the secure world can prepare anything it needs prior to starting the VM. Setting up secure vIOMMU emulation, for instance. I expect ARM will need this, I'd be surprised if AMD actually doesn't in the full scenario with secure viommu. It should not be a surprise to the secure world after the VM has started that suddenly it learns about a vPCI function that wants to be secure. This should all be pre-arranged as possible before starting the VM, even if alot of steps happen after the VM starts running (or maybe don't happen at all). Jason

6 months

Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

by Jason Gunthorpe

On Fri, Jan 10, 2025 at 08:34:55PM +0100, Simona Vetter wrote: > So if I'm getting this right, what you need from a functional pov is a > dma_buf_tdx_mmap? Because due to tdx restrictions, the normal dma_buf_mmap > is not going to work I guess? Don't want something TDX specific! There is a general desire, and CC is one, but there are other motivations like performance, to stop using VMAs and mmaps as a way to exchanage memory between two entities. Instead we want to use FDs. We now have memfd and guestmemfd that are usable with memfd_pin_folios() - this covers pinnable CPU memory. And for a long time we had DMABUF which is for all the other wild stuff, and it supports movable memory too. So, the normal DMABUF semantics with reservation locking and move notifiers seem workable to me here. They are broadly similar enough to the mmu notifier locking that they can serve the same job of updating page tables. > Also another thing that's a bit tricky is that kvm kinda has a 3rd dma-buf > memory model: > - permanently pinned dma-buf, they never move > - dynamic dma-buf, they move through ->move_notify and importers can remap > - revocable dma-buf, which thus far only exist for pci mmio resources I would like to see the importers be able to discover which one is going to be used, because we have RDMA cases where we can support 1 and 3 but not 2. revocable doesn't require page faulting as it is a terminal condition. > Since we're leaning even more on that 3rd model I'm wondering whether we > should make it something official. Because the existing dynamic importers > do very much assume that re-acquiring the memory after move_notify will > work. But for the revocable use-case the entire point is that it will > never work. > I feel like that's a concept we need to make explicit, so that dynamic > importers can reject such memory if necessary. It strikes me as strange that HW can do page faulting, so it can support #2, but it can't handle a non-present fault? > So yeah there's a bunch of tricky lifetime questions that need to be > sorted out with proper design I think, and the current "let's just use pfn > directly" proposal hides them all under the rug. I don't think these two things are connected. The lifetime model that KVM needs to work with the EPT, and that VFIO needs for it's MMIO, definately should be reviewed and evaluated. But it is completely orthogonal to allowing iommufd and kvm to access the CPU PFN to use in their mapping flows, instead of the dma_addr_t. What I want to get to is a replacement for scatter list in DMABUF that is an array of arrays, roughly like: struct memory_chunks { struct memory_p2p_provider *provider; struct bio_vec addrs[]; }; int (*dmabuf_get_memory)(struct memory_chunks **chunks, size_t *num_chunks); This can represent all forms of memory: P2P, private, CPU, etc and would be efficient with the new DMA API. This is similar to the structure BIO has, and it composes nicely with a future pin_user_pages() and memfd_pin_folios(). Jason

6 months

Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

by Jason Gunthorpe

On Fri, Jan 10, 2025 at 08:24:22PM +0100, Simona Vetter wrote: > On Thu, Jan 09, 2025 at 01:56:02AM +0800, Xu Yilun wrote: > > > > > 5) iommufd and kvm are both using CPU addresses without DMA. No > > > > > exporter mapping is possible > > > > > > > > We have customers using both KVM and XEN with DMA-buf, so I can clearly > > > > confirm that this isn't true. > > > > > > Today they are mmaping the dma-buf into a VMA and then using KVM's > > > follow_pfn() flow to extract the CPU pfn from the PTE. Any mmapable > > > dma-buf must have a CPU PFN. > > > > Yes, the final target for KVM is still the CPU PFN, just with the help > > of CPU mapping table. > > > > I also found the xen gntdev-dmabuf is calculating pfn from mapped > > sgt. > > See the comment, it's ok because it's a fake device with fake iommu and > the xen code has special knowledge to peek behind the curtain. /* * Now convert sgt to array of gfns without accessing underlying pages. * It is not allowed to access the underlying struct page of an sg table * exported by DMA-buf, but since we deal with special Xen dma device here * (not a normal physical one) look at the dma addresses in the sg table * and then calculate gfns directly from them. */ for_each_sgtable_dma_page(sgt, &sg_iter, 0) { dma_addr_t addr = sg_page_iter_dma_address(&sg_iter); unsigned long pfn = bfn_to_pfn(XEN_PFN_DOWN(dma_to_phys(dev, addr))); *barf* Can we please all agree that is a horrible abuse of the DMA API and lets not point it as some acceptable "solution"? KVM and iommufd do not have fake struct devices with fake iommus. Jason

6 months

Re: [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device

by Jason Gunthorpe

On Fri, Jan 10, 2025 at 12:40:28AM +0800, Xu Yilun wrote: > So then we face with the shared <-> private device conversion in CoCo VM, > and in turn shared <-> private MMIO conversion. MMIO region has only one > physical backend so it is a bit like in-place conversion which is > complicated. I wanna simply the MMIO conversion routine based on the fact > that VMM never needs to access assigned MMIO for feature emulation, so > always disallow userspace MMIO mapping during the whole lifecycle. That's > why the flag is introduced. The VMM can simply not map it if for these cases. As part of the TDI flow the kernel can validate it is not mapped. > > can be sure what is the correct UAPI. In other words, make the > > VFIO device into a CC device should also prevent mmaping it and so on. > > My idea is prevent mmaping first, then allow VFIO device into CC dev (TDI). I think you need to start the TDI process much earlier. Some arches are going to need work to prepare the TDI before the VM is started. The other issue here is that Intel is somewhat different from others and when we build uapi for TDI it has to accommodate everyone. > Yes. It carries out the idea of "KVM maps MMIO resources without firstly > mapping into the host" even for normal VM. That's why I think it could > be an independent patchset. Yes, just remove this patch and other TDI focused stuff. Just infrastructure to move to FD based mapping instead of VMA. Jason

6 months

Re: [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device

by Jason Gunthorpe

On Thu, Jan 09, 2025 at 12:57:58AM +0800, Xu Yilun wrote: > On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote: > > On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > > > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > > > for private assignment. For these private assigned devices, disallow > > > host accessing their MMIO resources. > > > > Why? Shouldn't the VMM simply not call mmap? Why does the kernel have > > to enforce this? > > MM.. maybe I should not say 'host', instead 'userspace'. > > I think the kernel part VMM (KVM) has the responsibility to enforce the > correct behavior of the userspace part VMM (QEMU). QEMU has no way to > touch private memory/MMIO intentionally or accidently. IIUC that's one > of the initiative guest_memfd is introduced for private memory. Private > MMIO follows. Okay, but then why is it a flag like that? I'm expecting a much broader system here to make the VFIO device into a confidential device (like setup the TDI) where we'd have to enforce the private things, communicate with some secure world to assign it, and so on. I want to see a fuller solution to the CC problem in VFIO before we can be sure what is the correct UAPI. In other words, make the VFIO device into a CC device should also prevent mmaping it and so on. So, I would take this out and defer VFIO enforcment to a series which does fuller CC enablement of VFIO. The precursor work should just be avoiding requiring a VMA when installing VFIO MMIO into the KVM and IOMMU stage 2 mappings. Ie by using a FD to get the CPU pfns into iommufd and kvm as you are showing. This works just fine for non-CC devices anyhow and is the necessary building block for making a TDI interface in VFIO. Jason

6 months

Re: [PATCH 1/3] drm/sched: Document run_job() refcount hazard

by Christian König

Am 09.01.25 um 14:37 schrieb Philipp Stanner: > From: Philipp Stanner <pstanner(a)redhat.com> > > drm_sched_backend_ops.run_job() returns a dma_fence for the scheduler. > That fence is signalled by the driver once the hardware completed the > associated job. The scheduler does not increment the reference count on > that fence, but implicitly expects to inherit this fence from run_job(). > > This is relatively subtle and prone to misunderstandings. > > This implies that, to keep a reference for itself, a driver needs to > call dma_fence_get() in addition to dma_fence_init() in that callback. > > It's further complicated by the fact that the scheduler even decrements > the refcount in drm_sched_run_job_work() since it created a new > reference in drm_sched_fence_scheduled(). It does, however, still use > its pointer to the fence after calling dma_fence_put() - which is safe > because of the aforementioned new reference, but actually still violates > the refcounting rules. > > Improve the explanatory comment for that decrement. > > Move the call to dma_fence_put() to the position behind the last usage > of the fence. > > Document the necessity to increment the reference count in > drm_sched_backend_ops.run_job(). > > Signed-off-by: Philipp Stanner <pstanner(a)redhat.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++--- > include/drm/gpu_scheduler.h | 19 +++++++++++++++---- > 2 files changed, 22 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 57da84908752..5f46c01eb01e 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1218,15 +1218,19 @@ static void drm_sched_run_job_work(struct work_struct *w) > drm_sched_fence_scheduled(s_fence, fence); > > if (!IS_ERR_OR_NULL(fence)) { > - /* Drop for original kref_init of the fence */ > - dma_fence_put(fence); > - > r = dma_fence_add_callback(fence, &sched_job->cb, > drm_sched_job_done_cb); > if (r == -ENOENT) > drm_sched_job_done(sched_job, fence->error); > else if (r) > DRM_DEV_ERROR(sched->dev, "fence add callback failed (%d)\n", r); > + > + /* > + * s_fence took a new reference to fence in the call to > + * drm_sched_fence_scheduled() above. The reference passed by > + * run_job() above is now not needed any longer. Drop it. > + */ > + dma_fence_put(fence); > } else { > drm_sched_job_done(sched_job, IS_ERR(fence) ? > PTR_ERR(fence) : 0); > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 95e17504e46a..d5cd2a78f27c 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -420,10 +420,21 @@ struct drm_sched_backend_ops { > struct drm_sched_entity *s_entity); > > /** > - * @run_job: Called to execute the job once all of the dependencies > - * have been resolved. This may be called multiple times, if > - * timedout_job() has happened and drm_sched_job_recovery() > - * decides to try it again. > + * @run_job: Called to execute the job once all of the dependencies > + * have been resolved. This may be called multiple times, if > + * timedout_job() has happened and drm_sched_job_recovery() decides to > + * try it again. I just came to realize that this hack with calling run_job multiple times won't work any more with this patch here. The background is that you can't allocate memory for a newly returned fence and as far as I know no driver pre allocates multiple HW fences for a job. So at least amdgpu used to re-use the same HW fence as return over and over again, just re-initialized the reference count. I removed that hack from amdgpu, but just FYI it could be that other driver did the same. Apart from that concern I think that this patch is really the right thing and that driver hacks relying on the order of dropping references are fundamentally broken approaches. So feel free to add Reviewed-by: Christian König <christian.koenig(a)amd.com>. Regards, Christian. > + * > + * @sched_job: the job to run > + * > + * Returns: dma_fence the driver must signal once the hardware has > + * completed the job ("hardware fence"). > + * > + * Note that the scheduler expects to 'inherit' its own reference to > + * this fence from the callback. It does not invoke an extra > + * dma_fence_get() on it. Consequently, this callback must take a > + * reference for the scheduler, and additional ones for the driver's > + * respective needs. > */ > struct dma_fence *(*run_job)(struct drm_sched_job *sched_job); >

6 months

Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

by Christian König

Am 07.01.25 um 15:27 schrieb Xu Yilun: > Introduce a new API for dma-buf importer, also add a dma_buf_ops > callback for dma-buf exporter. This API is for subsystem importers who > map the dma-buf to some user defined address space, e.g. for IOMMUFD to > map the dma-buf to userspace IOVA via IOMMU page table, or for KVM to > map the dma-buf to GPA via KVM MMU (e.g. EPT). > > Currently dma-buf is only used to get DMA address for device's default > domain by using kernel DMA APIs. But for these new use-cases, importers > only need the pfn of the dma-buf resource to build their own mapping > tables. As far as I can see I have to fundamentally reject this whole approach. It's intentional DMA-buf design that we don't expose struct pages nor PFNs to the importer. Essentially DMA-buf only transports DMA addresses. In other words the mapping is done by the exporter and *not* the importer. What we certainly can do is to annotate those DMA addresses to a better specify in which domain they are applicable, e.g. if they are PCIe bus addresses or some inter device bus addresses etc... But moving the functionality to map the pages/PFNs to DMA addresses into the importer is an absolutely clear NO-GO. Regards, Christian. > So the map_dma_buf() callback is not mandatory for exporters > anymore. Also the importers could choose not to provide > struct device *dev on dma_buf_attach() if they don't call > dma_buf_map_attachment(). > > Like dma_buf_map_attachment(), the importer should firstly call > dma_buf_attach/dynamic_attach() then call dma_buf_get_pfn_unlocked(). > If the importer choose to do dynamic attach, it also should handle the > dma-buf move notification. > > Only the unlocked version of dma_buf_get_pfn is implemented for now, > just because no locked version is used for now. > > Signed-off-by: Xu Yilun <yilun.xu(a)linux.intel.com> > > --- > IIUC, Only get_pfn() is needed but no put_pfn(). The whole dma-buf is > de/referenced at dma-buf attach/detach time. > > Specifically, for static attachment, the exporter should always make > memory resource available/pinned on first dma_buf_attach(), and > release/unpin memory resource on last dma_buf_detach(). For dynamic > attachment, the exporter could populate & invalidate the memory > resource at any time, it's OK as long as the importers follow dma-buf > move notification. So no pinning is needed for get_pfn() and no > put_pfn() is needed. > --- > drivers/dma-buf/dma-buf.c | 90 +++++++++++++++++++++++++++++++-------- > include/linux/dma-buf.h | 13 ++++++ > 2 files changed, 86 insertions(+), 17 deletions(-) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 7eeee3a38202..83d1448b6dcc 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -630,10 +630,10 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) > size_t alloc_size = sizeof(struct dma_buf); > int ret; > > - if (WARN_ON(!exp_info->priv || !exp_info->ops > - || !exp_info->ops->map_dma_buf > - || !exp_info->ops->unmap_dma_buf > - || !exp_info->ops->release)) > + if (WARN_ON(!exp_info->priv || !exp_info->ops || > + (!!exp_info->ops->map_dma_buf != !!exp_info->ops->unmap_dma_buf) || > + (!exp_info->ops->map_dma_buf && !exp_info->ops->get_pfn) || > + !exp_info->ops->release)) > return ERR_PTR(-EINVAL); > > if (WARN_ON(exp_info->ops->cache_sgt_mapping && > @@ -909,7 +909,10 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, > struct dma_buf_attachment *attach; > int ret; > > - if (WARN_ON(!dmabuf || !dev)) > + if (WARN_ON(!dmabuf)) > + return ERR_PTR(-EINVAL); > + > + if (WARN_ON(dmabuf->ops->map_dma_buf && !dev)) > return ERR_PTR(-EINVAL); > > if (WARN_ON(importer_ops && !importer_ops->move_notify)) > @@ -941,7 +944,7 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, > */ > if (dma_buf_attachment_is_dynamic(attach) != > dma_buf_is_dynamic(dmabuf)) { > - struct sg_table *sgt; > + struct sg_table *sgt = NULL; > > dma_resv_lock(attach->dmabuf->resv, NULL); > if (dma_buf_is_dynamic(attach->dmabuf)) { > @@ -950,13 +953,16 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, > goto err_unlock; > } > > - sgt = __map_dma_buf(attach, DMA_BIDIRECTIONAL); > - if (!sgt) > - sgt = ERR_PTR(-ENOMEM); > - if (IS_ERR(sgt)) { > - ret = PTR_ERR(sgt); > - goto err_unpin; > + if (dmabuf->ops->map_dma_buf) { > + sgt = __map_dma_buf(attach, DMA_BIDIRECTIONAL); > + if (!sgt) > + sgt = ERR_PTR(-ENOMEM); > + if (IS_ERR(sgt)) { > + ret = PTR_ERR(sgt); > + goto err_unpin; > + } > } > + > dma_resv_unlock(attach->dmabuf->resv); > attach->sgt = sgt; > attach->dir = DMA_BIDIRECTIONAL; > @@ -1119,7 +1125,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, > > might_sleep(); > > - if (WARN_ON(!attach || !attach->dmabuf)) > + if (WARN_ON(!attach || !attach->dmabuf || > + !attach->dmabuf->ops->map_dma_buf)) > return ERR_PTR(-EINVAL); > > dma_resv_assert_held(attach->dmabuf->resv); > @@ -1195,7 +1202,8 @@ dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach, > > might_sleep(); > > - if (WARN_ON(!attach || !attach->dmabuf)) > + if (WARN_ON(!attach || !attach->dmabuf || > + !attach->dmabuf->ops->map_dma_buf)) > return ERR_PTR(-EINVAL); > > dma_resv_lock(attach->dmabuf->resv, NULL); > @@ -1222,7 +1230,8 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, > { > might_sleep(); > > - if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) > + if (WARN_ON(!attach || !attach->dmabuf || > + !attach->dmabuf->ops->unmap_dma_buf || !sg_table)) > return; > > dma_resv_assert_held(attach->dmabuf->resv); > @@ -1254,7 +1263,8 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, > { > might_sleep(); > > - if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) > + if (WARN_ON(!attach || !attach->dmabuf || > + !attach->dmabuf->ops->unmap_dma_buf || !sg_table)) > return; > > dma_resv_lock(attach->dmabuf->resv, NULL); > @@ -1263,6 +1273,52 @@ void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, > } > EXPORT_SYMBOL_NS_GPL(dma_buf_unmap_attachment_unlocked, "DMA_BUF"); > > +/** > + * dma_buf_get_pfn_unlocked - > + * @attach: [in] attachment to get pfn from > + * @pgoff: [in] page offset of the buffer against the start of dma_buf > + * @pfn: [out] returns the pfn of the buffer > + * @max_order [out] returns the max mapping order of the buffer > + */ > +int dma_buf_get_pfn_unlocked(struct dma_buf_attachment *attach, > + pgoff_t pgoff, u64 *pfn, int *max_order) > +{ > + struct dma_buf *dmabuf = attach->dmabuf; > + int ret; > + > + if (WARN_ON(!attach || !attach->dmabuf || > + !attach->dmabuf->ops->get_pfn)) > + return -EINVAL; > + > + /* > + * Open: > + * > + * When dma_buf is dynamic but dma_buf move is disabled, the buffer > + * should be pinned before use, See dma_buf_map_attachment() for > + * reference. > + * > + * But for now no pin is intended inside dma_buf_get_pfn(), otherwise > + * need another API to unpin the dma_buf. So just fail out this case. > + */ > + if (dma_buf_is_dynamic(attach->dmabuf) && > + !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) > + return -ENOENT; > + > + dma_resv_lock(attach->dmabuf->resv, NULL); > + ret = dmabuf->ops->get_pfn(attach, pgoff, pfn, max_order); > + /* > + * Open: > + * > + * Is dma_resv_wait_timeout() needed? I assume no. The DMA buffer > + * content synchronization could be done when the buffer is to be > + * mapped by importer. > + */ > + dma_resv_unlock(attach->dmabuf->resv); > + > + return ret; > +} > +EXPORT_SYMBOL_NS_GPL(dma_buf_get_pfn_unlocked, "DMA_BUF"); > + > /** > * dma_buf_move_notify - notify attachments that DMA-buf is moving > * > @@ -1662,7 +1718,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused) > attach_count = 0; > > list_for_each_entry(attach_obj, &buf_obj->attachments, node) { > - seq_printf(s, "\t%s\n", dev_name(attach_obj->dev)); > + seq_printf(s, "\t%s\n", attach_obj->dev ? dev_name(attach_obj->dev) : NULL); > attach_count++; > } > dma_resv_unlock(buf_obj->resv); > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > index 36216d28d8bd..b16183edfb3a 100644 > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -194,6 +194,17 @@ struct dma_buf_ops { > * if the call would block. > */ > > + /** > + * @get_pfn: > + * > + * This is called by dma_buf_get_pfn(). It is used to get the pfn > + * of the buffer positioned by the page offset against the start of > + * the dma_buf. It can only be called if @attach has been called > + * successfully. > + */ > + int (*get_pfn)(struct dma_buf_attachment *attach, pgoff_t pgoff, > + u64 *pfn, int *max_order); > + > /** > * @release: > * > @@ -629,6 +640,8 @@ dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach, > void dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, > struct sg_table *sg_table, > enum dma_data_direction direction); > +int dma_buf_get_pfn_unlocked(struct dma_buf_attachment *attach, > + pgoff_t pgoff, u64 *pfn, int *max_order); > > int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, > unsigned long);

6 months

Re: [PATCH v4 1/6] tee: add restricted memory allocation

by Jens Wiklander

On Wed, Jan 8, 2025 at 5:54 PM Simona Vetter <simona.vetter(a)ffwll.ch> wrote: > > On Tue, Dec 17, 2024 at 11:07:37AM +0100, Jens Wiklander wrote: > > Add restricted memory allocation to the TEE subsystem. > > > > Restricted memory refers to memory buffers behind a hardware enforced > > firewall. It is not accessible to the kernel during normal circumstances > > but rather only accessible to certain hardware IPs or CPUs executing in > > higher privileged mode than the kernel itself. This interface allows to > > allocate and manage such restricted memory buffers via interaction with > > a TEE implementation. > > > > A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted > > memory buffers. > > > > The restricted memory is allocated for a specific use-case, like Secure > > Video Playback, Trusted UI, or Secure Video Recording where certain > > hardware devices can access the memory. > > > > More use-cases can be added in userspace ABI, but it's up to the backend > > drivers to provide the implementation. > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > --- > > drivers/tee/Makefile | 1 + > > drivers/tee/tee_core.c | 38 ++++++- > > drivers/tee/tee_private.h | 2 + > > drivers/tee/tee_rstmem.c | 201 +++++++++++++++++++++++++++++++++++++ > > drivers/tee/tee_shm.c | 2 + > > drivers/tee/tee_shm_pool.c | 69 ++++++++++++- > > include/linux/tee_core.h | 15 +++ > > include/linux/tee_drv.h | 2 + > > include/uapi/linux/tee.h | 44 +++++++- > > 9 files changed, 370 insertions(+), 4 deletions(-) > > create mode 100644 drivers/tee/tee_rstmem.c > > > > diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile > > index 5488cba30bd2..a4c6b55444b9 100644 > > --- a/drivers/tee/Makefile > > +++ b/drivers/tee/Makefile > > @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o > > tee-objs += tee_core.o > > tee-objs += tee_shm.o > > tee-objs += tee_shm_pool.o > > +tee-objs += tee_rstmem.o > > obj-$(CONFIG_OPTEE) += optee/ > > obj-$(CONFIG_AMDTEE) += amdtee/ > > obj-$(CONFIG_ARM_TSTEE) += tstee/ > > diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c > > index d113679b1e2d..f4a45b77753b 100644 > > --- a/drivers/tee/tee_core.c > > +++ b/drivers/tee/tee_core.c > > @@ -1,12 +1,13 @@ > > // SPDX-License-Identifier: GPL-2.0-only > > /* > > - * Copyright (c) 2015-2016, Linaro Limited > > + * Copyright (c) 2015-2022, 2024, Linaro Limited > > */ > > > > #define pr_fmt(fmt) "%s: " fmt, __func__ > > > > #include <linux/cdev.h> > > #include <linux/cred.h> > > +#include <linux/dma-buf.h> > > #include <linux/fs.h> > > #include <linux/idr.h> > > #include <linux/module.h> > > @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx, > > return rc; > > } > > > > +static int > > +tee_ioctl_rstmem_alloc(struct tee_context *ctx, > > + struct tee_ioctl_rstmem_alloc_data __user *udata) > > +{ > > + struct tee_ioctl_rstmem_alloc_data data; > > + struct dma_buf *dmabuf; > > + int id; > > + int fd; > > + > > + if (copy_from_user(&data, udata, sizeof(data))) > > + return -EFAULT; > > + > > + if (data.use_case == TEE_IOC_UC_RESERVED) > > + return -EINVAL; > > + > > + dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size, > > + &id); > > + if (IS_ERR(dmabuf)) > > + return PTR_ERR(dmabuf); > > + if (put_user(id, &udata->id)) { > > + fd = -EFAULT; > > + goto err; > > + } > > + fd = dma_buf_fd(dmabuf, O_CLOEXEC); > > + if (fd < 0) > > + goto err; > > + return fd; > > +err: > > + dma_buf_put(dmabuf); > > + return fd; > > +} > > + > > static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > > { > > struct tee_context *ctx = filp->private_data; > > @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > > return tee_ioctl_supp_recv(ctx, uarg); > > case TEE_IOC_SUPPL_SEND: > > return tee_ioctl_supp_send(ctx, uarg); > > + case TEE_IOC_RSTMEM_ALLOC: > > + return tee_ioctl_rstmem_alloc(ctx, uarg); > > default: > > return -EINVAL; > > } > > @@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro"); > > MODULE_DESCRIPTION("TEE Driver"); > > MODULE_VERSION("1.0"); > > MODULE_LICENSE("GPL v2"); > > +MODULE_IMPORT_NS("DMA_BUF"); > > diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h > > index 9bc50605227c..bf97796909c0 100644 > > --- a/drivers/tee/tee_private.h > > +++ b/drivers/tee/tee_private.h > > @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx); > > struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size); > > struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx, > > unsigned long addr, size_t length); > > +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags, > > + u32 use_case, size_t size, int *shm_id); > > > > #endif /*TEE_PRIVATE_H*/ > > diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c > > new file mode 100644 > > index 000000000000..536bca2901e2 > > --- /dev/null > > +++ b/drivers/tee/tee_rstmem.c > > @@ -0,0 +1,201 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * Copyright (c) 2024 Linaro Limited > > + */ > > +#include <linux/device.h> > > +#include <linux/dma-buf.h> > > +#include <linux/genalloc.h> > > +#include <linux/scatterlist.h> > > +#include <linux/slab.h> > > +#include <linux/tee_core.h> > > +#include "tee_private.h" > > + > > +struct tee_rstmem_attachment { > > + struct sg_table table; > > + struct device *dev; > > +}; > > + > > +static int rstmem_dma_attach(struct dma_buf *dmabuf, > > + struct dma_buf_attachment *attachment) > > +{ > > + struct tee_shm *shm = dmabuf->priv; > > + struct tee_rstmem_attachment *a; > > + int rc; > > + > > + a = kzalloc(sizeof(*a), GFP_KERNEL); > > + if (!a) > > + return -ENOMEM; > > + > > + if (shm->pages) { > > + rc = sg_alloc_table_from_pages(&a->table, shm->pages, > > + shm->num_pages, 0, > > + shm->num_pages * PAGE_SIZE, > > + GFP_KERNEL); > > + if (rc) > > + goto err; > > + } else { > > + rc = sg_alloc_table(&a->table, 1, GFP_KERNEL); > > + if (rc) > > + goto err; > > + sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size, > > + 0); > > + } > > + > > + a->dev = attachment->dev; > > + attachment->priv = a; > > + > > + return 0; > > +err: > > + kfree(a); > > + return rc; > > +} > > + > > +static void rstmem_dma_detach(struct dma_buf *dmabuf, > > + struct dma_buf_attachment *attachment) > > +{ > > + struct tee_rstmem_attachment *a = attachment->priv; > > + > > + sg_free_table(&a->table); > > + kfree(a); > > +} > > + > > +static struct sg_table * > > +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment, > > + enum dma_data_direction direction) > > +{ > > + struct tee_rstmem_attachment *a = attachment->priv; > > + int ret; > > + > > + ret = dma_map_sgtable(attachment->dev, &a->table, direction, > > + DMA_ATTR_SKIP_CPU_SYNC); > > + if (ret) > > + return ERR_PTR(ret); > > + > > + return &a->table; > > +} > > + > > +static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment, > > + struct sg_table *table, > > + enum dma_data_direction direction) > > +{ > > + struct tee_rstmem_attachment *a = attachment->priv; > > + > > + WARN_ON(&a->table != table); > > + > > + dma_unmap_sgtable(attachment->dev, table, direction, > > + DMA_ATTR_SKIP_CPU_SYNC); > > +} > > + > > +static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, > > + enum dma_data_direction direction) > > +{ > > + return -EPERM; > > +} > > + > > +static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf, > > + enum dma_data_direction direction) > > +{ > > + return -EPERM; > > +} > > + > > +static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf, > > + struct vm_area_struct *vma) > > +{ > > + return -EPERM; > > Just an aside, you should just not implment mmap if it's not possible. > Then you also don't need the cpu access functions. Both userspace mmap and > in-kernel vmap support is intentionally optional for dma-buf. Got it, thanks. I'll remove those in the next version. Cheers, Jens > -Sima > > > +} > > + > > +static void rstmem_dma_buf_free(struct dma_buf *dmabuf) > > +{ > > + struct tee_shm *shm = dmabuf->priv; > > + > > + tee_shm_put(shm); > > +} > > + > > +static const struct dma_buf_ops rstmem_generic_buf_ops = { > > + .attach = rstmem_dma_attach, > > + .detach = rstmem_dma_detach, > > + .map_dma_buf = rstmem_dma_map_dma_buf, > > + .unmap_dma_buf = rstmem_dma_unmap_dma_buf, > > + .begin_cpu_access = rstmem_dma_buf_begin_cpu_access, > > + .end_cpu_access = rstmem_dma_buf_end_cpu_access, > > + .mmap = rstmem_dma_buf_mmap, > > + .release = rstmem_dma_buf_free, > > +}; > > + > > +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags, > > + u32 use_case, size_t size, int *shm_id) > > +{ > > + struct tee_device *teedev = ctx->teedev; > > + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); > > + struct dma_buf *dmabuf; > > + struct tee_shm *shm; > > + void *ret; > > + int rc; > > + > > + if (!tee_device_get(teedev)) > > + return ERR_PTR(-EINVAL); > > + > > + if (!teedev->desc->ops->rstmem_alloc || > > + !teedev->desc->ops->rstmem_free) { > > + dmabuf = ERR_PTR(-EINVAL); > > + goto err; > > + } > > + > > + shm = kzalloc(sizeof(*shm), GFP_KERNEL); > > + if (!shm) { > > + dmabuf = ERR_PTR(-ENOMEM); > > + goto err; > > + } > > + > > + refcount_set(&shm->refcount, 1); > > + shm->flags = TEE_SHM_RESTRICTED; > > + shm->use_case = use_case; > > + shm->ctx = ctx; > > + > > + mutex_lock(&teedev->mutex); > > + shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL); > > + mutex_unlock(&teedev->mutex); > > + if (shm->id < 0) { > > + dmabuf = ERR_PTR(shm->id); > > + goto err_kfree; > > + } > > + > > + rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size); > > + if (rc) { > > + dmabuf = ERR_PTR(rc); > > + goto err_idr_remove; > > + } > > + > > + mutex_lock(&teedev->mutex); > > + ret = idr_replace(&teedev->idr, shm, shm->id); > > + mutex_unlock(&teedev->mutex); > > + if (IS_ERR(ret)) { > > + dmabuf = ret; > > + goto err_rstmem_free; > > + } > > + teedev_ctx_get(ctx); > > + > > + exp_info.ops = &rstmem_generic_buf_ops; > > + exp_info.size = shm->size; > > + exp_info.priv = shm; > > + dmabuf = dma_buf_export(&exp_info); > > + if (IS_ERR(dmabuf)) { > > + tee_shm_put(shm); > > + return dmabuf; > > + } > > + > > + *shm_id = shm->id; > > + return dmabuf; > > + > > +err_rstmem_free: > > + teedev->desc->ops->rstmem_free(ctx, shm); > > +err_idr_remove: > > + mutex_lock(&teedev->mutex); > > + idr_remove(&teedev->idr, shm->id); > > + mutex_unlock(&teedev->mutex); > > +err_kfree: > > + kfree(shm); > > +err: > > + tee_device_put(teedev); > > + return dmabuf; > > +} > > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c > > index daf6e5cfd59a..416f7f25d885 100644 > > --- a/drivers/tee/tee_shm.c > > +++ b/drivers/tee/tee_shm.c > > @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) > > "unregister shm %p failed: %d", shm, rc); > > > > release_registered_pages(shm); > > + } else if (shm->flags & TEE_SHM_RESTRICTED) { > > + teedev->desc->ops->rstmem_free(shm->ctx, shm); > > } > > > > teedev_ctx_put(shm->ctx); > > diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c > > index 80004b55628d..ee57ef157a77 100644 > > --- a/drivers/tee/tee_shm_pool.c > > +++ b/drivers/tee/tee_shm_pool.c > > @@ -1,9 +1,8 @@ > > // SPDX-License-Identifier: GPL-2.0-only > > /* > > - * Copyright (c) 2015, 2017, 2022 Linaro Limited > > + * Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited > > */ > > #include <linux/device.h> > > -#include <linux/dma-buf.h> > > #include <linux/genalloc.h> > > #include <linux/slab.h> > > #include <linux/tee_core.h> > > @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr, > > return ERR_PTR(rc); > > } > > EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem); > > + > > +static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool, > > + struct tee_shm *shm, size_t size, > > + size_t align) > > +{ > > + size_t sz = ALIGN(size, PAGE_SIZE); > > + phys_addr_t pa; > > + > > + pa = gen_pool_alloc(pool->private_data, sz); > > + if (!pa) > > + return -ENOMEM; > > + > > + shm->size = sz; > > + shm->paddr = pa; > > + > > + return 0; > > +} > > + > > +static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool, > > + struct tee_shm *shm) > > +{ > > + gen_pool_free(pool->private_data, shm->paddr, shm->size); > > + shm->paddr = 0; > > +} > > + > > +static struct tee_shm_pool_ops rstmem_pool_ops_generic = { > > + .alloc = rstmem_pool_op_gen_alloc, > > + .free = rstmem_pool_op_gen_free, > > + .destroy_pool = pool_op_gen_destroy_pool, > > +}; > > + > > +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size) > > +{ > > + const size_t page_mask = PAGE_SIZE - 1; > > + struct tee_shm_pool *pool; > > + int rc; > > + > > + /* Check it's page aligned */ > > + if ((paddr | size) & page_mask) > > + return ERR_PTR(-EINVAL); > > + > > + pool = kzalloc(sizeof(*pool), GFP_KERNEL); > > + if (!pool) > > + return ERR_PTR(-ENOMEM); > > + > > + pool->private_data = gen_pool_create(PAGE_SHIFT, -1); > > + if (!pool->private_data) { > > + rc = -ENOMEM; > > + goto err_free; > > + } > > + > > + rc = gen_pool_add(pool->private_data, paddr, size, -1); > > + if (rc) > > + goto err_free_pool; > > + > > + pool->ops = &rstmem_pool_ops_generic; > > + return pool; > > + > > +err_free_pool: > > + gen_pool_destroy(pool->private_data); > > +err_free: > > + kfree(pool); > > + > > + return ERR_PTR(rc); > > +} > > +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc); > > diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h > > index a38494d6b5f4..608302f494fe 100644 > > --- a/include/linux/tee_core.h > > +++ b/include/linux/tee_core.h > > @@ -26,6 +26,7 @@ > > #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ > > #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ > > #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ > > +#define TEE_SHM_RESTRICTED BIT(4) /* Restricted memory */ > > > > #define TEE_DEVICE_FLAG_REGISTERED 0x1 > > #define TEE_MAX_DEV_NAME_LEN 32 > > @@ -76,6 +77,8 @@ struct tee_device { > > * @supp_send: called for supplicant to send a response > > * @shm_register: register shared memory buffer in TEE > > * @shm_unregister: unregister shared memory buffer in TEE > > + * @rstmem_alloc: allocate restricted memory > > + * @rstmem_free: free restricted memory > > */ > > struct tee_driver_ops { > > void (*get_version)(struct tee_device *teedev, > > @@ -99,6 +102,9 @@ struct tee_driver_ops { > > struct page **pages, size_t num_pages, > > unsigned long start); > > int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm); > > + int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm, > > + u32 flags, u32 use_case, size_t size); > > + void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm); > > }; > > > > /** > > @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool) > > pool->ops->destroy_pool(pool); > > } > > > > +/** > > + * tee_rstmem_gen_pool_alloc() - Create a restricted memory manager > > + * @paddr: Physical address of start of pool > > + * @size: Size in bytes of the pool > > + * > > + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure. > > + */ > > +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size); > > + > > /** > > * tee_get_drvdata() - Return driver_data pointer > > * @returns the driver_data pointer supplied to tee_register(). > > diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h > > index a54c203000ed..cba067715d14 100644 > > --- a/include/linux/tee_drv.h > > +++ b/include/linux/tee_drv.h > > @@ -55,6 +55,7 @@ struct tee_context { > > * @pages: locked pages from userspace > > * @num_pages: number of locked pages > > * @refcount: reference counter > > + * @use_case: defined by TEE_IOC_UC_* in tee.h > > * @flags: defined by TEE_SHM_* in tee_core.h > > * @id: unique id of a shared memory object on this device, shared > > * with user space > > @@ -71,6 +72,7 @@ struct tee_shm { > > struct page **pages; > > size_t num_pages; > > refcount_t refcount; > > + u32 use_case; > > u32 flags; > > int id; > > u64 sec_world_id; > > diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h > > index d0430bee8292..88834448debb 100644 > > --- a/include/uapi/linux/tee.h > > +++ b/include/uapi/linux/tee.h > > @@ -1,5 +1,5 @@ > > /* > > - * Copyright (c) 2015-2016, Linaro Limited > > + * Copyright (c) 2015-2017, 2020, 2024, Linaro Limited > > * All rights reserved. > > * > > * Redistribution and use in source and binary forms, with or without > > @@ -48,6 +48,7 @@ > > #define TEE_GEN_CAP_PRIVILEGED (1 << 1)/* Privileged device (for supplicant) */ > > #define TEE_GEN_CAP_REG_MEM (1 << 2)/* Supports registering shared memory */ > > #define TEE_GEN_CAP_MEMREF_NULL (1 << 3)/* NULL MemRef support */ > > +#define TEE_GEN_CAP_RSTMEM (1 << 4)/* Supports restricted memory */ > > > > #define TEE_MEMREF_NULL (__u64)(-1) /* NULL MemRef Buffer */ > > > > @@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data { > > */ > > #define TEE_IOC_SHM_REGISTER _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \ > > struct tee_ioctl_shm_register_data) > > + > > +#define TEE_IOC_UC_RESERVED 0 > > +#define TEE_IOC_UC_SECURE_VIDEO_PLAY 1 > > +#define TEE_IOC_UC_TRUSTED_UI 2 > > +#define TEE_IOC_UC_SECURE_VIDEO_RECORD 3 > > + > > +/** > > + * struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument > > + * @size: [in/out] Size of restricted memory to allocate > > + * @flags: [in/out] Flags to/from allocate > > + * @use_case [in] Restricted memory use case, TEE_IOC_UC_* > > + * @id: [out] Identifier of the restricted memory > > + */ > > +struct tee_ioctl_rstmem_alloc_data { > > + __u64 size; > > + __u32 flags; > > + __u32 use_case; > > + __s32 id; > > +}; > > + > > +/** > > + * TEE_IOC_RSTMEM_ALLOC - allocate restricted memory > > + * > > + * Allocates restricted physically memory normally not accessible by the > > + * kernel. > > + * > > + * Restricted memory refers to memory buffers behind a hardware enforced > > + * firewall. It is not accessible to the kernel during normal circumstances > > + * but rather only accessible to certain hardware IPs or CPUs executing in > > + * higher privileged mode than the kernel itself. This interface allows to > > + * allocate and manage such restricted memory buffers via interaction with > > + * a TEE implementation. > > + * > > + * Returns a file descriptor on success or < 0 on failure > > + * > > + * The returned file descriptor is a dma-buf that can be attached and > > + * mapped for device with permission to access the physical memory. > > + */ > > +#define TEE_IOC_RSTMEM_ALLOC _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \ > > + struct tee_ioctl_rstmem_alloc_data) > > + > > /* > > * Five syscalls are used when communicating with the TEE driver. > > * open(): opens the device associated with the driver > > -- > > 2.43.0 > > > > -- > Simona Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch

6 months

Re: [PATCH v4 0/6] TEE subsystem for restricted dma-buf allocations

by Sumit Garg

On Wed, 8 Jan 2025 at 22:27, Simona Vetter <simona.vetter(a)ffwll.ch> wrote: > > On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote: > > Hi Simona, > > > > On Wed, 18 Dec 2024 at 16:36, Simona Vetter <simona.vetter(a)ffwll.ch> wrote: > > > > > > On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote: > > > > Hi, > > > > > > > > This patch set allocates the restricted DMA-bufs via the TEE subsystem. > > > > > > > > The TEE subsystem handles the DMA-buf allocations since it is the TEE > > > > (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the > > > > restrictions for the memory used for the DMA-bufs. > > > > > > > > I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted > > > > DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose > > > > how to allocate the restricted physical memory. > > > > > > > > TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also > > > > a use-case parameter. This is used by the backend TEE driver to decide on > > > > allocation policy and which devices should be able to access the memory. > > > > > > > > Three use-cases (Secure Video Playback, Trusted UI, and Secure Video > > > > Recording) has been identified so far to serve as examples of what can be > > > > expected. More use-cases can be added in userspace ABI, but it's up to the > > > > backend TEE drivers to provide the implementation. > > > > > > > > Each use-case has it's own restricted memory pool since different use-cases > > > > requires isolation from different parts of the system. A restricted memory > > > > pool can be based on a static carveout instantiated while probing the TEE > > > > backend driver, or dynamically allocated from CMA and made restricted as > > > > needed by the TEE. > > > > > > > > This can be tested on QEMU with the following steps: > > > > repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ > > > > -b prototype/sdp-v4 > > > > repo sync -j8 > > > > cd build > > > > make toolchains -j$(nproc) > > > > make SPMC_AT_EL=1 all -j$(nproc) > > > > make SPMC_AT_EL=1 run-only > > > > # login and at the prompt: > > > > xtest --sdp-basic > > > > > > > > The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at > > > > S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test > > > > without FF-A using the original SMC ABI instead. Please remember to do > > > > %rm -rf ../trusted-firmware-a/build/qemu > > > > for TF-A to be rebuilt properly using the new configuration. > > > > > > > > https://optee.readthedocs.io/en/latest/building/prerequisites.html > > > > list dependencies needed to build the above. > > > > > > > > The tests are pretty basic, mostly checking that a Trusted Application in > > > > the secure world can access and manipulate the memory. There are also some > > > > negative tests for out of bounds buffers etc. > > > > > > I think I've dropped this on earlier encrypted dma-buf discussions for > > > TEE, but can't find one right now ... > > > > Thanks for raising this query. > > > > > > > > Do we have some open source userspace for this? To my knowledge we have > > > two implementations of encrypted/content protected dma-buf in upstream > > > right now in the amd and intel gpu drivers, and unless I'm mistaken they > > > both have some minimal userspace supporting EXT_protected_textures: > > > > First of all to clarify the support Jens is adding here for allocating > > restricted shared memory allocation in TEE subsystem is meant to be > > generic and not specific to only secure media pipeline use-case. Then > > here we not only have open source test applications but rather open > > source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a > > core feature where we maintain a stable and extensible ABI among the > > kernel and the OP-TEE core. > > > > Restricted memory is a feature enforced by hardware specific firewalls > > where a particular TEE implementation governs which particular block > > of memory is accessible to a particular peripheral or a CPU running in > > a higher privileged mode than the Linux kernel. There can be numeric > > use-cases surrounding that as follows: > > > > - Secure media pipeline where the contents gets decrypted and stored > > in a restricted buffer which are then accessible only to media display > > pipeline peripherals. > > - Trusted user interface where a peripheral takes input from the user > > and stores it in a restricted buffer which then is accessible to TEE > > implementation only. > > - Another possible use-case can be for the TEE implementation to store > > key material in a restricted buffer which is only accessible to the > > hardware crypto accelerator. > > > > I am sure there will be more use-cases related to this feature but > > those will only be possible once we provide a stable and extensible > > restricted memory interface among the Linux user-space and the secure > > world user-space (normally referred to as Trusted Applications). > > > > [1] https://github.com/OP-TEE/optee_os/pull/7159 > > > > > > > > https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EX… > > > > > > It's not great, but it does just barely clear the bar in my opinion. I > > > guess something in gstreamer or similar video pipeline framework would > > > also do the job. > > > > > > Especially with the context of the uapi discussion in the v1/RFC thread I > > > think we need more than a bare-bones testcase to make sure this works in > > > actual use. > > > > Currently the TEE subsystem already supports a stable ABI for shared > > memory allocator among Linux user-space and secure world user-space > > here [2]. And the stable ABI for restricted memory is also along the > > same lines meant to be a vendor neutral abstraction for the user-space > > access. The current test cases not only test the interface but also > > perform regression tests too. > > > > I am also in favour of end to end open source use-cases. But I fear > > without progressing in a step wise manner as with this proposal we > > would rather force developers to upstream all the software pieces in > > one go which will be kind of a chicken and egg situation. I am sure > > once this feature lands Mediatek folks will be interested to port > > their secure video playback patchset [3] on top of it. Similarly other > > silicon vendors like NXP, Qcom etc. will be motivated to do the same. > > > > [2] https://docs.kernel.org/userspace-api/tee.html > > [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@med… > > We get entire opengl/vulkan driver stacks ready before we merge new drm > drivers, I really don't think this is too hard from a technical pov. And I > think the mediatek patches had the same issue of lacking userspace for it, > so that's not moving things forward. > -Sima > Okay fair enough, I think I get your point. Currently we are missing at least one peripheral support being the consumer for these restricted DMA-bufs. So I discussed with Jens offline that we can try with a crypto peripheral use-case first which can simply be demonstrated using the current OP-TEE client user-space. Also, in crypto peripheral use-case we can target the symmetric crypto use-case first which already has a concept of hardware backed symmetric key [1]. IOW, we should be able to come up with a generic symmetric crypto algorithm which can be supported by different crypto accelerators using a TEE backed restricted key DMA buffer. [1] https://www.youtube.com/watch?v=GbcpwUBFGDw -Sumit

6 months

Re: [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device

by Jason Gunthorpe

On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote: > Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as > for private assignment. For these private assigned devices, disallow > host accessing their MMIO resources. Why? Shouldn't the VMM simply not call mmap? Why does the kernel have to enforce this? Jason

6 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig January 2025