Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

20 participants
2957 discussions

Re: [PATCH bpf-next v6 4/5] selftests/bpf: Add test for dmabuf_iter

by T.J. Mercier

On Wed, May 14, 2025 at 1:53 PM Song Liu <song(a)kernel.org> wrote: > > On Tue, May 13, 2025 at 9:36 AM T.J. Mercier <tjmercier(a)google.com> wrote: > > > > This test creates a udmabuf, and a dmabuf from the system dmabuf heap, > > and uses a BPF program that prints dmabuf metadata with the new > > dmabuf_iter to verify they can be found. > > > > Signed-off-by: T.J. Mercier <tjmercier(a)google.com> > > Acked-by: Christian König <christian.koenig(a)amd.com> > > Acked-by: Song Liu <song(a)kernel.org> Thanks. > > With one more comment below. > > [...] > > > diff --git a/tools/testing/selftests/bpf/progs/dmabuf_iter.c b/tools/testing/selftests/bpf/progs/dmabuf_iter.c > > new file mode 100644 > > index 000000000000..2a1b5397196d > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/progs/dmabuf_iter.c > > @@ -0,0 +1,53 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* Copyright (c) 2025 Google LLC */ > > +#include <vmlinux.h> > > +#include <bpf/bpf_core_read.h> > > +#include <bpf/bpf_helpers.h> > > + > > +/* From uapi/linux/dma-buf.h */ > > +#define DMA_BUF_NAME_LEN 32 > > + > > +char _license[] SEC("license") = "GPL"; > > + > > +/* > > + * Fields output by this iterator are delimited by newlines. Convert any > > + * newlines in user-provided printed strings to spaces. > > + */ > > +static void sanitize_string(char *src, size_t size) > > +{ > > + for (char *c = src; *c && (size_t)(c - src) < size; ++c) > > We should do the size check first, right? IOW: > > for (char *c = src; (size_t)(c - src) < size && *c; ++c) Yeah if you call the function with size = 0, which is kinda questionable and not possible with the non-zero array size that is tied to immutable UAPI. Let's change it like you suggest. > > > + if (*c == '\n') > > + *c = ' '; > > +} > > + > [...]

2 months

[PATCH v4 00/40] drm/msm: sparse / "VM_BIND" support

by Rob Clark

From: Rob Clark <robdclark(a)chromium.org> Conversion to DRM GPU VA Manager[1], and adding support for Vulkan Sparse Memory[2] in the form of: 1. A new VM_BIND submitqueue type for executing VM MSM_SUBMIT_BO_OP_MAP/ MAP_NULL/UNMAP commands 2. A new VM_BIND ioctl to allow submitting batches of one or more MAP/MAP_NULL/UNMAP commands to a VM_BIND submitqueue I did not implement support for synchronous VM_BIND commands. Since userspace could just immediately wait for the `SUBMIT` to complete, I don't think we need this extra complexity in the kernel. Synchronous/immediate VM_BIND operations could be implemented with a 2nd VM_BIND submitqueue. The corresponding mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533 Changes in v4: - Various locking/etc fixes - Optimize the pgtable preallocation. If userspace sorts the VM_BIND ops then the kernel detects ops that fall into the same 2MB last level PTD to avoid duplicate page preallocation. - Add way to throttle pushing jobs to the scheduler, to cap the amount of potentially temporary prealloc'd pgtable pages. - Add vm_log to devcoredump for debugging. If the vm_log_shift module param is set, keep a log of the last 1<<vm_log_shift VM updates for easier debugging of faults/crashes. - Link to v3: https://lore.kernel.org/all/20250428205619.227835-1-robdclark@gmail.com/ Changes in v3: - Switched to seperate VM_BIND ioctl. This makes the UABI a bit cleaner, but OTOH the userspace code was cleaner when the end result of either type of VkQueue lead to the same ioctl. So I'm a bit on the fence. - Switched to doing the gpuvm bookkeeping synchronously, and only deferring the pgtable updates. This avoids needing to hold any resv locks in the fence signaling path, resolving the last shrinker related lockdep complaints. OTOH it means userspace can trigger invalid pgtable updates with multiple VM_BIND queues. In this case, we ensure that unmaps happen completely (to prevent userspace from using this to access free'd pages), mark the context as unusable, and move on with life. - Link to v2: https://lore.kernel.org/all/20250319145425.51935-1-robdclark@gmail.com/ Changes in v2: - Dropped Bibek Kumar Patro's arm-smmu patches[3], which have since been merged. - Pre-allocate all the things, and drop HACK patch which disabled shrinker. This includes ensuring that vm_bo objects are allocated up front, pre- allocating VMA objects, and pre-allocating pages used for pgtable updates. The latter utilizes io_pgtable_cfg callbacks for pgtable alloc/free, that were initially added for panthor. - Add back support for BO dumping for devcoredump. - Link to v1 (RFC): https://lore.kernel.org/dri-devel/20241207161651.410556-1-robdclark@gmail.c… [1] https://www.kernel.org/doc/html/next/gpu/drm-mm.html#drm-gpuvm [2] https://docs.vulkan.org/spec/latest/chapters/sparsemem.html [3] https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=909700 Rob Clark (40): drm/gpuvm: Don't require obj lock in destructor path drm/gpuvm: Allow VAs to hold soft reference to BOs drm/gem: Add ww_acquire_ctx support to drm_gem_lru_scan() drm/sched: Add enqueue credit limit iommu/io-pgtable-arm: Add quirk to quiet WARN_ON() drm/msm: Rename msm_file_private -> msm_context drm/msm: Improve msm_context comments drm/msm: Rename msm_gem_address_space -> msm_gem_vm drm/msm: Remove vram carveout support drm/msm: Collapse vma allocation and initialization drm/msm: Collapse vma close and delete drm/msm: Don't close VMAs on purge drm/msm: drm_gpuvm conversion drm/msm: Convert vm locking drm/msm: Use drm_gpuvm types more drm/msm: Split out helper to get iommu prot flags drm/msm: Add mmu support for non-zero offset drm/msm: Add PRR support drm/msm: Rename msm_gem_vma_purge() -> _unmap() drm/msm: Drop queued submits on lastclose() drm/msm: Lazily create context VM drm/msm: Add opt-in for VM_BIND drm/msm: Mark VM as unusable on GPU hangs drm/msm: Add _NO_SHARE flag drm/msm: Crashdump prep for sparse mappings drm/msm: rd dumping prep for sparse mappings drm/msm: Crashdec support for sparse drm/msm: rd dumping support for sparse drm/msm: Extract out syncobj helpers drm/msm: Use DMA_RESV_USAGE_BOOKKEEP/KERNEL drm/msm: Add VM_BIND submitqueue drm/msm: Support IO_PGTABLE_QUIRK_NO_WARN_ON drm/msm: Support pgtable preallocation drm/msm: Split out map/unmap ops drm/msm: Add VM_BIND ioctl drm/msm: Add VM logging for VM_BIND updates drm/msm: Add VMA unmap reason drm/msm: Add mmu prealloc tracepoint drm/msm: use trylock for debugfs drm/msm: Bump UAPI version drivers/gpu/drm/drm_gem.c | 14 +- drivers/gpu/drm/drm_gpuvm.c | 15 +- drivers/gpu/drm/msm/Kconfig | 1 + drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 25 +- drivers/gpu/drm/msm/adreno/a2xx_gpummu.c | 5 +- drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 17 +- drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 17 +- drivers/gpu/drm/msm/adreno/a5xx_debugfs.c | 4 +- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 22 +- drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +- drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 10 +- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 32 +- drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 49 +- drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 6 +- drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 10 +- drivers/gpu/drm/msm/adreno/adreno_device.c | 4 - drivers/gpu/drm/msm/adreno/adreno_gpu.c | 99 +- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 23 +- .../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c | 14 +- drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c | 18 +- drivers/gpu/drm/msm/disp/dpu1/dpu_formats.h | 2 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 18 +- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 14 +- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.h | 4 +- drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 6 +- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 28 +- drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 12 +- drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c | 4 +- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 19 +- drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 12 +- drivers/gpu/drm/msm/dsi/dsi_host.c | 14 +- drivers/gpu/drm/msm/msm_drv.c | 184 +-- drivers/gpu/drm/msm/msm_drv.h | 35 +- drivers/gpu/drm/msm/msm_fb.c | 18 +- drivers/gpu/drm/msm/msm_fbdev.c | 2 +- drivers/gpu/drm/msm/msm_gem.c | 494 +++--- drivers/gpu/drm/msm/msm_gem.h | 247 ++- drivers/gpu/drm/msm/msm_gem_prime.c | 15 + drivers/gpu/drm/msm/msm_gem_shrinker.c | 104 +- drivers/gpu/drm/msm/msm_gem_submit.c | 295 ++-- drivers/gpu/drm/msm/msm_gem_vma.c | 1471 ++++++++++++++++- drivers/gpu/drm/msm/msm_gpu.c | 214 ++- drivers/gpu/drm/msm/msm_gpu.h | 144 +- drivers/gpu/drm/msm/msm_gpu_trace.h | 14 + drivers/gpu/drm/msm/msm_iommu.c | 302 +++- drivers/gpu/drm/msm/msm_kms.c | 18 +- drivers/gpu/drm/msm/msm_kms.h | 2 +- drivers/gpu/drm/msm/msm_mmu.h | 38 +- drivers/gpu/drm/msm/msm_rd.c | 62 +- drivers/gpu/drm/msm/msm_ringbuffer.c | 10 +- drivers/gpu/drm/msm/msm_submitqueue.c | 96 +- drivers/gpu/drm/msm/msm_syncobj.c | 172 ++ drivers/gpu/drm/msm/msm_syncobj.h | 37 + drivers/gpu/drm/scheduler/sched_entity.c | 16 +- drivers/gpu/drm/scheduler/sched_main.c | 3 + drivers/iommu/io-pgtable-arm.c | 27 +- include/drm/drm_gem.h | 10 +- include/drm/drm_gpuvm.h | 12 +- include/drm/gpu_scheduler.h | 13 +- include/linux/io-pgtable.h | 8 + include/uapi/drm/msm_drm.h | 149 +- 63 files changed, 3484 insertions(+), 1251 deletions(-) create mode 100644 drivers/gpu/drm/msm/msm_syncobj.c create mode 100644 drivers/gpu/drm/msm/msm_syncobj.h -- 2.49.0

2 months

[PATCH v4 00/40] drm/msm: sparse / "VM_BIND" support

by Rob Clark

2 months

Re: [RFC PATCH 00/12] Private MMIO support for private assigned dev

by Jason Gunthorpe

On Wed, May 14, 2025 at 03:02:53PM +0800, Xu Yilun wrote: > > We have an awkward fit for what CCA people are doing to the various > > Linux APIs. Looking somewhat maximally across all the arches a "bind" > > for a CC vPCI device creation operation does: > > > > - Setup the CPU page tables for the VM to have access to the MMIO > > This is guest side thing, is it? Anything host need to opt-in? CPU hypervisor page tables. > > - Revoke hypervisor access to the MMIO > > VFIO could choose never to mmap MMIO, so in this case nothing to do? Yes, if you do it that way. > > - Setup the vIOMMU to understand the vPCI device > > - Take over control of some of the IOVA translation, at least for T=1, > > and route to the the vIOMMU > > - Register the vPCI with any attestation functions the VM might use > > - Do some DOE stuff to manage/validate TDSIP/etc > > Intel TDX Connect has a extra requirement for "unbind": > > - Revoke KVM page table (S-EPT) for the MMIO only after TDISP > CONFIG_UNLOCK Maybe you could express this as the S-EPT always has the MMIO mapped into it as long as the vPCI function is installed to the VM? Is KVM responsible for the S-EPT? > Another thing is, seems your term "bind" includes all steps for > shared -> private conversion. Well, I was talking about vPCI creation. I understand that during the vPCI lifecycle the VM will do "bind" "unbind" which are more or less switching the device into a T=1 mode. Though I understood on some arches this was mostly invisible to the hypervisor? > But in my mind, "bind" only includes > putting device in TDISP LOCK state & corresponding host setups required > by firmware. I.e "bind" means host lockes down the CC setup, waiting for > guest attestation. So we will need to have some other API for this that modifies the vPCI object. It might be reasonable to have VFIO reach into iommufd to do that on an already existing iommufd VDEVICE object. A little weird, but we could probably make that work. But you have some weird ordering issues here if the S-EPT has to have the VFIO MMIO then you have to have a close() destruction order that sees VFIO remove the S-EPT and release the KVM, then have iommufd destroy the VDEVICE object. > > It doesn't mean that iommufd is suddenly doing PCI stuff, no, that > > stays in VFIO. > > I'm not sure if Alexey's patch [1] illustates your idea. It calls > tsm_tdi_bind() which directly does device stuff, and impacts MMIO. > VFIO doesn't know about this. > > I have to interpret this as VFIO firstly hand over device CC features > and MMIO resources to IOMMUFD, so VFIO never cares about them. > > [1] https://lore.kernel.org/all/20250218111017.491719-15-aik@amd.com/ There is also the PCI layer involved here and maybe PCI should be participating in managing some of this. Like it makes a bit of sense that PCI would block the FLR on platforms that require this? Jason

2 months

Re: [RFC v2 10/13] dma-fence: Add safe access helpers and document the rules

by Rob Clark

On Wed, May 14, 2025 at 7:58 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote: > > > On 14/05/2025 14:57, Rob Clark wrote: > > On Wed, May 14, 2025 at 3:01 AM Tvrtko Ursulin > > <tvrtko.ursulin(a)igalia.com> wrote: > >> > >> > >> On 13/05/2025 15:16, Rob Clark wrote: > >>> On Fri, May 9, 2025 at 8:34 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote: > >>>> > >>>> Dma-fence objects currently suffer from a potential use after free problem > >>>> where fences exported to userspace and other drivers can outlive the > >>>> exporting driver, or the associated data structures. > >>>> > >>>> The discussion on how to address this concluded that adding reference > >>>> counting to all the involved objects is not desirable, since it would need > >>>> to be very wide reaching and could cause unloadable drivers if another > >>>> entity would be holding onto a signaled fence reference potentially > >>>> indefinitely. > >>>> > >>>> This patch enables the safe access by introducing and documenting a > >>>> contract between fence exporters and users. It documents a set of > >>>> contraints and adds helpers which a) drivers with potential to suffer from > >>>> the use after free must use and b) users of the dma-fence API must use as > >>>> well. > >>>> > >>>> Premise of the design has multiple sides: > >>>> > >>>> 1. Drivers (fence exporters) MUST ensure a RCU grace period between > >>>> signalling a fence and freeing the driver private data associated with it. > >>>> > >>>> The grace period does not have to follow the signalling immediately but > >>>> HAS to happen before data is freed. > >>>> > >>>> 2. Users of the dma-fence API marked with such requirement MUST contain > >>>> the complete access to the data within a single code block guarded by the > >>>> new dma_fence_access_begin() and dma_fence_access_end() helpers. > >>>> > >>>> The combination of the two ensures that whoever sees the > >>>> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > >>>> valid fence->lock and valid data potentially accessed by the fence->ops > >>>> virtual functions, until the call to dma_fence_access_end(). > >>>> > >>>> 3. Module unload (fence->ops) disappearing is for now explicitly not > >>>> handled. That would required a more complex protection, possibly needing > >>>> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(), > >>>> where race between dma_fence_enable_sw_signaling, signalling, and > >>>> dereference of fence->ops->wait() would need a sleeping SRCU context. > >>>> > >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> > >>>> --- > >>>> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++ > >>>> include/linux/dma-fence.h | 32 ++++++++++++----- > >>>> 2 files changed, 93 insertions(+), 8 deletions(-) > >>>> > >>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > >>>> index dc2456f68685..cfe1d7b79c22 100644 > >>>> --- a/drivers/dma-buf/dma-fence.c > >>>> +++ b/drivers/dma-buf/dma-fence.c > >>>> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref) > >>>> struct dma_fence *fence = > >>>> container_of(kref, struct dma_fence, refcount); > >>>> > >>>> + dma_fence_access_begin(); > >>>> trace_dma_fence_destroy(fence); > >>>> > >>>> if (WARN(!list_empty(&fence->cb_list) && > >>>> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref) > >>>> fence->ops->release(fence); > >>>> else > >>>> dma_fence_free(fence); > >>>> + > >>>> + dma_fence_access_end(); > >>>> } > >>>> EXPORT_SYMBOL(dma_fence_release); > >>>> > >>>> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > >>>> */ > >>>> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > >>>> { > >>>> + dma_fence_access_begin(); > >>>> seq_printf(seq, "%s %s seq %llu %ssignalled\n", > >>>> dma_fence_driver_name(fence), > >>>> dma_fence_timeline_name(fence), > >>>> fence->seqno, > >>>> dma_fence_is_signaled(fence) ? "" : "un"); > >>>> + dma_fence_access_end(); > >>>> } > >>>> EXPORT_SYMBOL(dma_fence_describe); > >>>> > >>>> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, > >>>> __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags); > >>>> } > >>>> EXPORT_SYMBOL(dma_fence_init64); > >>>> + > >>>> +/** > >>>> + * dma_fence_driver_name - Access the driver name > >>>> + * @fence: the fence to query > >>>> + * > >>>> + * Returns a driver name backing the dma-fence implementation. > >>>> + * > >>>> + * IMPORTANT CONSIDERATION: > >>>> + * Dma-fence contract stipulates that access to driver provided data (data not > >>>> + * directly embedded into the object itself), such as the &dma_fence.lock and > >>>> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > >>>> + * after the fence has been signalled. Drivers are allowed to free that data, > >>>> + * and some do. > >>>> + * > >>>> + * To allow safe access drivers are mandated to guarantee a RCU grace period > >>>> + * between signalling the fence and freeing said data. > >>>> + * > >>>> + * As such access to the driver name is only valid inside a RCU locked section. > >>>> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > >>>> + * by the &dma_fence_access_being and &dma_fence_access_end pair. > >>>> + */ > >>>> +const char *dma_fence_driver_name(struct dma_fence *fence) > >>>> +{ > >>>> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > >>>> + "rcu_read_lock() required for safe access to returned string"); > >>>> + > >>>> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > >>>> + return fence->ops->get_driver_name(fence); > >>>> + else > >>>> + return "detached-driver"; > >>>> +} > >>>> +EXPORT_SYMBOL(dma_fence_driver_name); > >>>> + > >>>> +/** > >>>> + * dma_fence_timeline_name - Access the timeline name > >>>> + * @fence: the fence to query > >>>> + * > >>>> + * Returns a timeline name provided by the dma-fence implementation. > >>>> + * > >>>> + * IMPORTANT CONSIDERATION: > >>>> + * Dma-fence contract stipulates that access to driver provided data (data not > >>>> + * directly embedded into the object itself), such as the &dma_fence.lock and > >>>> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > >>>> + * after the fence has been signalled. Drivers are allowed to free that data, > >>>> + * and some do. > >>>> + * > >>>> + * To allow safe access drivers are mandated to guarantee a RCU grace period > >>>> + * between signalling the fence and freeing said data. > >>>> + * > >>>> + * As such access to the driver name is only valid inside a RCU locked section. > >>>> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > >>>> + * by the &dma_fence_access_being and &dma_fence_access_end pair. > >>>> + */ > >>>> +const char *dma_fence_timeline_name(struct dma_fence *fence) > >>>> +{ > >>>> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > >>>> + "rcu_read_lock() required for safe access to returned string"); > >>>> + > >>>> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > >>>> + return fence->ops->get_driver_name(fence); > >>>> + else > >>>> + return "signaled-timeline"; > >>> > >>> This means that trace_dma_fence_signaled() will get the wrong > >>> timeline/driver name, which probably screws up perfetto and maybe > >>> other tools. > >> > >> Do you think context and seqno are not enough for those tools and they > >> actually rely on the names? It would sound weird if they decided to > >> index anything on the names which are non-standardised between drivers, > >> but I guess anything is possible. > > > > At some point perfetto uses the timeline name to put up a named fence > > timeline, I'm not sure if it is using the name or context # for > > subsequent fence events (namely, signalled). I'd have to check the > > code and get back to you. > > If you can it would be useful. Presumably it saves the names from the > start edge of fence lifetime. But again, who knows. Ok, it looks like perfetto is ok... mostly.. DrmTracker::GetFenceTimelineByContext() will try to lookup the timeline by context #, and then if that fails, create a new timeline with the name from the trace event, and add it to the hashmap. It might be that "signaled-timeline" shows up if the first event seen is the fence-signaled event. > > There is also gpuvis, which I guess does something similar, but > > haven't looked into it. Idk if there are others. > > I know GpuVis uses DRM sched tracepoints since Pierre-Eric was > explaining me about those in the context of tracing rework he did there. > I am not sure about dma-fence tracepoints. > > +Pierre-Eric on the off chance you know from the top of your head how > much GpuVis depends on them (dma-fence tracepoints). > > >>> Maybe it would work well enough just to move the > >>> trace_dma_fence_signaled() call ahead of the test_and_set_bit()? Idk > >>> if some things will start getting confused if they see that trace > >>> multiple times. > >> > >> Another alternative is to make this tracepoint access the names > >> directly. It is under the lock so guaranteed not to get freed with > >> drivers which will be made compliant with the documented rules. > > > > I guess it would have been better if, other than dma_fence_init > > tracepoint, later tracepoints didn't include the driver/timeline > > name.. that would have forced the use of the context. But I guess too > > late for that. Perhaps the least bad thing to do is use the locking? > > You mean this last alternative I mentioned? I think that will work fine. > I'll wait a little bit longer for more potential comments before re-spi > ning with that. yes > Were you able to test the series for your use case? Assuming it is not > upstream msm since I don't immediately see a path in msm_fence which > gets freed at runtime? Not yet, but I think it should because it is the exact same problem your igt test triggers. This is with my VM_BIND series, which will dynamically create/teardown sched entities BR, -R

2 months

Re: [RFC v2 10/13] dma-fence: Add safe access helpers and document the rules

by Rob Clark

On Wed, May 14, 2025 at 3:01 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote: > > > On 13/05/2025 15:16, Rob Clark wrote: > > On Fri, May 9, 2025 at 8:34 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote: > >> > >> Dma-fence objects currently suffer from a potential use after free problem > >> where fences exported to userspace and other drivers can outlive the > >> exporting driver, or the associated data structures. > >> > >> The discussion on how to address this concluded that adding reference > >> counting to all the involved objects is not desirable, since it would need > >> to be very wide reaching and could cause unloadable drivers if another > >> entity would be holding onto a signaled fence reference potentially > >> indefinitely. > >> > >> This patch enables the safe access by introducing and documenting a > >> contract between fence exporters and users. It documents a set of > >> contraints and adds helpers which a) drivers with potential to suffer from > >> the use after free must use and b) users of the dma-fence API must use as > >> well. > >> > >> Premise of the design has multiple sides: > >> > >> 1. Drivers (fence exporters) MUST ensure a RCU grace period between > >> signalling a fence and freeing the driver private data associated with it. > >> > >> The grace period does not have to follow the signalling immediately but > >> HAS to happen before data is freed. > >> > >> 2. Users of the dma-fence API marked with such requirement MUST contain > >> the complete access to the data within a single code block guarded by the > >> new dma_fence_access_begin() and dma_fence_access_end() helpers. > >> > >> The combination of the two ensures that whoever sees the > >> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > >> valid fence->lock and valid data potentially accessed by the fence->ops > >> virtual functions, until the call to dma_fence_access_end(). > >> > >> 3. Module unload (fence->ops) disappearing is for now explicitly not > >> handled. That would required a more complex protection, possibly needing > >> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(), > >> where race between dma_fence_enable_sw_signaling, signalling, and > >> dereference of fence->ops->wait() would need a sleeping SRCU context. > >> > >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> > >> --- > >> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++ > >> include/linux/dma-fence.h | 32 ++++++++++++----- > >> 2 files changed, 93 insertions(+), 8 deletions(-) > >> > >> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > >> index dc2456f68685..cfe1d7b79c22 100644 > >> --- a/drivers/dma-buf/dma-fence.c > >> +++ b/drivers/dma-buf/dma-fence.c > >> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref) > >> struct dma_fence *fence = > >> container_of(kref, struct dma_fence, refcount); > >> > >> + dma_fence_access_begin(); > >> trace_dma_fence_destroy(fence); > >> > >> if (WARN(!list_empty(&fence->cb_list) && > >> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref) > >> fence->ops->release(fence); > >> else > >> dma_fence_free(fence); > >> + > >> + dma_fence_access_end(); > >> } > >> EXPORT_SYMBOL(dma_fence_release); > >> > >> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > >> */ > >> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > >> { > >> + dma_fence_access_begin(); > >> seq_printf(seq, "%s %s seq %llu %ssignalled\n", > >> dma_fence_driver_name(fence), > >> dma_fence_timeline_name(fence), > >> fence->seqno, > >> dma_fence_is_signaled(fence) ? "" : "un"); > >> + dma_fence_access_end(); > >> } > >> EXPORT_SYMBOL(dma_fence_describe); > >> > >> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, > >> __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags); > >> } > >> EXPORT_SYMBOL(dma_fence_init64); > >> + > >> +/** > >> + * dma_fence_driver_name - Access the driver name > >> + * @fence: the fence to query > >> + * > >> + * Returns a driver name backing the dma-fence implementation. > >> + * > >> + * IMPORTANT CONSIDERATION: > >> + * Dma-fence contract stipulates that access to driver provided data (data not > >> + * directly embedded into the object itself), such as the &dma_fence.lock and > >> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > >> + * after the fence has been signalled. Drivers are allowed to free that data, > >> + * and some do. > >> + * > >> + * To allow safe access drivers are mandated to guarantee a RCU grace period > >> + * between signalling the fence and freeing said data. > >> + * > >> + * As such access to the driver name is only valid inside a RCU locked section. > >> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > >> + * by the &dma_fence_access_being and &dma_fence_access_end pair. > >> + */ > >> +const char *dma_fence_driver_name(struct dma_fence *fence) > >> +{ > >> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > >> + "rcu_read_lock() required for safe access to returned string"); > >> + > >> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > >> + return fence->ops->get_driver_name(fence); > >> + else > >> + return "detached-driver"; > >> +} > >> +EXPORT_SYMBOL(dma_fence_driver_name); > >> + > >> +/** > >> + * dma_fence_timeline_name - Access the timeline name > >> + * @fence: the fence to query > >> + * > >> + * Returns a timeline name provided by the dma-fence implementation. > >> + * > >> + * IMPORTANT CONSIDERATION: > >> + * Dma-fence contract stipulates that access to driver provided data (data not > >> + * directly embedded into the object itself), such as the &dma_fence.lock and > >> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > >> + * after the fence has been signalled. Drivers are allowed to free that data, > >> + * and some do. > >> + * > >> + * To allow safe access drivers are mandated to guarantee a RCU grace period > >> + * between signalling the fence and freeing said data. > >> + * > >> + * As such access to the driver name is only valid inside a RCU locked section. > >> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > >> + * by the &dma_fence_access_being and &dma_fence_access_end pair. > >> + */ > >> +const char *dma_fence_timeline_name(struct dma_fence *fence) > >> +{ > >> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > >> + "rcu_read_lock() required for safe access to returned string"); > >> + > >> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > >> + return fence->ops->get_driver_name(fence); > >> + else > >> + return "signaled-timeline"; > > > > This means that trace_dma_fence_signaled() will get the wrong > > timeline/driver name, which probably screws up perfetto and maybe > > other tools. > > Do you think context and seqno are not enough for those tools and they > actually rely on the names? It would sound weird if they decided to > index anything on the names which are non-standardised between drivers, > but I guess anything is possible. At some point perfetto uses the timeline name to put up a named fence timeline, I'm not sure if it is using the name or context # for subsequent fence events (namely, signalled). I'd have to check the code and get back to you. There is also gpuvis, which I guess does something similar, but haven't looked into it. Idk if there are others. > > Maybe it would work well enough just to move the > > trace_dma_fence_signaled() call ahead of the test_and_set_bit()? Idk > > if some things will start getting confused if they see that trace > > multiple times. > > Another alternative is to make this tracepoint access the names > directly. It is under the lock so guaranteed not to get freed with > drivers which will be made compliant with the documented rules. I guess it would have been better if, other than dma_fence_init tracepoint, later tracepoints didn't include the driver/timeline name.. that would have forced the use of the context. But I guess too late for that. Perhaps the least bad thing to do is use the locking? BR, -R

2 months

Re: [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules

by Christian König

I'm going to push patches #1-#6 to drm-misc-next. They make sense as a stand alone cleanups anyway. But that here needs a bit more documentation I think. On 5/13/25 09:45, Tvrtko Ursulin wrote: > Dma-fence objects currently suffer from a potential use after free problem > where fences exported to userspace and other drivers can outlive the > exporting driver, or the associated data structures. > > The discussion on how to address this concluded that adding reference > counting to all the involved objects is not desirable, since it would need > to be very wide reaching and could cause unloadable drivers if another > entity would be holding onto a signaled fence reference potentially > indefinitely. > > This patch enables the safe access by introducing and documenting a > contract between fence exporters and users. It documents a set of > contraints and adds helpers which a) drivers with potential to suffer from > the use after free must use and b) users of the dma-fence API must use as > well. > > Premise of the design has multiple sides: > > 1. Drivers (fence exporters) MUST ensure a RCU grace period between > signalling a fence and freeing the driver private data associated with it. That's a must have anyway, otherwise functions like dma_fence_get_rcu() won't work. I hope that we have documented that somewhere, but I'm not 100% sure to be honest. > The grace period does not have to follow the signalling immediately but > HAS to happen before data is freed. That is the new requirement we have to document somehow. I'm not 100% sure but I think module unloading waits for an RCU grace period anyway. > 2. Users of the dma-fence API marked with such requirement MUST contain > the complete access to the data within a single code block guarded by the > new dma_fence_access_begin() and dma_fence_access_end() helpers. > > The combination of the two ensures that whoever sees the > DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > valid fence->lock and valid data potentially accessed by the fence->ops > virtual functions, until the call to dma_fence_access_end(). Mhm, how about returning copies of the string? This is only for debugging anyway and kstrdup_const() isn't that costly. Regards, Christian. > > 3. Module unload (fence->ops) disappearing is for now explicitly not > handled. That would required a more complex protection, possibly needing > SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(), > where race between dma_fence_enable_sw_signaling, signalling, and > dereference of fence->ops->wait() would need a sleeping SRCU context. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> > --- > drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++ > include/linux/dma-fence.h | 32 ++++++++++++----- > 2 files changed, 93 insertions(+), 8 deletions(-) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index dc2456f68685..cfe1d7b79c22 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref) > struct dma_fence *fence = > container_of(kref, struct dma_fence, refcount); > > + dma_fence_access_begin(); > trace_dma_fence_destroy(fence); > > if (WARN(!list_empty(&fence->cb_list) && > @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref) > fence->ops->release(fence); > else > dma_fence_free(fence); > + > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_release); > > @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > */ > void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > { > + dma_fence_access_begin(); > seq_printf(seq, "%s %s seq %llu %ssignalled\n", > dma_fence_driver_name(fence), > dma_fence_timeline_name(fence), > fence->seqno, > dma_fence_is_signaled(fence) ? "" : "un"); > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_describe); > > @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, > __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags); > } > EXPORT_SYMBOL(dma_fence_init64); > + > +/** > + * dma_fence_driver_name - Access the driver name > + * @fence: the fence to query > + * > + * Returns a driver name backing the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_driver_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "detached-driver"; > +} > +EXPORT_SYMBOL(dma_fence_driver_name); > + > +/** > + * dma_fence_timeline_name - Access the timeline name > + * @fence: the fence to query > + * > + * Returns a timeline name provided by the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_timeline_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "signaled-timeline"; > +} > +EXPORT_SYMBOL(dma_fence_timeline_name); > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index c5ac37e10d85..b39e430142ea 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -377,15 +377,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence, > struct dma_fence_cb *cb); > void dma_fence_enable_sw_signaling(struct dma_fence *fence); > > -static inline const char *dma_fence_driver_name(struct dma_fence *fence) > -{ > - return fence->ops->get_driver_name(fence); > -} > +/** > + * DOC: Safe external access to driver provided object members > + * > + * All data not stored directly in the dma-fence object, such as the > + * &dma_fence.lock and memory potentially accessed by functions in the > + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled > + * because after that point drivers are allowed to free it. > + * > + * All code accessing that data via the dma-fence API (or directly, which is > + * discouraged), MUST make sure to contain the complete access within a > + * &dma_fence_access_begin and &dma_fence_access_end pair. > + * > + * Some dma-fence API handles this automatically, while other, as for example > + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that > + * responsibility to the caller. > + * > + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses > + * between signalling the fence and freeing the said data. > + * > + */ > +#define dma_fence_access_begin rcu_read_lock > +#define dma_fence_access_end rcu_read_unlock > > -static inline const char *dma_fence_timeline_name(struct dma_fence *fence) > -{ > - return fence->ops->get_timeline_name(fence); > -} > +const char *dma_fence_driver_name(struct dma_fence *fence); > +const char *dma_fence_timeline_name(struct dma_fence *fence); > > /** > * dma_fence_is_signaled_locked - Return an indication if the fence

2 months

Re: [RFC v3 05/10] drm/amdgpu: Use dma-fence driver and timeline name helpers

by Christian König

On 5/13/25 09:45, Tvrtko Ursulin wrote: > Access the dma-fence internals via the previously added helpers. > > Drop the macro while at it, since the length is now more manageable. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > index 11dd2e0f7979..4c61e4168f23 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > @@ -32,9 +32,6 @@ > #define TRACE_SYSTEM amdgpu > #define TRACE_INCLUDE_FILE amdgpu_trace > > -#define AMDGPU_JOB_GET_TIMELINE_NAME(job) \ > - job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished) > - > TRACE_EVENT(amdgpu_device_rreg, > TP_PROTO(unsigned did, uint32_t reg, uint32_t value), > TP_ARGS(did, reg, value), > @@ -168,7 +165,7 @@ TRACE_EVENT(amdgpu_cs_ioctl, > TP_ARGS(job), > TP_STRUCT__entry( > __field(uint64_t, sched_job_id) > - __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)) > + __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished)) > __field(unsigned int, context) > __field(unsigned int, seqno) > __field(struct dma_fence *, fence) > @@ -194,7 +191,7 @@ TRACE_EVENT(amdgpu_sched_run_job, > TP_ARGS(job), > TP_STRUCT__entry( > __field(uint64_t, sched_job_id) > - __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)) > + __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished)) > __field(unsigned int, context) > __field(unsigned int, seqno) > __string(ring, to_amdgpu_ring(job->base.sched)->name) > @@ -585,8 +582,6 @@ TRACE_EVENT(amdgpu_reset_reg_dumps, > __entry->address, > __entry->value) > ); > - > -#undef AMDGPU_JOB_GET_TIMELINE_NAME > #endif > > /* This part must be outside protection */

2 months

Re: [PATCH] dma-buf: insert memory barrier before updating num_fences

by Christian König

On 5/13/25 04:06, Hyejeong Choi wrote: > smp_store_mb() inserts memory barrier after storing operation. > It is different with what the comment is originally aiming so Null > pointer dereference can be happened if memory update is reordered. > > Signed-off-by: Hyejeong Choi <hjeong.choi(a)samsung.com> I've reviewed, add CC stable and Fixes tags and pushed it to drm-misc-fixes. Thanks, Christian. > --- > drivers/dma-buf/dma-resv.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c > index 5f8d010516f0..b1ef4546346d 100644 > --- a/drivers/dma-buf/dma-resv.c > +++ b/drivers/dma-buf/dma-resv.c > @@ -320,8 +320,9 @@ void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence, > count++; > > dma_resv_list_set(fobj, i, fence, usage); > - /* pointer update must be visible before we extend the num_fences */ > - smp_store_mb(fobj->num_fences, count); > + /* fence update must be visible before we extend the num_fences */ > + smp_wmb(); > + fobj->num_fences = count; > } > EXPORT_SYMBOL(dma_resv_add_fence); > > >

2 months

Re: [PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

by Christian König

On 5/14/25 13:02, wangtao wrote: >> -----Original Message----- >> From: Christian König <christian.koenig(a)amd.com> >> Sent: Tuesday, May 13, 2025 9:18 PM >> To: wangtao <tao.wangtao(a)honor.com>; sumit.semwal(a)linaro.org; >> benjamin.gaignard(a)collabora.com; Brian.Starkey(a)arm.com; >> jstultz(a)google.com; tjmercier(a)google.com >> Cc: linux-media(a)vger.kernel.org; dri-devel(a)lists.freedesktop.org; linaro- >> mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org; >> wangbintian(BintianWang) <bintian.wang(a)honor.com>; yipengxiang >> <yipengxiang(a)honor.com>; <liulu.liu(a)honor.com>; <feng.han(a)honor.com> >> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement >> DMA_BUF_IOCTL_RW_FILE for system_heap >> >> On 5/13/25 14:30, wangtao wrote: >>>> -----Original Message----- >>>> From: Christian König <christian.koenig(a)amd.com> >>>> Sent: Tuesday, May 13, 2025 7:32 PM >>>> To: wangtao <tao.wangtao(a)honor.com>; sumit.semwal(a)linaro.org; >>>> benjamin.gaignard(a)collabora.com; Brian.Starkey(a)arm.com; >>>> jstultz(a)google.com; tjmercier(a)google.com >>>> Cc: linux-media(a)vger.kernel.org; dri-devel(a)lists.freedesktop.org; >>>> linaro- mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org; >>>> wangbintian(BintianWang) <bintian.wang(a)honor.com>; yipengxiang >>>> <yipengxiang(a)honor.com>; <liulu.liu(a)honor.com>; >>>> <feng.han(a)honor.com> >>>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement >>>> DMA_BUF_IOCTL_RW_FILE for system_heap >>>> >>>> On 5/13/25 11:28, wangtao wrote: >>>>> Support direct file I/O operations for system_heap dma-buf objects. >>>>> Implementation includes: >>>>> 1. Convert sg_table to bio_vec >>>> >>>> That is usually illegal for DMA-bufs. >>> [wangtao] The term 'convert' is misleading in this context. The appropriate >> phrasing should be: Construct bio_vec from sg_table. >> >> Well it doesn't matter what you call it. Touching the page inside an sg table of >> a DMA-buf is illegal, we even have code to actively prevent that. > [wangtao] For a driver using DMA-buf: Don't touch pages in the sg_table. But the system heap exporter (sg_table owner) should be allowed to use them. Good point that might be possible. > If a driver takes ownership via dma_buf_map_attachment or similar calls, the exporter must stop using the sg_table. > User-space programs should call DMA_BUF_IOCTL_RW_FILE only when the DMA-buf is not attached. > The exporter must check ownership (e.g., ensure no map_dma_buf/vmap is active) and block new calls during operations. > I'll add these checks in patch v2. > >> >> Once more: This approach was already rejected multiple times! Please use >> udmabuf instead! >> >> The hack you came up here is simply not necessary. > [wangtao] Many people need DMA-buf direct I/O. I tried it 2 years ago. My method is simpler, uses less CPU/power, and performs better: I don't think that this is a valid argument. > - Speed: 3418 MB/s vs. 2073 MB/s (udmabuf) at 1GHz CPU. > - udmabuf wastes half its CPU time on __get_user_pages. > - Creating 32x32MB DMA-bufs + reading 1GB file takes 346 ms vs. 1145 ms for udmabuf (10x slower) vs. 1503 ms for DMA-buf normal. Why would using udmabuf be slower here? > udmabuf is slightly faster but not enough. Switching to udmabuf is easy for small apps but hard in complex systems without major benefits. Yeah, but your approach here is a rather clear hack. Using udmabuf is much more cleaner and generally accepted by everybody now. As far as I can see I have to reject your approach here. Regards, Christian. >> >> Regards, >> Christian. >> >> >>> Appreciate your feedback. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> 2. Set IOCB_DIRECT when O_DIRECT is supported 3. Invoke >>>>> vfs_iocb_iter_read()/vfs_iocb_iter_write() for actual I/O >>>>> >>>>> Performance metrics (UFS 4.0 device @4GB/s, Arm64 CPU @1GHz): >>>>> >>>>> | Metric | 1MB | 8MB | 64MB | 1024MB | 3072MB | >>>>> |--------------------|-------:|-------:|--------:|---------:|------- >>>>> |--------------------|-- >>>>> |--------------------|:| >>>>> | Buffer Read (us) | 1658 | 9028 | 69295 | 1019783 | 2978179 | >>>>> | Direct Read (us) | 707 | 2647 | 18689 | 299627 | 937758 | >>>>> | Buffer Rate (MB/s) | 603 | 886 | 924 | 1004 | 1032 | >>>>> | Direct Rate (MB/s) | 1414 | 3022 | 3425 | 3418 | 3276 | >>>>> >>>>> Signed-off-by: wangtao <tao.wangtao(a)honor.com> >>>>> --- >>>>> drivers/dma-buf/heaps/system_heap.c | 118 >>>>> ++++++++++++++++++++++++++++ >>>>> 1 file changed, 118 insertions(+) >>>>> >>>>> diff --git a/drivers/dma-buf/heaps/system_heap.c >>>>> b/drivers/dma-buf/heaps/system_heap.c >>>>> index 26d5dc89ea16..f7b71b9843aa 100644 >>>>> --- a/drivers/dma-buf/heaps/system_heap.c >>>>> +++ b/drivers/dma-buf/heaps/system_heap.c >>>>> @@ -20,6 +20,8 @@ >>>>> #include <linux/scatterlist.h> >>>>> #include <linux/slab.h> >>>>> #include <linux/vmalloc.h> >>>>> +#include <linux/bvec.h> >>>>> +#include <linux/uio.h> >>>>> >>>>> static struct dma_heap *sys_heap; >>>>> >>>>> @@ -281,6 +283,121 @@ static void system_heap_vunmap(struct >> dma_buf >>>> *dmabuf, struct iosys_map *map) >>>>> iosys_map_clear(map); >>>>> } >>>>> >>>>> +static struct bio_vec *system_heap_init_bvec(struct >>>> system_heap_buffer *buffer, >>>>> + size_t offset, size_t len, int *nr_segs) { >>>>> + struct sg_table *sgt = &buffer->sg_table; >>>>> + struct scatterlist *sg; >>>>> + size_t length = 0; >>>>> + unsigned int i, k = 0; >>>>> + struct bio_vec *bvec; >>>>> + size_t sg_left; >>>>> + size_t sg_offset; >>>>> + size_t sg_len; >>>>> + >>>>> + bvec = kvcalloc(sgt->nents, sizeof(*bvec), GFP_KERNEL); >>>>> + if (!bvec) >>>>> + return NULL; >>>>> + >>>>> + for_each_sg(sgt->sgl, sg, sgt->nents, i) { >>>>> + length += sg->length; >>>>> + if (length <= offset) >>>>> + continue; >>>>> + >>>>> + sg_left = length - offset; >>>>> + sg_offset = sg->offset + sg->length - sg_left; >>>>> + sg_len = min(sg_left, len); >>>>> + >>>>> + bvec[k].bv_page = sg_page(sg); >>>>> + bvec[k].bv_len = sg_len; >>>>> + bvec[k].bv_offset = sg_offset; >>>>> + k++; >>>>> + >>>>> + offset += sg_len; >>>>> + len -= sg_len; >>>>> + if (len <= 0) >>>>> + break; >>>>> + } >>>>> + >>>>> + *nr_segs = k; >>>>> + return bvec; >>>>> +} >>>>> + >>>>> +static int system_heap_rw_file(struct system_heap_buffer *buffer, >>>>> +bool >>>> is_read, >>>>> + bool direct_io, struct file *filp, loff_t file_offset, >>>>> + size_t buf_offset, size_t len) >>>>> +{ >>>>> + struct bio_vec *bvec; >>>>> + int nr_segs = 0; >>>>> + struct iov_iter iter; >>>>> + struct kiocb kiocb; >>>>> + ssize_t ret = 0; >>>>> + >>>>> + if (direct_io) { >>>>> + if (!(filp->f_mode & FMODE_CAN_ODIRECT)) >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + bvec = system_heap_init_bvec(buffer, buf_offset, len, &nr_segs); >>>>> + if (!bvec) >>>>> + return -ENOMEM; >>>>> + >>>>> + iov_iter_bvec(&iter, is_read ? ITER_DEST : ITER_SOURCE, bvec, >>>> nr_segs, len); >>>>> + init_sync_kiocb(&kiocb, filp); >>>>> + kiocb.ki_pos = file_offset; >>>>> + if (direct_io) >>>>> + kiocb.ki_flags |= IOCB_DIRECT; >>>>> + >>>>> + while (kiocb.ki_pos < file_offset + len) { >>>>> + if (is_read) >>>>> + ret = vfs_iocb_iter_read(filp, &kiocb, &iter); >>>>> + else >>>>> + ret = vfs_iocb_iter_write(filp, &kiocb, &iter); >>>>> + if (ret <= 0) >>>>> + break; >>>>> + } >>>>> + >>>>> + kvfree(bvec); >>>>> + return ret < 0 ? ret : 0; >>>>> +} >>>>> + >>>>> +static int system_heap_dma_buf_rw_file(struct dma_buf *dmabuf, >>>>> + struct dma_buf_rw_file *back) >>>>> +{ >>>>> + struct system_heap_buffer *buffer = dmabuf->priv; >>>>> + int ret = 0; >>>>> + __u32 op = back->flags & DMA_BUF_RW_FLAGS_OP_MASK; >>>>> + bool direct_io = back->flags & DMA_BUF_RW_FLAGS_DIRECT; >>>>> + struct file *filp; >>>>> + >>>>> + if (op != DMA_BUF_RW_FLAGS_READ && op != >>>> DMA_BUF_RW_FLAGS_WRITE) >>>>> + return -EINVAL; >>>>> + if (direct_io) { >>>>> + if (!PAGE_ALIGNED(back->file_offset) || >>>>> + !PAGE_ALIGNED(back->buf_offset) || >>>>> + !PAGE_ALIGNED(back->buf_len)) >>>>> + return -EINVAL; >>>>> + } >>>>> + if (!back->buf_len || back->buf_len > dmabuf->size || >>>>> + back->buf_offset >= dmabuf->size || >>>>> + back->buf_offset + back->buf_len > dmabuf->size) >>>>> + return -EINVAL; >>>>> + if (back->file_offset + back->buf_len < back->file_offset) >>>>> + return -EINVAL; >>>>> + >>>>> + filp = fget(back->fd); >>>>> + if (!filp) >>>>> + return -EBADF; >>>>> + >>>>> + mutex_lock(&buffer->lock); >>>>> + ret = system_heap_rw_file(buffer, op == >>>> DMA_BUF_RW_FLAGS_READ, direct_io, >>>>> + filp, back->file_offset, back->buf_offset, back- >>>>> buf_len); >>>>> + mutex_unlock(&buffer->lock); >>>>> + >>>>> + fput(filp); >>>>> + return ret; >>>>> +} >>>>> + >>>>> static void system_heap_dma_buf_release(struct dma_buf *dmabuf) { >>>>> struct system_heap_buffer *buffer = dmabuf->priv; @@ -308,6 >>>> +425,7 >>>>> @@ static const struct dma_buf_ops system_heap_buf_ops = { >>>>> .mmap = system_heap_mmap, >>>>> .vmap = system_heap_vmap, >>>>> .vunmap = system_heap_vunmap, >>>>> + .rw_file = system_heap_dma_buf_rw_file, >>>>> .release = system_heap_dma_buf_release, }; >>>>> >>> >

2 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig