On Wed, May 14, 2025 at 03:02:53PM +0800, Xu Yilun wrote:
> > We have an awkward fit for what CCA people are doing to the various
> > Linux APIs. Looking somewhat maximally across all the arches a "bind"
> > for a CC vPCI device creation operation does:
> >
> > - Setup the CPU page tables for the VM to have access to the MMIO
>
> This is guest side thing, is it? Anything host need to opt-in?
CPU hypervisor page tables.
> > - Revoke hypervisor access to the MMIO
>
> VFIO could choose never to mmap MMIO, so in this case nothing to do?
Yes, if you do it that way.
> > - Setup the vIOMMU to understand the vPCI device
> > - Take over control of some of the IOVA translation, at least for T=1,
> > and route to the the vIOMMU
> > - Register the vPCI with any attestation functions the VM might use
> > - Do some DOE stuff to manage/validate TDSIP/etc
>
> Intel TDX Connect has a extra requirement for "unbind":
>
> - Revoke KVM page table (S-EPT) for the MMIO only after TDISP
> CONFIG_UNLOCK
Maybe you could express this as the S-EPT always has the MMIO mapped
into it as long as the vPCI function is installed to the VM? Is KVM
responsible for the S-EPT?
> Another thing is, seems your term "bind" includes all steps for
> shared -> private conversion.
Well, I was talking about vPCI creation. I understand that during the
vPCI lifecycle the VM will do "bind" "unbind" which are more or less
switching the device into a T=1 mode. Though I understood on some
arches this was mostly invisible to the hypervisor?
> But in my mind, "bind" only includes
> putting device in TDISP LOCK state & corresponding host setups required
> by firmware. I.e "bind" means host lockes down the CC setup, waiting for
> guest attestation.
So we will need to have some other API for this that modifies the vPCI
object.
It might be reasonable to have VFIO reach into iommufd to do that on
an already existing iommufd VDEVICE object. A little weird, but we
could probably make that work.
But you have some weird ordering issues here if the S-EPT has to have
the VFIO MMIO then you have to have a close() destruction order that
sees VFIO remove the S-EPT and release the KVM, then have iommufd
destroy the VDEVICE object.
> > It doesn't mean that iommufd is suddenly doing PCI stuff, no, that
> > stays in VFIO.
>
> I'm not sure if Alexey's patch [1] illustates your idea. It calls
> tsm_tdi_bind() which directly does device stuff, and impacts MMIO.
> VFIO doesn't know about this.
>
> I have to interpret this as VFIO firstly hand over device CC features
> and MMIO resources to IOMMUFD, so VFIO never cares about them.
>
> [1] https://lore.kernel.org/all/20250218111017.491719-15-aik@amd.com/
There is also the PCI layer involved here and maybe PCI should be
participating in managing some of this. Like it makes a bit of sense
that PCI would block the FLR on platforms that require this?
Jason
On Wed, May 14, 2025 at 7:58 AM Tvrtko Ursulin
<tvrtko.ursulin(a)igalia.com> wrote:
>
>
> On 14/05/2025 14:57, Rob Clark wrote:
> > On Wed, May 14, 2025 at 3:01 AM Tvrtko Ursulin
> > <tvrtko.ursulin(a)igalia.com> wrote:
> >>
> >>
> >> On 13/05/2025 15:16, Rob Clark wrote:
> >>> On Fri, May 9, 2025 at 8:34 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote:
> >>>>
> >>>> Dma-fence objects currently suffer from a potential use after free problem
> >>>> where fences exported to userspace and other drivers can outlive the
> >>>> exporting driver, or the associated data structures.
> >>>>
> >>>> The discussion on how to address this concluded that adding reference
> >>>> counting to all the involved objects is not desirable, since it would need
> >>>> to be very wide reaching and could cause unloadable drivers if another
> >>>> entity would be holding onto a signaled fence reference potentially
> >>>> indefinitely.
> >>>>
> >>>> This patch enables the safe access by introducing and documenting a
> >>>> contract between fence exporters and users. It documents a set of
> >>>> contraints and adds helpers which a) drivers with potential to suffer from
> >>>> the use after free must use and b) users of the dma-fence API must use as
> >>>> well.
> >>>>
> >>>> Premise of the design has multiple sides:
> >>>>
> >>>> 1. Drivers (fence exporters) MUST ensure a RCU grace period between
> >>>> signalling a fence and freeing the driver private data associated with it.
> >>>>
> >>>> The grace period does not have to follow the signalling immediately but
> >>>> HAS to happen before data is freed.
> >>>>
> >>>> 2. Users of the dma-fence API marked with such requirement MUST contain
> >>>> the complete access to the data within a single code block guarded by the
> >>>> new dma_fence_access_begin() and dma_fence_access_end() helpers.
> >>>>
> >>>> The combination of the two ensures that whoever sees the
> >>>> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
> >>>> valid fence->lock and valid data potentially accessed by the fence->ops
> >>>> virtual functions, until the call to dma_fence_access_end().
> >>>>
> >>>> 3. Module unload (fence->ops) disappearing is for now explicitly not
> >>>> handled. That would required a more complex protection, possibly needing
> >>>> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(),
> >>>> where race between dma_fence_enable_sw_signaling, signalling, and
> >>>> dereference of fence->ops->wait() would need a sleeping SRCU context.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
> >>>> ---
> >>>> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++
> >>>> include/linux/dma-fence.h | 32 ++++++++++++-----
> >>>> 2 files changed, 93 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >>>> index dc2456f68685..cfe1d7b79c22 100644
> >>>> --- a/drivers/dma-buf/dma-fence.c
> >>>> +++ b/drivers/dma-buf/dma-fence.c
> >>>> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref)
> >>>> struct dma_fence *fence =
> >>>> container_of(kref, struct dma_fence, refcount);
> >>>>
> >>>> + dma_fence_access_begin();
> >>>> trace_dma_fence_destroy(fence);
> >>>>
> >>>> if (WARN(!list_empty(&fence->cb_list) &&
> >>>> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref)
> >>>> fence->ops->release(fence);
> >>>> else
> >>>> dma_fence_free(fence);
> >>>> +
> >>>> + dma_fence_access_end();
> >>>> }
> >>>> EXPORT_SYMBOL(dma_fence_release);
> >>>>
> >>>> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
> >>>> */
> >>>> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
> >>>> {
> >>>> + dma_fence_access_begin();
> >>>> seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> >>>> dma_fence_driver_name(fence),
> >>>> dma_fence_timeline_name(fence),
> >>>> fence->seqno,
> >>>> dma_fence_is_signaled(fence) ? "" : "un");
> >>>> + dma_fence_access_end();
> >>>> }
> >>>> EXPORT_SYMBOL(dma_fence_describe);
> >>>>
> >>>> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >>>> __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
> >>>> }
> >>>> EXPORT_SYMBOL(dma_fence_init64);
> >>>> +
> >>>> +/**
> >>>> + * dma_fence_driver_name - Access the driver name
> >>>> + * @fence: the fence to query
> >>>> + *
> >>>> + * Returns a driver name backing the dma-fence implementation.
> >>>> + *
> >>>> + * IMPORTANT CONSIDERATION:
> >>>> + * Dma-fence contract stipulates that access to driver provided data (data not
> >>>> + * directly embedded into the object itself), such as the &dma_fence.lock and
> >>>> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> >>>> + * after the fence has been signalled. Drivers are allowed to free that data,
> >>>> + * and some do.
> >>>> + *
> >>>> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> >>>> + * between signalling the fence and freeing said data.
> >>>> + *
> >>>> + * As such access to the driver name is only valid inside a RCU locked section.
> >>>> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> >>>> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> >>>> + */
> >>>> +const char *dma_fence_driver_name(struct dma_fence *fence)
> >>>> +{
> >>>> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> >>>> + "rcu_read_lock() required for safe access to returned string");
> >>>> +
> >>>> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> >>>> + return fence->ops->get_driver_name(fence);
> >>>> + else
> >>>> + return "detached-driver";
> >>>> +}
> >>>> +EXPORT_SYMBOL(dma_fence_driver_name);
> >>>> +
> >>>> +/**
> >>>> + * dma_fence_timeline_name - Access the timeline name
> >>>> + * @fence: the fence to query
> >>>> + *
> >>>> + * Returns a timeline name provided by the dma-fence implementation.
> >>>> + *
> >>>> + * IMPORTANT CONSIDERATION:
> >>>> + * Dma-fence contract stipulates that access to driver provided data (data not
> >>>> + * directly embedded into the object itself), such as the &dma_fence.lock and
> >>>> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> >>>> + * after the fence has been signalled. Drivers are allowed to free that data,
> >>>> + * and some do.
> >>>> + *
> >>>> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> >>>> + * between signalling the fence and freeing said data.
> >>>> + *
> >>>> + * As such access to the driver name is only valid inside a RCU locked section.
> >>>> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> >>>> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> >>>> + */
> >>>> +const char *dma_fence_timeline_name(struct dma_fence *fence)
> >>>> +{
> >>>> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> >>>> + "rcu_read_lock() required for safe access to returned string");
> >>>> +
> >>>> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> >>>> + return fence->ops->get_driver_name(fence);
> >>>> + else
> >>>> + return "signaled-timeline";
> >>>
> >>> This means that trace_dma_fence_signaled() will get the wrong
> >>> timeline/driver name, which probably screws up perfetto and maybe
> >>> other tools.
> >>
> >> Do you think context and seqno are not enough for those tools and they
> >> actually rely on the names? It would sound weird if they decided to
> >> index anything on the names which are non-standardised between drivers,
> >> but I guess anything is possible.
> >
> > At some point perfetto uses the timeline name to put up a named fence
> > timeline, I'm not sure if it is using the name or context # for
> > subsequent fence events (namely, signalled). I'd have to check the
> > code and get back to you.
>
> If you can it would be useful. Presumably it saves the names from the
> start edge of fence lifetime. But again, who knows.
Ok, it looks like perfetto is ok... mostly..
DrmTracker::GetFenceTimelineByContext() will try to lookup the
timeline by context #, and then if that fails, create a new timeline
with the name from the trace event, and add it to the hashmap.
It might be that "signaled-timeline" shows up if the first event seen
is the fence-signaled event.
> > There is also gpuvis, which I guess does something similar, but
> > haven't looked into it. Idk if there are others.
>
> I know GpuVis uses DRM sched tracepoints since Pierre-Eric was
> explaining me about those in the context of tracing rework he did there.
> I am not sure about dma-fence tracepoints.
>
> +Pierre-Eric on the off chance you know from the top of your head how
> much GpuVis depends on them (dma-fence tracepoints).
>
> >>> Maybe it would work well enough just to move the
> >>> trace_dma_fence_signaled() call ahead of the test_and_set_bit()? Idk
> >>> if some things will start getting confused if they see that trace
> >>> multiple times.
> >>
> >> Another alternative is to make this tracepoint access the names
> >> directly. It is under the lock so guaranteed not to get freed with
> >> drivers which will be made compliant with the documented rules.
> >
> > I guess it would have been better if, other than dma_fence_init
> > tracepoint, later tracepoints didn't include the driver/timeline
> > name.. that would have forced the use of the context. But I guess too
> > late for that. Perhaps the least bad thing to do is use the locking?
>
> You mean this last alternative I mentioned? I think that will work fine.
> I'll wait a little bit longer for more potential comments before re-spi
> ning with that.
yes
> Were you able to test the series for your use case? Assuming it is not
> upstream msm since I don't immediately see a path in msm_fence which
> gets freed at runtime?
Not yet, but I think it should because it is the exact same problem
your igt test triggers.
This is with my VM_BIND series, which will dynamically create/teardown
sched entities
BR,
-R
On Wed, May 14, 2025 at 3:01 AM Tvrtko Ursulin
<tvrtko.ursulin(a)igalia.com> wrote:
>
>
> On 13/05/2025 15:16, Rob Clark wrote:
> > On Fri, May 9, 2025 at 8:34 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote:
> >>
> >> Dma-fence objects currently suffer from a potential use after free problem
> >> where fences exported to userspace and other drivers can outlive the
> >> exporting driver, or the associated data structures.
> >>
> >> The discussion on how to address this concluded that adding reference
> >> counting to all the involved objects is not desirable, since it would need
> >> to be very wide reaching and could cause unloadable drivers if another
> >> entity would be holding onto a signaled fence reference potentially
> >> indefinitely.
> >>
> >> This patch enables the safe access by introducing and documenting a
> >> contract between fence exporters and users. It documents a set of
> >> contraints and adds helpers which a) drivers with potential to suffer from
> >> the use after free must use and b) users of the dma-fence API must use as
> >> well.
> >>
> >> Premise of the design has multiple sides:
> >>
> >> 1. Drivers (fence exporters) MUST ensure a RCU grace period between
> >> signalling a fence and freeing the driver private data associated with it.
> >>
> >> The grace period does not have to follow the signalling immediately but
> >> HAS to happen before data is freed.
> >>
> >> 2. Users of the dma-fence API marked with such requirement MUST contain
> >> the complete access to the data within a single code block guarded by the
> >> new dma_fence_access_begin() and dma_fence_access_end() helpers.
> >>
> >> The combination of the two ensures that whoever sees the
> >> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
> >> valid fence->lock and valid data potentially accessed by the fence->ops
> >> virtual functions, until the call to dma_fence_access_end().
> >>
> >> 3. Module unload (fence->ops) disappearing is for now explicitly not
> >> handled. That would required a more complex protection, possibly needing
> >> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(),
> >> where race between dma_fence_enable_sw_signaling, signalling, and
> >> dereference of fence->ops->wait() would need a sleeping SRCU context.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
> >> ---
> >> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++
> >> include/linux/dma-fence.h | 32 ++++++++++++-----
> >> 2 files changed, 93 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> >> index dc2456f68685..cfe1d7b79c22 100644
> >> --- a/drivers/dma-buf/dma-fence.c
> >> +++ b/drivers/dma-buf/dma-fence.c
> >> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref)
> >> struct dma_fence *fence =
> >> container_of(kref, struct dma_fence, refcount);
> >>
> >> + dma_fence_access_begin();
> >> trace_dma_fence_destroy(fence);
> >>
> >> if (WARN(!list_empty(&fence->cb_list) &&
> >> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref)
> >> fence->ops->release(fence);
> >> else
> >> dma_fence_free(fence);
> >> +
> >> + dma_fence_access_end();
> >> }
> >> EXPORT_SYMBOL(dma_fence_release);
> >>
> >> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
> >> */
> >> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
> >> {
> >> + dma_fence_access_begin();
> >> seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> >> dma_fence_driver_name(fence),
> >> dma_fence_timeline_name(fence),
> >> fence->seqno,
> >> dma_fence_is_signaled(fence) ? "" : "un");
> >> + dma_fence_access_end();
> >> }
> >> EXPORT_SYMBOL(dma_fence_describe);
> >>
> >> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> >> __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
> >> }
> >> EXPORT_SYMBOL(dma_fence_init64);
> >> +
> >> +/**
> >> + * dma_fence_driver_name - Access the driver name
> >> + * @fence: the fence to query
> >> + *
> >> + * Returns a driver name backing the dma-fence implementation.
> >> + *
> >> + * IMPORTANT CONSIDERATION:
> >> + * Dma-fence contract stipulates that access to driver provided data (data not
> >> + * directly embedded into the object itself), such as the &dma_fence.lock and
> >> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> >> + * after the fence has been signalled. Drivers are allowed to free that data,
> >> + * and some do.
> >> + *
> >> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> >> + * between signalling the fence and freeing said data.
> >> + *
> >> + * As such access to the driver name is only valid inside a RCU locked section.
> >> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> >> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> >> + */
> >> +const char *dma_fence_driver_name(struct dma_fence *fence)
> >> +{
> >> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> >> + "rcu_read_lock() required for safe access to returned string");
> >> +
> >> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> >> + return fence->ops->get_driver_name(fence);
> >> + else
> >> + return "detached-driver";
> >> +}
> >> +EXPORT_SYMBOL(dma_fence_driver_name);
> >> +
> >> +/**
> >> + * dma_fence_timeline_name - Access the timeline name
> >> + * @fence: the fence to query
> >> + *
> >> + * Returns a timeline name provided by the dma-fence implementation.
> >> + *
> >> + * IMPORTANT CONSIDERATION:
> >> + * Dma-fence contract stipulates that access to driver provided data (data not
> >> + * directly embedded into the object itself), such as the &dma_fence.lock and
> >> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> >> + * after the fence has been signalled. Drivers are allowed to free that data,
> >> + * and some do.
> >> + *
> >> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> >> + * between signalling the fence and freeing said data.
> >> + *
> >> + * As such access to the driver name is only valid inside a RCU locked section.
> >> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> >> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> >> + */
> >> +const char *dma_fence_timeline_name(struct dma_fence *fence)
> >> +{
> >> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> >> + "rcu_read_lock() required for safe access to returned string");
> >> +
> >> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> >> + return fence->ops->get_driver_name(fence);
> >> + else
> >> + return "signaled-timeline";
> >
> > This means that trace_dma_fence_signaled() will get the wrong
> > timeline/driver name, which probably screws up perfetto and maybe
> > other tools.
>
> Do you think context and seqno are not enough for those tools and they
> actually rely on the names? It would sound weird if they decided to
> index anything on the names which are non-standardised between drivers,
> but I guess anything is possible.
At some point perfetto uses the timeline name to put up a named fence
timeline, I'm not sure if it is using the name or context # for
subsequent fence events (namely, signalled). I'd have to check the
code and get back to you.
There is also gpuvis, which I guess does something similar, but
haven't looked into it. Idk if there are others.
> > Maybe it would work well enough just to move the
> > trace_dma_fence_signaled() call ahead of the test_and_set_bit()? Idk
> > if some things will start getting confused if they see that trace
> > multiple times.
>
> Another alternative is to make this tracepoint access the names
> directly. It is under the lock so guaranteed not to get freed with
> drivers which will be made compliant with the documented rules.
I guess it would have been better if, other than dma_fence_init
tracepoint, later tracepoints didn't include the driver/timeline
name.. that would have forced the use of the context. But I guess too
late for that. Perhaps the least bad thing to do is use the locking?
BR,
-R
I'm going to push patches #1-#6 to drm-misc-next.
They make sense as a stand alone cleanups anyway.
But that here needs a bit more documentation I think.
On 5/13/25 09:45, Tvrtko Ursulin wrote:
> Dma-fence objects currently suffer from a potential use after free problem
> where fences exported to userspace and other drivers can outlive the
> exporting driver, or the associated data structures.
>
> The discussion on how to address this concluded that adding reference
> counting to all the involved objects is not desirable, since it would need
> to be very wide reaching and could cause unloadable drivers if another
> entity would be holding onto a signaled fence reference potentially
> indefinitely.
>
> This patch enables the safe access by introducing and documenting a
> contract between fence exporters and users. It documents a set of
> contraints and adds helpers which a) drivers with potential to suffer from
> the use after free must use and b) users of the dma-fence API must use as
> well.
>
> Premise of the design has multiple sides:
>
> 1. Drivers (fence exporters) MUST ensure a RCU grace period between
> signalling a fence and freeing the driver private data associated with it.
That's a must have anyway, otherwise functions like dma_fence_get_rcu() won't work.
I hope that we have documented that somewhere, but I'm not 100% sure to be honest.
> The grace period does not have to follow the signalling immediately but
> HAS to happen before data is freed.
That is the new requirement we have to document somehow.
I'm not 100% sure but I think module unloading waits for an RCU grace period anyway.
> 2. Users of the dma-fence API marked with such requirement MUST contain
> the complete access to the data within a single code block guarded by the
> new dma_fence_access_begin() and dma_fence_access_end() helpers.
>
> The combination of the two ensures that whoever sees the
> DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a
> valid fence->lock and valid data potentially accessed by the fence->ops
> virtual functions, until the call to dma_fence_access_end().
Mhm, how about returning copies of the string?
This is only for debugging anyway and kstrdup_const() isn't that costly.
Regards,
Christian.
>
> 3. Module unload (fence->ops) disappearing is for now explicitly not
> handled. That would required a more complex protection, possibly needing
> SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(),
> where race between dma_fence_enable_sw_signaling, signalling, and
> dereference of fence->ops->wait() would need a sleeping SRCU context.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
> ---
> drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++
> include/linux/dma-fence.h | 32 ++++++++++++-----
> 2 files changed, 93 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index dc2456f68685..cfe1d7b79c22 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref)
> struct dma_fence *fence =
> container_of(kref, struct dma_fence, refcount);
>
> + dma_fence_access_begin();
> trace_dma_fence_destroy(fence);
>
> if (WARN(!list_empty(&fence->cb_list) &&
> @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref)
> fence->ops->release(fence);
> else
> dma_fence_free(fence);
> +
> + dma_fence_access_end();
> }
> EXPORT_SYMBOL(dma_fence_release);
>
> @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline);
> */
> void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq)
> {
> + dma_fence_access_begin();
> seq_printf(seq, "%s %s seq %llu %ssignalled\n",
> dma_fence_driver_name(fence),
> dma_fence_timeline_name(fence),
> fence->seqno,
> dma_fence_is_signaled(fence) ? "" : "un");
> + dma_fence_access_end();
> }
> EXPORT_SYMBOL(dma_fence_describe);
>
> @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
> __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags);
> }
> EXPORT_SYMBOL(dma_fence_init64);
> +
> +/**
> + * dma_fence_driver_name - Access the driver name
> + * @fence: the fence to query
> + *
> + * Returns a driver name backing the dma-fence implementation.
> + *
> + * IMPORTANT CONSIDERATION:
> + * Dma-fence contract stipulates that access to driver provided data (data not
> + * directly embedded into the object itself), such as the &dma_fence.lock and
> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> + * after the fence has been signalled. Drivers are allowed to free that data,
> + * and some do.
> + *
> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> + * between signalling the fence and freeing said data.
> + *
> + * As such access to the driver name is only valid inside a RCU locked section.
> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> + */
> +const char *dma_fence_driver_name(struct dma_fence *fence)
> +{
> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> + "rcu_read_lock() required for safe access to returned string");
> +
> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> + return fence->ops->get_driver_name(fence);
> + else
> + return "detached-driver";
> +}
> +EXPORT_SYMBOL(dma_fence_driver_name);
> +
> +/**
> + * dma_fence_timeline_name - Access the timeline name
> + * @fence: the fence to query
> + *
> + * Returns a timeline name provided by the dma-fence implementation.
> + *
> + * IMPORTANT CONSIDERATION:
> + * Dma-fence contract stipulates that access to driver provided data (data not
> + * directly embedded into the object itself), such as the &dma_fence.lock and
> + * memory potentially accessed by the &dma_fence.ops functions, is forbidden
> + * after the fence has been signalled. Drivers are allowed to free that data,
> + * and some do.
> + *
> + * To allow safe access drivers are mandated to guarantee a RCU grace period
> + * between signalling the fence and freeing said data.
> + *
> + * As such access to the driver name is only valid inside a RCU locked section.
> + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded
> + * by the &dma_fence_access_being and &dma_fence_access_end pair.
> + */
> +const char *dma_fence_timeline_name(struct dma_fence *fence)
> +{
> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> + "rcu_read_lock() required for safe access to returned string");
> +
> + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> + return fence->ops->get_driver_name(fence);
> + else
> + return "signaled-timeline";
> +}
> +EXPORT_SYMBOL(dma_fence_timeline_name);
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index c5ac37e10d85..b39e430142ea 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -377,15 +377,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence,
> struct dma_fence_cb *cb);
> void dma_fence_enable_sw_signaling(struct dma_fence *fence);
>
> -static inline const char *dma_fence_driver_name(struct dma_fence *fence)
> -{
> - return fence->ops->get_driver_name(fence);
> -}
> +/**
> + * DOC: Safe external access to driver provided object members
> + *
> + * All data not stored directly in the dma-fence object, such as the
> + * &dma_fence.lock and memory potentially accessed by functions in the
> + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled
> + * because after that point drivers are allowed to free it.
> + *
> + * All code accessing that data via the dma-fence API (or directly, which is
> + * discouraged), MUST make sure to contain the complete access within a
> + * &dma_fence_access_begin and &dma_fence_access_end pair.
> + *
> + * Some dma-fence API handles this automatically, while other, as for example
> + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that
> + * responsibility to the caller.
> + *
> + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses
> + * between signalling the fence and freeing the said data.
> + *
> + */
> +#define dma_fence_access_begin rcu_read_lock
> +#define dma_fence_access_end rcu_read_unlock
>
> -static inline const char *dma_fence_timeline_name(struct dma_fence *fence)
> -{
> - return fence->ops->get_timeline_name(fence);
> -}
> +const char *dma_fence_driver_name(struct dma_fence *fence);
> +const char *dma_fence_timeline_name(struct dma_fence *fence);
>
> /**
> * dma_fence_is_signaled_locked - Return an indication if the fence
Dear All,
This patchset fixes the incorrect use of dma_sync_sg_*() calls in
media and related drivers. They are replaced with much safer
dma_sync_sgtable_*() variants, which take care of passing the proper
number of elements for the sync operation.
Best regards
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Change log:
v3: added cc: stable to tags
v2: fixes typos and added cc: stable
Patch summary:
Marek Szyprowski (3):
media: videobuf2: use sgtable-based scatterlist wrappers
udmabuf: use sgtable-based scatterlist wrappers
media: omap3isp: use sgtable-based scatterlist wrappers
drivers/dma-buf/udmabuf.c | 5 ++---
drivers/media/common/videobuf2/videobuf2-dma-sg.c | 4 ++--
drivers/media/platform/ti/omap3isp/ispccdc.c | 8 ++++----
drivers/media/platform/ti/omap3isp/ispstat.c | 6 ++----
4 files changed, 10 insertions(+), 13 deletions(-)
--
2.34.1
On 5/13/25 04:06, Hyejeong Choi wrote:
> smp_store_mb() inserts memory barrier after storing operation.
> It is different with what the comment is originally aiming so Null
> pointer dereference can be happened if memory update is reordered.
>
> Signed-off-by: Hyejeong Choi <hjeong.choi(a)samsung.com>
I've reviewed, add CC stable and Fixes tags and pushed it to drm-misc-fixes.
Thanks,
Christian.
> ---
> drivers/dma-buf/dma-resv.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 5f8d010516f0..b1ef4546346d 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -320,8 +320,9 @@ void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence,
> count++;
>
> dma_resv_list_set(fobj, i, fence, usage);
> - /* pointer update must be visible before we extend the num_fences */
> - smp_store_mb(fobj->num_fences, count);
> + /* fence update must be visible before we extend the num_fences */
> + smp_wmb();
> + fobj->num_fences = count;
> }
> EXPORT_SYMBOL(dma_resv_add_fence);
>
>
>
Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to
perform per-buffer accounting with debugfs which is not suitable for
production environments. Eventually we discovered the overhead with
per-buffer sysfs file creation/removal was significantly impacting
allocation and free times, and exacerbated kernfs lock contention. [2]
dma_buf_stats_setup() is responsible for 39% of single-page buffer
creation duration, or 74% of single-page dma_buf_export() duration when
stressing dmabuf allocations and frees.
I prototyped a change from per-buffer to per-exporter statistics with a
RCU protected list of exporter allocations that accommodates most (but
not all) of our use-cases and avoids almost all of the sysfs overhead.
While that adds less overhead than per-buffer sysfs, and less even than
the maintenance of the dmabuf debugfs_list, it's still *additional*
overhead on top of the debugfs_list and doesn't give us per-buffer info.
This series uses the existing dmabuf debugfs_list to implement a BPF
dmabuf iterator, which adds no overhead to buffer allocation/free and
provides per-buffer info. The list has been moved outside of
CONFIG_DEBUG_FS scope so that it is always populated. The BPF program
loaded by userspace that extracts per-buffer information gets to define
its own interface which avoids the lack of ABI stability with debugfs.
This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and
the plan is to remove it from the kernel after the next longterm stable
release.
[1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.…
[2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com
v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com
v1 -> v2:
Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian
König
Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel
test robot
Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu
Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in
selftest per Song Liu
Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei
Starovoitov
Add open-coded iterator and selftest per Alexei Starovoitov
Add a second test buffer from the system dmabuf heap to selftests
Use the BPF program we'll use in production for selftest per Alexei
Starovoitov
https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.chttps://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdm…
v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com
v2 -> v3:
Rebase onto bpf-next/master
Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the
new get_first_dmabuf(). This avoids having to expose the dmabuf list
and mutex to the rest of the kernel, and keeps the dmabuf mutex
operations near each other in the same file. (Christian König)
Add Christian's RB to dma-buf: Rename debugfs symbols
Drop RFC: dma-buf: Remove DMA-BUF statistics
v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com
v3 -> v4:
Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov
Fix dma-buf.c kdoc comment style per Alexei Starovoitov
Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin /
dma_buf_iter_next per Christian König
Add Christian's RB to bpf: Add dmabuf iterator
v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com
v4 -> v5:
Add Christian's Acks to all patches
Add Song Liu's Acks
Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per
Song Liu
Fix open-coded iterator comment style per Song Liu
Move iterator termination check to its own subtest per Song Liu
Rework selftest buffer creation per Song Liu
Fix spacing in sanitize_string per BPF CI
v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com
v5 -> v6:
Song Liu:
Init test buffer FDs to -1
Zero-init udmabuf_create for future proofing
Bail early for iterator fd/FILE creation failure
Dereference char ptr to check for NUL in sanitize_string()
Move map insertion from create_test_buffers() to test_dmabuf_iter()
Add ACK to selftests/bpf: Add test for open coded dmabuf_iter
T.J. Mercier (5):
dma-buf: Rename debugfs symbols
bpf: Add dmabuf iterator
bpf: Add open coded dmabuf iterator
selftests/bpf: Add test for dmabuf_iter
selftests/bpf: Add test for open coded dmabuf_iter
drivers/dma-buf/dma-buf.c | 98 ++++--
include/linux/dma-buf.h | 4 +-
kernel/bpf/Makefile | 3 +
kernel/bpf/dmabuf_iter.c | 150 +++++++++
kernel/bpf/helpers.c | 5 +
.../testing/selftests/bpf/bpf_experimental.h | 5 +
tools/testing/selftests/bpf/config | 3 +
.../selftests/bpf/prog_tests/dmabuf_iter.c | 285 ++++++++++++++++++
.../testing/selftests/bpf/progs/dmabuf_iter.c | 91 ++++++
9 files changed, 622 insertions(+), 22 deletions(-)
create mode 100644 kernel/bpf/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c
base-commit: 43745d11bfd9683abdf08ad7a5cc403d6a9ffd15
--
2.49.0.1045.g170613ef41-goog