Am 10.04.25 um 16:53 schrieb Bastien Curutchet:
> Hi all,
>
> Many UIO users performing DMA from their UIO device need to access the
> DMA addresses of the allocated buffers. There are out-of-tree drivers
> that allow to do it but nothing in the mainline.
Well that basically disqualifies this patch set in the first paragraph.
To justify some kernel change we always need an in kernel user of the interface, since this is purely for out-of-tree drivers this is a no-go to begin with.
> I know DMA shouldn't be handled by userspace but, IMHO, since UIO
> drivers exist, it would be better if they offered a way of doing this.
Leaking DMA addresses to userspace is usually seen as quite some security hole, but on the other hand with UIO you don't have much other choice.
> This patch series use the dma-heap framework which already allows
> userspace to allocate DMA buffers. I tried to avoid 'polluting' the
> existing heaps to prevent inappropriate uses of this new feature by
> introducing a new UIO heap, which is the only one implementing this
> behavior.
Yeah, that won't fly at all.
What you could potentially do is to create an UIO driver which imports DMA-bufs, pins them and then provide the DMA addresses to userspace.
But please be aware that DMA-fences are fundamentally incompatible with UIO. So you won't be able to do any form of synchronization which probably makes the implementation pretty limited.
Regards,
Christian.
>
> PATCH 1 allows the creation of heaps that don't implement map/unmap_buf
> operations as UIO heap doesn't use them.
> PATCH 2 adds the DMA_BUF_IOCTL_GET_DMA_ADDR which transmits the DMA
> addresses to userspace.
> PATCH 3 implements the UIO heap.
>
> It has been tested with the uio_pci_generic driver on a PowerPC.
>
> Signed-off-by: Bastien Curutchet <bastien.curutchet(a)bootlin.com>
> ---
> Bastien Curutchet (3):
> dma-buf: Allow heap that doesn't provide map_buf/unmap_buf
> dma-buf: Add DMA_BUF_IOCTL_GET_DMA_ADDR
> uio: Add UIO_DMABUF_HEAP
>
> drivers/dma-buf/dma-buf.c | 29 +++++++++--
> drivers/uio/Kconfig | 9 ++++
> drivers/uio/Makefile | 1 +
> drivers/uio/uio.c | 4 ++
> drivers/uio/uio_heap.c | 120 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/dma-buf.h | 1 +
> include/linux/uio_driver.h | 2 +
> include/uapi/linux/dma-buf.h | 1 +
> 8 files changed, 164 insertions(+), 3 deletions(-)
> ---
> base-commit: 5f13fa25acaa4f586aaed12efcf7436e004eeaf2
> change-id: 20250408-uio-dma-9b011e9e7f0b
>
> Best regards,
Am 10.04.25 um 15:09 schrieb Philipp Stanner:
> On Thu, 2025-04-10 at 14:58 +0200, Christian König wrote:
>> Am 10.04.25 um 11:24 schrieb Philipp Stanner:
>>> Nouveau currently relies on the assumption that dma_fences will
>>> only
>>> ever get signaled through nouveau_fence_signal(), which takes care
>>> of
>>> removing a signaled fence from the list nouveau_fence_chan.pending.
>>>
>>> This self-imposed rule is violated in nouveau_fence_done(), where
>>> dma_fence_is_signaled() (somewhat surprisingly, considering its
>>> name)
>>> can signal the fence without removing it from the list. This
>>> enables
>>> accesses to already signaled fences through the list, which is a
>>> bug.
>>>
>>> In particular, it can race with nouveau_fence_context_kill(), which
>>> would then attempt to set an error code on an already signaled
>>> fence,
>>> which is illegal.
>>>
>>> In nouveau_fence_done(), the call to nouveau_fence_update() already
>>> ensures to signal all ready fences. Thus, the signaling potentially
>>> performed by dma_fence_is_signaled() is actually not necessary.
>> Ah, I now got what you are trying to do here! But that won't help.
>>
>> The problem is it is perfectly valid for somebody external (e.g.
>> other driver, TTM etc...) to call dma_fence_is_signaled() on a
>> nouveau fence.
>>
>> This will then in turn still signal the fence and leave it on the
>> pending list and creating the problem you have.
> Good to hear – precisely that then is the use case for a dma_fence
> callback! ^_^ It guarantees that, no matter who signals a fence, no
> matter at what place, a certain action will always be performed.
>
> I can't think of any other mechanism which could guarantee that a
> signaled fence immediately gets removed from nouveau's pending list,
> other than the callbacks.
>
> But seriously, I don't think that anyone does this currently, nor do I
> think that anyone could get away with doing it without the entire
> computer burning down.
Yeah, I don't think that this is possible at the moment.
When you do stuff like that from the provider side you will always run into lifetime issues because in the signaling from interrupt case you then drop the last reference before the signaling is completed.
How about the attached (not even compile tested) patch? I think it should fix the issue.
Regards,
Christian.
>
> P.
>
>
>
>> Regards,
>> Christian.
>>
>>> Replace the call to dma_fence_is_signaled() with
>>> nouveau_fence_base_is_signaled().
>>>
>>> Cc: <stable(a)vger.kernel.org> # 4.10+, precise commit not to be
>>> determined
>>> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
>>> ---
>>> drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> index 7cc84472cece..33535987d8ed 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence)
>>> nvif_event_block(&fctx->event);
>>> spin_unlock_irqrestore(&fctx->lock, flags);
>>> }
>>> - return dma_fence_is_signaled(&fence->base);
>>> + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence-
>>>> base.flags);
>>> }
>>>
>>> static long
Am 10.04.25 um 11:24 schrieb Philipp Stanner:
> Nouveau currently relies on the assumption that dma_fences will only
> ever get signaled through nouveau_fence_signal(), which takes care of
> removing a signaled fence from the list nouveau_fence_chan.pending.
>
> This self-imposed rule is violated in nouveau_fence_done(), where
> dma_fence_is_signaled() (somewhat surprisingly, considering its name)
> can signal the fence without removing it from the list. This enables
> accesses to already signaled fences through the list, which is a bug.
>
> In particular, it can race with nouveau_fence_context_kill(), which
> would then attempt to set an error code on an already signaled fence,
> which is illegal.
>
> In nouveau_fence_done(), the call to nouveau_fence_update() already
> ensures to signal all ready fences. Thus, the signaling potentially
> performed by dma_fence_is_signaled() is actually not necessary.
Ah, I now got what you are trying to do here! But that won't help.
The problem is it is perfectly valid for somebody external (e.g. other driver, TTM etc...) to call dma_fence_is_signaled() on a nouveau fence.
This will then in turn still signal the fence and leave it on the pending list and creating the problem you have.
Regards,
Christian.
>
> Replace the call to dma_fence_is_signaled() with
> nouveau_fence_base_is_signaled().
>
> Cc: <stable(a)vger.kernel.org> # 4.10+, precise commit not to be determined
> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
> ---
> drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 7cc84472cece..33535987d8ed 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence)
> nvif_event_block(&fctx->event);
> spin_unlock_irqrestore(&fctx->lock, flags);
> }
> - return dma_fence_is_signaled(&fence->base);
> + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
> }
>
> static long
Am 10.04.25 um 14:21 schrieb Danilo Krummrich:
> On Thu, Apr 10, 2025 at 02:13:34PM +0200, Christian König wrote:
>> Am 10.04.25 um 11:24 schrieb Philipp Stanner:
>>> Nouveau currently relies on the assumption that dma_fences will only
>>> ever get signaled through nouveau_fence_signal(), which takes care of
>>> removing a signaled fence from the list nouveau_fence_chan.pending.
>>>
>>> This self-imposed rule is violated in nouveau_fence_done(), where
>>> dma_fence_is_signaled() (somewhat surprisingly, considering its name)
>>> can signal the fence without removing it from the list. This enables
>>> accesses to already signaled fences through the list, which is a bug.
>>>
>>> In particular, it can race with nouveau_fence_context_kill(), which
>>> would then attempt to set an error code on an already signaled fence,
>>> which is illegal.
>>>
>>> In nouveau_fence_done(), the call to nouveau_fence_update() already
>>> ensures to signal all ready fences. Thus, the signaling potentially
>>> performed by dma_fence_is_signaled() is actually not necessary.
>>>
>>> Replace the call to dma_fence_is_signaled() with
>>> nouveau_fence_base_is_signaled().
>>>
>>> Cc: <stable(a)vger.kernel.org> # 4.10+, precise commit not to be determined
>>> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
>>> ---
>>> drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> index 7cc84472cece..33535987d8ed 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
>>> @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence)
>>> nvif_event_block(&fctx->event);
>>> spin_unlock_irqrestore(&fctx->lock, flags);
>>> }
>>> - return dma_fence_is_signaled(&fence->base);
>>> + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
>> See the code above that:
>>
>> if (fence->base.ops == &nouveau_fence_ops_legacy ||
>> fence->base.ops == &nouveau_fence_ops_uevent) {
> I think this check is a bit pointless given that fence is already a struct
> nouveau_fence. :)
Oh, good point. I totally missed that.
In this case that indeed doesn't make any sense at all.
(Unless somebody just blindly upcasted the structure, but I really hope that this isn't the case here).
Regards,
Christian.
Am 10.04.25 um 11:24 schrieb Philipp Stanner:
> nouveau_fence_done() contains an if-branch which checks for the
> existence of either of two fence backend ops. Those two are the only
> backend ops existing in Nouveau, however; and at least one backend ops
> must be in use for the entire driver to be able to work. The if branch
> is, therefore, surplus.
>
> Remove the if-branch.
What happens here is that nouveau checks if the fence comes from itself or some external source.
So when you remove that check you potentially illegally uses nouveau_fctx() on a non-nouveau fence.
Regards,
Christian.
>
> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
> ---
> drivers/gpu/drm/nouveau/nouveau_fence.c | 24 +++++++++++-------------
> 1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 33535987d8ed..db6f4494405c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -259,21 +259,19 @@ nouveau_fence_emit(struct nouveau_fence *fence)
> bool
> nouveau_fence_done(struct nouveau_fence *fence)
> {
> - if (fence->base.ops == &nouveau_fence_ops_legacy ||
> - fence->base.ops == &nouveau_fence_ops_uevent) {
> - struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> - struct nouveau_channel *chan;
> - unsigned long flags;
> + struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
> + struct nouveau_channel *chan;
> + unsigned long flags;
>
> - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
> - return true;
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
> + return true;
> +
> + spin_lock_irqsave(&fctx->lock, flags);
> + chan = rcu_dereference_protected(fence->channel, lockdep_is_held(&fctx->lock));
> + if (chan && nouveau_fence_update(chan, fctx))
> + nvif_event_block(&fctx->event);
> + spin_unlock_irqrestore(&fctx->lock, flags);
>
> - spin_lock_irqsave(&fctx->lock, flags);
> - chan = rcu_dereference_protected(fence->channel, lockdep_is_held(&fctx->lock));
> - if (chan && nouveau_fence_update(chan, fctx))
> - nvif_event_block(&fctx->event);
> - spin_unlock_irqrestore(&fctx->lock, flags);
> - }
> return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
> }
>
Am 10.04.25 um 11:24 schrieb Philipp Stanner:
> Nouveau currently relies on the assumption that dma_fences will only
> ever get signaled through nouveau_fence_signal(), which takes care of
> removing a signaled fence from the list nouveau_fence_chan.pending.
>
> This self-imposed rule is violated in nouveau_fence_done(), where
> dma_fence_is_signaled() (somewhat surprisingly, considering its name)
> can signal the fence without removing it from the list. This enables
> accesses to already signaled fences through the list, which is a bug.
>
> In particular, it can race with nouveau_fence_context_kill(), which
> would then attempt to set an error code on an already signaled fence,
> which is illegal.
>
> In nouveau_fence_done(), the call to nouveau_fence_update() already
> ensures to signal all ready fences. Thus, the signaling potentially
> performed by dma_fence_is_signaled() is actually not necessary.
>
> Replace the call to dma_fence_is_signaled() with
> nouveau_fence_base_is_signaled().
>
> Cc: <stable(a)vger.kernel.org> # 4.10+, precise commit not to be determined
> Signed-off-by: Philipp Stanner <phasta(a)kernel.org>
> ---
> drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index 7cc84472cece..33535987d8ed 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -274,7 +274,7 @@ nouveau_fence_done(struct nouveau_fence *fence)
> nvif_event_block(&fctx->event);
> spin_unlock_irqrestore(&fctx->lock, flags);
> }
> - return dma_fence_is_signaled(&fence->base);
> + return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
See the code above that:
if (fence->base.ops == &nouveau_fence_ops_legacy ||
fence->base.ops == &nouveau_fence_ops_uevent) {
....
Nouveau first tests if it's one of it's own fences, and if yes does some special handling. E.g. checking the fence status bits etc...
So this dma_fence_is_signaled() is for all non-nouveau fences and then not touching the internal flags is perfectly correct as far as I can see.
Regards,
Christian.
> }
>
> static long
Am 09.04.25 um 17:04 schrieb Philipp Stanner:
> On Wed, 2025-04-09 at 16:10 +0200, Christian König wrote:
>>> I only see improvement by making things more obvious.
>>>
>>> In any case, how would you call a wrapper that just does
>>> test_bit(IS_SIGNALED, …) ?
>> Broken, that was very intentionally removed quite shortly after we
>> created the framework.
>>
>> We have a few cases were implementations do check that for their
>> fences, but consumers should never be allowed to touch such
>> internals.
> There is theory and there is practice. In practice, those internals are
> being used by Nouveau, i915, Xe, vmgfx and radeon.
What do you mean? I only skimmed over the use cases, but as far as I can see those are all valid.
You can test the flag if you know what the fence means to you, that is not a problem at all.
> So it seems that we failed quite a bit at communicating clearly how the
> interface should be used.
>
> And, to repeat myself, with both name and docu of that function, I
> think it is very easy to misunderstand what it's doing. You say that it
> shouldn't matter – and maybe that's true, in theory. In practice, it
> does matter. In practice, APIs get misused and have side-effects. And
> making that harder is desirable.
That sounds like I didn't used the right wording.
It *must* not matter to the consumer. See the purpose of the DMA-fence framework is to make it irrelevant for the consumer how the provider has implemented it's fences.
This means that things like if polling or interrupt driven signaling is used, 32bit vs 64bit seq numbers, etc... should all be hidden by the framework from the consumer of the fences.
BTW I'm actually not sure if nouveau has a bug here. As far as I can see nouveau_fence_signal() will be called later eventually and do the necessary cleanup.
But on the other hand it wouldn't surprise me if nouveau has a bug with that. The driver has been basically only barely maintained for quite a while.
> In any case, I might have to add another such call to Nouveau, because
> the solution preferred by you over the callback causes another race.
> Certainly one could solve this in a clean way, but someone has to do
> the work, and we're talking about more than a few hours here.
Well this is not my preferred solution, it's just the technical correct solution as far as I can see.
> In any case, be so kind and look at patch 2 and tell me there if you're
> at least OK with making the documentation more detailed.
As far as I can see that is clearly the wrong place to document that stuff.
Regards,
Christian.
>
> P.
On Wed, Apr 9, 2025 at 2:50 PM Sumit Garg <sumit.garg(a)kernel.org> wrote:
>
> On Tue, Apr 08, 2025 at 03:28:45PM +0200, Jens Wiklander wrote:
> > On Tue, Apr 8, 2025 at 11:14 AM Sumit Garg <sumit.garg(a)kernel.org> wrote:
> > >
> > > On Tue, Apr 01, 2025 at 10:33:04AM +0200, Jens Wiklander wrote:
> > > > On Tue, Apr 1, 2025 at 9:58 AM Sumit Garg <sumit.garg(a)kernel.org> wrote:
> > > > >
> > > > > On Tue, Mar 25, 2025 at 11:55:46AM +0100, Jens Wiklander wrote:
> > > > > > Hi Sumit,
> > > > > >
> > > > >
> > > > > <snip>
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +#include "tee_private.h"
> > > > > > > > +
> > > > > > > > +struct tee_dma_heap {
> > > > > > > > + struct dma_heap *heap;
> > > > > > > > + enum tee_dma_heap_id id;
> > > > > > > > + struct tee_rstmem_pool *pool;
> > > > > > > > + struct tee_device *teedev;
> > > > > > > > + /* Protects pool and teedev above */
> > > > > > > > + struct mutex mu;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +struct tee_heap_buffer {
> > > > > > > > + struct tee_rstmem_pool *pool;
> > > > > > > > + struct tee_device *teedev;
> > > > > > > > + size_t size;
> > > > > > > > + size_t offs;
> > > > > > > > + struct sg_table table;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +struct tee_heap_attachment {
> > > > > > > > + struct sg_table table;
> > > > > > > > + struct device *dev;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +struct tee_rstmem_static_pool {
> > > > > > > > + struct tee_rstmem_pool pool;
> > > > > > > > + struct gen_pool *gen_pool;
> > > > > > > > + phys_addr_t pa_base;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +#if !IS_MODULE(CONFIG_TEE) && IS_ENABLED(CONFIG_DMABUF_HEAPS)
> > > > > > >
> > > > > > > Can this dependency rather be better managed via Kconfig?
> > > > > >
> > > > > > This was the easiest yet somewhat flexible solution I could find. If
> > > > > > you have something better, let's use that instead.
> > > > > >
> > > > >
> > > > > --- a/drivers/tee/optee/Kconfig
> > > > > +++ b/drivers/tee/optee/Kconfig
> > > > > @@ -5,6 +5,7 @@ config OPTEE
> > > > > depends on HAVE_ARM_SMCCC
> > > > > depends on MMU
> > > > > depends on RPMB || !RPMB
> > > > > + select DMABUF_HEAPS
> > > > > help
> > > > > This implements the OP-TEE Trusted Execution Environment (TEE)
> > > > > driver.
> > > >
> > > > I wanted to avoid that since there are plenty of use cases where
> > > > DMABUF_HEAPS aren't needed.
> > >
> > > Yeah, but how the users will figure out the dependency to enable DMA
> > > heaps with TEE subsystem.
> >
> > I hope, without too much difficulty. They are after all looking for a
> > way to allocate memory from a DMA heap.
> >
> > > So it's better we provide a generic kernel
> > > Kconfig which enables all the default features.
> >
> > I disagree, it should be possible to configure without DMABUF_HEAPS if desired.
>
> It's hard to see a use-case for that additional compile time option. If
> you are worried about kernel size then those can be built as modules. On
> the other hand the benifit is that we avoid ifdefery and providing sane
> TEE defaults where features can be detected and enabled at runtime
> instead.
My primary concern isn't kernel size, even if it shouldn't be
irrelevant. It doesn't seem right to enable features that are not
asked for casually. In this case, it's not unreasonable or unexpected
that DMABUF_HEAPS must be explicitly enabled in the config if a heap
interface is needed. It's the same as before this patch set.
>
> >
> > >
> > > > This seems to do the job:
> > > > +config TEE_DMABUF_HEAP
> > > > + bool
> > > > + depends on TEE = y && DMABUF_HEAPS
> > > >
> > > > We can only use DMABUF_HEAPS if the TEE subsystem is compiled into the kernel.
> > >
> > > Ah, I see. So we aren't exporting the DMA heaps APIs for TEE subsystem
> > > to use. We should do that such that there isn't a hard dependency to
> > > compile them into the kernel.
> >
> > I was saving that for a later patch set as a later problem. We may
> > save some time by not doing it now.
> >
>
> But I think it's not a correct way to just reuse internal APIs from DMA
> heaps subsystem without exporting them. It can be seen as a inter
> subsystem API contract breach. I hope it won't be an issue with DMA heap
> maintainers regarding export of those APIs.
Fair enough. I'll add a patch in the next patch set for that. I guess
the same goes for CMA.
Cheers,
Jens
Am 09.04.25 um 16:01 schrieb Philipp Stanner:
> On Wed, 2025-04-09 at 15:14 +0200, Christian König wrote:
>> Am 09.04.25 um 14:56 schrieb Philipp Stanner:
>>> On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
>>>> On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
>>>>> Hi Philipp,
>>>>>
>>>>> On Wed, 9 Apr 2025 14:06:37 +0200
>>>>> Philipp Stanner <phasta(a)kernel.org> wrote:
>>>>>
>>>>>> dma_fence_is_signaled()'s name strongly reads as if this
>>>>>> function
>>>>>> were
>>>>>> intended for checking whether a fence is already signaled.
>>>>>> Also
>>>>>> the
>>>>>> boolean it returns hints at that.
>>>>>>
>>>>>> The function's behavior, however, is more complex: it can
>>>>>> check
>>>>>> with a
>>>>>> driver callback whether the hardware's sequence number
>>>>>> indicates
>>>>>> that
>>>>>> the fence can already be treated as signaled, although the
>>>>>> hardware's /
>>>>>> driver's interrupt handler has not signaled it yet. If that's
>>>>>> the
>>>>>> case,
>>>>>> the function also signals the fence.
>>>>>>
>>>>>> (Presumably) this has caused a bug in Nouveau (unknown
>>>>>> commit),
>>>>>> where
>>>>>> nouveau_fence_done() uses the function to check a fence,
>>>>>> which
>>>>>> causes a
>>>>>> race.
>>>>>>
>>>>>> Give the function a more obvious name.
>>>>> This is just my personal view on this, but I find the new name
>>>>> just
>>>>> as
>>>>> confusing as the old one. It sounds like something is checked,
>>>>> but
>>>>> it's
>>>>> clear what, and then the fence is forcibly signaled like it
>>>>> would
>>>>> be
>>>>> if
>>>>> you call drm_fence_signal(). Of course, this clarified by the
>>>>> doc,
>>>>> but
>>>>> given the goal was to make the function name clearly reflect
>>>>> what
>>>>> it
>>>>> does, I'm not convinced it's significantly better.
>>>>>
>>>>> Maybe dma_fence_check_hw_state_and_propagate(), though it might
>>>>> be
>>>>> too long of name. Oh well, feel free to ignore this comments if
>>>>> a
>>>>> majority is fine with the new name.
>>>> Yoa, the name isn't perfect (the perfect name describing the
>>>> whole
>>>> behavior would be
>>>> dma_fence_check_if_already_signaled_then_check_hardware_state_and
>>>> _pro
>>>> pa
>>>> gate() ^^'
>>>>
>>>> My intention here is to have the reader realize "watch out, the
>>>> fence
>>>> might get signaled here!", which is probably the most important
>>>> event
>>>> regarding fences, which can race, invoke the callbacks and so on.
>>>>
>>>> For details readers will then check the documentation.
>>>>
>>>> But I'm of course open to see if there's a majority for this or
>>>> that
>>>> name.
>>> how about:
>>>
>>> dma_fence_check_hw_and_signal() ?
>> I don't think that renaming the function is a good idea in the first
>> place.
>>
>> What the function does internally is an implementation detail of the
>> framework.
>>
>> For the code using this function it's completely irrelevant if the
>> function might also signal the fence, what matters for the caller is
>> the returned status of the fence. I think this also counts for the
>> dma_fence_is_signaled() documentation.
> It does obviously matter. As it's currently implemented, a lot of
> important things happen implicitly.
Yeah, but that's ok.
The code who calls this is the consumer of the interface and so shouldn't need to know this. That's why we have created the DMA fence framework in the first place.
For the provider side when a driver or similar implements the interface the relevant documentation is the dma_fence_ops structure.
> I only see improvement by making things more obvious.
>
> In any case, how would you call a wrapper that just does
> test_bit(IS_SIGNALED, …) ?
Broken, that was very intentionally removed quite shortly after we created the framework.
We have a few cases were implementations do check that for their fences, but consumers should never be allowed to touch such internals.
Regards,
Christian.
>
> P.
>
>> What we should improve is the documentation of the dma_fence_ops-
>>> enable_signaling and dma_fence_ops->signaled callbacks.
>> Especially see the comment about reference counts on enable_signaling
>> which is missing on the signaled callback. That is most likely the
>> root cause why nouveau implemented enable_signaling correctly but not
>> the other one.
>>
>> But putting that aside I think we should make nails with heads and
>> let the framework guarantee that the fences stay alive until they are
>> signaled (one way or another). This completely removes the burden to
>> keep a reference on unsignaled fences from the drivers /
>> implementations and make things more over all more defensive.
>>
>> Regards,
>> Christian.
>>
>>> P.
>>>
>>>> P.
>>>>
>>>>
>>>>> Regards,
>>>>>
>>>>> Boris
Am 09.04.25 um 14:56 schrieb Philipp Stanner:
> On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
>> On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
>>> Hi Philipp,
>>>
>>> On Wed, 9 Apr 2025 14:06:37 +0200
>>> Philipp Stanner <phasta(a)kernel.org> wrote:
>>>
>>>> dma_fence_is_signaled()'s name strongly reads as if this function
>>>> were
>>>> intended for checking whether a fence is already signaled. Also
>>>> the
>>>> boolean it returns hints at that.
>>>>
>>>> The function's behavior, however, is more complex: it can check
>>>> with a
>>>> driver callback whether the hardware's sequence number indicates
>>>> that
>>>> the fence can already be treated as signaled, although the
>>>> hardware's /
>>>> driver's interrupt handler has not signaled it yet. If that's the
>>>> case,
>>>> the function also signals the fence.
>>>>
>>>> (Presumably) this has caused a bug in Nouveau (unknown commit),
>>>> where
>>>> nouveau_fence_done() uses the function to check a fence, which
>>>> causes a
>>>> race.
>>>>
>>>> Give the function a more obvious name.
>>> This is just my personal view on this, but I find the new name just
>>> as
>>> confusing as the old one. It sounds like something is checked, but
>>> it's
>>> clear what, and then the fence is forcibly signaled like it would
>>> be
>>> if
>>> you call drm_fence_signal(). Of course, this clarified by the doc,
>>> but
>>> given the goal was to make the function name clearly reflect what
>>> it
>>> does, I'm not convinced it's significantly better.
>>>
>>> Maybe dma_fence_check_hw_state_and_propagate(), though it might be
>>> too long of name. Oh well, feel free to ignore this comments if a
>>> majority is fine with the new name.
>> Yoa, the name isn't perfect (the perfect name describing the whole
>> behavior would be
>> dma_fence_check_if_already_signaled_then_check_hardware_state_and_pro
>> pa
>> gate() ^^'
>>
>> My intention here is to have the reader realize "watch out, the fence
>> might get signaled here!", which is probably the most important event
>> regarding fences, which can race, invoke the callbacks and so on.
>>
>> For details readers will then check the documentation.
>>
>> But I'm of course open to see if there's a majority for this or that
>> name.
> how about:
>
> dma_fence_check_hw_and_signal() ?
I don't think that renaming the function is a good idea in the first place.
What the function does internally is an implementation detail of the framework.
For the code using this function it's completely irrelevant if the function might also signal the fence, what matters for the caller is the returned status of the fence. I think this also counts for the dma_fence_is_signaled() documentation.
What we should improve is the documentation of the dma_fence_ops->enable_signaling and dma_fence_ops->signaled callbacks.
Especially see the comment about reference counts on enable_signaling which is missing on the signaled callback. That is most likely the root cause why nouveau implemented enable_signaling correctly but not the other one.
But putting that aside I think we should make nails with heads and let the framework guarantee that the fences stay alive until they are signaled (one way or another). This completely removes the burden to keep a reference on unsignaled fences from the drivers / implementations and make things more over all more defensive.
Regards,
Christian.
>
> P.
>
>> P.
>>
>>
>>> Regards,
>>>
>>> Boris
On 01.04.25 12:13, Sumit Garg wrote:
> + MM folks to seek guidance here.
>
> On Thu, Mar 27, 2025 at 09:07:34AM +0100, Jens Wiklander wrote:
>> Hi Sumit,
>>
>> On Tue, Mar 25, 2025 at 8:42 AM Sumit Garg <sumit.garg(a)kernel.org> wrote:
>>>
>>> On Wed, Mar 05, 2025 at 02:04:15PM +0100, Jens Wiklander wrote:
>>>> Add support in the OP-TEE backend driver dynamic restricted memory
>>>> allocation with FF-A.
>>>>
>>>> The restricted memory pools for dynamically allocated restrict memory
>>>> are instantiated when requested by user-space. This instantiation can
>>>> fail if OP-TEE doesn't support the requested use-case of restricted
>>>> memory.
>>>>
>>>> Restricted memory pools based on a static carveout or dynamic allocation
>>>> can coexist for different use-cases. We use only dynamic allocation with
>>>> FF-A.
>>>>
>>>> Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
>>>> ---
>>>> drivers/tee/optee/Makefile | 1 +
>>>> drivers/tee/optee/ffa_abi.c | 143 ++++++++++++-
>>>> drivers/tee/optee/optee_private.h | 13 +-
>>>> drivers/tee/optee/rstmem.c | 329 ++++++++++++++++++++++++++++++
>>>> 4 files changed, 483 insertions(+), 3 deletions(-)
>>>> create mode 100644 drivers/tee/optee/rstmem.c
>>>>
>
> <snip>
>
>>>> diff --git a/drivers/tee/optee/rstmem.c b/drivers/tee/optee/rstmem.c
>>>> new file mode 100644
>>>> index 000000000000..ea27769934d4
>>>> --- /dev/null
>>>> +++ b/drivers/tee/optee/rstmem.c
>>>> @@ -0,0 +1,329 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * Copyright (c) 2025, Linaro Limited
>>>> + */
>>>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>>> +
>>>> +#include <linux/errno.h>
>>>> +#include <linux/genalloc.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/string.h>
>>>> +#include <linux/tee_core.h>
>>>> +#include <linux/types.h>
>>>> +#include "optee_private.h"
>>>> +
>>>> +struct optee_rstmem_cma_pool {
>>>> + struct tee_rstmem_pool pool;
>>>> + struct gen_pool *gen_pool;
>>>> + struct optee *optee;
>>>> + size_t page_count;
>>>> + u16 *end_points;
>>>> + u_int end_point_count;
>>>> + u_int align;
>>>> + refcount_t refcount;
>>>> + u32 use_case;
>>>> + struct tee_shm *rstmem;
>>>> + /* Protects when initializing and tearing down this struct */
>>>> + struct mutex mutex;
>>>> +};
>>>> +
>>>> +static struct optee_rstmem_cma_pool *
>>>> +to_rstmem_cma_pool(struct tee_rstmem_pool *pool)
>>>> +{
>>>> + return container_of(pool, struct optee_rstmem_cma_pool, pool);
>>>> +}
>>>> +
>>>> +static int init_cma_rstmem(struct optee_rstmem_cma_pool *rp)
>>>> +{
>>>> + int rc;
>>>> +
>>>> + rp->rstmem = tee_shm_alloc_cma_phys_mem(rp->optee->ctx, rp->page_count,
>>>> + rp->align);
>>>> + if (IS_ERR(rp->rstmem)) {
>>>> + rc = PTR_ERR(rp->rstmem);
>>>> + goto err_null_rstmem;
>>>> + }
>>>> +
>>>> + /*
>>>> + * TODO unmap the memory range since the physical memory will
>>>> + * become inaccesible after the lend_rstmem() call.
>>>> + */
>>>
>>> What's your plan for this TODO? I think we need a CMA allocator here
>>> which can allocate un-mapped memory such that any cache speculation
>>> won't lead to CPU hangs once the memory restriction comes into picture.
>>
>> What happens is platform-specific. For some platforms, it might be
>> enough to avoid explicit access. Yes, a CMA allocator with unmapped
>> memory or where memory can be unmapped is one option.
>
> Did you get a chance to enable real memory protection on RockPi board?
> This will atleast ensure that mapped restricted memory without explicit
> access works fine. Since otherwise once people start to enable real
> memory restriction in OP-TEE, there can be chances of random hang ups
> due to cache speculation.
>
> MM folks,
>
> Basically what we are trying to achieve here is a "no-map" DT behaviour
> [1] which is rather dynamic in nature. The use-case here is that a memory
> block allocated from CMA can be marked restricted at runtime where we
> would like the Linux not being able to directly or indirectly (cache
> speculation) access it. Once memory restriction use-case has been
> completed, the memory block can be marked as normal and freed for
> further CMA allocation.
>
> It will be apprciated if you can guide us regarding the appropriate APIs
> to use for un-mapping/mamping CMA allocations for this use-case.
Can we get some more information why that is even required, so we can
decide if that is even the right thing to do? :)
Who would mark the memory block as restricted and for which purpose?
In arch/powerpc/platforms/powernv/memtrace.c we have some arch-specific
code to remove the directmap after alloc_contig_pages(). See
memtrace_alloc_node(). But it's very arch-specific ...
--
Cheers,
David / dhildenb
On Wed, Apr 9, 2025 at 9:20 AM Amirreza Zarrabi
<amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
>
>
> On 4/9/2025 4:41 PM, Jens Wiklander wrote:
> > Hi Amirreza,
> >
> > On Wed, Apr 9, 2025 at 2:28 AM Amirreza Zarrabi
> > <amirreza.zarrabi(a)oss.qualcomm.com> wrote:
> >>
> >> Hi jens,
> >>
> >> On 4/8/2025 10:19 PM, Jens Wiklander wrote:
> >>
> >> Hi Amirreza,
> >>
> >> On Fri, Mar 28, 2025 at 3:48 AM Amirreza Zarrabi
> >> <amirreza.zarrabi(a)oss.qualcomm.com> wrote:
> >>
> >> For drivers that can transfer data to the TEE without using shared
> >> memory from client, it is necessary to receive the user address
> >> directly, bypassing any processing by the TEE subsystem. Introduce
> >> TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT/OUTPUT/INOUT to represent
> >> userspace buffers.
> >>
> >> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com>
> >> ---
> >> drivers/tee/tee_core.c | 33 +++++++++++++++++++++++++++++++++
> >> include/linux/tee_drv.h | 6 ++++++
> >> include/uapi/linux/tee.h | 22 ++++++++++++++++------
> >> 3 files changed, 55 insertions(+), 6 deletions(-)
> >>
> >> Is this patch needed now that the QCOMTEE driver supports shared
> >> memory? I prefer keeping changes to the ABI to a minimum.
> >>
> >> Cheers,
> >> Jens
> >>
> >> Unfortunately, this is still required. QTEE supports two types of data transfer:
> >> (1) using UBUF and (2) memory objects. Even with memory object support, some APIs still
> >> expect to receive data using UBUF. For instance, to load a TA, QTEE offers two interfaces:
> >> one where the TA binary is in UBUF and another where the TA binary is in a memory object.
> >
> > Is this a limitation in the QTEE backend driver or on the secure side?
> > Can it be fixed? I don't ask for changes in the ABI to the secure
> > world since I assume you haven't made such changes while this patch
> > set has evolved.
> >
> > Cheers,
> > Jens
>
> The secure-side ABI supports passing data using memcpy to the same
> buffer that contains the message for QTEE, rather than using a memory
> object. Some services tend to use this approach for small data instead
> of allocating a memory object. I have no choice but to expose this support.
Got it, thanks! It's needed.
>
> Throughout the patchset, I have not made any change to the ABI but
> tried to provide support for the memory object in a separate,
> independent commit, distinct from the UBUF.
OK
Cheers,
Jens
>
> Best regards,
> Amir
>
> >
> >>
> >> Best Regards,
> >> Amir
> >>
> >> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> >> index 22cc7d624b0c..bc862a11d437 100644
> >> --- a/drivers/tee/tee_core.c
> >> +++ b/drivers/tee/tee_core.c
> >> @@ -404,6 +404,17 @@ static int params_from_user(struct tee_context *ctx, struct tee_param *params,
> >> params[n].u.value.b = ip.b;
> >> params[n].u.value.c = ip.c;
> >> break;
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT:
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> >> + params[n].u.ubuf.uaddr = u64_to_user_ptr(ip.a);
> >> + params[n].u.ubuf.size = ip.b;
> >> +
> >> + if (!access_ok(params[n].u.ubuf.uaddr,
> >> + params[n].u.ubuf.size))
> >> + return -EFAULT;
> >> +
> >> + break;
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT:
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> >> @@ -472,6 +483,11 @@ static int params_to_user(struct tee_ioctl_param __user *uparams,
> >> put_user(p->u.value.c, &up->c))
> >> return -EFAULT;
> >> break;
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> >> + if (put_user((u64)p->u.ubuf.size, &up->b))
> >> + return -EFAULT;
> >> + break;
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> >> if (put_user((u64)p->u.memref.size, &up->b))
> >> @@ -672,6 +688,13 @@ static int params_to_supp(struct tee_context *ctx,
> >> ip.b = p->u.value.b;
> >> ip.c = p->u.value.c;
> >> break;
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT:
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> >> + ip.a = (u64)p->u.ubuf.uaddr;
> >> + ip.b = p->u.ubuf.size;
> >> + ip.c = 0;
> >> + break;
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT:
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> >> @@ -774,6 +797,16 @@ static int params_from_supp(struct tee_param *params, size_t num_params,
> >> p->u.value.b = ip.b;
> >> p->u.value.c = ip.c;
> >> break;
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> >> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> >> + p->u.ubuf.uaddr = u64_to_user_ptr(ip.a);
> >> + p->u.ubuf.size = ip.b;
> >> +
> >> + if (!access_ok(params[n].u.ubuf.uaddr,
> >> + params[n].u.ubuf.size))
> >> + return -EFAULT;
> >> +
> >> + break;
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> >> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> >> /*
> >> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> >> index ce23fd42c5d4..d773f91c6bdd 100644
> >> --- a/include/linux/tee_drv.h
> >> +++ b/include/linux/tee_drv.h
> >> @@ -82,6 +82,11 @@ struct tee_param_memref {
> >> struct tee_shm *shm;
> >> };
> >>
> >> +struct tee_param_ubuf {
> >> + void * __user uaddr;
> >> + size_t size;
> >> +};
> >> +
> >> struct tee_param_value {
> >> u64 a;
> >> u64 b;
> >> @@ -92,6 +97,7 @@ struct tee_param {
> >> u64 attr;
> >> union {
> >> struct tee_param_memref memref;
> >> + struct tee_param_ubuf ubuf;
> >> struct tee_param_value value;
> >> } u;
> >> };
> >> diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
> >> index d0430bee8292..3e9b1ec5dfde 100644
> >> --- a/include/uapi/linux/tee.h
> >> +++ b/include/uapi/linux/tee.h
> >> @@ -151,6 +151,13 @@ struct tee_ioctl_buf_data {
> >> #define TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT 6
> >> #define TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT 7 /* input and output */
> >>
> >> +/*
> >> + * These defines userspace buffer parameters.
> >> + */
> >> +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT 8
> >> +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT 9
> >> +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT 10 /* input and output */
> >> +
> >> /*
> >> * Mask for the type part of the attribute, leaves room for more types
> >> */
> >> @@ -186,14 +193,17 @@ struct tee_ioctl_buf_data {
> >> /**
> >> * struct tee_ioctl_param - parameter
> >> * @attr: attributes
> >> - * @a: if a memref, offset into the shared memory object, else a value parameter
> >> - * @b: if a memref, size of the buffer, else a value parameter
> >> + * @a: if a memref, offset into the shared memory object,
> >> + * else if a ubuf, address of the user buffer,
> >> + * else a value parameter
> >> + * @b: if a memref or ubuf, size of the buffer, else a value parameter
> >> * @c: if a memref, shared memory identifier, else a value parameter
> >> *
> >> - * @attr & TEE_PARAM_ATTR_TYPE_MASK indicates if memref or value is used in
> >> - * the union. TEE_PARAM_ATTR_TYPE_VALUE_* indicates value and
> >> - * TEE_PARAM_ATTR_TYPE_MEMREF_* indicates memref. TEE_PARAM_ATTR_TYPE_NONE
> >> - * indicates that none of the members are used.
> >> + * @attr & TEE_PARAM_ATTR_TYPE_MASK indicates if memref, ubuf, or value is
> >> + * used in the union. TEE_PARAM_ATTR_TYPE_VALUE_* indicates value,
> >> + * TEE_PARAM_ATTR_TYPE_MEMREF_* indicates memref, and TEE_PARAM_ATTR_TYPE_UBUF_*
> >> + * indicates ubuf. TEE_PARAM_ATTR_TYPE_NONE indicates that none of the members
> >> + * are used.
> >> *
> >> * Shared memory is allocated with TEE_IOC_SHM_ALLOC which returns an
> >> * identifier representing the shared memory object. A memref can reference
> >>
> >> --
> >> 2.34.1
> >>
>
Hi Amirreza,
On Wed, Apr 9, 2025 at 2:28 AM Amirreza Zarrabi
<amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
> Hi jens,
>
> On 4/8/2025 10:19 PM, Jens Wiklander wrote:
>
> Hi Amirreza,
>
> On Fri, Mar 28, 2025 at 3:48 AM Amirreza Zarrabi
> <amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
> For drivers that can transfer data to the TEE without using shared
> memory from client, it is necessary to receive the user address
> directly, bypassing any processing by the TEE subsystem. Introduce
> TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT/OUTPUT/INOUT to represent
> userspace buffers.
>
> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com>
> ---
> drivers/tee/tee_core.c | 33 +++++++++++++++++++++++++++++++++
> include/linux/tee_drv.h | 6 ++++++
> include/uapi/linux/tee.h | 22 ++++++++++++++++------
> 3 files changed, 55 insertions(+), 6 deletions(-)
>
> Is this patch needed now that the QCOMTEE driver supports shared
> memory? I prefer keeping changes to the ABI to a minimum.
>
> Cheers,
> Jens
>
> Unfortunately, this is still required. QTEE supports two types of data transfer:
> (1) using UBUF and (2) memory objects. Even with memory object support, some APIs still
> expect to receive data using UBUF. For instance, to load a TA, QTEE offers two interfaces:
> one where the TA binary is in UBUF and another where the TA binary is in a memory object.
Is this a limitation in the QTEE backend driver or on the secure side?
Can it be fixed? I don't ask for changes in the ABI to the secure
world since I assume you haven't made such changes while this patch
set has evolved.
Cheers,
Jens
>
> Best Regards,
> Amir
>
> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> index 22cc7d624b0c..bc862a11d437 100644
> --- a/drivers/tee/tee_core.c
> +++ b/drivers/tee/tee_core.c
> @@ -404,6 +404,17 @@ static int params_from_user(struct tee_context *ctx, struct tee_param *params,
> params[n].u.value.b = ip.b;
> params[n].u.value.c = ip.c;
> break;
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT:
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> + params[n].u.ubuf.uaddr = u64_to_user_ptr(ip.a);
> + params[n].u.ubuf.size = ip.b;
> +
> + if (!access_ok(params[n].u.ubuf.uaddr,
> + params[n].u.ubuf.size))
> + return -EFAULT;
> +
> + break;
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT:
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> @@ -472,6 +483,11 @@ static int params_to_user(struct tee_ioctl_param __user *uparams,
> put_user(p->u.value.c, &up->c))
> return -EFAULT;
> break;
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> + if (put_user((u64)p->u.ubuf.size, &up->b))
> + return -EFAULT;
> + break;
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> if (put_user((u64)p->u.memref.size, &up->b))
> @@ -672,6 +688,13 @@ static int params_to_supp(struct tee_context *ctx,
> ip.b = p->u.value.b;
> ip.c = p->u.value.c;
> break;
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT:
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> + ip.a = (u64)p->u.ubuf.uaddr;
> + ip.b = p->u.ubuf.size;
> + ip.c = 0;
> + break;
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT:
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> @@ -774,6 +797,16 @@ static int params_from_supp(struct tee_param *params, size_t num_params,
> p->u.value.b = ip.b;
> p->u.value.c = ip.c;
> break;
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT:
> + case TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT:
> + p->u.ubuf.uaddr = u64_to_user_ptr(ip.a);
> + p->u.ubuf.size = ip.b;
> +
> + if (!access_ok(params[n].u.ubuf.uaddr,
> + params[n].u.ubuf.size))
> + return -EFAULT;
> +
> + break;
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT:
> case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT:
> /*
> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> index ce23fd42c5d4..d773f91c6bdd 100644
> --- a/include/linux/tee_drv.h
> +++ b/include/linux/tee_drv.h
> @@ -82,6 +82,11 @@ struct tee_param_memref {
> struct tee_shm *shm;
> };
>
> +struct tee_param_ubuf {
> + void * __user uaddr;
> + size_t size;
> +};
> +
> struct tee_param_value {
> u64 a;
> u64 b;
> @@ -92,6 +97,7 @@ struct tee_param {
> u64 attr;
> union {
> struct tee_param_memref memref;
> + struct tee_param_ubuf ubuf;
> struct tee_param_value value;
> } u;
> };
> diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h
> index d0430bee8292..3e9b1ec5dfde 100644
> --- a/include/uapi/linux/tee.h
> +++ b/include/uapi/linux/tee.h
> @@ -151,6 +151,13 @@ struct tee_ioctl_buf_data {
> #define TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT 6
> #define TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT 7 /* input and output */
>
> +/*
> + * These defines userspace buffer parameters.
> + */
> +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT 8
> +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_OUTPUT 9
> +#define TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INOUT 10 /* input and output */
> +
> /*
> * Mask for the type part of the attribute, leaves room for more types
> */
> @@ -186,14 +193,17 @@ struct tee_ioctl_buf_data {
> /**
> * struct tee_ioctl_param - parameter
> * @attr: attributes
> - * @a: if a memref, offset into the shared memory object, else a value parameter
> - * @b: if a memref, size of the buffer, else a value parameter
> + * @a: if a memref, offset into the shared memory object,
> + * else if a ubuf, address of the user buffer,
> + * else a value parameter
> + * @b: if a memref or ubuf, size of the buffer, else a value parameter
> * @c: if a memref, shared memory identifier, else a value parameter
> *
> - * @attr & TEE_PARAM_ATTR_TYPE_MASK indicates if memref or value is used in
> - * the union. TEE_PARAM_ATTR_TYPE_VALUE_* indicates value and
> - * TEE_PARAM_ATTR_TYPE_MEMREF_* indicates memref. TEE_PARAM_ATTR_TYPE_NONE
> - * indicates that none of the members are used.
> + * @attr & TEE_PARAM_ATTR_TYPE_MASK indicates if memref, ubuf, or value is
> + * used in the union. TEE_PARAM_ATTR_TYPE_VALUE_* indicates value,
> + * TEE_PARAM_ATTR_TYPE_MEMREF_* indicates memref, and TEE_PARAM_ATTR_TYPE_UBUF_*
> + * indicates ubuf. TEE_PARAM_ATTR_TYPE_NONE indicates that none of the members
> + * are used.
> *
> * Shared memory is allocated with TEE_IOC_SHM_ALLOC which returns an
> * identifier representing the shared memory object. A memref can reference
>
> --
> 2.34.1
>
Hi,
Here's preliminary work to enable dmem tracking for heavy users of DMA
allocations on behalf of userspace: v4l2, DRM, and dma-buf heaps.
It's not really meant for inclusion at the moment, because I really
don't like it that much, and would like to discuss solutions on how to
make it nicer.
In particular, the dma dmem region accessors don't feel that great to
me. It duplicates the logic to select the proper accessor in
dma_alloc_attrs(), and it looks fragile and potentially buggy to me.
One solution I tried is to do the accounting in dma_alloc_attrs()
directly, depending on a flag being set, similar to what __GFP_ACCOUNT
is doing.
It didn't work because dmem initialises a state pointer when charging an
allocation to a region, and expects that state pointer to be passed back
when uncharging. Since dma_alloc_attrs() returns a void pointer to the
allocated buffer, we need to put that state into a higher-level
structure, such as drm_gem_object, or dma_buf.
Since we can't share the region selection logic, we need to get the
region through some other mean. Another thing I consider was to return
the region as part of the allocated buffer (through struct page or
folio), but those are lost across the calls and dma_alloc_attrs() will
only get a void pointer. So that's not doable without some heavy
rework, if it's a good idea at all.
So yeah, I went for the dumbest possible solution with the accessors,
hoping you could suggest a much smarter idea :)
Thanks,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Maxime Ripard (12):
cma: Register dmem region for each cma region
cma: Provide accessor to cma dmem region
dma: coherent: Register dmem region for each coherent region
dma: coherent: Provide accessor to dmem region
dma: contiguous: Provide accessor to dmem region
dma: direct: Provide accessor to dmem region
dma: Create default dmem region for DMA allocations
dma: Provide accessor to dmem region
dma-buf: Clear cgroup accounting on release
dma-buf: cma: Account for allocations in dmem cgroup
drm/gem: Add cgroup memory accounting
media: videobuf2: Track buffer allocations through the dmem cgroup
drivers/dma-buf/dma-buf.c | 7 ++++
drivers/dma-buf/heaps/cma_heap.c | 18 ++++++++--
drivers/gpu/drm/drm_gem.c | 5 +++
drivers/gpu/drm/drm_gem_dma_helper.c | 6 ++++
.../media/common/videobuf2/videobuf2-dma-contig.c | 19 +++++++++++
include/drm/drm_device.h | 1 +
include/drm/drm_gem.h | 2 ++
include/linux/cma.h | 9 +++++
include/linux/dma-buf.h | 5 +++
include/linux/dma-direct.h | 2 ++
include/linux/dma-map-ops.h | 32 ++++++++++++++++++
include/linux/dma-mapping.h | 11 ++++++
kernel/dma/coherent.c | 26 +++++++++++++++
kernel/dma/direct.c | 8 +++++
kernel/dma/mapping.c | 39 ++++++++++++++++++++++
mm/cma.c | 21 +++++++++++-
mm/cma.h | 3 ++
17 files changed, 211 insertions(+), 3 deletions(-)
---
base-commit: 55a2aa61ba59c138bd956afe0376ec412a7004cf
change-id: 20250307-dmem-cgroups-73febced0989
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi Amir,
On Fri, Mar 28, 2025 at 3:48 AM Amirreza Zarrabi
<amirreza.zarrabi(a)oss.qualcomm.com> wrote:
>
> The tee_context can be used to manage TEE user resources, including
> those allocated by the driver for the TEE on behalf of the user.
> The release() callback is invoked only when all resources, such as
> tee_shm, are released and there are no references to the tee_context.
>
> When a user closes the device file, the driver should notify the
> TEE to release any resources it may hold and drop the context
> references. To achieve this, a close_context() callback is
> introduced to initiate resource release in the TEE driver when
> the device file is closed.
>
> Relocate teedev_ctx_get, teedev_ctx_put, tee_device_get, and
> tee_device_get functions to tee_drv.h to make them accessible
> outside the TEE subsystem.
>
> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com>
> ---
> drivers/tee/tee_core.c | 39 +++++++++++++++++++++++++++++++++++++++
> drivers/tee/tee_private.h | 6 ------
> include/linux/tee_core.h | 11 +++++++++--
> include/linux/tee_drv.h | 40 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 88 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
> index 24edce4cdbaa..22cc7d624b0c 100644
> --- a/drivers/tee/tee_core.c
> +++ b/drivers/tee/tee_core.c
> @@ -72,6 +72,20 @@ struct tee_context *teedev_open(struct tee_device *teedev)
> }
> EXPORT_SYMBOL_GPL(teedev_open);
>
> +/**
> + * teedev_ctx_get() - Increment the reference count of a context
> + *
> + * This function increases the refcount of the context, which is tied to
> + * resources shared by the same tee_device. During the unregistration process,
> + * the context may remain valid even after tee_device_unregister() has returned.
> + *
> + * Users should ensure that the context's refcount is properly decreased before
> + * calling tee_device_put(), typically within the context's release() function.
> + * Alternatively, users can call tee_device_get() and teedev_ctx_get() together
> + * and release them simultaneously (see shm_alloc_helper()).
> + *
> + * @ctx: Pointer to the context
Please move this @ctx line to before the verbose description of the function.
Cheers,
Jens
> + */
> void teedev_ctx_get(struct tee_context *ctx)
> {
> if (ctx->releasing)
> @@ -79,6 +93,7 @@ void teedev_ctx_get(struct tee_context *ctx)
>
> kref_get(&ctx->refcount);
> }
> +EXPORT_SYMBOL_GPL(teedev_ctx_get);
>
> static void teedev_ctx_release(struct kref *ref)
> {
> @@ -89,6 +104,10 @@ static void teedev_ctx_release(struct kref *ref)
> kfree(ctx);
> }
>
> +/**
> + * teedev_ctx_put() - Decrease reference count on a context
> + * @ctx: pointer to the context
> + */
> void teedev_ctx_put(struct tee_context *ctx)
> {
> if (ctx->releasing)
> @@ -96,11 +115,15 @@ void teedev_ctx_put(struct tee_context *ctx)
>
> kref_put(&ctx->refcount, teedev_ctx_release);
> }
> +EXPORT_SYMBOL_GPL(teedev_ctx_put);
>
> void teedev_close_context(struct tee_context *ctx)
> {
> struct tee_device *teedev = ctx->teedev;
>
> + if (teedev->desc->ops->close_context)
> + teedev->desc->ops->close_context(ctx);
> +
> teedev_ctx_put(ctx);
> tee_device_put(teedev);
> }
> @@ -1024,6 +1047,10 @@ int tee_device_register(struct tee_device *teedev)
> }
> EXPORT_SYMBOL_GPL(tee_device_register);
>
> +/**
> + * tee_device_put() - Decrease the user count for a tee_device
> + * @teedev: pointer to the tee_device
> + */
> void tee_device_put(struct tee_device *teedev)
> {
> mutex_lock(&teedev->mutex);
> @@ -1037,7 +1064,18 @@ void tee_device_put(struct tee_device *teedev)
> }
> mutex_unlock(&teedev->mutex);
> }
> +EXPORT_SYMBOL_GPL(tee_device_put);
>
> +/**
> + * tee_device_get() - Increment the user count for a tee_device
> + * @teedev: Pointer to the tee_device
> + *
> + * If tee_device_unregister() has been called and the final user of @teedev
> + * has already released the device, this function will fail to prevent new users
> + * from accessing the device during the unregistration process.
> + *
> + * Returns: true if @teedev remains valid, otherwise false
> + */
> bool tee_device_get(struct tee_device *teedev)
> {
> mutex_lock(&teedev->mutex);
> @@ -1049,6 +1087,7 @@ bool tee_device_get(struct tee_device *teedev)
> mutex_unlock(&teedev->mutex);
> return true;
> }
> +EXPORT_SYMBOL_GPL(tee_device_get);
>
> /**
> * tee_device_unregister() - Removes a TEE device
> diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h
> index 9bc50605227c..d3f40a03de36 100644
> --- a/drivers/tee/tee_private.h
> +++ b/drivers/tee/tee_private.h
> @@ -14,12 +14,6 @@
>
> int tee_shm_get_fd(struct tee_shm *shm);
>
> -bool tee_device_get(struct tee_device *teedev);
> -void tee_device_put(struct tee_device *teedev);
> -
> -void teedev_ctx_get(struct tee_context *ctx);
> -void teedev_ctx_put(struct tee_context *ctx);
> -
> struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size);
> struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx,
> unsigned long addr, size_t length);
> diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h
> index a38494d6b5f4..8a4c9e30b652 100644
> --- a/include/linux/tee_core.h
> +++ b/include/linux/tee_core.h
> @@ -65,8 +65,9 @@ struct tee_device {
> /**
> * struct tee_driver_ops - driver operations vtable
> * @get_version: returns version of driver
> - * @open: called when the device file is opened
> - * @release: release this open file
> + * @open: called for a context when the device file is opened
> + * @close_context: called when the device file is closed
> + * @release: called to release the context
> * @open_session: open a new session
> * @close_session: close a session
> * @system_session: declare session as a system session
> @@ -76,11 +77,17 @@ struct tee_device {
> * @supp_send: called for supplicant to send a response
> * @shm_register: register shared memory buffer in TEE
> * @shm_unregister: unregister shared memory buffer in TEE
> + *
> + * The context given to @open might last longer than the device file if it is
> + * tied to other resources in the TEE driver. @close_context is called when the
> + * client closes the device file, even if there are existing references to the
> + * context. The TEE driver can use @close_context to start cleaning up.
> */
> struct tee_driver_ops {
> void (*get_version)(struct tee_device *teedev,
> struct tee_ioctl_version_data *vers);
> int (*open)(struct tee_context *ctx);
> + void (*close_context)(struct tee_context *ctx);
> void (*release)(struct tee_context *ctx);
> int (*open_session)(struct tee_context *ctx,
> struct tee_ioctl_open_session_arg *arg,
> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> index a54c203000ed..ce23fd42c5d4 100644
> --- a/include/linux/tee_drv.h
> +++ b/include/linux/tee_drv.h
> @@ -96,6 +96,46 @@ struct tee_param {
> } u;
> };
>
> +/**
> + * tee_device_get() - Increment the user count for a tee_device
> + * @teedev: Pointer to the tee_device
> + *
> + * If tee_device_unregister() has been called and the final user of @teedev
> + * has already released the device, this function will fail to prevent new users
> + * from accessing the device during the unregistration process.
> + *
> + * Returns: true if @teedev remains valid, otherwise false
> + */
> +bool tee_device_get(struct tee_device *teedev);
> +
> +/**
> + * tee_device_put() - Decrease the user count for a tee_device
> + * @teedev: pointer to the tee_device
> + */
> +void tee_device_put(struct tee_device *teedev);
> +
> +/**
> + * teedev_ctx_get() - Increment the reference count of a context
> + *
> + * This function increases the refcount of the context, which is tied to
> + * resources shared by the same tee_device. During the unregistration process,
> + * the context may remain valid even after tee_device_unregister() has returned.
> + *
> + * Users should ensure that the context's refcount is properly decreased before
> + * calling tee_device_put(), typically within the context's release() function.
> + * Alternatively, users can call tee_device_get() and teedev_ctx_get() together
> + * and release them simultaneously (see shm_alloc_helper()).
> + *
> + * @ctx: Pointer to the context
> + */
> +void teedev_ctx_get(struct tee_context *ctx);
> +
> +/**
> + * teedev_ctx_put() - Decrease reference count on a context
> + * @ctx: pointer to the context
> + */
> +void teedev_ctx_put(struct tee_context *ctx);
> +
> /**
> * tee_shm_alloc_kernel_buf() - Allocate kernel shared memory for a
> * particular TEE client driver
>
> --
> 2.34.1
>
On Mon, Apr 07, 2025 at 02:43:20PM +0800, Muchun Song wrote:
> By the way, in case you truly struggle to comprehend the fundamental
> aspects of HVO, I would like to summarize for you the user-visible
> behaviors in comparison to the situation where HVO is disabled.
>
> HVO Status Tail Page Structures Head Page Structures
> Enabled Read-Only (RO) Read-Write (RW)
> Disabled Read-Write (RW) Read-Write (RW)
>
> The sole distinction between the two scenarios lies in whether the
> tail page structures are allowed to be written or not. Please refrain
> from getting bogged down in the details of the implementation of HVO.
This feels extremely fragile to me. I doubt many people know what
operations needs read vs write access to tail pages. Or for higher
level operations if needs access to tail pages at all.