On Mon, Dec 15, 2025 at 7:51 PM Maxime Ripard <mripard(a)redhat.com> wrote:
>
> Hi TJ,
Hi Maxime,
> On Fri, Dec 12, 2025 at 08:25:19AM +0900, T.J. Mercier wrote:
> > On Fri, Dec 12, 2025 at 4:31 AM Eric Chanudet <echanude(a)redhat.com> wrote:
> > >
> > > The system dma-buf heap lets userspace allocate buffers from the page
> > > allocator. However, these allocations are not accounted for in memcg,
> > > allowing processes to escape limits that may be configured.
> > >
> > > Pass the __GFP_ACCOUNT for our allocations to account them into memcg.
> >
> > We had a discussion just last night in the MM track at LPC about how
> > shared memory accounted in memcg is pretty broken. Without a way to
> > identify (and possibly transfer) ownership of a shared buffer, this
> > makes the accounting of shared memory, and zombie memcg problems
> > worse. :\
>
> Are there notes or a report from that discussion anywhere?
The LPC vids haven't been clipped yet, and actually I can't even find
the recorded full live stream from Hall A2 on the first day. So I
don't think there's anything to look at, but I bet there's probably
nothing there you don't already know.
> The way I see it, the dma-buf heaps *trivial* case is non-existent at
> the moment and that's definitely broken. Any application can bypass its
> cgroups limits trivially, and that's a pretty big hole in the system.
Agree, but if we only charge the first allocator then limits can still
easily be bypassed assuming an app can cause an allocation outside of
its cgroup tree.
I'm not sure using static memcg limits where a significant portion of
the memory can be shared is really feasible. Even with just pagecache
being charged to memcgs, we're having trouble defining a static memcg
limit that is really useful since it has to be high enough to
accomodate occasional spikes due to shared memory that might or might
not be charged (since it can only be charged to one memcg - it may be
spread around or it may all get charged to one memcg). So excessive
anonymous use has to get really bad before it gets punished.
What I've been hearing lately is that folks are polling memory.stat or
PSI or other metrics and using that to take actions (memory.reclaim /
killing / adjust memory.high) at runtime rather than relying on
memory.high/max behavior with a static limit.
> The shared ownership is indeed broken, but it's not more or less broken
> than, say, memfd + udmabuf, and I'm sure plenty of others.
One thing that's worse about system heap buffers is that unlike memfd
the memory isn't reclaimable. So without killing all users there's
currently no way to deal with the zombie issue. Harry's proposing
reparenting, but I don't think our current interfaces support that
because we'd have to mess with the page structs behind system heap
dmabufs to change the memcg during reparenting.
Ah... but udmabuf pins the memfd pages, so you're right that memfd +
udmabuf isn't worse.
> So we really improve the common case, but only make the "advanced"
> slightly more broken than it already is.
>
> Would you disagree?
I think memcg limits in this case just wouldn't be usable because of
what I mentioned above. In our common case the allocator is in a
different cgroup tree than the real users of the buffer.
> Maxime
On 12/15/25 16:53, Tvrtko Ursulin wrote:
>
> On 15/12/2025 15:38, Christian König wrote:
>> On 12/15/25 10:20, Tvrtko Ursulin wrote:
>>>
>>> On 12/12/2025 15:50, Christian König wrote:
>>>> On 12/11/25 16:13, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 11/12/2025 13:16, Christian König wrote:
>>>>>> Using the inline lock is now the recommended way for dma_fence implementations.
>>>>>>
>>>>>> So use this approach for the scheduler fences as well just in case if
>>>>>> anybody uses this as blueprint for its own implementation.
>>>>>>
>>>>>> Also saves about 4 bytes for the external spinlock.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig(a)amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/scheduler/sched_fence.c | 7 +++----
>>>>>> include/drm/gpu_scheduler.h | 4 ----
>>>>>> 2 files changed, 3 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>> index 08ccbde8b2f5..47471b9e43f9 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>>>> @@ -161,7 +161,7 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>>>>>> /* If we already have an earlier deadline, keep it: */
>>>>>> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>>>>>> ktime_before(fence->deadline, deadline)) {
>>>>>> - spin_unlock_irqrestore(&fence->lock, flags);
>>>>>> + dma_fence_unlock_irqrestore(f, flags);
>>>>>
>>>>> Rebase error I guess. Pull into the locking helpers patch.
>>>>
>>>> No that is actually completely intentional here.
>>>>
>>>> Previously we had a separate lock which protected both the DMA-fences as well as the deadline state.
>>>>
>>>> Now we turn that upside down by dropping the separate lock and protecting the deadline state with the dma_fence lock instead.
>>>
>>> I don't follow. The code is currently like this:
>>>
>>> static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>>> ktime_t deadline)
>>> {
>>> struct drm_sched_fence *fence = to_drm_sched_fence(f);
>>> struct dma_fence *parent;
>>> unsigned long flags;
>>>
>>> spin_lock_irqsave(&fence->lock, flags);
>>>
>>> /* If we already have an earlier deadline, keep it: */
>>> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>>> ktime_before(fence->deadline, deadline)) {
>>> spin_unlock_irqrestore(&fence->lock, flags);
>>> return;
>>> }
>>>
>>> fence->deadline = deadline;
>>> set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>>>
>>> spin_unlock_irqrestore(&fence->lock, flags);...
>>>
>>> The diff changes one out of the three lock/unlock operations. Other two are changed in 3/19. All three should surely be changed in the same patch.
>>
>> We could change those spin_lock/unlock calls in patch #3, but I don't think that this is clean.
>>
>> See the code here currently uses fence->lock and patch #3 would change it to use fence->finished->lock instead. That might be the pointer at the moment, but that is just by coincident and not design.
>>
>> Only this change here ontop makes it intentional that we use fence->finished->lock for everything.
>
> Sorry I still don't follow. After 3/19 and before this 9/19 the function looks like this:
>
> static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
> ktime_t deadline)
> {
> struct drm_sched_fence *fence = to_drm_sched_fence(f);
> struct dma_fence *parent;
> unsigned long flags;
>
> dma_fence_lock_irqsave(f, flags);
>
> /* If we already have an earlier deadline, keep it: */
> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> ktime_before(fence->deadline, deadline)) {
> spin_unlock_irqrestore(&fence->lock, flags);
> return;
> }
>
> fence->deadline = deadline;
> set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>
> dma_fence_unlock_irqrestore(f, flags);
>
> Notice the lonely spin_unlock_irqrestore on the early return path while other two use the dma_fence_(un)lock helpers. Am I blind or how is that clean?
Oh, that's what you mean. Sorry I was blind!
Yeah that is clearly unintentional.
Thanks,
Christian.
>
> Regards,
>
> Tvrtko
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>>
>>>>>> return;
>>>>>> }
>>>>>> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>>>> fence->owner = owner;
>>>>>> fence->drm_client_id = drm_client_id;
>>>>>> - spin_lock_init(&fence->lock);
>>>>>> return fence;
>>>>>> }
>>>>>> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>>>> fence->sched = entity->rq->sched;
>>>>>> seq = atomic_inc_return(&entity->fence_seq);
>>>>>> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>>>> - &fence->lock, entity->fence_context, seq);
>>>>>> + NULL, entity->fence_context, seq);
>>>>>> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>>>> - &fence->lock, entity->fence_context + 1, seq);
>>>>>> + NULL, entity->fence_context + 1, seq);
>>>>>> }
>>>>>> module_init(drm_sched_fence_slab_init);
>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>> index fb88301b3c45..b77f24a783e3 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -297,10 +297,6 @@ struct drm_sched_fence {
>>>>>> * belongs to.
>>>>>> */
>>>>>> struct drm_gpu_scheduler *sched;
>>>>>> - /**
>>>>>> - * @lock: the lock used by the scheduled and the finished fences.
>>>>>> - */
>>>>>> - spinlock_t lock;
>>>>>> /**
>>>>>> * @owner: job owner for debugging
>>>>>> */
>>>>>
>>>>
>>>
>>
>
On 12/15/25 10:20, Tvrtko Ursulin wrote:
>
> On 12/12/2025 15:50, Christian König wrote:
>> On 12/11/25 16:13, Tvrtko Ursulin wrote:
>>>
>>> On 11/12/2025 13:16, Christian König wrote:
>>>> Using the inline lock is now the recommended way for dma_fence implementations.
>>>>
>>>> So use this approach for the scheduler fences as well just in case if
>>>> anybody uses this as blueprint for its own implementation.
>>>>
>>>> Also saves about 4 bytes for the external spinlock.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig(a)amd.com>
>>>> ---
>>>> drivers/gpu/drm/scheduler/sched_fence.c | 7 +++----
>>>> include/drm/gpu_scheduler.h | 4 ----
>>>> 2 files changed, 3 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>>>> index 08ccbde8b2f5..47471b9e43f9 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>>> @@ -161,7 +161,7 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>>>> /* If we already have an earlier deadline, keep it: */
>>>> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>>>> ktime_before(fence->deadline, deadline)) {
>>>> - spin_unlock_irqrestore(&fence->lock, flags);
>>>> + dma_fence_unlock_irqrestore(f, flags);
>>>
>>> Rebase error I guess. Pull into the locking helpers patch.
>>
>> No that is actually completely intentional here.
>>
>> Previously we had a separate lock which protected both the DMA-fences as well as the deadline state.
>>
>> Now we turn that upside down by dropping the separate lock and protecting the deadline state with the dma_fence lock instead.
>
> I don't follow. The code is currently like this:
>
> static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
> ktime_t deadline)
> {
> struct drm_sched_fence *fence = to_drm_sched_fence(f);
> struct dma_fence *parent;
> unsigned long flags;
>
> spin_lock_irqsave(&fence->lock, flags);
>
> /* If we already have an earlier deadline, keep it: */
> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> ktime_before(fence->deadline, deadline)) {
> spin_unlock_irqrestore(&fence->lock, flags);
> return;
> }
>
> fence->deadline = deadline;
> set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
>
> spin_unlock_irqrestore(&fence->lock, flags);...
>
> The diff changes one out of the three lock/unlock operations. Other two are changed in 3/19. All three should surely be changed in the same patch.
We could change those spin_lock/unlock calls in patch #3, but I don't think that this is clean.
See the code here currently uses fence->lock and patch #3 would change it to use fence->finished->lock instead. That might be the pointer at the moment, but that is just by coincident and not design.
Only this change here ontop makes it intentional that we use fence->finished->lock for everything.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> return;
>>>> }
>>>> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>>>> fence->owner = owner;
>>>> fence->drm_client_id = drm_client_id;
>>>> - spin_lock_init(&fence->lock);
>>>> return fence;
>>>> }
>>>> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>>>> fence->sched = entity->rq->sched;
>>>> seq = atomic_inc_return(&entity->fence_seq);
>>>> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>>>> - &fence->lock, entity->fence_context, seq);
>>>> + NULL, entity->fence_context, seq);
>>>> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>>>> - &fence->lock, entity->fence_context + 1, seq);
>>>> + NULL, entity->fence_context + 1, seq);
>>>> }
>>>> module_init(drm_sched_fence_slab_init);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index fb88301b3c45..b77f24a783e3 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -297,10 +297,6 @@ struct drm_sched_fence {
>>>> * belongs to.
>>>> */
>>>> struct drm_gpu_scheduler *sched;
>>>> - /**
>>>> - * @lock: the lock used by the scheduled and the finished fences.
>>>> - */
>>>> - spinlock_t lock;
>>>> /**
>>>> * @owner: job owner for debugging
>>>> */
>>>
>>
>
On 12/15/25 11:51, Maxime Ripard wrote:
> Hi TJ,
>
> On Fri, Dec 12, 2025 at 08:25:19AM +0900, T.J. Mercier wrote:
>> On Fri, Dec 12, 2025 at 4:31 AM Eric Chanudet <echanude(a)redhat.com> wrote:
>>>
>>> The system dma-buf heap lets userspace allocate buffers from the page
>>> allocator. However, these allocations are not accounted for in memcg,
>>> allowing processes to escape limits that may be configured.
>>>
>>> Pass the __GFP_ACCOUNT for our allocations to account them into memcg.
>>
>> We had a discussion just last night in the MM track at LPC about how
>> shared memory accounted in memcg is pretty broken. Without a way to
>> identify (and possibly transfer) ownership of a shared buffer, this
>> makes the accounting of shared memory, and zombie memcg problems
>> worse. :\
>
> Are there notes or a report from that discussion anywhere?
>
> The way I see it, the dma-buf heaps *trivial* case is non-existent at
> the moment and that's definitely broken. Any application can bypass its
> cgroups limits trivially, and that's a pretty big hole in the system.
Well, that is just the tip of the iceberg.
Pretty much all driver interfaces doesn't account to memcg at the moment, all the way from alsa, over GPUs (both TTM and SHM-GEM) to V4L2.
> The shared ownership is indeed broken, but it's not more or less broken
> than, say, memfd + udmabuf, and I'm sure plenty of others.
>
> So we really improve the common case, but only make the "advanced"
> slightly more broken than it already is.
>
> Would you disagree?
I strongly disagree. As far as I can see there is a huge chance we break existing use cases with that.
There has been some work on TTM by Dave but I still haven't found time to wrap my head around all possible side effects such a change can have.
The fundamental problem is that neither memcg nor the classic resource tracking (e.g. the OOM killer) has a good understanding of shared resources.
For example you can use memfd to basically kill any process in the system because the OOM killer can't identify the process which holds the reference to the memory in question. And that is a *MUCH* bigger problem than just inaccurate memcg accounting.
Regards,
Christian.
>
> Maxime
On 12/11/25 16:13, Tvrtko Ursulin wrote:
>
> On 11/12/2025 13:16, Christian König wrote:
>> Using the inline lock is now the recommended way for dma_fence implementations.
>>
>> So use this approach for the scheduler fences as well just in case if
>> anybody uses this as blueprint for its own implementation.
>>
>> Also saves about 4 bytes for the external spinlock.
>>
>> Signed-off-by: Christian König <christian.koenig(a)amd.com>
>> ---
>> drivers/gpu/drm/scheduler/sched_fence.c | 7 +++----
>> include/drm/gpu_scheduler.h | 4 ----
>> 2 files changed, 3 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
>> index 08ccbde8b2f5..47471b9e43f9 100644
>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>> @@ -161,7 +161,7 @@ static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
>> /* If we already have an earlier deadline, keep it: */
>> if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
>> ktime_before(fence->deadline, deadline)) {
>> - spin_unlock_irqrestore(&fence->lock, flags);
>> + dma_fence_unlock_irqrestore(f, flags);
>
> Rebase error I guess. Pull into the locking helpers patch.
No that is actually completely intentional here.
Previously we had a separate lock which protected both the DMA-fences as well as the deadline state.
Now we turn that upside down by dropping the separate lock and protecting the deadline state with the dma_fence lock instead.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>> return;
>> }
>> @@ -217,7 +217,6 @@ struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
>> fence->owner = owner;
>> fence->drm_client_id = drm_client_id;
>> - spin_lock_init(&fence->lock);
>> return fence;
>> }
>> @@ -230,9 +229,9 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>> fence->sched = entity->rq->sched;
>> seq = atomic_inc_return(&entity->fence_seq);
>> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled,
>> - &fence->lock, entity->fence_context, seq);
>> + NULL, entity->fence_context, seq);
>> dma_fence_init(&fence->finished, &drm_sched_fence_ops_finished,
>> - &fence->lock, entity->fence_context + 1, seq);
>> + NULL, entity->fence_context + 1, seq);
>> }
>> module_init(drm_sched_fence_slab_init);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index fb88301b3c45..b77f24a783e3 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -297,10 +297,6 @@ struct drm_sched_fence {
>> * belongs to.
>> */
>> struct drm_gpu_scheduler *sched;
>> - /**
>> - * @lock: the lock used by the scheduled and the finished fences.
>> - */
>> - spinlock_t lock;
>> /**
>> * @owner: job owner for debugging
>> */
>
On 12/12/25 14:20, Karol Wachowski wrote:
> Add missing drm_gem_object_put() call when drm_gem_object_lookup()
> successfully returns an object. This fixes a GEM object reference
> leak that can prevent driver modules from unloading when using
> prime buffers.
>
> Fixes: 53096728b891 ("drm: Add DRM prime interface to reassign GEM handle")
> Signed-off-by: Karol Wachowski <karol.wachowski(a)linux.intel.com>
> ---
> Changes between v1 and v2:
> - move setting ret value under if branch as suggested in review
> - add Cc: stable 6.18+
Oh don't CC the stable list on the review mail directly, just add "CC: stable(a)vger.kernel.org # 6.18+" to the tags. Greg is going to complain about that :(
With that done Reviewed-by: Christian König <christian.koenig(a)amd.com> and please push to drm-misc-fixes.
If you don't have commit rights for drm-misc-fixes please ping me and I'm going to push that.
Thanks,
Christian.
> ---
> drivers/gpu/drm/drm_gem.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index ca1956608261..bcc08a6aebf8 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1010,8 +1010,10 @@ int drm_gem_change_handle_ioctl(struct drm_device *dev, void *data,
> if (!obj)
> return -ENOENT;
>
> - if (args->handle == args->new_handle)
> - return 0;
> + if (args->handle == args->new_handle) {
> + ret = 0;
> + goto out;
> + }
>
> mutex_lock(&file_priv->prime.lock);
>
> @@ -1043,6 +1045,8 @@ int drm_gem_change_handle_ioctl(struct drm_device *dev, void *data,
>
> out_unlock:
> mutex_unlock(&file_priv->prime.lock);
> +out:
> + drm_gem_object_put(obj);
>
> return ret;
> }