On 12/19/25 11:25, Maxime Ripard wrote:
On Mon, Dec 15, 2025 at 03:53:22PM +0100, Christian König wrote:
On 12/15/25 14:59, Maxime Ripard wrote:
...
The shared ownership is indeed broken, but it's not more or less broken than, say, memfd + udmabuf, and I'm sure plenty of others.
So we really improve the common case, but only make the "advanced" slightly more broken than it already is.
Would you disagree?
I strongly disagree. As far as I can see there is a huge chance we break existing use cases with that.
Which ones? And what about the ones that are already broken?
Well everybody that expects that driver resources are *not* accounted to memcg.
Which is a thing only because these buffers have never been accounted for in the first place.
Yeah, completely agree. By not accounting it for such a long time we ended up with people depending on this behavior.
Not nice, but that's what it is.
So I guess the conclusion is that we shouldn't even try to do memory accounting, because someone somewhere might not expect that one of its application would take too much RAM in the system?
Well we do need some kind of solution to the problem. Either having some setting where you say "This memcg limit is inclusive/exclusive device driver allocated memory" or have a completely separate limit for device driver allocated memory.
Key point is we have both use cases, so we need to support both.
There has been some work on TTM by Dave but I still haven't found time to wrap my head around all possible side effects such a change can have.
The fundamental problem is that neither memcg nor the classic resource tracking (e.g. the OOM killer) has a good understanding of shared resources.
And yet heap allocations don't necessarily have to be shared. But they all have to be allocated.
For example you can use memfd to basically kill any process in the system because the OOM killer can't identify the process which holds the reference to the memory in question. And that is a *MUCH* bigger problem than just inaccurate memcg accounting.
When you frame it like that, sure. Also, you can use the system heap to DoS any process in the system. I'm not saying that what you're concerned about isn't an issue, but let's not brush off other people legitimate issues as well.
Completely agree, but we should prioritize.
That driver allocated memory is not memcg accounted is actually uAPI, e.g. that is not something which can easily change.
While fixing the OOM killer looks perfectly doable and will then most likely also show a better path how to fix the memcg accounting.
I don't necessarily disagree, but we don't necessarily have the same priorities either. Your use-cases are probably quite different from mine, and that's ok. But that's precisely why all these discussions should be made on the ML when possible, or at least have some notes when a discussion has happened at a conference or something.
So far, my whole experience with this topic, despite being the only one (afaik) sending patches about this for the last 1.5y, is that everytime some work on this is done the answer is "oh but you shouldn't have worked on it because we completely changed our mind", and that's pretty frustrating.
Welcome to the club :)
I've already posted patches to start addressing at least the OOM killer issue ~10 years ago.
Those patches were not well received because back then driver memory was negligible and the problem simply didn't hurt much.
But by now we have GPUs and AI accelerators which eat up 90% of your system memory, security researchers stumbling over it and IIRC even multiple CVE numbers for some of the resulting issues...
I should probably dig it up and re-send my patch set.
Happy holidays, Christian.
Maxime
linaro-mm-sig@lists.linaro.org