On Mon, Sep 25, 2023 at 09:41:24AM +0200, Michal Hocko wrote:
On Fri 22-09-23 16:00:30, Roman Gushchin wrote:
On Wed, Sep 20, 2023 at 03:47:37PM +0200, Michal Hocko wrote:
On Wed 20-09-23 15:25:23, Jeremi Piotrowski wrote:
On 9/20/2023 1:07 PM, Michal Hocko wrote:
[...]
I mean, normally I would be just fine reverting this API change because it is disruptive but the only way to have the file available and not break somebody is to revert 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") as well. Or to ignore any value written there but that sounds rather dubious. Although one could argue this would mimic nokmem kernel option.
I just want to make sure we don't introduce yet another new behavior in this legacy system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit" (=don't event store the written limit). The latter might have unintended consequences.
Yes it would mean that the limit is never enforced. Bad as it is the thing is that the hard limit on kernel memory is broken by design and unfixable. This causes all sorts of unexpected kernel allocation failures that this is simply unsafe to use.
All that being said I can see the following options
- keep the current upstream status and not export the file
- revert both 58056f77502f and 86327e8eb94 and make it clear that kmem.limit_in_bytes is unsupported so failures or misbehavior as a result of the limit being hit are likely not going to be investigated or fixed.
- reverting like in 2) but never inforce the limit (so basically nokmem semantic)
Since it's a part of cgroup v1 interface, which is in a frozen state as a whole, and there is no significant (performance, code complexity) benefit of additionally deprecating kmem.limit_in_bytes, I vote for 2).
- is also an option.
We have a stronger agrement over 3) http://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz. Please speak up if you disagree.
This works for me too. Thank you!
Btw, it seems like going forward we should be more resistant for any cgroup v1 changes and just leave it as it is.
Thanks.