On 9/20/2023 11:25 AM, Greg Kroah-Hartman wrote:
On Wed, Sep 20, 2023 at 10:43:56AM +0200, Michal Hocko wrote:
On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
Hi Greg/Michal,
This commit breaks userspace which makes it a bad commit for mainline and an even worse commit for stable.
We ingested 6.1.54 into our nightly testing and found that runc fails to gather cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored into kubelet and kubelet fails to start if this operation fails. 6.1.53 is fine.
Could you expand some more on why is the file read? It doesn't support writing to it for some time so how does reading it helps in any sense?
Anyway, I do agree that the stable backport should be reverted.
That will just postpone the breakage, we really shouldn't break userspace.
That being said, having userspace "break" because a file is no longer present is not good coding style on the userspace side at all. That's why we have sysfs and single-value-files now, if the file isn't present, then userspace instantly notices and can handle it. Much easier than the old-style multi-fields-in-one-file problem.
The memcg files in this case are single-value, but userspace expects to be able to read memcg limits when it can read the usage (indicating MEMCG is enabled). If it can't - then something is off, and the node is marked unhealthy.
Address this by wiping out the file completely and effectively get back to pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
The fact that this is a valid option (i.e. no file) with that config option disabled makes me want to keep this as well, as how does userspace handle this option disabled at all? Or old kernels?
Userspace has had to handle the case of MEMCG_KMEM=n, but that had 2 cases so far:
limits/usage/max_usage/failcnt files are all available or none of them are available.
Now it needs to handle 3 of 4 files being available, but only for kmem (and not plain memory, memsw or kmem.tcp). That's an inconsistency.
I can drop this from stable kernels, but again, this feels like the runc developers are just postponing the problem...
Since cgroups v1 is deprecated, I think the runc developers haven't touched this part of the code in years and expected it to keep working while they wait for the long tail of usage to die out.
thanks,
greg k-h