On Thu, Aug 13, 2020 at 07:30:55PM +0800, Yang Yingliang wrote:
Hi,
On 2020/7/20 23:36, Greg Kroah-Hartman wrote:
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is copied, so the cgroup refcnt must be taken too. And, unlike the sk_alloc() path, sock_update_netprioidx() is not called here. Therefore, it is safe and necessary to grab the cgroup refcnt even when cgroup_sk_alloc is disabled.
sk_clone_lock() is in BH context anyway, the in_interrupt() would terminate this function if called there. And for sk_alloc() skcd->val is always zero. So it's safe to factor out the code to make it more readable.
The global variable 'cgroup_sk_alloc_disabled' is used to determine whether to take these reference counts. It is impossible to make the reference counting correct unless we save this bit of information in skcd->val. So, add a new bit there to record whether the socket has already taken the reference counts. This obviously relies on kmalloc() to align cgroup pointers to at least 4 bytes, ARCH_KMALLOC_MINALIGN is certainly larger than that.
This bug seems to be introduced since the beginning, commit d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") tried to fix it but not compeletely. It seems not easy to trigger until the recent commit 090e28b229af ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Cameron Berkenpas cam@neo-zeon.de Reported-by: Peter Geis pgwipeout@gmail.com Reported-by: Lu Fengqi lufq.fnst@cn.fujitsu.com Reported-by: Daniƫl Sonck dsonck92@gmail.com Reported-by: Zhang Qiang qiang.zhang@windriver.com Tested-by: Cameron Berkenpas cam@neo-zeon.de Tested-by: Peter Geis pgwipeout@gmail.com Tested-by: Thomas Lamprecht t.lamprecht@proxmox.com Cc: Daniel Borkmann daniel@iogearbox.net Cc: Zefan Li lizefan@huawei.com Cc: Tejun Heo tj@kernel.org Cc: Roman Gushchin guro@fb.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
[...]
+void cgroup_sk_clone(struct sock_cgroup_data *skcd) +{
- /* Socket clone path */
- if (skcd->val) {
Compare to mainline patch, it's missing *if (skcd->no_refcnt)* check here.
Is it a mistake here ?
Possibly, it is in the cgroup_sk_free() call. Can you send a patch to fix this up?
thanks,
greg k-h