Hi,
On 2020/7/20 23:36, Greg Kroah-Hartman wrote:
From: Cong Wang xiyou.wangcong@gmail.com
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]
When we clone a socket in sk_clone_lock(), its sk_cgrp_data is copied, so the cgroup refcnt must be taken too. And, unlike the sk_alloc() path, sock_update_netprioidx() is not called here. Therefore, it is safe and necessary to grab the cgroup refcnt even when cgroup_sk_alloc is disabled.
sk_clone_lock() is in BH context anyway, the in_interrupt() would terminate this function if called there. And for sk_alloc() skcd->val is always zero. So it's safe to factor out the code to make it more readable.
The global variable 'cgroup_sk_alloc_disabled' is used to determine whether to take these reference counts. It is impossible to make the reference counting correct unless we save this bit of information in skcd->val. So, add a new bit there to record whether the socket has already taken the reference counts. This obviously relies on kmalloc() to align cgroup pointers to at least 4 bytes, ARCH_KMALLOC_MINALIGN is certainly larger than that.
This bug seems to be introduced since the beginning, commit d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") tried to fix it but not compeletely. It seems not easy to trigger until the recent commit 090e28b229af ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Cameron Berkenpas cam@neo-zeon.de Reported-by: Peter Geis pgwipeout@gmail.com Reported-by: Lu Fengqi lufq.fnst@cn.fujitsu.com Reported-by: Daniƫl Sonck dsonck92@gmail.com Reported-by: Zhang Qiang qiang.zhang@windriver.com Tested-by: Cameron Berkenpas cam@neo-zeon.de Tested-by: Peter Geis pgwipeout@gmail.com Tested-by: Thomas Lamprecht t.lamprecht@proxmox.com Cc: Daniel Borkmann daniel@iogearbox.net Cc: Zefan Li lizefan@huawei.com Cc: Tejun Heo tj@kernel.org Cc: Roman Gushchin guro@fb.com Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
[...]
+void cgroup_sk_clone(struct sock_cgroup_data *skcd) +{
- /* Socket clone path */
- if (skcd->val) {
Compare to mainline patch, it's missing *if (skcd->no_refcnt)* check here.
Is it a mistake here ?
Thanks,
Yang
/*
* We might be cloning a socket which is left in an empty
* cgroup and the cgroup might have already been rmdir'd.
* Don't use cgroup_get_live().
*/
cgroup_get(sock_cgroup_ptr(skcd));
- }
+}
- void cgroup_sk_free(struct sock_cgroup_data *skcd) {
- if (skcd->no_refcnt)
return;
- cgroup_put(sock_cgroup_ptr(skcd)); }
--- a/net/core/sock.c +++ b/net/core/sock.c @@ -1694,7 +1694,7 @@ struct sock *sk_clone_lock(const struct /* sk->sk_memcg will be populated at accept() time */ newsk->sk_memcg = NULL;
cgroup_sk_alloc(&newsk->sk_cgrp_data);
cgroup_sk_clone(&newsk->sk_cgrp_data);
rcu_read_lock(); filter = rcu_dereference(sk->sk_filter);
.