5.4 doesn't have commit backported which introduces RCU for cgroup root_list, but has a commit fixing a UAF (and thus a CVE) which depends on it.
Thus, we need to backport the original commits. See thread: https://lore.kernel.org/all/xr93ikus2nd1.fsf@gthelen-cloudtop.c.googlers.com...
This patch series backports the requisite commits to 5.4.y, which are picked up from the above mentioned thread.
Thanks, Siddh
Waiman Long (1): cgroup: Move rcu_head up near the top of cgroup_root
Yafang Shao (1): cgroup: Make operations on the cgroup root_list RCU safe
include/linux/cgroup-defs.h | 7 ++++--- kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 23 ++++++++++++++++------- 3 files changed, 22 insertions(+), 11 deletions(-)
From: Yafang Shao laoar.shao@gmail.com
commit d23b5c577715892c87533b13923306acc6243f93 upstream.
At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial.
Signed-off-by: Yafang Shao laoar.shao@gmail.com Signed-off-by: Tejun Heo tj@kernel.org [fp: adapt to 5.10 mainly because of changes made by e210a89f5b07 ("cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes")] Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru [Shivani: Modified to apply on v5.4.y] Signed-off-by: Shivani Agarwal shivani.agarwal@broadcom.com Reviewed-by: Siddh Raman Pant siddh.raman.pant@oracle.com Signed-off-by: Siddh Raman Pant siddh.raman.pant@oracle.com --- include/linux/cgroup-defs.h | 1 + kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 23 ++++++++++++++++------- 3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index d15884957e7f..c64f11674850 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -517,6 +517,7 @@ struct cgroup_root {
/* A list running through the active hierarchies */ struct list_head root_list; + struct rcu_head rcu;
/* Hierarchy-specific flags */ unsigned int flags; diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 803989eae99e..bb85acc1114e 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -172,7 +172,8 @@ extern struct list_head cgroup_roots;
/* iterate across the hierarchies */ #define for_each_root(root) \ - list_for_each_entry((root), &cgroup_roots, root_list) + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ + lockdep_is_held(&cgroup_mutex))
/** * for_each_subsys - iterate all enabled cgroup subsystems diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 79e57b6df731..273a8a42cb72 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1314,7 +1314,7 @@ void cgroup_free_root(struct cgroup_root *root) { if (root) { idr_destroy(&root->cgroup_idr); - kfree(root); + kfree_rcu(root, rcu); } }
@@ -1348,7 +1348,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) spin_unlock_irq(&css_set_lock);
if (!list_empty(&root->root_list)) { - list_del(&root->root_list); + list_del_rcu(&root->root_list); cgroup_root_count--; }
@@ -1401,7 +1401,6 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, { struct cgroup *res = NULL;
- lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock);
if (cset == &init_css_set) { @@ -1421,13 +1420,23 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, } }
- BUG_ON(!res); + /* + * If cgroup_mutex is not held, the cgrp_cset_link will be freed + * before we remove the cgroup root from the root_list. Consequently, + * when accessing a cgroup root, the cset_link may have already been + * freed, resulting in a NULL res_cgroup. However, by holding the + * cgroup_mutex, we ensure that res_cgroup can't be NULL. + * If we don't hold cgroup_mutex in the caller, we must do the NULL + * check. + */ return res; }
/* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held to prevent task's groups from being modified. + * Must be called with either cgroup_mutex or rcu read lock to prevent the + * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -2012,7 +2021,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) struct cgroup_root *root = ctx->root; struct cgroup *cgrp = &root->cgrp;
- INIT_LIST_HEAD(&root->root_list); + INIT_LIST_HEAD_RCU(&root->root_list); atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); @@ -2094,7 +2103,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ - list_add(&root->root_list, &cgroup_roots); + list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++;
/*
[ Sasha's backport helper bot ]
Hi,
The upstream commit SHA1 provided is correct: d23b5c577715892c87533b13923306acc6243f93
WARNING: Author mismatch between patch and upstream commit: Backport author: Siddh Raman Pant siddh.raman.pant@oracle.com Commit author: Yafang Shao laoar.shao@gmail.com
Status in newer kernel trees: 6.12.y | Present (exact SHA1) 6.11.y | Present (exact SHA1) 6.6.y | Present (different SHA1: dd9542ae7c7c) 6.1.y | Present (different SHA1: f5b7a9792041) 5.15.y | Present (different SHA1: de77545c72c4) 5.10.y | Present (different SHA1: 45a81667e0e8) 5.4.y | Not found
Note: The patch differs from the upstream commit: --- 1: d23b5c5777158 ! 1: ab0b59573d9f9 cgroup: Make operations on the cgroup root_list RCU safe @@ Metadata ## Commit message ## cgroup: Make operations on the cgroup root_list RCU safe
+ commit d23b5c577715892c87533b13923306acc6243f93 upstream. + At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold @@ Commit message
Signed-off-by: Yafang Shao laoar.shao@gmail.com Signed-off-by: Tejun Heo tj@kernel.org + [fp: adapt to 5.10 mainly because of changes made by e210a89f5b07 + ("cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated + codes")] + Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru + [Shivani: Modified to apply on v5.4.y] + Signed-off-by: Shivani Agarwal shivani.agarwal@broadcom.com + Reviewed-by: Siddh Raman Pant siddh.raman.pant@oracle.com + Signed-off-by: Siddh Raman Pant siddh.raman.pant@oracle.com
## include/linux/cgroup-defs.h ## @@ include/linux/cgroup-defs.h: struct cgroup_root { @@ kernel/cgroup/cgroup-internal.h: extern struct list_head cgroup_roots; * for_each_subsys - iterate all enabled cgroup subsystems
## kernel/cgroup/cgroup.c ## -@@ kernel/cgroup/cgroup.c: static void cgroup_exit_root_id(struct cgroup_root *root) - - void cgroup_free_root(struct cgroup_root *root) +@@ kernel/cgroup/cgroup.c: void cgroup_free_root(struct cgroup_root *root) { -- kfree(root); -+ kfree_rcu(root, rcu); + if (root) { + idr_destroy(&root->cgroup_idr); +- kfree(root); ++ kfree_rcu(root, rcu); + } }
- static void cgroup_destroy_root(struct cgroup_root *root) @@ kernel/cgroup/cgroup.c: static void cgroup_destroy_root(struct cgroup_root *root) spin_unlock_irq(&css_set_lock);
- WARN_ON_ONCE(list_empty(&root->root_list)); -- list_del(&root->root_list); -+ list_del_rcu(&root->root_list); - cgroup_root_count--; + if (!list_empty(&root->root_list)) { +- list_del(&root->root_list); ++ list_del_rcu(&root->root_list); + cgroup_root_count--; + }
- if (!have_favordynmods) -@@ kernel/cgroup/cgroup.c: static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset, +@@ kernel/cgroup/cgroup.c: static struct cgroup *cset_cgroup_from_root(struct css_set *cset, + { + struct cgroup *res = NULL; + +- lockdep_assert_held(&cgroup_mutex); + lockdep_assert_held(&css_set_lock); + + if (cset == &init_css_set) { +@@ kernel/cgroup/cgroup.c: static struct cgroup *cset_cgroup_from_root(struct css_set *cset, } }
-- BUG_ON(!res_cgroup); +- BUG_ON(!res); + /* + * If cgroup_mutex is not held, the cgrp_cset_link will be freed + * before we remove the cgroup root from the root_list. Consequently, @@ kernel/cgroup/cgroup.c: static inline struct cgroup *__cset_cgroup_from_root(str + * If we don't hold cgroup_mutex in the caller, we must do the NULL + * check. + */ - return res_cgroup; + return res; }
-@@ kernel/cgroup/cgroup.c: static struct cgroup *current_cgns_cgroup_dfl(void) - static struct cgroup *cset_cgroup_from_root(struct css_set *cset, - struct cgroup_root *root) - { -- lockdep_assert_held(&cgroup_mutex); - lockdep_assert_held(&css_set_lock); - - return __cset_cgroup_from_root(cset, root); -@@ kernel/cgroup/cgroup.c: static struct cgroup *cset_cgroup_from_root(struct css_set *cset, - /* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-5.4.y | Success | Success |
From: Waiman Long longman@redhat.com
commit a7fb0423c201ba12815877a0b5a68a6a1710b23a upstream.
Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu() for freeing the cgroup_root.
The current implementation of kvfree_rcu(), however, has the limitation that the offset of the rcu_head structure within the larger data structure must be less than 4096 or the compilation will fail. See the macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h for more information.
By putting rcu_head below the large cgroup structure, any change to the cgroup structure that makes it larger run the risk of causing build failure under certain configurations. Commit 77070eeb8821 ("cgroup: Avoid false cacheline sharing of read mostly rstat_cpu") happens to be the last straw that breaks it. Fix this problem by moving the rcu_head structure up before the cgroup structure.
Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/ Signed-off-by: Waiman Long longman@redhat.com Acked-by: Yafang Shao laoar.shao@gmail.com Reviewed-by: Yosry Ahmed yosryahmed@google.com Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru [Shivani: Modified to apply on v5.4.y] Signed-off-by: Shivani Agarwal shivani.agarwal@broadcom.com Reviewed-by: Siddh Raman Pant siddh.raman.pant@oracle.com Signed-off-by: Siddh Raman Pant siddh.raman.pant@oracle.com --- include/linux/cgroup-defs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index c64f11674850..f0798d98be8e 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -506,6 +506,10 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id;
+ /* A list running through the active hierarchies */ + struct list_head root_list; + struct rcu_head rcu; + /* The root cgroup. Root is destroyed on its release. */ struct cgroup cgrp;
@@ -515,10 +519,6 @@ struct cgroup_root { /* Number of cgroups in the hierarchy, used only for /proc/cgroups */ atomic_t nr_cgrps;
- /* A list running through the active hierarchies */ - struct list_head root_list; - struct rcu_head rcu; - /* Hierarchy-specific flags */ unsigned int flags;
[ Sasha's backport helper bot ]
Hi,
The upstream commit SHA1 provided is correct: a7fb0423c201ba12815877a0b5a68a6a1710b23a
WARNING: Author mismatch between patch and upstream commit: Backport author: Siddh Raman Pant siddh.raman.pant@oracle.com Commit author: Waiman Long longman@redhat.com
Status in newer kernel trees: 6.12.y | Present (exact SHA1) 6.11.y | Present (exact SHA1) 6.6.y | Present (different SHA1: f3c60ab676bb) 6.1.y | Present (different SHA1: 0e76e9bb1d8d) 5.15.y | Present (different SHA1: 9405d778a49a) 5.10.y | Present (different SHA1: 4abf1841680f) 5.4.y | Not found
Note: The patch differs from the upstream commit: --- Failed to apply patch cleanly, falling back to interdiff... ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-5.4.y | Success | Success |
linux-stable-mirror@lists.linaro.org