Backport summary ---------------- 1756d7994ad8 ("cgroup: Use open-time credentials for process migraton perm checks") * Cherry pick from 4.19-stable, no modifications.
0d2b5955b362 ("cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv") * Cherry-pick from 4.19-stable, minor contextual adjustement.
e57457641613 ("cgroup: Use open-time cgroup namespace for process migration perm checks") * Cherry-pick from 4.19-stable, no modifications.
Testing ------- There are no cgroup selftests in 4.14, but when running the ones from 4.19 on the 4.14 kernel, all selftests pass:
root@intel-x86-64:~# ./test_core ok 1 test_cgcore_internal_process_constraint ok 2 test_cgcore_top_down_constraint_enable ok 3 test_cgcore_top_down_constraint_disable ok 4 test_cgcore_no_internal_process_constraint_on_threads ok 5 test_cgcore_parent_becomes_threaded ok 6 test_cgcore_invalid_domain ok 7 test_cgcore_populated ok 8 test_cgcore_lesser_euid_open ok 9 test_cgcore_lesser_ns_open
Tejun Heo (3): cgroup: Use open-time credentials for process migraton perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time cgroup namespace for process migration perm checks
kernel/cgroup/cgroup-internal.h | 19 ++++++++ kernel/cgroup/cgroup-v1.c | 33 ++++++++------ kernel/cgroup/cgroup.c | 81 +++++++++++++++++++++++---------- 3 files changed, 95 insertions(+), 38 deletions(-)
From: Tejun Heo tj@kernel.org
commit 1756d7994ad85c2479af6ae5a9750b92324685af upstream.
cgroup process migration permission checks are performed at write time as whether a given operation is allowed or not is dependent on the content of the write - the PID. This currently uses current's credentials which is a potential security weakness as it may allow scenarios where a less privileged process tricks a more privileged one into writing into a fd that it created.
This patch makes both cgroup2 and cgroup1 process migration interfaces to use the credentials saved at the time of open (file->f_cred) instead of current's.
Reported-by: "Eric W. Biederman" ebiederm@xmission.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Fixes: 187fe84067bd ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy") Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org [OP: backport to v4.14: apply original __cgroup_procs_write() changes to cgroup_threads_write() and cgroup_procs_write()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- kernel/cgroup/cgroup-v1.c | 7 ++++--- kernel/cgroup/cgroup.c | 17 ++++++++++++++++- 2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 105f5b2f5978..4a2b148b900d 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -535,10 +535,11 @@ static ssize_t __cgroup1_procs_write(struct kernfs_open_file *of, goto out_unlock;
/* - * Even if we're attaching all tasks in the thread group, we only - * need to check permissions on one of them. + * Even if we're attaching all tasks in the thread group, we only need + * to check permissions on one of them. Check permissions using the + * credentials from file open to protect against inherited fd attacks. */ - cred = current_cred(); + cred = of->file->f_cred; tcred = get_task_cred(task); if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && !uid_eq(cred->euid, tcred->uid) && diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index d5044ca33bd0..380500251b96 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4381,6 +4381,7 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of, { struct cgroup *src_cgrp, *dst_cgrp; struct task_struct *task; + const struct cred *saved_cred; ssize_t ret;
dst_cgrp = cgroup_kn_lock_live(of->kn, false); @@ -4397,8 +4398,15 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of, src_cgrp = task_cgroup_from_root(task, &cgrp_dfl_root); spin_unlock_irq(&css_set_lock);
+ /* + * Process and thread migrations follow same delegation rule. Check + * permissions using the credentials from file open to protect against + * inherited fd attacks. + */ + saved_cred = override_creds(of->file->f_cred); ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, of->file->f_path.dentry->d_sb); + revert_creds(saved_cred); if (ret) goto out_finish;
@@ -4422,6 +4430,7 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of, { struct cgroup *src_cgrp, *dst_cgrp; struct task_struct *task; + const struct cred *saved_cred; ssize_t ret;
buf = strstrip(buf); @@ -4440,9 +4449,15 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of, src_cgrp = task_cgroup_from_root(task, &cgrp_dfl_root); spin_unlock_irq(&css_set_lock);
- /* thread migrations follow the cgroup.procs delegation rule */ + /* + * Process and thread migrations follow same delegation rule. Check + * permissions using the credentials from file open to protect against + * inherited fd attacks. + */ + saved_cred = override_creds(of->file->f_cred); ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, of->file->f_path.dentry->d_sb); + revert_creds(saved_cred); if (ret) goto out_finish;
From: Tejun Heo tj@kernel.org
commit 0d2b5955b36250a9428c832664f2079cbf723bec upstream.
of->priv is currently used by each interface file implementation to store private information. This patch collects the current two private data usages into struct cgroup_file_ctx which is allocated and freed by the common path. This allows generic private data which applies to multiple files, which will be used to in the following patch.
Note that cgroup_procs iterator is now embedded as procs.iter in the new cgroup_file_ctx so that it doesn't need to be allocated and freed separately.
v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in cgroup_file_ctx as suggested by Linus.
v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too. Converted. Didn't change to embedded allocation as cgroup1 pidlists get stored for caching.
Signed-off-by: Tejun Heo tj@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Reviewed-by: Michal Koutný mkoutny@suse.com [mkoutny: v5.10: modify cgroup.pressure handlers, adjust context] Signed-off-by: Michal Koutný mkoutny@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [OP: backport to v4.14: drop changes to cgroup_pressure_*() functions] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- kernel/cgroup/cgroup-internal.h | 17 +++++++++++++ kernel/cgroup/cgroup-v1.c | 26 ++++++++++---------- kernel/cgroup/cgroup.c | 42 ++++++++++++++++++++------------- 3 files changed, 57 insertions(+), 28 deletions(-)
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index bf54ade001be..aee1af8bf849 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -8,6 +8,23 @@ #include <linux/list.h> #include <linux/refcount.h>
+struct cgroup_pidlist; + +struct cgroup_file_ctx { + struct { + void *trigger; + } psi; + + struct { + bool started; + struct css_task_iter iter; + } procs; + + struct { + struct cgroup_pidlist *pidlist; + } procs1; +}; + /* * A cgroup can be associated with multiple css_sets as different tasks may * belong to different cgroups on different hierarchies. In the other diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 4a2b148b900d..a14182e90f57 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -426,6 +426,7 @@ static void *cgroup_pidlist_start(struct seq_file *s, loff_t *pos) * next pid to display, if any */ struct kernfs_open_file *of = s->private; + struct cgroup_file_ctx *ctx = of->priv; struct cgroup *cgrp = seq_css(s)->cgroup; struct cgroup_pidlist *l; enum cgroup_filetype type = seq_cft(s)->private; @@ -435,25 +436,24 @@ static void *cgroup_pidlist_start(struct seq_file *s, loff_t *pos) mutex_lock(&cgrp->pidlist_mutex);
/* - * !NULL @of->priv indicates that this isn't the first start() - * after open. If the matching pidlist is around, we can use that. - * Look for it. Note that @of->priv can't be used directly. It - * could already have been destroyed. + * !NULL @ctx->procs1.pidlist indicates that this isn't the first + * start() after open. If the matching pidlist is around, we can use + * that. Look for it. Note that @ctx->procs1.pidlist can't be used + * directly. It could already have been destroyed. */ - if (of->priv) - of->priv = cgroup_pidlist_find(cgrp, type); + if (ctx->procs1.pidlist) + ctx->procs1.pidlist = cgroup_pidlist_find(cgrp, type);
/* * Either this is the first start() after open or the matching * pidlist has been destroyed inbetween. Create a new one. */ - if (!of->priv) { - ret = pidlist_array_load(cgrp, type, - (struct cgroup_pidlist **)&of->priv); + if (!ctx->procs1.pidlist) { + ret = pidlist_array_load(cgrp, type, &ctx->procs1.pidlist); if (ret) return ERR_PTR(ret); } - l = of->priv; + l = ctx->procs1.pidlist;
if (pid) { int end = l->length; @@ -481,7 +481,8 @@ static void *cgroup_pidlist_start(struct seq_file *s, loff_t *pos) static void cgroup_pidlist_stop(struct seq_file *s, void *v) { struct kernfs_open_file *of = s->private; - struct cgroup_pidlist *l = of->priv; + struct cgroup_file_ctx *ctx = of->priv; + struct cgroup_pidlist *l = ctx->procs1.pidlist;
if (l) mod_delayed_work(cgroup_pidlist_destroy_wq, &l->destroy_dwork, @@ -492,7 +493,8 @@ static void cgroup_pidlist_stop(struct seq_file *s, void *v) static void *cgroup_pidlist_next(struct seq_file *s, void *v, loff_t *pos) { struct kernfs_open_file *of = s->private; - struct cgroup_pidlist *l = of->priv; + struct cgroup_file_ctx *ctx = of->priv; + struct cgroup_pidlist *l = ctx->procs1.pidlist; pid_t *p = v; pid_t *end = l->list + l->length; /* diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 380500251b96..9381100d6770 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3364,18 +3364,31 @@ static int cgroup_stat_show(struct seq_file *seq, void *v) static int cgroup_file_open(struct kernfs_open_file *of) { struct cftype *cft = of->kn->priv; + struct cgroup_file_ctx *ctx; + int ret;
- if (cft->open) - return cft->open(of); - return 0; + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + of->priv = ctx; + + if (!cft->open) + return 0; + + ret = cft->open(of); + if (ret) + kfree(ctx); + return ret; }
static void cgroup_file_release(struct kernfs_open_file *of) { struct cftype *cft = of->kn->priv; + struct cgroup_file_ctx *ctx = of->priv;
if (cft->release) cft->release(of); + kfree(ctx); }
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, @@ -4270,21 +4283,21 @@ void css_task_iter_end(struct css_task_iter *it)
static void cgroup_procs_release(struct kernfs_open_file *of) { - if (of->priv) { - css_task_iter_end(of->priv); - kfree(of->priv); - } + struct cgroup_file_ctx *ctx = of->priv; + + if (ctx->procs.started) + css_task_iter_end(&ctx->procs.iter); }
static void *cgroup_procs_next(struct seq_file *s, void *v, loff_t *pos) { struct kernfs_open_file *of = s->private; - struct css_task_iter *it = of->priv; + struct cgroup_file_ctx *ctx = of->priv;
if (pos) (*pos)++;
- return css_task_iter_next(it); + return css_task_iter_next(&ctx->procs.iter); }
static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos, @@ -4292,21 +4305,18 @@ static void *__cgroup_procs_start(struct seq_file *s, loff_t *pos, { struct kernfs_open_file *of = s->private; struct cgroup *cgrp = seq_css(s)->cgroup; - struct css_task_iter *it = of->priv; + struct cgroup_file_ctx *ctx = of->priv; + struct css_task_iter *it = &ctx->procs.iter;
/* * When a seq_file is seeked, it's always traversed sequentially * from position 0, so we can simply keep iterating on !0 *pos. */ - if (!it) { + if (!ctx->procs.started) { if (WARN_ON_ONCE((*pos))) return ERR_PTR(-EINVAL); - - it = kzalloc(sizeof(*it), GFP_KERNEL); - if (!it) - return ERR_PTR(-ENOMEM); - of->priv = it; css_task_iter_start(&cgrp->self, iter_flags, it); + ctx->procs.started = true; } else if (!(*pos)) { css_task_iter_end(it); css_task_iter_start(&cgrp->self, iter_flags, it);
From: Tejun Heo tj@kernel.org
commit e57457641613fef0d147ede8bd6a3047df588b95 upstream.
cgroup process migration permission checks are performed at write time as whether a given operation is allowed or not is dependent on the content of the write - the PID. This currently uses current's cgroup namespace which is a potential security weakness as it may allow scenarios where a less privileged process tricks a more privileged one into writing into a fd that it created.
This patch makes cgroup remember the cgroup namespace at the time of open and uses it for migration permission checks instad of current's. Note that this only applies to cgroup2 as cgroup1 doesn't have namespace support.
This also fixes a use-after-free bug on cgroupns reported in
https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Note that backporting this fix also requires the preceding patch.
Reported-by: "Eric W. Biederman" ebiederm@xmission.com Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Cc: Michal Koutný mkoutny@suse.com Cc: Oleg Nesterov oleg@redhat.com Reviewed-by: Michal Koutný mkoutny@suse.com Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com Fixes: 5136f6365ce3 ("cgroup: implement "nsdelegate" mount option") Signed-off-by: Tejun Heo tj@kernel.org [mkoutny: v5.10: duplicate ns check in procs/threads write handler, adjust context] Signed-off-by: Michal Koutný mkoutny@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [OP: backport to v4.14: drop changes to cgroup_attach_permissions() and cgroup_css_set_fork(), adjust cgroup_procs_write_permission() calls] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- kernel/cgroup/cgroup-internal.h | 2 ++ kernel/cgroup/cgroup.c | 24 +++++++++++++++++------- 2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index aee1af8bf849..421c01590cb5 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -11,6 +11,8 @@ struct cgroup_pidlist;
struct cgroup_file_ctx { + struct cgroup_namespace *ns; + struct { void *trigger; } psi; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 9381100d6770..63d1349a17a3 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3370,14 +3370,19 @@ static int cgroup_file_open(struct kernfs_open_file *of) ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) return -ENOMEM; + + ctx->ns = current->nsproxy->cgroup_ns; + get_cgroup_ns(ctx->ns); of->priv = ctx;
if (!cft->open) return 0;
ret = cft->open(of); - if (ret) + if (ret) { + put_cgroup_ns(ctx->ns); kfree(ctx); + } return ret; }
@@ -3388,13 +3393,14 @@ static void cgroup_file_release(struct kernfs_open_file *of)
if (cft->release) cft->release(of); + put_cgroup_ns(ctx->ns); kfree(ctx); }
static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { - struct cgroup_namespace *ns = current->nsproxy->cgroup_ns; + struct cgroup_file_ctx *ctx = of->priv; struct cgroup *cgrp = of->kn->parent->priv; struct cftype *cft = of->kn->priv; struct cgroup_subsys_state *css; @@ -3408,7 +3414,7 @@ static ssize_t cgroup_file_write(struct kernfs_open_file *of, char *buf, */ if ((cgrp->root->flags & CGRP_ROOT_NS_DELEGATE) && !(cft->flags & CFTYPE_NS_DELEGATABLE) && - ns != &init_cgroup_ns && ns->root_cset->dfl_cgrp == cgrp) + ctx->ns != &init_cgroup_ns && ctx->ns->root_cset->dfl_cgrp == cgrp) return -EPERM;
if (cft->write) @@ -4351,9 +4357,9 @@ static int cgroup_procs_show(struct seq_file *s, void *v)
static int cgroup_procs_write_permission(struct cgroup *src_cgrp, struct cgroup *dst_cgrp, - struct super_block *sb) + struct super_block *sb, + struct cgroup_namespace *ns) { - struct cgroup_namespace *ns = current->nsproxy->cgroup_ns; struct cgroup *com_cgrp = src_cgrp; struct inode *inode; int ret; @@ -4389,6 +4395,7 @@ static int cgroup_procs_write_permission(struct cgroup *src_cgrp, static ssize_t cgroup_procs_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { + struct cgroup_file_ctx *ctx = of->priv; struct cgroup *src_cgrp, *dst_cgrp; struct task_struct *task; const struct cred *saved_cred; @@ -4415,7 +4422,8 @@ static ssize_t cgroup_procs_write(struct kernfs_open_file *of, */ saved_cred = override_creds(of->file->f_cred); ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, - of->file->f_path.dentry->d_sb); + of->file->f_path.dentry->d_sb, + ctx->ns); revert_creds(saved_cred); if (ret) goto out_finish; @@ -4438,6 +4446,7 @@ static void *cgroup_threads_start(struct seq_file *s, loff_t *pos) static ssize_t cgroup_threads_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { + struct cgroup_file_ctx *ctx = of->priv; struct cgroup *src_cgrp, *dst_cgrp; struct task_struct *task; const struct cred *saved_cred; @@ -4466,7 +4475,8 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of, */ saved_cred = override_creds(of->file->f_cred); ret = cgroup_procs_write_permission(src_cgrp, dst_cgrp, - of->file->f_path.dentry->d_sb); + of->file->f_path.dentry->d_sb, + ctx->ns); revert_creds(saved_cred); if (ret) goto out_finish;
On Thu, Apr 14, 2022 at 12:24:18PM +0300, Ovidiu Panait wrote:
Backport summary
1756d7994ad8 ("cgroup: Use open-time credentials for process migraton perm checks")
- Cherry pick from 4.19-stable, no modifications.
0d2b5955b362 ("cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv")
- Cherry-pick from 4.19-stable, minor contextual adjustement.
e57457641613 ("cgroup: Use open-time cgroup namespace for process migration perm checks")
- Cherry-pick from 4.19-stable, no modifications.
Testing
There are no cgroup selftests in 4.14, but when running the ones from 4.19 on the 4.14 kernel, all selftests pass:
THanks for these, all now queued up!
greg k-h
linux-stable-mirror@lists.linaro.org