From: Kumar Kartikeya Dwivedi memxor@gmail.com
[ Upstream commit 2c895133950646f45e5cf3900b168c952c8dbee8 ]
The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the cgroup corresponding to a given cgroup ID. This helper can be called in a lot of contexts where the current thread can be random. A recent example was its use in sched_ext's ops.tick(), to obtain the root cgroup pointer. Since the current task can be whatever random user space task preempted by the timer tick, this makes the behavior of the helper unreliable.
Refactor out __cgroup_get_from_id as the non-namespace aware version of cgroup_get_from_id, and change bpf_cgroup_from_id to make use of it.
There is no compatibility breakage here, since changing the namespace against which the lookup is being done to the root cgroup namespace only permits a wider set of lookups to succeed now. The cgroup IDs across namespaces are globally unique, and thus don't need to be retranslated.
Reported-by: Dan Schatzberg dschatzberg@meta.com Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20250915032618.1551762-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES - This patch fixes a real functional bug in `bpf_cgroup_from_id()` that breaks sched_ext and other BPF users when the helper runs from contexts where `current` lives in an unrelated cgroup namespace.
**Bug Details** - Today `bpf_cgroup_from_id()` simply calls `cgroup_get_from_id()` (`kernel/bpf/helpers.c:2539`), and that helper rejects IDs not visible from `current`’s namespace via the `cgroup_is_descendant()` check (`kernel/cgroup/cgroup.c:6407`). When the kfunc is invoked from timer and irq contexts (e.g. sched_ext `ops.tick()`), `current` is just whatever user task was interrupted, so the lookup spuriously returns `NULL` and the BPF scheduler treats valid cgroups as gone. - Documentation already describes the ID lookup as global (`Documentation/bpf/kfuncs.rst:653`), so current behaviour contradicts the documented contract and leads to unpredictable failures for BPF programs that cache cgroup IDs.
**Fix Mechanics** - The patch factors the namespace-agnostic portion of the lookup into a new `__cgroup_get_from_id()` placed directly above the existing helper in `kernel/cgroup/cgroup.c` (~6376 after applying the change). That routine mirrors the old code path but returns as soon as the refcounted `struct cgroup` is acquired, skipping the namespace filter. - `bpf_cgroup_from_id()` is switched to call the new helper (`kernel/bpf/helpers.c:2539` post-patch), so BPF programs always see the globally unique ID mapping they rely on. The public declaration in `include/linux/cgroup.h:653` is added so other in-kernel users can opt into the unrestricted lookup if they intentionally need it. - The original `cgroup_get_from_id()` continues to enforce namespace visibility for existing callers (block layer, memcg, BPF iterators), so their semantics are unchanged.
**Risk Assessment** - Behaviour only widens the set of IDs that succeed for this BPF kfunc; no kernel data structures or locking rules change. The lookup and refcount handling remain identical, so regression risk is low. - The broader visibility is acceptable because accessing kfuncs of this class already requires privileged BPF programs; the cgroup maintainers (Acked-by: Tejun Heo) agreed the helper should operate on the global namespace. - No new exports or user-visible ABI are introduced—the change is confined to in-kernel helpers and a single BPF kfunc.
**Stable Backport Notes** - The patch is self-contained and applies cleanly as long as commit 332ea1f697be (“bpf: Add bpf_cgroup_from_id() kfunc”) is present, which is true for current stable lines. No follow-up fixes are required. - Without it, sched_ext BPF schedulers and other consumers that cache cgroup IDs will continue to misbehave whenever executed from asynchronous contexts, so backporting is warranted.
include/linux/cgroup.h | 1 + kernel/bpf/helpers.c | 2 +- kernel/cgroup/cgroup.c | 24 ++++++++++++++++++++---- 3 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b18fb5fcb38e2..b08c8e62881cd 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -650,6 +650,7 @@ static inline void cgroup_kthread_ready(void) }
void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen); +struct cgroup *__cgroup_get_from_id(u64 id); struct cgroup *cgroup_get_from_id(u64 id); #else /* !CONFIG_CGROUPS */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 8af62cb243d9e..0bde01edf5e6e 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2540,7 +2540,7 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) { struct cgroup *cgrp;
- cgrp = cgroup_get_from_id(cgid); + cgrp = __cgroup_get_from_id(cgid); if (IS_ERR(cgrp)) return NULL; return cgrp; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 77d02f87f3f12..c62b98f027f99 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6373,15 +6373,15 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) }
/* - * cgroup_get_from_id : get the cgroup associated with cgroup id + * __cgroup_get_from_id : get the cgroup associated with cgroup id * @id: cgroup id * On success return the cgrp or ERR_PTR on failure - * Only cgroups within current task's cgroup NS are valid. + * There are no cgroup NS restrictions. */ -struct cgroup *cgroup_get_from_id(u64 id) +struct cgroup *__cgroup_get_from_id(u64 id) { struct kernfs_node *kn; - struct cgroup *cgrp, *root_cgrp; + struct cgroup *cgrp;
kn = kernfs_find_and_get_node_by_id(cgrp_dfl_root.kf_root, id); if (!kn) @@ -6403,6 +6403,22 @@ struct cgroup *cgroup_get_from_id(u64 id)
if (!cgrp) return ERR_PTR(-ENOENT); + return cgrp; +} + +/* + * cgroup_get_from_id : get the cgroup associated with cgroup id + * @id: cgroup id + * On success return the cgrp or ERR_PTR on failure + * Only cgroups within current task's cgroup NS are valid. + */ +struct cgroup *cgroup_get_from_id(u64 id) +{ + struct cgroup *cgrp, *root_cgrp; + + cgrp = __cgroup_get_from_id(id); + if (IS_ERR(cgrp)) + return cgrp;
root_cgrp = current_cgns_cgroup_dfl(); if (!cgroup_is_descendant(cgrp, root_cgrp)) {