From: Feng Zhou zhoufeng.zf@bytedance.com
Trace sched related functions, such as enqueue_task_fair, it is necessary to specify a task instead of the current task which within a given cgroup.
Feng Zhou (2): bpf: Add bpf_task_under_cgroup() kfunc selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog: v5->v6: Addressed comments from Yonghong Song - Some code format modifications. - Add ack-by Details in here: https://lore.kernel.org/all/20230504031513.13749-1-zhoufeng.zf@bytedance.com...
v4->v5: Addressed comments from Yonghong Song - Some code format modifications. Details in here: https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.com...
v3->v4: Addressed comments from Yonghong Song - Modify test cases and test other tasks, not the current task. Details in here: https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.com...
v2->v3: Addressed comments from Alexei Starovoitov - Modify the comment information of the function. - Narrow down the testcase's hook point Details in here: https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.com...
v1->v2: Addressed comments from Alexei Starovoitov - Add kfunc instead. Details in here: https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.com...
kernel/bpf/helpers.c | 20 +++++++ tools/testing/selftests/bpf/DENYLIST.s390x | 1 + .../bpf/prog_tests/task_under_cgroup.c | 53 +++++++++++++++++++ .../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++ 4 files changed, 125 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
From: Feng Zhou zhoufeng.zf@bytedance.com
Add a kfunc that's similar to the bpf_current_task_under_cgroup. The difference is that it is a designated task.
When hook sched related functions, sometimes it is necessary to specify a task instead of the current task.
Signed-off-by: Feng Zhou zhoufeng.zf@bytedance.com Acked-by: Yonghong Song yhs@fb.com --- kernel/bpf/helpers.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index bb6b4637ebf2..453cbd312366 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) return NULL; return cgrp; } + +/** + * bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a kfunc, test + * task's membership of cgroup ancestry. + * @task: the task to be tested + * @ancestor: possible ancestor of @task's cgroup + * + * Tests whether @task's default cgroup hierarchy is a descendant of @ancestor. + * It follows all the same rules as cgroup_is_descendant, and only applies + * to the default hierarchy. + */ +__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task, + struct cgroup *ancestor) +{ + if (unlikely(!ancestor || !task)) + return -EINVAL; + + return task_under_cgroup_hierarchy(task, ancestor); +} #endif /* CONFIG_CGROUPS */
/** @@ -2400,6 +2419,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_acquire, KF_ACQUIRE | KF_RCU | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE) BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU) #endif BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) BTF_SET8_END(generic_btf_ids)
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
<...>
kernel/bpf/helpers.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index bb6b4637ebf2..453cbd312366 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) return NULL; return cgrp; }
+/**
- bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a kfunc, test
- task's membership of cgroup ancestry.
- @task: the task to be tested
- @ancestor: possible ancestor of @task's cgroup
- Tests whether @task's default cgroup hierarchy is a descendant of @ancestor.
- It follows all the same rules as cgroup_is_descendant, and only applies
- to the default hierarchy.
- */
+__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
struct cgroup *ancestor)
+{
if (unlikely(!ancestor || !task))
return -EINVAL;
return task_under_cgroup_hierarchy(task, ancestor);
+} #endif /* CONFIG_CGROUPS */
I wonder in what situation a null 'task' or 'ancestor' can be passed. Please call out in the comment that the returned value can be a negative error, so that writing if(bpf_task_under_cgroup()) may cause surprising results.
Hao
在 2023/5/5 14:58, Hao Luo 写道:
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
<...>
kernel/bpf/helpers.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index bb6b4637ebf2..453cbd312366 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) return NULL; return cgrp; }
+/**
- bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a kfunc, test
- task's membership of cgroup ancestry.
- @task: the task to be tested
- @ancestor: possible ancestor of @task's cgroup
- Tests whether @task's default cgroup hierarchy is a descendant of @ancestor.
- It follows all the same rules as cgroup_is_descendant, and only applies
- to the default hierarchy.
- */
+__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
struct cgroup *ancestor)
+{
if (unlikely(!ancestor || !task))
return -EINVAL;
return task_under_cgroup_hierarchy(task, ancestor);
+} #endif /* CONFIG_CGROUPS */
I wonder in what situation a null 'task' or 'ancestor' can be passed. Please call out in the comment that the returned value can be a negative error, so that writing if(bpf_task_under_cgroup()) may cause surprising results.
Hao
Hmm, you are right. As kfunc, the NULL value of the parameter is judged, and bpf verify will prompt the developer to add it. There is really no need to add this part of the judgment. See other people's opinions.
On 5/5/23 12:18 AM, Feng Zhou wrote:
在 2023/5/5 14:58, Hao Luo 写道:
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
<...>
kernel/bpf/helpers.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index bb6b4637ebf2..453cbd312366 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) return NULL; return cgrp; }
+/**
- bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a
kfunc, test
- task's membership of cgroup ancestry.
- @task: the task to be tested
- @ancestor: possible ancestor of @task's cgroup
- Tests whether @task's default cgroup hierarchy is a descendant of
@ancestor.
- It follows all the same rules as cgroup_is_descendant, and only
applies
- to the default hierarchy.
- */
+__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task, + struct cgroup *ancestor) +{ + if (unlikely(!ancestor || !task)) + return -EINVAL;
+ return task_under_cgroup_hierarchy(task, ancestor); +} #endif /* CONFIG_CGROUPS */
I wonder in what situation a null 'task' or 'ancestor' can be passed. Please call out in the comment that the returned value can be a negative error, so that writing if(bpf_task_under_cgroup()) may cause surprising results.
Hao
Hmm, you are right. As kfunc, the NULL value of the parameter is judged, and bpf verify will prompt the developer to add it. There is really no need to add this part of the judgment. See other people's opinions.
Thanks for pointing out Hou.
Currently, bpf_task_under_cgroup() is marked as KF_RCU.
Per documentation: 2.4.7 KF_RCU flag -----------------
The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees that the objects are valid and there is no use-after-free. The pointers are not NULL, but the object's refcount could have reached zero. The kfuncs need to consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely also be KF_RET_NULL.
The pointer cannot be NULL, so the following line of code can be removed:
if (unlikely(!ancestor || !task))
return -EINVAL;
I think we do not need to check refcnt != 0 case since ancestor and task won't go away.
In the example of second patch, both arguments are TRUSTED arguments which is stronger than RCU, so the test itself is okay. I am considering whether we should enforce arguments of the kfunc to be KF_TRUSTED_ARGS, but I think esp. in some cases, cgroup might be RCU protected e.g., task->cgroup->dfl_cgrp. So leaving argument requirement as KF_RCU should be better.
On Fri, May 5, 2023 at 11:44 AM Yonghong Song yhs@meta.com wrote:
On 5/5/23 12:18 AM, Feng Zhou wrote:
在 2023/5/5 14:58, Hao Luo 写道:
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
<...>
kernel/bpf/helpers.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index bb6b4637ebf2..453cbd312366 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2149,6 +2149,25 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid) return NULL; return cgrp; }
+/**
- bpf_task_under_cgroup - wrap task_under_cgroup_hierarchy() as a
kfunc, test
- task's membership of cgroup ancestry.
- @task: the task to be tested
- @ancestor: possible ancestor of @task's cgroup
- Tests whether @task's default cgroup hierarchy is a descendant of
@ancestor.
- It follows all the same rules as cgroup_is_descendant, and only
applies
- to the default hierarchy.
- */
+__bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
struct cgroup *ancestor)
+{
if (unlikely(!ancestor || !task))
return -EINVAL;
return task_under_cgroup_hierarchy(task, ancestor);
+} #endif /* CONFIG_CGROUPS */
I wonder in what situation a null 'task' or 'ancestor' can be passed. Please call out in the comment that the returned value can be a negative error, so that writing if(bpf_task_under_cgroup()) may cause surprising results.
Hao
Hmm, you are right. As kfunc, the NULL value of the parameter is judged, and bpf verify will prompt the developer to add it. There is really no need to add this part of the judgment. See other people's opinions.
Thanks for pointing out Hou.
Currently, bpf_task_under_cgroup() is marked as KF_RCU.
Per documentation: 2.4.7 KF_RCU flag
The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees that the objects are valid and there is no use-after-free. The pointers are not NULL, but the object's refcount could have reached zero. The kfuncs need to consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely also be KF_RET_NULL.
The pointer cannot be NULL, so the following line of code can be removed:
if (unlikely(!ancestor || !task))
return -EINVAL;
Right. With KF_RCU the verifier guarantees != NULL. Let's get rid of this check. This will make the return value clean.
I think we do not need to check refcnt != 0 case since ancestor and task won't go away.
correct.
In the example of second patch, both arguments are TRUSTED arguments which is stronger than RCU, so the test itself is okay. I am considering whether we should enforce arguments of the kfunc to be KF_TRUSTED_ARGS, but I think esp. in some cases, cgroup might be RCU protected e.g., task->cgroup->dfl_cgrp. So leaving argument requirement as KF_RCU should be better.
+1
From: Feng Zhou zhoufeng.zf@bytedance.com
test_progs: Tests new kfunc bpf_task_under_cgroup().
The bpf program saves the new task's pid within a given cgroup to the remote_pid, which is convenient for the user-mode program to verify the test correctness.
The user-mode program creates its own mount namespace, and mounts the cgroupsv2 hierarchy in there, call the fork syscall, then check if remote_pid and local_pid are unequal.
Signed-off-by: Feng Zhou zhoufeng.zf@bytedance.com Acked-by: Yonghong Song yhs@fb.com --- tools/testing/selftests/bpf/DENYLIST.s390x | 1 + .../bpf/prog_tests/task_under_cgroup.c | 53 +++++++++++++++++++ .../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++ 3 files changed, 105 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x index c7463f3ec3c0..5061d9e24c16 100644 --- a/tools/testing/selftests/bpf/DENYLIST.s390x +++ b/tools/testing/selftests/bpf/DENYLIST.s390x @@ -26,3 +26,4 @@ user_ringbuf # failed to find kernel BTF type ID of verif_stats # trace_vprintk__open_and_load unexpected error: -9 (?) xdp_bonding # failed to auto-attach program 'trace_on_entry': -524 (trampoline) xdp_metadata # JIT does not support calling kernel function (kfunc) +test_task_under_cgroup # JIT does not support calling kernel function (kfunc) diff --git a/tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c b/tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c new file mode 100644 index 000000000000..4224727fb364 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Bytedance */ + +#include <sys/syscall.h> +#include <test_progs.h> +#include <cgroup_helpers.h> +#include "test_task_under_cgroup.skel.h" + +#define FOO "/foo" + +void test_task_under_cgroup(void) +{ + struct test_task_under_cgroup *skel; + int ret, foo; + pid_t pid; + + foo = test__join_cgroup(FOO); + if (!ASSERT_OK(foo < 0, "cgroup_join_foo")) + return; + + skel = test_task_under_cgroup__open(); + if (!ASSERT_OK_PTR(skel, "test_task_under_cgroup__open")) + goto cleanup; + + skel->rodata->local_pid = getpid(); + skel->bss->remote_pid = getpid(); + skel->rodata->cgid = get_cgroup_id(FOO); + + ret = test_task_under_cgroup__load(skel); + if (!ASSERT_OK(ret, "test_task_under_cgroup__load")) + goto cleanup; + + ret = test_task_under_cgroup__attach(skel); + if (!ASSERT_OK(ret, "test_task_under_cgroup__attach")) + goto cleanup; + + pid = fork(); + if (pid == 0) + exit(0); + + ret = (pid == -1); + if (ASSERT_OK(ret, "fork process")) + wait(NULL); + + test_task_under_cgroup__detach(skel); + + ASSERT_NEQ(skel->bss->remote_pid, skel->rodata->local_pid, + "test task_under_cgroup"); + +cleanup: + test_task_under_cgroup__destroy(skel); + close(foo); +} diff --git a/tools/testing/selftests/bpf/progs/test_task_under_cgroup.c b/tools/testing/selftests/bpf/progs/test_task_under_cgroup.c new file mode 100644 index 000000000000..56cdc0a553f0 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_task_under_cgroup.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Bytedance */ + +#include <vmlinux.h> +#include <bpf/bpf_tracing.h> +#include <bpf/bpf_helpers.h> + +#include "bpf_misc.h" + +struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; +long bpf_task_under_cgroup(struct task_struct *task, struct cgroup *ancestor) __ksym; +void bpf_cgroup_release(struct cgroup *p) __ksym; +struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; + +const volatile int local_pid; +const volatile __u64 cgid; +int remote_pid; + +SEC("tp_btf/task_newtask") +int BPF_PROG(handle__task_newtask, struct task_struct *task, u64 clone_flags) +{ + struct cgroup *cgrp = NULL; + struct task_struct *acquired; + + if (local_pid != (bpf_get_current_pid_tgid() >> 32)) + return 0; + + acquired = bpf_task_acquire(task); + if (!acquired) + return 0; + + if (local_pid == acquired->tgid) + goto out; + + cgrp = bpf_cgroup_from_id(cgid); + if (!cgrp) + goto out; + + if (bpf_task_under_cgroup(acquired, cgrp)) + remote_pid = acquired->tgid; + +out: + if (cgrp) + bpf_cgroup_release(cgrp); + bpf_task_release(acquired); + + return 0; +} + +char _license[] SEC("license") = "GPL";
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
From: Feng Zhou zhoufeng.zf@bytedance.com
test_progs: Tests new kfunc bpf_task_under_cgroup().
The bpf program saves the new task's pid within a given cgroup to the remote_pid, which is convenient for the user-mode program to verify the test correctness.
The user-mode program creates its own mount namespace, and mounts the cgroupsv2 hierarchy in there, call the fork syscall, then check if remote_pid and local_pid are unequal.
Signed-off-by: Feng Zhou zhoufeng.zf@bytedance.com Acked-by: Yonghong Song yhs@fb.com
Hi Feng,
I have a comment about the methodology of the test, but the patch looks ok to me. Why do we have to test via a tracing program? I think what we need is just a task and a cgroup. Since we have the kfunc bpf_task_from_pid() and bpf_cgroup_from_id(), we can write a syscall program which takes a pid and a cgroup id as input and get the task and cgroup objects directly in the program.
I like testing via a syscall program because it doesn't depend on the newtask tracepoint and it should be simpler. But I'm ok with the current version of the patch, just have some thoughts.
Hao
在 2023/5/5 15:13, Hao Luo 写道:
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
From: Feng Zhou zhoufeng.zf@bytedance.com
test_progs: Tests new kfunc bpf_task_under_cgroup().
The bpf program saves the new task's pid within a given cgroup to the remote_pid, which is convenient for the user-mode program to verify the test correctness.
The user-mode program creates its own mount namespace, and mounts the cgroupsv2 hierarchy in there, call the fork syscall, then check if remote_pid and local_pid are unequal.
Signed-off-by: Feng Zhou zhoufeng.zf@bytedance.com Acked-by: Yonghong Song yhs@fb.com
Hi Feng,
I have a comment about the methodology of the test, but the patch looks ok to me. Why do we have to test via a tracing program? I think what we need is just a task and a cgroup. Since we have the kfunc bpf_task_from_pid() and bpf_cgroup_from_id(), we can write a syscall program which takes a pid and a cgroup id as input and get the task and cgroup objects directly in the program.
I like testing via a syscall program because it doesn't depend on the newtask tracepoint and it should be simpler. But I'm ok with the current version of the patch, just have some thoughts.
Hao
Yes, your method is also very good. The reason why I did this is because of Song's suggestion before, hope that the parameter of the hook point will have a task, so I chose this to test.
On 5/5/23 12:24 AM, Feng Zhou wrote:
在 2023/5/5 15:13, Hao Luo 写道:
On Thu, May 4, 2023 at 11:08 PM Feng zhou zhoufeng.zf@bytedance.com wrote:
From: Feng Zhou zhoufeng.zf@bytedance.com
test_progs: Tests new kfunc bpf_task_under_cgroup().
The bpf program saves the new task's pid within a given cgroup to the remote_pid, which is convenient for the user-mode program to verify the test correctness.
The user-mode program creates its own mount namespace, and mounts the cgroupsv2 hierarchy in there, call the fork syscall, then check if remote_pid and local_pid are unequal.
Signed-off-by: Feng Zhou zhoufeng.zf@bytedance.com Acked-by: Yonghong Song yhs@fb.com
Hi Feng,
I have a comment about the methodology of the test, but the patch looks ok to me. Why do we have to test via a tracing program? I think what we need is just a task and a cgroup. Since we have the kfunc bpf_task_from_pid() and bpf_cgroup_from_id(), we can write a syscall program which takes a pid and a cgroup id as input and get the task and cgroup objects directly in the program.
I like testing via a syscall program because it doesn't depend on the newtask tracepoint and it should be simpler. But I'm ok with the current version of the patch, just have some thoughts.
Hao
Yes, your method is also very good. The reason why I did this is because of Song's suggestion before, hope that the parameter of the hook point will have a task, so I chose this to test.
The motivation of this patch is: Trace sched related functions, such as enqueue_task_fair, it is necessary to specify a task instead of the current task which within a given cgroup.
So I think it is okay to have a test related to sched.
linux-kselftest-mirror@lists.linaro.org