We have switched to memcg based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now.
This patchset cleanups the usage of RLIMIT_MEMLOCK in tools/bpf/,
tools/testing/selftests/bpf and samples/bpf. The file
tools/testing/selftests/bpf/bpf_rlimit.h is removed. The included header
sys/resource.h is removed from many files as it is useless in these files.
- v3: Get rid of bpf_rlimit.h and fix some typos (Andrii)
- v2: Use libbpf_set_strict_mode instead. (Andrii)
- v1: https://lore.kernel.org/bpf/20220320060815.7716-2-laoar.shao@gmail.com/
Yafang Shao (27):
bpf: selftests: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK in
xdping
bpf: selftests: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK in
xdpxceiver
bpf: selftests: No need to include bpf_rlimit.h in test_tcpnotify_user
bpf: selftests: No need to include bpf_rlimit.h in flow_dissector_load
bpf: selftests: Set libbpf 1.0 API mode explicitly in
get_cgroup_id_user
bpf: selftests: Set libbpf 1.0 API mode explicitly in
test_cgroup_storage
bpf: selftests: Set libbpf 1.0 API mode explicitly in
get_cgroup_id_user
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_lpm_map
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_lru_map
bpf: selftests: Set libbpf 1.0 API mode explicitly in
test_skb_cgroup_id_user
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_sock_addr
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_sock
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_sockmap
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_sysctl
bpf: selftests: Set libbpf 1.0 API mode explicitly in test_tag
bpf: selftests: Set libbpf 1.0 API mode explicitly in
test_tcp_check_syncookie_user
bpf: selftests: Set libbpf 1.0 API mode explicitly in
test_verifier_log
bpf: samples: Set libbpf 1.0 API mode explicitly in hbm
bpf: selftests: Get rid of bpf_rlimit.h
bpf: selftests: No need to include sys/resource.h in some files
bpf: samples: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK in
xdpsock_user
bpf: samples: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK in
xsk_fwd
bpf: samples: No need to include sys/resource.h in many files
bpf: bpftool: Remove useless return value of libbpf_set_strict_mode
bpf: bpftool: Set LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK for legacy libbpf
bpf: bpftool: remove RLIMIT_MEMLOCK
bpf: runqslower: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK
samples/bpf/cpustat_user.c | 1 -
samples/bpf/hbm.c | 5 ++--
samples/bpf/ibumad_user.c | 1 -
samples/bpf/map_perf_test_user.c | 1 -
samples/bpf/offwaketime_user.c | 1 -
samples/bpf/sockex2_user.c | 1 -
samples/bpf/sockex3_user.c | 1 -
samples/bpf/spintest_user.c | 1 -
samples/bpf/syscall_tp_user.c | 1 -
samples/bpf/task_fd_query_user.c | 1 -
samples/bpf/test_lru_dist.c | 1 -
samples/bpf/test_map_in_map_user.c | 1 -
samples/bpf/test_overhead_user.c | 1 -
samples/bpf/tracex2_user.c | 1 -
samples/bpf/tracex3_user.c | 1 -
samples/bpf/tracex4_user.c | 1 -
samples/bpf/tracex5_user.c | 1 -
samples/bpf/tracex6_user.c | 1 -
samples/bpf/xdp1_user.c | 1 -
samples/bpf/xdp_adjust_tail_user.c | 1 -
samples/bpf/xdp_monitor_user.c | 1 -
samples/bpf/xdp_redirect_cpu_user.c | 1 -
samples/bpf/xdp_redirect_map_multi_user.c | 1 -
samples/bpf/xdp_redirect_user.c | 1 -
samples/bpf/xdp_router_ipv4_user.c | 1 -
samples/bpf/xdp_rxq_info_user.c | 1 -
samples/bpf/xdp_sample_pkts_user.c | 1 -
samples/bpf/xdp_sample_user.c | 1 -
samples/bpf/xdp_tx_iptunnel_user.c | 1 -
samples/bpf/xdpsock_user.c | 9 ++----
samples/bpf/xsk_fwd.c | 7 ++---
tools/bpf/bpftool/common.c | 8 ------
tools/bpf/bpftool/feature.c | 2 --
tools/bpf/bpftool/main.c | 6 ++--
tools/bpf/bpftool/main.h | 2 --
tools/bpf/bpftool/map.c | 2 --
tools/bpf/bpftool/pids.c | 1 -
tools/bpf/bpftool/prog.c | 3 --
tools/bpf/bpftool/struct_ops.c | 2 --
tools/bpf/runqslower/runqslower.c | 18 ++----------
tools/testing/selftests/bpf/bench.c | 1 -
tools/testing/selftests/bpf/bpf_rlimit.h | 28 -------------------
.../selftests/bpf/flow_dissector_load.c | 6 ++--
.../selftests/bpf/get_cgroup_id_user.c | 4 ++-
tools/testing/selftests/bpf/prog_tests/btf.c | 1 -
.../selftests/bpf/test_cgroup_storage.c | 4 ++-
tools/testing/selftests/bpf/test_dev_cgroup.c | 4 ++-
tools/testing/selftests/bpf/test_lpm_map.c | 4 ++-
tools/testing/selftests/bpf/test_lru_map.c | 4 ++-
.../selftests/bpf/test_skb_cgroup_id_user.c | 4 ++-
tools/testing/selftests/bpf/test_sock.c | 4 ++-
tools/testing/selftests/bpf/test_sock_addr.c | 4 ++-
tools/testing/selftests/bpf/test_sockmap.c | 5 ++--
tools/testing/selftests/bpf/test_sysctl.c | 4 ++-
tools/testing/selftests/bpf/test_tag.c | 4 ++-
.../bpf/test_tcp_check_syncookie_user.c | 4 ++-
.../selftests/bpf/test_tcpnotify_user.c | 1 -
.../testing/selftests/bpf/test_verifier_log.c | 5 ++--
.../selftests/bpf/xdp_redirect_multi.c | 1 -
tools/testing/selftests/bpf/xdping.c | 8 ++----
tools/testing/selftests/bpf/xdpxceiver.c | 6 ++--
61 files changed, 57 insertions(+), 142 deletions(-)
delete mode 100644 tools/testing/selftests/bpf/bpf_rlimit.h
--
2.17.1
eBPF already allows programs to be preloaded and kept running without
intervention from user space. There is a dedicated kernel module called
bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
program. If this module is enabled in the kernel configuration, its loading
will be triggered when the bpf filesystem is mounted (unless the module is
built-in), and the links of iterators_bpf are pinned in that filesystem
(they will appear as the progs.debug and maps.debug files).
However, the current mechanism, if used to preload an LSM, would not offer
the same security guarantees of LSMs integrated in the security subsystem.
Also, it is not generic enough to be used for preloading arbitrary eBPF
programs, unless the bpf_preload code is heavily modified.
More specifically, the security problems are:
- any program can be pinned to the bpf filesystem without limitations
(unless a MAC mechanism enforces some restrictions);
- programs being executed can be terminated at any time by deleting the
pinned objects or unmounting the bpf filesystem.
The usability problems are:
- only a fixed amount of links can be pinned;
- only links can be pinned, other object types are not supported;
- code to pin objects has to be written manually;
- preloading multiple eBPF programs is not practical, bpf_preload has to be
modified to include additional light skeletons.
Solve the security problems by mounting the bpf filesystem from the kernel,
by preloading authenticated kernel modules (e.g. with module.sig_enforce)
and by pinning objects to that filesystem. This particular filesystem
instance guarantees that desired eBPF programs run until the very end of
the kernel lifecycle, since even root cannot interfere with it.
Solve the usability problems by generalizing the pinning function, to
handle not only links but also maps and progs. Also increment the object
reference count and call the pinning function directly from the preload
method (currently in the bpf_preload kernel module) rather than from the
bpf filesystem code itself, so that a generic eBPF program can do those
operations depending on its objects (this also avoids the limitation of the
fixed-size array for storing the objects to pin).
Then, simplify the process of pinning objects defined by a generic eBPF
program by automatically generating the required methods in the light
skeleton. Also, generate a separate kernel module for each eBPF program to
preload, so that existing ones don't have to be modified. Finally, support
preloading multiple eBPF programs by allowing users to specify a list from
the kernel configuration, at build time, or with the new kernel option
bpf_preload_list=, at run-time.
To summarize, this patch set makes it possible to plug in out-of-tree LSMs
matching the security guarantees of their counterpart in the security
subsystem, without having to modify the kernel itself. The same benefits
are extended to other eBPF program types.
Only one remaining problem is how to support auto-attaching eBPF programs
with LSM type. It will be solved with a separate patch set.
Patches 1-2 export some definitions, to build out-of-tree kernel modules
with eBPF programs to preload. Patches 3-4 allow eBPF programs to pin
objects by themselves. Patches 5-10 automatically generate the methods for
preloading in the light skeleton. Patches 11-14 make it possible to preload
multiple eBPF programs. Patch 15 automatically generates the kernel module
for preloading an eBPF program, patch 16 does a kernel mount of the bpf
filesystem, and finally patches 17-18 test the functionality introduced.
Roberto Sassu (18):
bpf: Export bpf_link_inc()
bpf-preload: Move bpf_preload.h to include/linux
bpf-preload: Generalize object pinning from the kernel
bpf-preload: Export and call bpf_obj_do_pin_kernel()
bpf-preload: Generate static variables
bpf-preload: Generate free_objs_and_skel()
bpf-preload: Generate preload()
bpf-preload: Generate load_skel()
bpf-preload: Generate code to pin non-internal maps
bpf-preload: Generate bpf_preload_ops
bpf-preload: Store multiple bpf_preload_ops structures in a linked
list
bpf-preload: Implement new registration method for preloading eBPF
programs
bpf-preload: Move pinned links and maps to a dedicated directory in
bpffs
bpf-preload: Switch to new preload registration method
bpf-preload: Generate code of kernel module to preload
bpf-preload: Do kernel mount to ensure that pinned objects don't
disappear
bpf-preload/selftests: Add test for automatic generation of preload
methods
bpf-preload/selftests: Preload a test eBPF program and check pinned
objects
.../admin-guide/kernel-parameters.txt | 8 +
fs/namespace.c | 1 +
include/linux/bpf.h | 5 +
include/linux/bpf_preload.h | 37 ++
init/main.c | 2 +
kernel/bpf/inode.c | 295 +++++++++--
kernel/bpf/preload/Kconfig | 25 +-
kernel/bpf/preload/bpf_preload.h | 16 -
kernel/bpf/preload/bpf_preload_kern.c | 85 +---
kernel/bpf/preload/iterators/Makefile | 9 +-
.../bpf/preload/iterators/iterators.lskel.h | 466 +++++++++++-------
kernel/bpf/syscall.c | 1 +
.../bpf/bpftool/Documentation/bpftool-gen.rst | 13 +
tools/bpf/bpftool/bash-completion/bpftool | 6 +-
tools/bpf/bpftool/gen.c | 331 +++++++++++++
tools/bpf/bpftool/main.c | 7 +-
tools/bpf/bpftool/main.h | 1 +
tools/testing/selftests/bpf/Makefile | 32 +-
.../bpf/bpf_testmod_preload/.gitignore | 7 +
.../bpf/bpf_testmod_preload/Makefile | 20 +
.../gen_preload_methods.expected.diff | 97 ++++
.../bpf/prog_tests/test_gen_preload_methods.c | 27 +
.../bpf/prog_tests/test_preload_methods.c | 69 +++
.../selftests/bpf/progs/gen_preload_methods.c | 23 +
24 files changed, 1246 insertions(+), 337 deletions(-)
create mode 100644 include/linux/bpf_preload.h
delete mode 100644 kernel/bpf/preload/bpf_preload.h
create mode 100644 tools/testing/selftests/bpf/bpf_testmod_preload/.gitignore
create mode 100644 tools/testing/selftests/bpf/bpf_testmod_preload/Makefile
create mode 100644 tools/testing/selftests/bpf/prog_tests/gen_preload_methods.expected.diff
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_gen_preload_methods.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_preload_methods.c
create mode 100644 tools/testing/selftests/bpf/progs/gen_preload_methods.c
--
2.32.0
There are some issues in parse_num_list():
1. The end variable is assigned twice when parsing_end is true.
2. The function does not check that parsing_end should finally be false.
Clean up parse_num_list() and fix these issues.
Signed-off-by: Yuntao Wang <ytcoode(a)gmail.com>
---
tools/testing/selftests/bpf/testing_helpers.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/bpf/testing_helpers.c b/tools/testing/selftests/bpf/testing_helpers.c
index 795b6798ccee..82f0e2d99c23 100644
--- a/tools/testing/selftests/bpf/testing_helpers.c
+++ b/tools/testing/selftests/bpf/testing_helpers.c
@@ -20,16 +20,16 @@ int parse_num_list(const char *s, bool **num_set, int *num_set_len)
if (errno)
return -errno;
- if (parsing_end)
- end = num;
- else
+ if (!parsing_end) {
start = num;
+ if (*next == '-') {
+ s = next + 1;
+ parsing_end = true;
+ continue;
+ }
+ }
- if (!parsing_end && *next == '-') {
- s = next + 1;
- parsing_end = true;
- continue;
- } else if (*next == ',') {
+ if (*next == ',') {
parsing_end = false;
s = next + 1;
end = num;
@@ -60,7 +60,7 @@ int parse_num_list(const char *s, bool **num_set, int *num_set_len)
set[i] = true;
}
- if (!set)
+ if (!set || parsing_end)
return -EINVAL;
*num_set = set;
--
2.35.1
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (5):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
binder: Add a buffer flag to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: Use the GPU cgroup charge/uncharge APIs
binder: use __kernel_pid_t and __kernel_uid_t for userspace
selftests: Add binder cgroup gpu memory transfer test
Documentation/gpu/rfc/gpu-cgroup.rst | 183 +++++++
Documentation/gpu/rfc/index.rst | 4 +
drivers/android/binder.c | 26 +
drivers/dma-buf/dma-buf.c | 107 ++++
drivers/dma-buf/dma-heap.c | 27 +
drivers/dma-buf/heaps/system_heap.c | 3 +
include/linux/cgroup_gpu.h | 139 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 22 +-
include/linux/dma-heap.h | 11 +
include/uapi/linux/android/binder.h | 5 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 362 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 254 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 484 ++++++++++++++++++
19 files changed, 1679 insertions(+), 4 deletions(-)
create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.35.1.1021.g381101b075-goog