This series attempts to provide a simple way for BPF programs (and in
future other consumers) to utilize BPF Type Format (BTF) information
to display kernel data structures in-kernel. The use case this
functionality is applied to here is to support a snprintf()-like
helper to copy a BTF representation of kernel data to a string,
and a BPF seq file helper to display BTF data for an iterator.
There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; rather than seq_printf()ing the show data, it is copied
to a supplied string using a snprintf()-like function. Other future
consumers of the show functionality could include a bpf_printk_btf()
function which printk()ed the data instead. Oops messaging in
particular would be an interesting application for such functionality.
The above potential use case hints at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in-kernel tracing records
data and the userspace program prints it out. While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also. Critically in BPF programs it greatly simplifies
debugging and tracing of such data to invoking a simple
helper.
One challenge raised in an earlier iteration of this work -
where the BTF printing was implemented as a printk() format
specifier - was that the amount of data printed per
printk() was large, and other format specifiers were far
simpler. Here we sidestep that concern by printing
components of the BTF representation as we go for the
seq file case, and in the string case the snprintf()-like
operation is intended to be a basis for perf event or
ringbuf output. The reasons for avoiding bpf_trace_printk
are that
1. bpf_trace_printk() strings are restricted in size and
cannot display anything beyond trivial data structures; and
2. bpf_trace_printk() is for debugging purposes only.
As Alexei suggested, a bpf_trace_puts() helper could solve
this in the future but it still would be limited by the
1000 byte limit for traced strings.
Default output for an sk_buff looks like this (zeroed fields
are omitted):
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}
Flags can modify aspects of output format; see patch 3
for more details.
Changes since v4:
- Changed approach from a BPF trace event-centric design to one
utilizing a snprintf()-like helper and an iter helper (Alexei,
patches 3,5)
- Added tests to verify BTF output (patch 4)
- Added support to tests for verifying BTF type_id-based display
as well as type name via __builtin_btf_type_id (Andrii, patch 4).
- Augmented task iter tests to cover the BTF-based seq helper.
Because a task_struct's BTF-based representation would overflow
the PAGE_SIZE limit on iterator data, the "struct fs_struct"
(task->fs) is displayed for each task instead (Alexei, patch 6).
Changes since v3:
- Moved to RFC since the approach is different (and bpf-next is
closed)
- Rather than using a printk() format specifier as the means
of invoking BTF-enabled display, a dedicated BPF helper is
used. This solves the issue of printk() having to output
large amounts of data using a complex mechanism such as
BTF traversal, but still provides a way for the display of
such data to be achieved via BPF programs. Future work could
include a bpf_printk_btf() function to invoke display via
printk() where the elements of a data structure are printk()ed
one at a time. Thanks to Petr Mladek, Andy Shevchenko and
Rasmus Villemoes who took time to look at the earlier printk()
format-specifier-focused version of this and provided feedback
clarifying the problems with that approach.
- Added trace id to the bpf_trace_printk events as a means of
separating output from standard bpf_trace_printk() events,
ensuring it can be easily parsed by the reader.
- Added bpf_trace_btf() helper tests which do simple verification
of the various display options.
Changes since v2:
- Alexei and Yonghong suggested it would be good to use
probe_kernel_read() on to-be-shown data to ensure safety
during operation. Safe copy via probe_kernel_read() to a
buffer object in "struct btf_show" is used to support
this. A few different approaches were explored
including dynamic allocation and per-cpu buffers. The
downside of dynamic allocation is that it would be done
during BPF program execution for bpf_trace_printk()s using
%pT format specifiers. The problem with per-cpu buffers
is we'd have to manage preemption and since the display
of an object occurs over an extended period and in printk
context where we'd rather not change preemption status,
it seemed tricky to manage buffer safety while considering
preemption. The approach of utilizing stack buffer space
via the "struct btf_show" seemed like the simplest approach.
The stack size of the associated functions which have a
"struct btf_show" on their stack to support show operation
(btf_type_snprintf_show() and btf_type_seq_show()) stays
under 500 bytes. The compromise here is the safe buffer we
use is small - 256 bytes - and as a result multiple
probe_kernel_read()s are needed for larger objects. Most
objects of interest are smaller than this (e.g.
"struct sk_buff" is 224 bytes), and while task_struct is a
notable exception at ~8K, performance is not the priority for
BTF-based display. (Alexei and Yonghong, patch 2).
- safe buffer use is the default behaviour (and is mandatory
for BPF) but unsafe display - meaning no safe copy is done
and we operate on the object itself - is supported via a
'u' option.
- pointers are prefixed with 0x for clarity (Alexei, patch 2)
- added additional comments and explanations around BTF show
code, especially around determining whether objects such
zeroed. Also tried to comment safe object scheme used. (Yonghong,
patch 2)
- added late_initcall() to initialize vmlinux BTF so that it would
not have to be initialized during printk operation (Alexei,
patch 5)
- removed CONFIG_BTF_PRINTF config option as it is not needed;
CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
determining behaviour of type-based printk can be done via
retrieval of BTF data; if it's not there BTF was unavailable
or broken (Alexei, patches 4,6)
- fix bpf_trace_printk test to use vmlinux.h and globals via
skeleton infrastructure, removing need for perf events
(Andrii, patch 8)
Changes since v1:
- changed format to be more drgn-like, rendering indented type info
along with type names by default (Alexei)
- zeroed values are omitted (Arnaldo) by default unless the '0'
modifier is specified (Alexei)
- added an option to print pointer values without obfuscation.
The reason to do this is the sysctls controlling pointer display
are likely to be irrelevant in many if not most tracing contexts.
Some questions on this in the outstanding questions section below...
- reworked printk format specifer so that we no longer rely on format
%pT<type> but instead use a struct * which contains type information
(Rasmus). This simplifies the printk parsing, makes use more dynamic
and also allows specification by BTF id as well as name.
- removed incorrect patch which tried to fix dereferencing of resolved
BTF info for vmlinux; instead we skip modifiers for the relevant
case (array element type determination) (Alexei).
- fixed issues with negative snprintf format length (Rasmus)
- added test cases for various data structure formats; base types,
typedefs, structs, etc.
- tests now iterate through all typedef, enum, struct and unions
defined for vmlinux BTF and render a version of the target dummy
value which is either all zeros or all 0xff values; the idea is this
exercises the "skip if zero" and "print everything" cases.
- added support in BPF for using the %pT format specifier in
bpf_trace_printk()
- added BPF tests which ensure %pT format specifier use works (Alexei).
Alan Maguire (6):
bpf: provide function to get vmlinux BTF information
bpf: move to generic BTF show support, apply it to seq files/strings
bpf: add bpf_btf_snprintf helper
selftests/bpf: add bpf_btf_snprintf helper tests
bpf: add bpf_seq_btf_write helper
selftests/bpf: add test for bpf_seq_btf_write helper
include/linux/bpf.h | 3 +
include/linux/btf.h | 40 +
include/uapi/linux/bpf.h | 78 ++
kernel/bpf/btf.c | 978 ++++++++++++++++++---
kernel/bpf/helpers.c | 4 +
kernel/bpf/verifier.c | 18 +-
kernel/trace/bpf_trace.c | 133 +++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 78 ++
tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 66 ++
.../selftests/bpf/prog_tests/btf_snprintf.c | 55 ++
.../selftests/bpf/progs/bpf_iter_task_btf.c | 49 ++
.../selftests/bpf/progs/netif_receive_skb.c | 260 ++++++
13 files changed, 1656 insertions(+), 108 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_snprintf.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c
--
1.8.3.1
v1: https://lore.kernel.org/lkml/20200912110820.597135-1-keescook@chromium.org
v2:
- Took Acked patches into -next
- refactored powerpc syscall setting implementation
- refactored clone3 args implementation
Hi,
This finishes the refactoring of the seccomp selftest logic used in
for ptrace syscall number/return handling for powerpc. Additionally
fixes clone3 (which seccomp depends on for testing) to run under MIPS
where an old struct clone_args has become visible.
(FWIW, I expect to take these via the seccomp tree.)
Thanks,
Kees Cook (4):
selftests/seccomp: Record syscall during ptrace entry
selftests/seccomp: Allow syscall nr and ret value to be set separately
selftests/seccomp: powerpc: Set syscall return during ptrace syscall
exit
selftests/clone3: Avoid OS-defined clone_args
tools/testing/selftests/clone3/clone3.c | 45 +++----
.../clone3/clone3_cap_checkpoint_restore.c | 4 +-
.../selftests/clone3/clone3_clear_sighand.c | 2 +-
.../selftests/clone3/clone3_selftests.h | 24 ++--
.../testing/selftests/clone3/clone3_set_tid.c | 4 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 120 ++++++++++++++----
6 files changed, 131 insertions(+), 68 deletions(-)
--
2.25.1
Hi,
This refactors the seccomp selftest macros used in change_syscall(),
in an effort to remove special cases for mips, arm, arm64, and xtensa,
which paves the way for powerpc fixes.
I'm not entirely done testing, but all-arch build tests and x86_64
selftests pass. I'll be doing arm, arm64, and i386 selftests shortly,
but I currently don't have an easy way to check xtensa, mips, nor
powerpc. Any help there would be appreciated!
(FWIW, I expect to take these via the seccomp tree.)
Thanks,
-Kees
Kees Cook (15):
selftests/seccomp: Refactor arch register macros to avoid xtensa
special case
selftests/seccomp: Provide generic syscall setting macro
selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
selftests/seccomp: mips: Remove O32-specific macro
selftests/seccomp: Remove syscall setting #ifdefs
selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Avoid redundant register flushes
selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of
SYSCALL_RET_SET
selftests/seccomp: powerpc: Fix seccomp return value testing
selftests/seccomp: powerpc: Set syscall return during ptrace syscall
exit
selftests/clone3: Avoid OS-defined clone_args
selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
.../selftests/clone3/clone3_selftests.h | 16 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 313 ++++++++++--------
2 files changed, 184 insertions(+), 145 deletions(-)
--
2.25.1
This series imports a series of tests for FPSIMD and SVE originally
written by Dave Martin to the tree. Since these extensions have some
overlap in terms of register usage and must sometimes be tested together
they're dropped into a single directory. I've adapted some of the tests
to run within the kselftest framework but there are also some stress
tests here that are intended to be run as soak tests so aren't suitable
for running by default and are mostly just integrated with the build
system. There doesn't seem to be a more suitable home for those stress
tests and they are very useful for work on these areas of the code so it
seems useful to have them somewhere in tree.
v2: Rebased onto v5.9-rc1
Mark Brown (6):
selftests: arm64: Test case for enumeration of SVE vector lengths
selftests: arm64: Add test for the SVE ptrace interface
selftests: arm64: Add stress tests for FPSMID and SVE context
switching
selftests: arm64: Add utility to set SVE vector lengths
selftests: arm64: Add wrapper scripts for stress tests
selftests: arm64: Add build and documentation for FP tests
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/fp/.gitignore | 5 +
tools/testing/selftests/arm64/fp/Makefile | 17 +
tools/testing/selftests/arm64/fp/README | 100 +++
.../testing/selftests/arm64/fp/asm-offsets.h | 11 +
tools/testing/selftests/arm64/fp/assembler.h | 57 ++
.../testing/selftests/arm64/fp/fpsimd-stress | 60 ++
.../testing/selftests/arm64/fp/fpsimd-test.S | 482 +++++++++++++
.../selftests/arm64/fp/sve-probe-vls.c | 58 ++
.../selftests/arm64/fp/sve-ptrace-asm.S | 33 +
tools/testing/selftests/arm64/fp/sve-ptrace.c | 336 +++++++++
tools/testing/selftests/arm64/fp/sve-stress | 59 ++
tools/testing/selftests/arm64/fp/sve-test.S | 672 ++++++++++++++++++
tools/testing/selftests/arm64/fp/vlset.c | 155 ++++
14 files changed, 2046 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/fp/.gitignore
create mode 100644 tools/testing/selftests/arm64/fp/Makefile
create mode 100644 tools/testing/selftests/arm64/fp/README
create mode 100644 tools/testing/selftests/arm64/fp/asm-offsets.h
create mode 100644 tools/testing/selftests/arm64/fp/assembler.h
create mode 100755 tools/testing/selftests/arm64/fp/fpsimd-stress
create mode 100644 tools/testing/selftests/arm64/fp/fpsimd-test.S
create mode 100644 tools/testing/selftests/arm64/fp/sve-probe-vls.c
create mode 100644 tools/testing/selftests/arm64/fp/sve-ptrace-asm.S
create mode 100644 tools/testing/selftests/arm64/fp/sve-ptrace.c
create mode 100755 tools/testing/selftests/arm64/fp/sve-stress
create mode 100644 tools/testing/selftests/arm64/fp/sve-test.S
create mode 100644 tools/testing/selftests/arm64/fp/vlset.c
--
2.20.1
Pointer Authentication (PAuth) is a security feature introduced in ARMv8.3.
It introduces instructions to sign addresses and later check for potential
corruption using a second modifier value and one of a set of keys. The
signature, in the form of the Pointer Authentication Code (PAC), is stored
in some of the top unused bits of the virtual address (e.g. [54: 49] if
TBID0 is enabled and TnSZ is set to use a 48 bit VA space). A set of
controls are present to enable/disable groups of instructions (which use
certain keys) for compatibility with libraries that do not utilize the
feature. PAuth is used to verify the integrity of return addresses on the
stack with less memory than the stack canary.
This patchset adds kselftests to verify the kernel's configuration of the
feature and its runtime behaviour. There are 7 tests which verify that:
* an authentication failure leads to a SIGSEGV
* the data/instruction instruction groups are enabled
* the generic instructions are enabled
* all 5 keys are different for a single thread
* exec() changes all keys to new different ones
* context switching preserves the 4 data/instruction keys
* context switching preserves the generic keys
The tests have been verified to work on qemu without a working PAUTH
Implementation and on ARM's FVP with a full or partial PAuth
implementation.
Changes in v3:
* remove double blank lines
* Patch 1: "kselftests: add a basic arm64 Pointer Authentication test"
* shorten pac_corruptor.S to cut out unnecessary code
* add second signal handler to cover ARMv8.6 compatibily
* Patch 3: "kselftests/arm64: add PAuth test for whether exec() changes keys"
* change name of "exec_unique_keys" to "exec_changed_keys"
* change reporting of error to be how many keys were left unchanged
* Path 4: "kselftests/arm64: add PAuth tests for single threaded consistency and key uniqueness"
* change unique to different
* rename "single_thread_unique_keys" to "single_thread_different_keys"
* change reporting of error to be how many keys were left unchanged
Changes in v2:
* remove extra lines at end of files
* Patch 1: "kselftests: add a basic arm64 Pointer Authentication test"
* add checks for a compatible compiler in Makefile
* Patch 4: "kselftests: add PAuth tests for single threaded consistency and
key uniqueness"
* rephrase comment for clarity in pac.c
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino(a)arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap(a)arm.com>
Signed-off-by: Boyan Karatotev <boyan.karatotev(a)arm.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Boyan Karatotev (4):
kselftests/arm64: add a basic Pointer Authentication test
kselftests/arm64: add nop checks for PAuth tests
kselftests/arm64: add PAuth test for whether exec() changes keys
kselftests/arm64: add PAuth tests for single threaded consistency and
differently initialized keys
tools/testing/selftests/arm64/Makefile | 2 +-
.../testing/selftests/arm64/pauth/.gitignore | 2 +
tools/testing/selftests/arm64/pauth/Makefile | 39 ++
.../selftests/arm64/pauth/exec_target.c | 34 ++
tools/testing/selftests/arm64/pauth/helper.c | 39 ++
tools/testing/selftests/arm64/pauth/helper.h | 28 ++
tools/testing/selftests/arm64/pauth/pac.c | 368 ++++++++++++++++++
.../selftests/arm64/pauth/pac_corruptor.S | 19 +
8 files changed, 530 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/pauth/.gitignore
create mode 100644 tools/testing/selftests/arm64/pauth/Makefile
create mode 100644 tools/testing/selftests/arm64/pauth/exec_target.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.h
create mode 100644 tools/testing/selftests/arm64/pauth/pac.c
create mode 100644 tools/testing/selftests/arm64/pauth/pac_corruptor.S
--
2.28.0
The test harness forks() a child to run each test. Both the parent and
the child print to stdout using libc functions. That can lead to
duplicated (or more) output if the libc buffers are not flushed before
forking.
It's generally not seen when running programs directly, because stdout
will usually be line buffered when it's pointing to a terminal.
This was noticed when running the seccomp_bpf test, eg:
$ ./seccomp_bpf | tee test.log
$ grep -c "TAP version 13" test.log
2
But we only expect the TAP header to appear once.
It can be exacerbated using stdbuf to increase the buffer size:
$ stdbuf -o 1MB ./seccomp_bpf > test.log
$ grep -c "TAP version 13" test.log
13
The fix is simple, we just flush stdout & stderr before fork. Usually
stderr is unbuffered, but that can be changed, so flush it as well
just to be safe.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/kselftest_harness.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 4f78e4805633..f19804df244c 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -971,6 +971,11 @@ void __run_test(struct __fixture_metadata *f,
ksft_print_msg(" RUN %s%s%s.%s ...\n",
f->name, variant->name[0] ? "." : "", variant->name, t->name);
+
+ /* Make sure output buffers are flushed before fork */
+ fflush(stdout);
+ fflush(stderr);
+
t->pid = fork();
if (t->pid < 0) {
ksft_print_msg("ERROR SPAWNING TEST CHILD\n");
--
2.25.1
Previously it was not possible to make a distinction between plain TCP
sockets and MPTCP subflow sockets on the BPF_PROG_TYPE_SOCK_OPS hook.
This patch series now enables a fine control of subflow sockets. In its
current state, it allows to put different sockopt on each subflow from a
same MPTCP connection (socket mark, TCP congestion algorithm, ...) using
BPF programs.
It should also be the basis of exposing MPTCP-specific fields through BPF.
v2 -> v3:
- minor modifications in new MPTCP selftests (Song). More details in patch notes.
- rebase on latest bpf-next
- the new is_mptcp field in bpf_tcp_sock is left as an __u32 to keep cohesion
with the is_fullsock field from bpf_sock_ops. Also it seems easier with a __u32
on the verifier side.
v1 -> v2:
- add basic mandatory selftests for the new helper and is_mptcp field (Alexei)
- rebase on latest bpf-next
Nicolas Rybowski (5):
bpf: expose is_mptcp flag to bpf_tcp_sock
mptcp: attach subflow socket to parent cgroup
bpf: add 'bpf_mptcp_sock' structure and helper
bpf: selftests: add MPTCP test base
bpf: selftests: add bpf_mptcp_sock() verifier tests
include/linux/bpf.h | 33 +++++
include/uapi/linux/bpf.h | 15 +++
kernel/bpf/verifier.c | 30 +++++
net/core/filter.c | 13 +-
net/mptcp/Makefile | 2 +
net/mptcp/bpf.c | 72 +++++++++++
net/mptcp/subflow.c | 27 ++++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 15 +++
tools/testing/selftests/bpf/config | 1 +
tools/testing/selftests/bpf/network_helpers.c | 37 +++++-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../testing/selftests/bpf/prog_tests/mptcp.c | 118 ++++++++++++++++++
tools/testing/selftests/bpf/progs/mptcp.c | 48 +++++++
tools/testing/selftests/bpf/verifier/sock.c | 63 ++++++++++
15 files changed, 473 insertions(+), 6 deletions(-)
create mode 100644 net/mptcp/bpf.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/mptcp.c
create mode 100644 tools/testing/selftests/bpf/progs/mptcp.c
--
2.28.0