Recently, a few issues have been discovered around the creation of
additional subflows. Without these counters, it was difficult to point
out the reason why some subflows were not created as expected.
In patch 3, all error paths from __mptcp_subflow_connect() are covered,
except the one related to the 'fully established mode', because it can
only happen with the userspace PM, which will propagate the error to the
userspace in this case (ENOTCONN).
These new counters are also verified in the MPTCP Join selftest in patch
6.
While at it, a few other patches are improving the MPTCP path-manager
code ...
- Patch 1: 'flush' related helpers are renamed to avoid confusions
- Patch 2: directly pass known ID and flags to create a new subflow,
i/o getting them later by iterating over all endpoints again
... and the MPJoin selftests:
- Patch 4: reduce the number of positional parameters
- Patch 5: only one line for the 'join' checks, instead of 3
- Patch 7: more explicit check names, instead of sometimes too cryptic
ones: rtx, ptx, ftx, ctx, fclzrx, sum
- Patch 8: specify client/server instead of 'invert' for some checks
not suggesting one specific direction
- Patch 9: mute errors of mptcp_connect when ran in the background
- Patch 10: simplify checksum_tests by using a for-loop
- Patch 11: remove 'define' re-definitions
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (1):
selftests: mptcp: join: simplify checksum_tests
Matthieu Baerts (NGI0) (10):
mptcp: pm: rename helpers linked to 'flush'
mptcp: pm: reduce entries iterations on connect
mptcp: MIB counters for sent MP_JOIN
selftests: mptcp: join: reduce join_nr params
selftests: mptcp: join: one line for join check
selftests: mptcp: join: validate MPJ SYN TX MIB counters
selftests: mptcp: join: more explicit check name
selftests: mptcp: join: specify host being checked
selftests: mptcp: join: mute errors when ran in the background
selftests: mptcp: pm_nl_ctl: remove re-definition
net/mptcp/mib.c | 4 +
net/mptcp/mib.h | 4 +
net/mptcp/pm.c | 11 -
net/mptcp/pm_netlink.c | 78 ++----
net/mptcp/pm_userspace.c | 40 +--
net/mptcp/protocol.h | 16 +-
net/mptcp/subflow.c | 50 +++-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 350 ++++++++++++++----------
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 10 +-
9 files changed, 309 insertions(+), 254 deletions(-)
---
base-commit: 221f9cce949ac8042f65b71ed1fde13b99073256
change-id: 20240902-net-next-mptcp-mib-mpjtx-misc-d80298438016
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Previous patch series[1][2] changes a mmap behavior that treats the hint
address as the upper bound of the mmap address range. The motivation of the
previous patch series is that some user space software may assume 48-bit
address space and use higher bits to encode some information, which may
collide with large virtual address space mmap may return. However, to make
sv48 by default, we don't need to change the meaning of the hint address on
mmap as the upper bound of the mmap address range. This behavior breaks
some user space software like Chromium that gets ENOMEM error when the hint
address + size is not big enough, as specified in [3].
Other ISAs with larger than 48-bit virtual address space like x86, arm64,
and powerpc do not have this special mmap behavior on hint address. They
all just make 48-bit / 47-bit virtual address space by default, and if a
user space software wants to large virtual address space, it only need to
specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series change mmap to use sv48 by default but does not
treat the hint address as the upper bound of the mmap address range. After
this patch, the behavior of mmap will align with existing behavior on other
ISAs with larger than 48-bit virtual address space like x86, arm64, and
powerpc. The user space software will no longer need to rewrite their code
to fit with this special mmap behavior only on RISC-V.
Note: Charlie also created another series [4] to completely remove the
arch_get_mmap_end and arch_get_mmap_base behavior based on the hint address
and size. However, this will cause programs like Go and Java, which need to
store information in the higher bits of the pointer, to fail on Sv57
machines.
Changes in v3:
- Rebase to newest master
- Changes some information in cover letter after patchset [2]
- Use patch [5] to patch selftests
- Link to v2: https://lore.kernel.org/linux-riscv/tencent_B2D0435BC011135736262764B511994…
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8B…
[1] https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosin…
[2] https://lore.kernel.org/linux-riscv/20240130-use_mmap_hint_address-v3-0-8a6…
[3] https://lore.kernel.org/linux-riscv/MEYP282MB2312A08FF95D44014AB78411C68D2@…
[4] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-0-cd8962afe47f@r…
[5] https://lore.kernel.org/linux-riscv/20240826-riscv_mmap-v1-2-cd8962afe47f@r…
Charlie Jenkins (1):
riscv: selftests: Remove mmap hint address checks
Yangyu Chen (2):
RISC-V: mm: not use hint addr as upper bound
Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 43 ++++++++----
arch/riscv/include/asm/processor.h | 20 ++----
.../selftests/riscv/mm/mmap_bottomup.c | 2 -
.../testing/selftests/riscv/mm/mmap_default.c | 2 -
tools/testing/selftests/riscv/mm/mmap_test.h | 67 -------------------
5 files changed, 36 insertions(+), 98 deletions(-)
--
2.45.2
This series first generalizes resctrl selftest non-contiguous CAT check
to not assume non-AMD vendor implies Intel. Second, it improves
kselftest common parts and resctrl selftest such that the use of
__cpuid_count() does not lead into a build failure (happens at least on
ARM).
While ARM does not currently support resctrl features, there's an
ongoing work to enable resctrl support also for it on the kernel side.
In any case, a common header such as kselftest.h should have a proper
fallback in place for what it provides, thus it seems justified to fix
this common level problem on the common level rather than e.g.
disabling build for resctrl selftest for archs lacking resctrl support.
I've dropped reviewed and tested by tags from the last patch due to
major changes into the makefile logic. So it would be helpful if
Muhammad could retest with this version.
v3:
- Remove "empty" wording
- Also cast input parameters to void
- Initialize ARCH from uname -m if not set (this might allow cleaning
up some other makefiles but that is left as future work)
v2:
- Removed RFC from the last patch & added Fixes and tags
- Fixed the error message's line splits
- Noted down the reason for void casts in the stub
Ilpo Järvinen (3):
selftests/resctrl: Generalize non-contiguous CAT check
selftests/resctrl: Always initialize ecx to avoid build warnings
kselftest: Provide __cpuid_count() stub on non-x86 archs
tools/testing/selftests/kselftest.h | 6 +++++
tools/testing/selftests/lib.mk | 6 +++++
tools/testing/selftests/resctrl/cat_test.c | 28 +++++++++++++---------
3 files changed, 29 insertions(+), 11 deletions(-)
--
2.39.2
This is something that I've been thinking about for a while. We had a
discussion at LPC 2020 about this[1] but the proposals suggested there
never materialised.
In short, it is quite difficult for userspace to detect the feature
capability of syscalls at runtime. This is something a lot of programs
want to do, but they are forced to create elaborate scenarios to try to
figure out if a feature is supported without causing damage to the
system. For the vast majority of cases, each individual feature also
needs to be tested individually (because syscall results are
all-or-nothing), so testing even a single syscall's feature set can
easily inflate the startup time of programs.
This patchset implements the fairly minimal design I proposed in this
talk[2] and in some old LKML threads (though I can't find the exact
references ATM). The general flow looks like:
1. Userspace will indicate to the kernel that a syscall should a be
no-op by setting the top bit of the extensible struct size argument.
We will almost certainly never support exabyte sized structs, so the
top bits are free for us to use as makeshift flag bits. This is
preferable to using the per-syscall flag field inside the structure
because seccomp can easily detect the bit in the flag and allow the
probe or forcefully return -EEXTSYS_NOOP.
2. The kernel will then fill the provided structure with every valid
bit pattern that the current kernel understands.
For flags or other bitflag-like fields, this is the set of valid
flags or bits. For pointer fields or fields that take an arbitrary
value, the field has every bit set (0xFF... to fill the field) to
indicate that any value is valid in the field.
3. The syscall then returns -EEXTSYS_NOOP which is an errno that will
only ever be used for this purpose (so userspace can be sure that
the request succeeded).
On older kernels, the syscall will return a different error (usually
-E2BIG or -EFAULT) and userspace can do their old-fashioned checks.
4. Userspace can then check which flags and fields are supported by
looking at the fields in the returned structure. Flags are checked
by doing an AND with the flags field, and field support can checked
by comparing to 0. In principle you could just AND the entire
structure if you wanted to do this check generically without caring
about the structure contents (this is what libraries might consider
doing).
Userspace can even find out the internal kernel structure size by
passing a PAGE_SIZE buffer and seeing how many bytes are non-zero.
As with copy_struct_from_user(), this is designed to be forward- and
backwards- compatible.
This allows programas to get a one-shot understanding of what features a
syscall supports without having to do any elaborate setups or tricks to
detect support for destructive features. Flags can simply be ANDed to
check if they are in the supported set, and fields can just be checked
to see if they are non-zero.
This patchset is IMHO the simplest way we can add the ability to
introspect the feature set of extensible struct (copy_struct_from_user)
syscalls. It doesn't preclude the chance of a more generic mechanism
being added later.
The intended way of using this interface to get feature information
looks something like the following (imagine that openat2 has gained a
new field and a new flag in the future):
static bool openat2_no_automount_supported;
static bool openat2_cwd_fd_supported;
int check_openat2_support(void)
{
int err;
struct open_how how = {};
err = openat2(AT_FDCWD, ".", &how, CHECK_FIELDS | sizeof(how));
assert(err < 0);
switch (errno) {
case EFAULT: case E2BIG:
/* Old kernel... */
check_support_the_old_way();
break;
case EEXTSYS_NOOP:
openat2_no_automount_supported = (how.flags & RESOLVE_NO_AUTOMOUNT);
openat2_cwd_fd_supported = (how.cwd_fd != 0);
break;
}
}
[1]: https://lwn.net/Articles/830666/
[2]: https://youtu.be/ggD-eb3yPVs
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Aleksa Sarai (8):
uaccess: add copy_struct_to_user helper
sched_getattr: port to copy_struct_to_user
openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
openat2: add CHECK_FIELDS flag to usize argument
clone3: add CHECK_FIELDS flag to usize argument
selftests: openat2: add 0xFF poisoned data after misaligned struct
selftests: openat2: add CHECK_FIELDS selftests
selftests: clone3: add CHECK_FIELDS selftests
fs/open.c | 17 ++
include/linux/uaccess.h | 98 +++++++++
include/uapi/asm-generic/errno.h | 3 +
include/uapi/linux/openat2.h | 2 +
kernel/fork.c | 33 ++-
kernel/sched/syscalls.c | 42 +---
tools/testing/selftests/clone3/.gitignore | 1 +
tools/testing/selftests/clone3/Makefile | 2 +-
.../testing/selftests/clone3/clone3_check_fields.c | 229 +++++++++++++++++++++
tools/testing/selftests/openat2/openat2_test.c | 126 +++++++++++-
10 files changed, 504 insertions(+), 49 deletions(-)
---
base-commit: 431c1646e1f86b949fa3685efc50b660a364c2b6
change-id: 20240803-extensible-structs-check_fields-a47e94cef691
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
From: Yuan Chen <chenyuan(a)kylinos.cn>
When the PROCMAP_QUERY is not defined, a compilation error occurs due to the
mismatch of the procmap_query()'s params, procmap_query() only be called in
the file where the function is defined, modify the params so they can match.
We get a warning when build samples/bpf:
trace_helpers.c:252:5: warning: no previous prototype for ‘procmap_query’ [-Wmissing-prototypes]
252 | int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
| ^~~~~~~~~~~~~
As this function is only used in the file, mark it as 'static'.
Signed-off-by: Yuan Chen <chenyuan(a)kylinos.cn>
---
tools/testing/selftests/bpf/trace_helpers.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 1bfd881c0e07..2d742fdac6b9 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -249,7 +249,7 @@ int kallsyms_find(const char *sym, unsigned long long *addr)
#ifdef PROCMAP_QUERY
int env_verbosity __weak = 0;
-int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
+static int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
{
char path_buf[PATH_MAX], build_id_buf[20];
struct procmap_query q;
@@ -293,7 +293,7 @@ int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, si
return 0;
}
#else
-int procmap_query(int fd, const void *addr, size_t *start, size_t *offset, int *flags)
+static int procmap_query(int fd, const void *addr, __u32 query_flags, size_t *start, size_t *offset, int *flags)
{
return -EOPNOTSUPP;
}
--
2.46.0