From: Christoph Müllner <christoph.muellner(a)vrull.eu>
The upcoming RISC-V Ssdtso specification introduces a bit in the senvcfg
CSR to switch the memory consistency model at run-time from RVWMO to TSO
(and back). The active consistency model can therefore be switched on a
per-hart base and managed by the kernel on a per-process/thread base.
This patch implements basic Ssdtso support and adds a prctl API on top
so that user-space processes can switch to a stronger memory consistency
model (than the kernel was written for) at run-time.
I am not sure if other architectures support switching the memory
consistency model at run-time, but designing the prctl API in an
arch-independent way allows reusing it in the future.
The patchset also comes with a short documentation of the prctl API.
This series is based on the second draft of the Ssdtso specification
which was published recently on an RVI list:
https://lists.riscv.org/g/tech-arch-review/message/183
Note, that the Ssdtso specification is in development state
(i.e., not frozen or even ratified) which is also the reason
why I marked the series as RFC.
One aspect that is not covered in this patchset is virtualization.
It is planned to add virtualization support in a later version.
Hints/suggestions on how to implement this part are very much
appreciated.
Christoph Müllner (5):
RISC-V: Add basic Ssdtso support
RISC-V: Expose Ssdtso via hwprobe API
uapi: prctl: Add new prctl call to set/get the memory consistency
model
RISC-V: Implement prctl call to set/get the memory consistency model
RISC-V: selftests: Add DTSO tests
Documentation/arch/riscv/hwprobe.rst | 3 +
.../mm/dynamic-memory-consistency-model.rst | 76 ++++++++++++++++++
arch/riscv/Kconfig | 10 +++
arch/riscv/include/asm/csr.h | 1 +
arch/riscv/include/asm/dtso.h | 74 ++++++++++++++++++
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/processor.h | 8 ++
arch/riscv/include/asm/switch_to.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/dtso.c | 33 ++++++++
arch/riscv/kernel/process.c | 4 +
arch/riscv/kernel/sys_riscv.c | 1 +
include/uapi/linux/prctl.h | 5 ++
kernel/sys.c | 12 +++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/dtso/.gitignore | 1 +
tools/testing/selftests/riscv/dtso/Makefile | 11 +++
tools/testing/selftests/riscv/dtso/dtso.c | 77 +++++++++++++++++++
20 files changed, 324 insertions(+), 1 deletion(-)
create mode 100644 Documentation/mm/dynamic-memory-consistency-model.rst
create mode 100644 arch/riscv/include/asm/dtso.h
create mode 100644 arch/riscv/kernel/dtso.c
create mode 100644 tools/testing/selftests/riscv/dtso/.gitignore
create mode 100644 tools/testing/selftests/riscv/dtso/Makefile
create mode 100644 tools/testing/selftests/riscv/dtso/dtso.c
--
2.41.0
As Guillaume pointed, many selftests create namespaces with very common
names (like "client" or "server") or even (partially) run directly in init_net.
This makes these tests prone to failure if another namespace with the same
name already exists. It also makes it impossible to run several instances
of these tests in parallel.
This patch set conver all the net selftests to run in unique namespace,
so we can update the selftest freamwork to run all tests in it's own namespace
in parallel. After update, we only need to wait for the test which need
longest time.
]# per_test_logging=1 time ./run_kselftest.sh -n -c net
TAP version 13
# selftests: net: reuseport_bpf_numa
not ok 3 selftests: net: reuseport_bpf_numa # exit=1
# selftests: net: reuseport_bpf_cpu
not ok 2 selftests: net: reuseport_bpf_cpu # exit=1
# selftests: net: reuseport_dualstack
not ok 4 selftests: net: reuseport_dualstack # exit=1
# selftests: net: reuseaddr_conflict
ok 5 selftests: net: reuseaddr_conflict
...
# selftests: net: test_vxlan_mdb.sh
ok 90 selftests: net: test_vxlan_mdb.sh
# selftests: net: fib_nexthops.sh
not ok 41 selftests: net: fib_nexthops.sh # exit=1
# selftests: net: fcnal-test.sh
not ok 36 selftests: net: fcnal-test.sh # exit=1
real 55m1.238s
user 12m10.350s
sys 22m17.432s
Hangbin Liu (38):
selftests/net: add lib.sh
selftests/net: arp_ndisc_evict_nocarrier.sh convert to run test in
unique namespace
selftest: arp_ndisc_untracked_subnets.sh convert to run test in unique
namespace
selftests/net: convert cmsg tests to make them run in unique namespace
selftests/net: convert drop_monitor_tests.sh to run it in unique
namespace
selftests/net: convert fcnal-test.sh to run it in unique namespace
selftests/net: convert fib_nexthop_multiprefix to run it in unique
namespace
selftests/net: convert fib_nexthop_nongw.sh to run it in unique
namespace
selftests/net: convert fib_nexthops.sh to run it in unique namespace
selftests/net: convert fib-onlink-tests.sh to run it in unique
namespace
selftests/net: convert fib_rule_tests.sh to run it in unique namespace
selftests/net: convert fib_tests.sh to run it in unique namespace
selftests/net: convert gre_gso.sh to run it in unique namespace
selftests/net: convert icmp_redirect.sh to run it in unique namespace
sleftests/net: convert icmp.sh to run it in unique namespace
selftests/net: convert ioam6.sh to run it in unique namespace
selftests/net: convert l2tp.sh to run it in unique namespace
selftests/net: convert ndisc_unsolicited_na_test.sh to run it in
unique namespace
selftests/net: convert netns-name.sh to run it in unique namespace
selftests/net: convert fdb_flush.sh to run it in unique namespace
selftests/net: convert rtnetlink.sh to run it in unique namespace
selftests/net: convert sctp_vrf.sh to run it in unique namespace
selftests/net: use unique netns name for setup_loopback.sh
setup_veth.sh
selftests/net: convert stress_reuseport_listen.sh to run it in unique
namespace
selftests/net: convert test_bridge_backup_port.sh to run it in unique
namespace
selftests/net: convert test_bridge_neigh_suppress.sh to run it in
unique namespace
selftests/net: convert test_vxlan_mdb.sh to run it in unique namespace
selftests/net: convert test_vxlan_nolocalbypass.sh to run it in unique
namespace
selftests/net: convert test_vxlan_under_vrf.sh to run it in unique
namespace
selftests/net: convert test_vxlan_vnifiltering.sh to run it in unique
namespace
selftests/net: convert toeplitz.sh to run it in unique namespace
selftests/net: convert unicast_extensions.sh to run it in unique
namespace
selftests/net: convert vrf_route_leaking.sh to run it in unique
namespace
selftests/net: convert vrf_strict_mode_test.sh to run it in unique
namespace
selftests/net: convert vrf-xfrm-tests.sh to run it in unique namespace
selftests/net: convert traceroute.sh to run it in unique namespace
selftests/net: convert xfrm_policy.sh to run it in unique namespace
kselftest/runner.sh: add netns support
tools/testing/selftests/kselftest/runner.sh | 26 +-
tools/testing/selftests/net/Makefile | 2 +-
.../net/arp_ndisc_evict_nocarrier.sh | 46 +--
.../net/arp_ndisc_untracked_subnets.sh | 18 +-
tools/testing/selftests/net/cmsg_ipv6.sh | 10 +-
tools/testing/selftests/net/cmsg_so_mark.sh | 7 +-
tools/testing/selftests/net/cmsg_time.sh | 7 +-
.../selftests/net/drop_monitor_tests.sh | 21 +-
tools/testing/selftests/net/fcnal-test.sh | 30 +-
tools/testing/selftests/net/fdb_flush.sh | 11 +-
.../testing/selftests/net/fib-onlink-tests.sh | 7 +-
.../selftests/net/fib_nexthop_multiprefix.sh | 104 +++--
.../selftests/net/fib_nexthop_nongw.sh | 34 +-
tools/testing/selftests/net/fib_nexthops.sh | 142 ++++---
tools/testing/selftests/net/fib_rule_tests.sh | 36 +-
tools/testing/selftests/net/fib_tests.sh | 184 +++++----
tools/testing/selftests/net/gre_gso.sh | 18 +-
tools/testing/selftests/net/icmp.sh | 10 +-
tools/testing/selftests/net/icmp_redirect.sh | 182 +++++----
tools/testing/selftests/net/ioam6.sh | 247 ++++++------
tools/testing/selftests/net/l2tp.sh | 130 +++----
tools/testing/selftests/net/lib.sh | 98 +++++
.../net/ndisc_unsolicited_na_test.sh | 19 +-
tools/testing/selftests/net/netns-name.sh | 44 +--
tools/testing/selftests/net/rtnetlink.sh | 21 +-
tools/testing/selftests/net/sctp_vrf.sh | 12 +-
tools/testing/selftests/net/settings | 2 +-
tools/testing/selftests/net/setup_loopback.sh | 8 +-
tools/testing/selftests/net/setup_veth.sh | 9 +-
.../selftests/net/stress_reuseport_listen.sh | 6 +-
.../selftests/net/test_bridge_backup_port.sh | 368 +++++++++---------
.../net/test_bridge_neigh_suppress.sh | 333 ++++++++--------
tools/testing/selftests/net/test_vxlan_mdb.sh | 202 +++++-----
.../selftests/net/test_vxlan_nolocalbypass.sh | 48 ++-
.../selftests/net/test_vxlan_under_vrf.sh | 70 ++--
.../selftests/net/test_vxlan_vnifiltering.sh | 154 +++++---
tools/testing/selftests/net/toeplitz.sh | 16 +-
tools/testing/selftests/net/traceroute.sh | 82 ++--
.../selftests/net/unicast_extensions.sh | 99 +++--
tools/testing/selftests/net/vrf-xfrm-tests.sh | 77 ++--
.../selftests/net/vrf_route_leaking.sh | 201 +++++-----
.../selftests/net/vrf_strict_mode_test.sh | 47 ++-
tools/testing/selftests/net/xfrm_policy.sh | 138 +++----
tools/testing/selftests/run_kselftest.sh | 4 +
44 files changed, 1676 insertions(+), 1654 deletions(-)
create mode 100644 tools/testing/selftests/net/lib.sh
--
2.41.0
Hi,
On Mon, Nov 27, 2023 at 11:49:16AM +0000, Felix Huettner wrote:
> conntrack zones are heavily used by tools like openvswitch to run
> multiple virtual "routers" on a single machine. In this context each
> conntrack zone matches to a single router, thereby preventing
> overlapping IPs from becoming issues.
> In these systems it is common to operate on all conntrack entries of a
> given zone, e.g. to delete them when a router is deleted. Previously this
> required these tools to dump the full conntrack table and filter out the
> relevant entries in userspace potentially causing performance issues.
>
> To do this we reuse the existing CTA_ZONE attribute. This was previous
> parsed but not used during dump and flush requests. Now if CTA_ZONE is
> set we filter these operations based on the provided zone.
> However this means that users that previously passed CTA_ZONE will
> experience a difference in functionality.
>
> Alternatively CTA_FILTER could have been used for the same
> functionality. However it is not yet supported during flush requests and
> is only available when using AF_INET or AF_INET6.
You mean, AF_UNSPEC cannot be specified in CTA_FILTER?
Please, extend libnetfilter_conntrack to support for this feature,
there is a filter API that can be used for this purpose.
Thanks.
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
> > I think ARM64 approached this problem by adding the
> > load-acquire/store-release instructions and for TSO based code,
> > translate into those (eg. x86 -> arm64 transpilers).
>
>
> Although those instructions have a bit more ordering constraints.
>
> I have heard rumors that the apple chips also have a register that can be
> set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies.
But to be fair, I haven't been paying *that* much attention to the apple
stuff.
I did read about how they fudged some of the x86 flags thing.
> And there are some IBM machines that have a setting, but not sure how it is
> controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I
wasn't aware they had this.
> > IIRC Risc-V actually has such instructions as well, so *why* are you
> > doing this?!?!
>
>
> Unfortunately, at least last time I checked RISC-V still hadn't gotten such
> instructions.
> What they have is the *semantics* of the instructions, but no actual opcodes
> to encode them.
Well, that sucks..
> I argued for them in the RISC-V memory group, but it was considered to be
> outside the scope of that group.
>
> Transpiling with sufficient DMB ISH to get the desired ordering is really
> bad for performance.
Ha!, quite dreadful I would imagine.
> That is not to say that linux should support this. Perhaps linux should
> pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like
a plan :-)
The series adds support for setrlimit/getrlimit.
Mainly to avoid spurious coredumps when running the tests under
qemu-user.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
tools/nolibc: drop custom definition of struct rusage
tools/nolibc: add support for getrlimit/setrlimit
selftests/nolibc: disable coredump via setrlimit
tools/include/nolibc/sys.h | 38 ++++++++++++++++++++++++++++
tools/include/nolibc/types.h | 21 +--------------
tools/testing/selftests/nolibc/nolibc-test.c | 31 +++++++++++++++++++++++
3 files changed, 70 insertions(+), 20 deletions(-)
---
base-commit: 0dbd4651f3f80151910a36416fa0df28a10c3b0a
change-id: 20231122-nolibc-rlimit-bb5b1f264fc4
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
In public cloud scenario, if kdump service works abnormally,
users cannot get vmcore. Without vmcore, user has no idea why the
kernel crashed. Meanwhile, there is no additional information
to find the reason why the kdump service is abnormal.
One way is to obtain console messages through VNC. The drawback
is that VNC is real-time, if user missed the timing to get the VNC
output, the crash needs to be retriggered.
Another way is to enable the console frontend of pstore and record the
console messages to the pstore backend. On the one hand, the console
logs only contain kernel printk logs and does not cover
user-mode print logs. Although we can redirect user-mode logs to the
pmsg frontend provided by pstore, user-mode information related to
booting and kdump service vary from systemd, kdump.sh, and so on which
makes redirection troublesome. So we added a tty frontend and save all
logs of tty driver to the pstore backend.
Another problem is that currently pstore only supports a single backend.
For debugging kdump problems, we hope to save the console logs and tty
logs to the ramoops backend of pstore, as it will not be lost after
rebooting. If the user has enabled another backend, the ramoops backend
will not be registered. To this end, we add the multi-backend function
to support simultaneous registration of multiple backends.
Based on the above changes, we can enable pstore in the crashdump kernel
and save the console logs and tty logs to the ramoops backend of pstore.
After rebooting, we can view the relevant logs by mounting the pstore
file system.
Furthermore, we also modified kexec-tools referring to crash-utils for
reading memory, so that pstore ramoops information can be read without
enabling pstore in first kernel. As we set the address and size of ramoops,
as well as the sizes of console and tty, we can infer the physical address
of console logs and tty logs in memory. Referring to the read method of
crash-utils, the console logs and tty logs are read from the memory,
user can get pstore debug information without affecting the first kernel
at all.
kexec-tools modification can be seen at
https://github.com/shuyuanmen/kexec-tools/blob/main/Add-pstore-segment.patch
Yuanhe Shu (5):
pstore: add tty frontend
pstore: add multi-backends support
pstore: add subdirs for multi-backends
pstore: remove the module parameter "backend"
tools/pstore: update pstore selftests
drivers/tty/n_tty.c | 1 +
fs/pstore/Kconfig | 23 ++
fs/pstore/Makefile | 2 +
fs/pstore/blk.c | 10 +
fs/pstore/ftrace.c | 22 +-
fs/pstore/inode.c | 86 ++++++-
fs/pstore/internal.h | 16 +-
fs/pstore/platform.c | 238 ++++++++++++--------
fs/pstore/pmsg.c | 23 +-
fs/pstore/ram.c | 40 +++-
fs/pstore/tty.c | 56 +++++
fs/pstore/zone.c | 42 +++-
include/linux/pstore.h | 33 +++
include/linux/pstore_blk.h | 3 +
include/linux/pstore_ram.h | 1 +
include/linux/pstore_zone.h | 2 +
include/linux/tty.h | 14 ++
tools/testing/selftests/pstore/common_tests | 4 -
18 files changed, 500 insertions(+), 116 deletions(-)
create mode 100644 fs/pstore/tty.c
--
2.39.3
Regressions that prevent a driver from probing a device can significantly
affect the functionality of a platform.
A kselftest to verify if devices on a DT-based platform are probed
correctly was recently introduced [1], but no such generic test is
available for ACPI platforms yet. bootrr [2] provides device probe
testing, but relies on a pre-defined list of the peripherals present on
each DUT.
On ACPI based hardware, a complete description of the platform is
provided to the OS by the system firmware. ACPI namespace objects are
mapped by the Linux ACPI subsystem into a device tree in
/sys/devices/LNXSYSTEM:00; the information in this subtree can be parsed
to build a list of the hw peripherals present on the DUT dynamically.
This series adds a test to verify if the devices declared in the ACPI
namespace and supported by the kernel are probed correctly.
This work follows a similar approach to [1], adapted for the ACPI use
case.
The first patch introduces a script that builds a list of all ACPI device
IDs supported by the kernel, by inspecting the acpi_device_id structs in
the sources. This list can be used to avoid testing ACPI-enumerated
devices that don't have a matching driver in the kernel. This script was
highly inspired by the dt-extract-compatibles script [3].
In the second patch, a new kselftest is added. It parses the
/sys/devices/LNXSYSTEM:00 tree to obtain a list of all platform
peripherals and verifies which of those, if supported, are correctly
bound to a driver.
Feedback is much appreciated,
Thank you,
Laura
[1] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
[2] https://github.com/kernelci/bootr
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scr…
Laura Nao (2):
acpi: Add script to extract ACPI device ids in the kernel
kselftest: Add test to detect unprobed devices on ACPI platforms
MAINTAINERS | 2 +
scripts/acpi/acpi-extract-ids | 60 +++++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/acpi/.gitignore | 2 +
tools/testing/selftests/acpi/Makefile | 23 ++++++
.../selftests/acpi/test_unprobed_devices.sh | 75 +++++++++++++++++++
6 files changed, 163 insertions(+)
create mode 100755 scripts/acpi/acpi-extract-ids
create mode 100644 tools/testing/selftests/acpi/.gitignore
create mode 100644 tools/testing/selftests/acpi/Makefile
create mode 100755 tools/testing/selftests/acpi/test_unprobed_devices.sh
--
2.30.2
The za-fork test does not output a newline when reporting the result of
the one test it runs, causing the counts printed by kselftest to be
included in the test name. Add the newline.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/za-fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/za-fork.c b/tools/testing/selftests/arm64/fp/za-fork.c
index b86cb1049497..587b94648222 100644
--- a/tools/testing/selftests/arm64/fp/za-fork.c
+++ b/tools/testing/selftests/arm64/fp/za-fork.c
@@ -85,7 +85,7 @@ int main(int argc, char **argv)
*/
ret = open("/proc/sys/abi/sme_default_vector_length", O_RDONLY, 0);
if (ret >= 0) {
- ksft_test_result(fork_test(), "fork_test");
+ ksft_test_result(fork_test(), "fork_test\n");
} else {
ksft_print_msg("SME not supported\n");
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231115-arm64-fix-za-fork-output-21cdd7a7195c
Best regards,
--
Mark Brown <broonie(a)kernel.org>