January 2024 - Linux-kselftest-mirror

Re: [PATCH v14 10/12] selftests/landlock: Add network tests

by Muhammad Usama Anjum

Hi Konstantin, There are some errors being reported in KernelCI: https://linux.kernelci.org/test/plan/id/657ab2240c761c0bd1e134ee/ The following sub-tests are failing: landlock_net_test_protocol_no_sandbox_with_ipv6_tcp_bind_unspec landlock_net_test_protocol_no_sandbox_with_ipv6_udp_bind_unspec landlock_net_test_protocol_tcp_sandbox_with_ipv6_udp_bind_unspec From my initial investigation, I can see that these failures are coming from just finding the wrong return error code (-97 instead of -22). It may be test's issue or the kernel's, not sure yet. Thanks, Usama On 10/26/23 6:47 AM, Konstantin Meskhidze wrote: > Add 82 test suites to check edge cases related to bind() and connect() > actions. They are defined with 6 fixtures and their variants: > > The "protocol" fixture is extended with 12 variants defined as a matrix > of: sandboxed/not-sandboxed, IPv4/IPv6/unix network domain, and > stream/datagram socket. 4 related tests suites are defined: > * bind: Tests with non-landlocked/landlocked ipv4, ipv6 and unix sockets. > * connect: Tests with non-landlocked/landlocked ipv4, ipv6 and unix > sockets. > * bind_unspec: Tests with non-landlocked/landlocked restrictions > for bind action with AF_UNSPEC socket family. > * connect_unspec: Tests with non-landlocked/landlocked restrictions > for connect action with AF_UNSPEC socket family. > > The "ipv4" fixture is extended with 4 variants defined as a matrix > of: sandboxed/not-sandboxed, IPv4/unix network domain, and > stream/datagram socket. 1 related test suite is defined: > * from_unix_to_inet: Tests to make sure unix sockets' actions are not > restricted by Landlock rules applied to TCP ones. > > The "tcp_layers" fixture is extended with 8 variants defined as a matrix > of: IPv4/IPv6 network domain, and different number of landlock rule layers. > 2 related tests suites are defined: > * ruleset_overlap. > * ruleset_expand. > > In the "mini" fixture 4 tests suites are defined: > * network_access_rights: Tests with legitimate access values. > * unknown_access_rights: Tests with invalid attributes, out of access > range. > * inval: > - unhandled allowed access. > - zero access value. > * tcp_port_overflow: Tests with wrong port values more than U16_MAX. > > In the "ipv4_tcp" fixture supports IPv4 network domain, stream socket. > 2 tests suites are defined: > * port_endianness: Tests with big/little endian port formats. > * with_fs: Tests with network bind() socket action within > filesystem directory access test. > > The "port_specific" fixture is extended with 4 variants defined > as a matrix of: sandboxed/not-sandboxed, IPv4/IPv6 network domain, > and stream socket. 2 related tests suites are defined: > * bind_connect_zero: Tests with port 0 value. > * bind_connect_1023: Tests with port 1023 value. > > Test coverage for security/landlock is 94.5% of 932 lines according to > gcc/gcov-9. > > Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze(a)huawei.com> > Co-developed-by: Mickaël Salaün <mic(a)digikod.net> > Signed-off-by: Mickaël Salaün <mic(a)digikod.net> > --- > > Changes since v13: > * Refactors "port_specific" test fixture: > - Deletes useless if .. else. > - Deletes repeating bind to port 0. > - Deletes useless lines. > - Adds 2 file descriptors per socket. > - Updates get_binded helper. > - Split test suite to bind_connect_zero > and bind_connect_1023. > * Adds CAP_NET_BIND_SERVICE to set_cap(); it helps > in bind_connect_1023 test. > * Moves with_net test from fs_test.c. > * Renames with_net test to with_fs. > * Refactors with_fs test by adding different > rule types per one ruleset layer. > * Minor fixes. > * Refactors commit message. > > Changes since v12: > * Renames port_zero to port_specific fixture. > * Refactors port_specific test: > - Adds set_port() and get_binded_port() helpers. > - Adds checks for port 0, allowed by Landlock in this version. > - Adds checks for port 1023. > * Refactors commit message. > > Changes since v11: > * Adds ipv4.from_unix_to_tcp test suite to check that socket family is > the same between a socket and a sockaddr by trying to connect/bind on > a unix socket (stream or dgram) using an inet family. Landlock should > not change the error code. This found a bug (which needs to be fixed) > with the TCP restriction. > * Revamps the inet.{bind,connect} tests into protocol.{bind,connect}: > - Merge bind_connect_unix_dgram_socket, bind_connect_unix_dgram_socket > and bind_connect_inval_addrlen into it: add a full test matrix of > IPv4/TCP, IPv6/TCP, IPv4/UDP, IPv6/UDP, unix/stream, unix/dgram, all > of them with or without sandboxing. This improve coverage and it > enables to check that a TCP restriction work as expected but doesn't > restrict other stream or datagram protocols. This also enables to > check consistency of the network stack with or without Landlock. > We now have 76 test suites for the network. > - Add full send/recv checks. > - Make a generic framework that will be ready for future > protocol supports. > * Replaces most ASSERT with EXPECT according to the criticity of an > action: if we can get more meaningful information with following > checks. For instance, failure to create a kernel object (e.g. > socket(), accept() or fork() call) is critical if it is used by > following checks. For Landlock ruleset building, the following checks > don't make sense if the sandbox is not complete. However, it doesn't > make sense to continue a FIXTURE_SETUP() if any check failed. > * Adds a new unspec fixture to replace inet.bind_afunspec with > unspec.bind and inet.connect_afunspec with unspec.connect, factoring > and simplifying code. > * Replaces inet.bind_afunspec with protocol.bind_unspec, and > inet.connect_afunspec with protocol.connect_unspec. Extend these > tests with the matrix of all "protocol" variants. Don't test connect > with the same socket which is already binded/listening (I guess this > was an copy-paste error). The protocol.bind_unspec tests found a bug > (which needs to be fixed). > * Add* and use set_service() and setup_loopback() helpers to configure > network services. Add and use and test_bind_and_connect() to factor > out a lot of checks. > * Adds new types (protocol_variant, service_fixture) and update related > helpers to get more generic test code. > * Replaces static (port) arrays with service_fixture variables. > * Adds new helpers: {bind,connect}_variant_addrlen() and get_addrlen() to > cover all protocols with previous bind_connect_inval_addrlen tests. > Make them return -errno in case of error. > * Switchs from a unix socket path address to an abstract one. This > enables to avoid file cleanup in test teardowns. > * Closes all rulesets after enforcement. > * Removes the duplicate "empty access" test. > * Replaces inet.ruleset_overlay with tcp_layers.ruleset_overlap and > simplify test: > - Always run sandbox tests because test were always run sandboxed and > it doesn't give more guarantees to do it not sandboxed. > - Rewrite test with variant->num_layers to make it simpler and > configurable. > - Add another test layer to tcp_layers used for ruleset_overlap and > test without sandbox. > - Leverage test_bind_and_connect() and avoid using SO_REUSEADDR > because the socket was not listened to, and don't use the same > socket/FD for server and client. > - Replace inet.ruleset_expanding with tcp_layers.ruleset_expand. > * Drops capabilities in all FIXTURE_SETUP(). > * Changes test ports to cover more ranges. > * Adds "mini" tests: > - Replace the invalid ruleset attribute test from port.inval with > mini.unknow_access_rights. > - Simplify port.inval and move some code to other mini.* tests. > - Add new mini.network_access_rights test. > * Rewrites inet.inval_port_format into mini.tcp_port_overflow: > - Remove useless is_sandbox checks. > - Extend tests with bind/connect checks. > - Interleave valid requests with invalid ones. > * Adds two_srv.port_endianness test, extracted and extended from > inet.inval_port_format . > * Adds Microsoft copyright. > * Rename some variables to make them easier to read. > * Constifies variables. > * Adds minimal logs to help debug test failures. > * Renames inet test to ipv4 and deletes is_sandboxed and prot vars from > FIXTURE_VARIANT. > * Adds port_zero tests. > * Renames all "net_service" to "net_port". > > Changes since v10: > * Replaces FIXTURE_VARIANT() with struct _fixture_variant_ . > * Changes tests names socket -> inet, standalone -> port. > * Gets rid of some DEFINEs. > * Changes names and groups tests' variables. > * Changes create_socket_variant() helper name to socket_variant(). > * Refactors FIXTURE_SETUP(port) logic. > * Changes TEST_F_FORK -> TEST_F since there no teardown. > * Refactors some tests' logic. > * Minor fixes. > * Refactors commit message. > > Changes since v9: > * Fixes mixing code declaration and code. > * Refactors FIXTURE_TEARDOWN() with clang-format. > * Replaces struct _fixture_variant_socket with > FIXTURE_VARIANT(socket). > * Deletes useless condition if (variant->is_sandboxed) > in multiple locations. > * Deletes zero_size argument in bind_variant() and > connect_variant(). > * Adds tests for port values exceeding U16_MAX. > > Changes since v8: > * Adds is_sandboxed const for FIXTURE_VARIANT(socket). > * Refactors AF_UNSPEC tests. > * Adds address length checking tests. > * Convert ports in all tests to __be16. > * Adds invalid port values tests. > * Minor fixes. > > Changes since v7: > * Squashes all selftest commits. > * Adds fs test with network bind() socket action. > * Minor fixes. > > --- > tools/testing/selftests/landlock/common.h | 3 + > tools/testing/selftests/landlock/config | 4 + > tools/testing/selftests/landlock/net_test.c | 1744 +++++++++++++++++++ > 3 files changed, 1751 insertions(+) > create mode 100644 tools/testing/selftests/landlock/net_test.c > > diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h > index 0fd6c4cf5e6f..5b79758cae62 100644 > --- a/tools/testing/selftests/landlock/common.h > +++ b/tools/testing/selftests/landlock/common.h > @@ -112,10 +112,13 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all) > cap_t cap_p; > /* Only these three capabilities are useful for the tests. */ > const cap_value_t caps[] = { > + /* clang-format off */ > CAP_DAC_OVERRIDE, > CAP_MKNOD, > CAP_SYS_ADMIN, > CAP_SYS_CHROOT, > + CAP_NET_BIND_SERVICE, > + /* clang-format on */ > }; > > cap_p = cap_get_proc(); > diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config > index 3dc9e438eab1..0086efaa7b68 100644 > --- a/tools/testing/selftests/landlock/config > +++ b/tools/testing/selftests/landlock/config > @@ -1,5 +1,9 @@ > CONFIG_CGROUPS=y > CONFIG_CGROUP_SCHED=y > +CONFIG_INET=y > +CONFIG_IPV6=y > +CONFIG_NET=y > +CONFIG_NET_NS=y > CONFIG_OVERLAY_FS=y > CONFIG_PROC_FS=y > CONFIG_SECURITY=y > diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c > new file mode 100644 > index 000000000000..3c0a10f9811a > --- /dev/null > +++ b/tools/testing/selftests/landlock/net_test.c > @@ -0,0 +1,1744 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Landlock tests - Network > + * > + * Copyright © 2022-2023 Huawei Tech. Co., Ltd. > + * Copyright © 2023 Microsoft Corporation > + */ > + > +#define _GNU_SOURCE > +#include <arpa/inet.h> > +#include <errno.h> > +#include <fcntl.h> > +#include <linux/landlock.h> > +#include <linux/in.h> > +#include <sched.h> > +#include <stdint.h> > +#include <string.h> > +#include <sys/prctl.h> > +#include <sys/socket.h> > +#include <sys/un.h> > + > +#include "common.h" > + > +const short sock_port_start = (1 << 10); > + > +static const char loopback_ipv4[] = "127.0.0.1"; > +static const char loopback_ipv6[] = "::1"; > + > +/* Number pending connections queue to be hold. */ > +const short backlog = 10; > + > +enum sandbox_type { > + NO_SANDBOX, > + /* This may be used to test rules that allow *and* deny accesses. */ > + TCP_SANDBOX, > +}; > + > +struct protocol_variant { > + int domain; > + int type; > +}; > + > +struct service_fixture { > + struct protocol_variant protocol; > + /* port is also stored in ipv4_addr.sin_port or ipv6_addr.sin6_port */ > + unsigned short port; > + union { > + struct sockaddr_in ipv4_addr; > + struct sockaddr_in6 ipv6_addr; > + struct { > + struct sockaddr_un unix_addr; > + socklen_t unix_addr_len; > + }; > + }; > +}; > + > +static int set_service(struct service_fixture *const srv, > + const struct protocol_variant prot, > + const unsigned short index) > +{ > + memset(srv, 0, sizeof(*srv)); > + > + /* > + * Copies all protocol properties in case of the variant only contains > + * a subset of them. > + */ > + srv->protocol = prot; > + > + /* Checks for port overflow. */ > + if (index > 2) > + return 1; > + srv->port = sock_port_start << (2 * index); > + > + switch (prot.domain) { > + case AF_UNSPEC: > + case AF_INET: > + srv->ipv4_addr.sin_family = prot.domain; > + srv->ipv4_addr.sin_port = htons(srv->port); > + srv->ipv4_addr.sin_addr.s_addr = inet_addr(loopback_ipv4); > + return 0; > + > + case AF_INET6: > + srv->ipv6_addr.sin6_family = prot.domain; > + srv->ipv6_addr.sin6_port = htons(srv->port); > + inet_pton(AF_INET6, loopback_ipv6, &srv->ipv6_addr.sin6_addr); > + return 0; > + > + case AF_UNIX: > + srv->unix_addr.sun_family = prot.domain; > + sprintf(srv->unix_addr.sun_path, > + "_selftests-landlock-net-tid%d-index%d", gettid(), > + index); > + srv->unix_addr_len = SUN_LEN(&srv->unix_addr); > + srv->unix_addr.sun_path[0] = '\0'; > + return 0; > + } > + return 1; > +} > + > +static void setup_loopback(struct __test_metadata *const _metadata) > +{ > + set_cap(_metadata, CAP_SYS_ADMIN); > + ASSERT_EQ(0, unshare(CLONE_NEWNET)); > + ASSERT_EQ(0, system("ip link set dev lo up")); > + clear_cap(_metadata, CAP_SYS_ADMIN); > +} > + > +static bool is_restricted(const struct protocol_variant *const prot, > + const enum sandbox_type sandbox) > +{ > + switch (prot->domain) { > + case AF_INET: > + case AF_INET6: > + switch (prot->type) { > + case SOCK_STREAM: > + return sandbox == TCP_SANDBOX; > + } > + break; > + } > + return false; > +} > + > +static int socket_variant(const struct service_fixture *const srv) > +{ > + int ret; > + > + ret = socket(srv->protocol.domain, srv->protocol.type | SOCK_CLOEXEC, > + 0); > + if (ret < 0) > + return -errno; > + return ret; > +} > + > +#ifndef SIN6_LEN_RFC2133 > +#define SIN6_LEN_RFC2133 24 > +#endif > + > +static socklen_t get_addrlen(const struct service_fixture *const srv, > + const bool minimal) > +{ > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + return sizeof(srv->ipv4_addr); > + > + case AF_INET6: > + if (minimal) > + return SIN6_LEN_RFC2133; > + return sizeof(srv->ipv6_addr); > + > + case AF_UNIX: > + if (minimal) > + return sizeof(srv->unix_addr) - > + sizeof(srv->unix_addr.sun_path); > + return srv->unix_addr_len; > + > + default: > + return 0; > + } > +} > + > +static void set_port(struct service_fixture *const srv, uint16_t port) > +{ > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + srv->ipv4_addr.sin_port = htons(port); > + return; > + > + case AF_INET6: > + srv->ipv6_addr.sin6_port = htons(port); > + return; > + > + default: > + return; > + } > +} > + > +static uint16_t get_binded_port(int socket_fd, > + const struct protocol_variant *const prot) > +{ > + struct sockaddr_in ipv4_addr; > + struct sockaddr_in6 ipv6_addr; > + socklen_t ipv4_addr_len, ipv6_addr_len; > + > + /* Gets binded port. */ > + switch (prot->domain) { > + case AF_UNSPEC: > + case AF_INET: > + ipv4_addr_len = sizeof(ipv4_addr); > + getsockname(socket_fd, &ipv4_addr, &ipv4_addr_len); > + return ntohs(ipv4_addr.sin_port); > + > + case AF_INET6: > + ipv6_addr_len = sizeof(ipv6_addr); > + getsockname(socket_fd, &ipv6_addr, &ipv6_addr_len); > + return ntohs(ipv6_addr.sin6_port); > + > + default: > + return 0; > + } > +} > + > +static int bind_variant_addrlen(const int sock_fd, > + const struct service_fixture *const srv, > + const socklen_t addrlen) > +{ > + int ret; > + > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + ret = bind(sock_fd, &srv->ipv4_addr, addrlen); > + break; > + > + case AF_INET6: > + ret = bind(sock_fd, &srv->ipv6_addr, addrlen); > + break; > + > + case AF_UNIX: > + ret = bind(sock_fd, &srv->unix_addr, addrlen); > + break; > + > + default: > + errno = EAFNOSUPPORT; > + return -errno; > + } > + > + if (ret < 0) > + return -errno; > + return ret; > +} > + > +static int bind_variant(const int sock_fd, > + const struct service_fixture *const srv) > +{ > + return bind_variant_addrlen(sock_fd, srv, get_addrlen(srv, false)); > +} > + > +static int connect_variant_addrlen(const int sock_fd, > + const struct service_fixture *const srv, > + const socklen_t addrlen) > +{ > + int ret; > + > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + ret = connect(sock_fd, &srv->ipv4_addr, addrlen); > + break; > + > + case AF_INET6: > + ret = connect(sock_fd, &srv->ipv6_addr, addrlen); > + break; > + > + case AF_UNIX: > + ret = connect(sock_fd, &srv->unix_addr, addrlen); > + break; > + > + default: > + errno = -EAFNOSUPPORT; > + return -errno; > + } > + > + if (ret < 0) > + return -errno; > + return ret; > +} > + > +static int connect_variant(const int sock_fd, > + const struct service_fixture *const srv) > +{ > + return connect_variant_addrlen(sock_fd, srv, get_addrlen(srv, false)); > +} > + > +FIXTURE(protocol) > +{ > + struct service_fixture srv0, srv1, srv2, unspec_any0, unspec_srv0; > +}; > + > +FIXTURE_VARIANT(protocol) > +{ > + const enum sandbox_type sandbox; > + const struct protocol_variant prot; > +}; > + > +FIXTURE_SETUP(protocol) > +{ > + const struct protocol_variant prot_unspec = { > + .domain = AF_UNSPEC, > + .type = SOCK_STREAM, > + }; > + > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, variant->prot, 0)); > + ASSERT_EQ(0, set_service(&self->srv1, variant->prot, 1)); > + ASSERT_EQ(0, set_service(&self->srv2, variant->prot, 2)); > + > + ASSERT_EQ(0, set_service(&self->unspec_srv0, prot_unspec, 0)); > + > + ASSERT_EQ(0, set_service(&self->unspec_any0, prot_unspec, 0)); > + self->unspec_any0.ipv4_addr.sin_addr.s_addr = htonl(INADDR_ANY); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(protocol) > +{ > +} > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_tcp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_udp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_udp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_unix_stream) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_unix_datagram) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_tcp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_udp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_udp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_unix_stream) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_unix_datagram) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_DGRAM, > + }, > +}; > + > +static void test_bind_and_connect(struct __test_metadata *const _metadata, > + const struct service_fixture *const srv, > + const bool deny_bind, const bool deny_connect) > +{ > + char buf = '\0'; > + int inval_fd, bind_fd, client_fd, status, ret; > + pid_t child; > + > + /* Starts invalid addrlen tests with bind. */ > + inval_fd = socket_variant(srv); > + ASSERT_LE(0, inval_fd) > + { > + TH_LOG("Failed to create socket: %s", strerror(errno)); > + } > + > + /* Tries to bind with zero as addrlen. */ > + EXPECT_EQ(-EINVAL, bind_variant_addrlen(inval_fd, srv, 0)); > + > + /* Tries to bind with too small addrlen. */ > + EXPECT_EQ(-EINVAL, bind_variant_addrlen(inval_fd, srv, > + get_addrlen(srv, true) - 1)); > + > + /* Tries to bind with minimal addrlen. */ > + ret = bind_variant_addrlen(inval_fd, srv, get_addrlen(srv, true)); > + if (deny_bind) { > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret) > + { > + TH_LOG("Failed to bind to socket: %s", strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(inval_fd)); > + > + /* Starts invalid addrlen tests with connect. */ > + inval_fd = socket_variant(srv); > + ASSERT_LE(0, inval_fd); > + > + /* Tries to connect with zero as addrlen. */ > + EXPECT_EQ(-EINVAL, connect_variant_addrlen(inval_fd, srv, 0)); > + > + /* Tries to connect with too small addrlen. */ > + EXPECT_EQ(-EINVAL, connect_variant_addrlen(inval_fd, srv, > + get_addrlen(srv, true) - 1)); > + > + /* Tries to connect with minimal addrlen. */ > + ret = connect_variant_addrlen(inval_fd, srv, get_addrlen(srv, true)); > + if (srv->protocol.domain == AF_UNIX) { > + EXPECT_EQ(-EINVAL, ret); > + } else if (deny_connect) { > + EXPECT_EQ(-EACCES, ret); > + } else if (srv->protocol.type == SOCK_STREAM) { > + /* No listening server, whatever the value of deny_bind. */ > + EXPECT_EQ(-ECONNREFUSED, ret); > + } else { > + EXPECT_EQ(0, ret) > + { > + TH_LOG("Failed to connect to socket: %s", > + strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(inval_fd)); > + > + /* Starts connection tests. */ > + bind_fd = socket_variant(srv); > + ASSERT_LE(0, bind_fd); > + > + ret = bind_variant(bind_fd, srv); > + if (deny_bind) { > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret); > + > + /* Creates a listening socket. */ > + if (srv->protocol.type == SOCK_STREAM) > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + } > + > + child = fork(); > + ASSERT_LE(0, child); > + if (child == 0) { > + int connect_fd, ret; > + > + /* Closes listening socket for the child. */ > + EXPECT_EQ(0, close(bind_fd)); > + > + /* Starts connection tests. */ > + connect_fd = socket_variant(srv); > + ASSERT_LE(0, connect_fd); > + ret = connect_variant(connect_fd, srv); > + if (deny_connect) { > + EXPECT_EQ(-EACCES, ret); > + } else if (deny_bind) { > + /* No listening server. */ > + EXPECT_EQ(-ECONNREFUSED, ret); > + } else { > + EXPECT_EQ(0, ret); > + EXPECT_EQ(1, write(connect_fd, ".", 1)); > + } > + > + EXPECT_EQ(0, close(connect_fd)); > + _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE); > + return; > + } > + > + /* Accepts connection from the child. */ > + client_fd = bind_fd; > + if (!deny_bind && !deny_connect) { > + if (srv->protocol.type == SOCK_STREAM) { > + client_fd = accept(bind_fd, NULL, 0); > + ASSERT_LE(0, client_fd); > + } > + > + EXPECT_EQ(1, read(client_fd, &buf, 1)); > + EXPECT_EQ('.', buf); > + } > + > + EXPECT_EQ(child, waitpid(child, &status, 0)); > + EXPECT_EQ(1, WIFEXITED(status)); > + EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status)); > + > + /* Closes connection, if any. */ > + if (client_fd != bind_fd) > + EXPECT_LE(0, close(client_fd)); > + > + /* Closes listening socket. */ > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_F(protocol, bind) > +{ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr tcp_connect_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv1.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect and bind for the first port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + > + /* Allows connect and denies bind for the second port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_connect_p1, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + /* Binds a socket to the first port. */ > + test_bind_and_connect(_metadata, &self->srv0, false, false); > + > + /* Binds a socket to the second port. */ > + test_bind_and_connect(_metadata, &self->srv1, > + is_restricted(&variant->prot, variant->sandbox), > + false); > + > + /* Binds a socket to the third port. */ > + test_bind_and_connect(_metadata, &self->srv2, > + is_restricted(&variant->prot, variant->sandbox), > + is_restricted(&variant->prot, variant->sandbox)); > +} > + > +TEST_F(protocol, connect) > +{ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr tcp_bind_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv1.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect and bind for the first port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + > + /* Allows bind and denies connect for the second port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_p1, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + test_bind_and_connect(_metadata, &self->srv0, false, false); > + > + test_bind_and_connect(_metadata, &self->srv1, false, > + is_restricted(&variant->prot, variant->sandbox)); > + > + test_bind_and_connect(_metadata, &self->srv2, > + is_restricted(&variant->prot, variant->sandbox), > + is_restricted(&variant->prot, variant->sandbox)); > +} > + > +TEST_F(protocol, bind_unspec) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + int bind_fd, ret; > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows bind. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + /* Allowed bind on AF_UNSPEC/INADDR_ANY. */ > + ret = bind_variant(bind_fd, &self->unspec_any0); > + if (variant->prot.domain == AF_INET) { > + EXPECT_EQ(0, ret) > + { > + TH_LOG("Failed to bind to unspec/any socket: %s", > + strerror(errno)); > + } > + } else { > + EXPECT_EQ(-EINVAL, ret); > + } > + EXPECT_EQ(0, close(bind_fd)); > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Denies bind. */ > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + /* Denied bind on AF_UNSPEC/INADDR_ANY. */ > + ret = bind_variant(bind_fd, &self->unspec_any0); > + if (variant->prot.domain == AF_INET) { > + if (is_restricted(&variant->prot, variant->sandbox)) { > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + } else { > + EXPECT_EQ(-EINVAL, ret); > + } > + EXPECT_EQ(0, close(bind_fd)); > + > + /* Checks bind with AF_UNSPEC and the loopback address. */ > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + ret = bind_variant(bind_fd, &self->unspec_srv0); > + if (variant->prot.domain == AF_INET) { > + EXPECT_EQ(-EAFNOSUPPORT, ret); > + } else { > + EXPECT_EQ(-EINVAL, ret) > + { > + TH_LOG("Wrong bind error: %s", strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_F(protocol, connect_unspec) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + int bind_fd, client_fd, status; > + pid_t child; > + > + /* Specific connection tests. */ > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + EXPECT_EQ(0, bind_variant(bind_fd, &self->srv0)); > + if (self->srv0.protocol.type == SOCK_STREAM) > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + > + child = fork(); > + ASSERT_LE(0, child); > + if (child == 0) { > + int connect_fd, ret; > + > + /* Closes listening socket for the child. */ > + EXPECT_EQ(0, close(bind_fd)); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + EXPECT_EQ(0, connect_variant(connect_fd, &self->srv0)); > + > + /* Tries to connect again, or set peer. */ > + ret = connect_variant(connect_fd, &self->srv0); > + if (self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EISCONN, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, > + LANDLOCK_RULE_NET_PORT, > + &tcp_connect, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + /* Disconnects already connected socket, or set peer. */ > + ret = connect_variant(connect_fd, &self->unspec_any0); > + if (self->srv0.protocol.domain == AF_UNIX && > + self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EINVAL, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + /* Tries to reconnect, or set peer. */ > + ret = connect_variant(connect_fd, &self->srv0); > + if (self->srv0.protocol.domain == AF_UNIX && > + self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EISCONN, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Denies connect. */ > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + ret = connect_variant(connect_fd, &self->unspec_any0); > + if (self->srv0.protocol.domain == AF_UNIX && > + self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EINVAL, ret); > + } else { > + /* Always allowed to disconnect. */ > + EXPECT_EQ(0, ret); > + } > + > + EXPECT_EQ(0, close(connect_fd)); > + _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE); > + return; > + } > + > + client_fd = bind_fd; > + if (self->srv0.protocol.type == SOCK_STREAM) { > + client_fd = accept(bind_fd, NULL, 0); > + ASSERT_LE(0, client_fd); > + } > + > + EXPECT_EQ(child, waitpid(child, &status, 0)); > + EXPECT_EQ(1, WIFEXITED(status)); > + EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status)); > + > + /* Closes connection, if any. */ > + if (client_fd != bind_fd) > + EXPECT_LE(0, close(client_fd)); > + > + /* Closes listening socket. */ > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +FIXTURE(ipv4) > +{ > + struct service_fixture srv0, srv1; > +}; > + > +FIXTURE_VARIANT(ipv4) > +{ > + const enum sandbox_type sandbox; > + const int type; > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, no_sandbox_with_tcp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .type = SOCK_STREAM, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, tcp_sandbox_with_tcp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .type = SOCK_STREAM, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, no_sandbox_with_udp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .type = SOCK_DGRAM, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, tcp_sandbox_with_udp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .type = SOCK_DGRAM, > +}; > + > +FIXTURE_SETUP(ipv4) > +{ > + const struct protocol_variant prot = { > + .domain = AF_INET, > + .type = variant->type, > + }; > + > + disable_caps(_metadata); > + > + set_service(&self->srv0, prot, 0); > + set_service(&self->srv1, prot, 1); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(ipv4) > +{ > +} > + > +TEST_F(ipv4, from_unix_to_inet) > +{ > + int unix_stream_fd, unix_dgram_fd; > + > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + int ruleset_fd; > + > + /* Denies connect and bind to check errno value. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect and bind for srv0. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + unix_stream_fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); > + ASSERT_LE(0, unix_stream_fd); > + > + unix_dgram_fd = socket(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0); > + ASSERT_LE(0, unix_dgram_fd); > + > + /* Checks unix stream bind and connect for srv0. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_stream_fd, &self->srv0)); > + EXPECT_EQ(-EINVAL, connect_variant(unix_stream_fd, &self->srv0)); > + > + /* Checks unix stream bind and connect for srv1. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_stream_fd, &self->srv1)) > + { > + TH_LOG("Wrong bind error: %s", strerror(errno)); > + } > + EXPECT_EQ(-EINVAL, connect_variant(unix_stream_fd, &self->srv1)); > + > + /* Checks unix datagram bind and connect for srv0. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_dgram_fd, &self->srv0)); > + EXPECT_EQ(-EINVAL, connect_variant(unix_dgram_fd, &self->srv0)); > + > + /* Checks unix datagram bind and connect for srv1. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_dgram_fd, &self->srv1)); > + EXPECT_EQ(-EINVAL, connect_variant(unix_dgram_fd, &self->srv1)); > +} > + > +FIXTURE(tcp_layers) > +{ > + struct service_fixture srv0, srv1; > +}; > + > +FIXTURE_VARIANT(tcp_layers) > +{ > + const size_t num_layers; > + const int domain; > +}; > + > +FIXTURE_SETUP(tcp_layers) > +{ > + const struct protocol_variant prot = { > + .domain = variant->domain, > + .type = SOCK_STREAM, > + }; > + > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, prot, 0)); > + ASSERT_EQ(0, set_service(&self->srv1, prot, 1)); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(tcp_layers) > +{ > +} > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, no_sandbox_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 0, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, one_sandbox_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 1, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, two_sandboxes_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 2, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, three_sandboxes_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 3, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, no_sandbox_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 0, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, one_sandbox_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 1, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, two_sandboxes_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 2, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, three_sandboxes_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 3, > +}; > + > +TEST_F(tcp_layers, ruleset_overlap) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr tcp_bind_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + > + if (variant->num_layers >= 1) { > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows bind. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + /* Also allows bind, but allows connect too. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 2) { > + int ruleset_fd; > + > + /* Creates another ruleset layer. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Only allows bind. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 3) { > + int ruleset_fd; > + > + /* Creates another ruleset layer. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Try to allow bind and connect. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + /* > + * Forbids to connect to the socket because only one ruleset layer > + * allows connect. > + */ > + test_bind_and_connect(_metadata, &self->srv0, false, > + variant->num_layers >= 2); > +} > + > +TEST_F(tcp_layers, ruleset_expand) > +{ > + if (variant->num_layers >= 1) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP, > + }; > + /* Allows bind for srv0. */ > + const struct landlock_net_port_attr bind_srv0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &bind_srv0, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 2) { > + /* Expands network mask with connect action. */ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + /* Allows bind for srv0 and connect to srv0. */ > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + /* Try to allow bind for srv1. */ > + const struct landlock_net_port_attr tcp_bind_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv1.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_p1, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 3) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + /* Allows connect to srv0, without bind rule. */ > + const struct landlock_net_port_attr tcp_bind_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_p0, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + test_bind_and_connect(_metadata, &self->srv0, false, > + variant->num_layers >= 3); > + > + test_bind_and_connect(_metadata, &self->srv1, variant->num_layers >= 1, > + variant->num_layers >= 2); > +} > + > +/* clang-format off */ > +FIXTURE(mini) {}; > +/* clang-format on */ > + > +FIXTURE_SETUP(mini) > +{ > + disable_caps(_metadata); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(mini) > +{ > +} > + > +/* clang-format off */ > + > +#define ACCESS_LAST LANDLOCK_ACCESS_NET_CONNECT_TCP > + > +#define ACCESS_ALL ( \ > + LANDLOCK_ACCESS_NET_BIND_TCP | \ > + LANDLOCK_ACCESS_NET_CONNECT_TCP) > + > +/* clang-format on */ > + > +TEST_F(mini, network_access_rights) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = ACCESS_ALL, > + }; > + struct landlock_net_port_attr net_port = { > + .port = sock_port_start, > + }; > + int ruleset_fd; > + __u64 access; > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + for (access = 1; access <= ACCESS_LAST; access <<= 1) { > + net_port.allowed_access = access; > + EXPECT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &net_port, 0)) > + { > + TH_LOG("Failed to add rule with access 0x%llx: %s", > + access, strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(ruleset_fd)); > +} > + > +/* Checks invalid attribute, out of landlock network access range. */ > +TEST_F(mini, unknown_access_rights) > +{ > + __u64 access_mask; > + > + for (access_mask = 1ULL << 63; access_mask != ACCESS_LAST; > + access_mask >>= 1) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = access_mask, > + }; > + > + EXPECT_EQ(-1, landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0)); > + EXPECT_EQ(EINVAL, errno); > + } > +} > + > +TEST_F(mini, inval) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP > + }; > + const struct landlock_net_port_attr tcp_bind_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = sock_port_start, > + }; > + const struct landlock_net_port_attr tcp_denied = { > + .allowed_access = 0, > + .port = sock_port_start, > + }; > + const struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = sock_port_start, > + }; > + int ruleset_fd; > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Checks unhandled allowed_access. */ > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + /* Checks zero access value. */ > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_denied, 0)); > + EXPECT_EQ(ENOMSG, errno); > + > + /* Adds with legitimate values. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > +} > + > +TEST_F(mini, tcp_port_overflow) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr port_max_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT16_MAX, > + }; > + const struct landlock_net_port_attr port_max_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = UINT16_MAX, > + }; > + const struct landlock_net_port_attr port_overflow1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT16_MAX + 1, > + }; > + const struct landlock_net_port_attr port_overflow2 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT16_MAX + 2, > + }; > + const struct landlock_net_port_attr port_overflow3 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT32_MAX + 1UL, > + }; > + const struct landlock_net_port_attr port_overflow4 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT32_MAX + 2UL, > + }; > + const struct protocol_variant ipv4_tcp = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }; > + struct service_fixture srv_denied, srv_max_allowed; > + int ruleset_fd; > + > + ASSERT_EQ(0, set_service(&srv_denied, ipv4_tcp, 0)); > + > + /* Be careful to avoid port inconsistencies. */ > + srv_max_allowed = srv_denied; > + srv_max_allowed.port = port_max_bind.port; > + srv_max_allowed.ipv4_addr.sin_port = htons(port_max_bind.port); > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_max_bind, 0)); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow1, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow2, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow3, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + /* Interleaves with invalid rule additions. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_max_connect, 0)); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow4, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + enforce_ruleset(_metadata, ruleset_fd); > + > + test_bind_and_connect(_metadata, &srv_denied, true, true); > + test_bind_and_connect(_metadata, &srv_max_allowed, false, false); > +} > + > +FIXTURE(ipv4_tcp) > +{ > + struct service_fixture srv0, srv1; > +}; > + > +FIXTURE_SETUP(ipv4_tcp) > +{ > + const struct protocol_variant ipv4_tcp = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }; > + > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, ipv4_tcp, 0)); > + ASSERT_EQ(0, set_service(&self->srv1, ipv4_tcp, 1)); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(ipv4_tcp) > +{ > +} > + > +TEST_F(ipv4_tcp, port_endianness) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr bind_host_endian_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + /* Host port format. */ > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr connect_big_endian_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + /* Big endian port format. */ > + .port = htons(self->srv0.port), > + }; > + const struct landlock_net_port_attr bind_connect_host_endian_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + /* Host port format. */ > + .port = self->srv1.port, > + }; > + const unsigned int one = 1; > + const char little_endian = *(const char *)&one; > + int ruleset_fd; > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &bind_host_endian_p0, 0)); > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &connect_big_endian_p0, 0)); > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &bind_connect_host_endian_p1, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + > + /* No restriction for big endinan CPU. */ > + test_bind_and_connect(_metadata, &self->srv0, false, little_endian); > + > + /* No restriction for any CPU. */ > + test_bind_and_connect(_metadata, &self->srv1, false, false); > +} > + > +TEST_F_FORK(ipv4_tcp, with_fs) > +{ > + const struct landlock_ruleset_attr ruleset_attr_fs_net = { > + .handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR, > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP, > + }; > + struct landlock_path_beneath_attr path_beneath = { > + .allowed_access = LANDLOCK_ACCESS_FS_READ_DIR, > + .parent_fd = -1, > + }; > + struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = sock_port_start, > + }; > + int sockfd, ruleset_fd, dirfd, open_dir1, open_dir2; > + struct sockaddr_in addr4; > + > + dirfd = open("/dev", O_PATH | O_DIRECTORY | O_CLOEXEC); > + ASSERT_LE(0, dirfd); > + path_beneath.parent_fd = dirfd; > + > + addr4.sin_family = AF_INET; > + addr4.sin_port = htons(sock_port_start); > + addr4.sin_addr.s_addr = inet_addr(loopback_ipv4); > + memset(&addr4.sin_zero, '\0', 8); > + > + /* Creates ruleset both for filesystem and network access. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr_fs_net, > + sizeof(ruleset_attr_fs_net), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Adds a filesystem rule. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, > + &path_beneath, 0)); > + /* Adds a network rule. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + ASSERT_EQ(0, close(ruleset_fd)); > + > + /* Tests on a directories with the network rule loaded. */ > + open_dir1 = open("/dev", O_RDONLY); > + ASSERT_LE(0, open_dir1); > + ASSERT_EQ(0, close(open_dir1)); > + > + open_dir2 = open("/", O_RDONLY); > + /* Denied by Landlock. */ > + ASSERT_EQ(-1, open_dir2); > + EXPECT_EQ(EACCES, errno); > + > + sockfd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0); > + ASSERT_LE(0, sockfd); > + /* Binds a socket to port 1024. */ > + ASSERT_EQ(0, bind(sockfd, &addr4, sizeof(addr4))); > + > + /* Closes bounded socket. */ > + ASSERT_EQ(0, close(sockfd)); > +} > + > +FIXTURE(port_specific) > +{ > + struct service_fixture srv0; > +}; > + > +FIXTURE_VARIANT(port_specific) > +{ > + const enum sandbox_type sandbox; > + const struct protocol_variant prot; > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, no_sandbox_with_ipv4) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, sandbox_with_ipv4) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, no_sandbox_with_ipv6) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, sandbox_with_ipv6) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +FIXTURE_SETUP(port_specific) > +{ > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, variant->prot, 0)); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(port_specific) > +{ > +} > + > +TEST_F(port_specific, bind_connect_zero) > +{ > + int bind_fd, connect_fd, ret; > + uint16_t port; > + > + /* Adds a rule layer with bind and connect actions. */ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP > + }; > + const struct landlock_net_port_attr tcp_bind_connect_zero = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = 0, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Checks zero port value on bind and connect actions. */ > + EXPECT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_zero, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + > + /* Sets address port to 0 for both protocol families. */ > + set_port(&self->srv0, 0); > + /* > + * Binds on port 0, which selects a random port within > + * ip_local_port_range. > + */ > + ret = bind_variant(bind_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + > + /* Connects on port 0. */ > + ret = connect_variant(connect_fd, &self->srv0); > + EXPECT_EQ(-ECONNREFUSED, ret); > + > + /* Sets binded port for both protocol families. */ > + port = get_binded_port(bind_fd, &variant->prot); > + EXPECT_NE(0, port); > + set_port(&self->srv0, port); > + /* Connects on the binded port. */ > + ret = connect_variant(connect_fd, &self->srv0); > + if (is_restricted(&variant->prot, variant->sandbox)) { > + /* Denied by Landlock. */ > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + EXPECT_EQ(0, close(connect_fd)); > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_F(port_specific, bind_connect_1023) > +{ > + int bind_fd, connect_fd, ret; > + > + /* Adds a rule layer with bind and connect actions. */ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP > + }; > + /* A rule with port value less than 1024. */ > + const struct landlock_net_port_attr tcp_bind_connect_low_range = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = 1023, > + }; > + /* A rule with 1024 port. */ > + const struct landlock_net_port_attr tcp_bind_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = 1024, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_low_range, 0)); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + > + /* Sets address port to 1023 for both protocol families. */ > + set_port(&self->srv0, 1023); > + /* Binds on port 1023. */ > + ret = bind_variant(bind_fd, &self->srv0); > + /* Denied by the system. */ > + EXPECT_EQ(-EACCES, ret); > + > + set_cap(_metadata, CAP_NET_BIND_SERVICE); > + /* Binds on port 1023. */ > + ret = bind_variant(bind_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + clear_cap(_metadata, CAP_NET_BIND_SERVICE); > + > + /* Connects on the binded port 1023. */ > + ret = connect_variant(connect_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + > + EXPECT_EQ(0, close(connect_fd)); > + EXPECT_EQ(0, close(bind_fd)); > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + > + /* Sets address port to 1024 for both protocol families. */ > + set_port(&self->srv0, 1024); > + /* Binds on port 1024. */ > + ret = bind_variant(bind_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + clear_cap(_metadata, CAP_NET_BIND_SERVICE); > + > + /* Connects on the binded port 1024. */ > + ret = connect_variant(connect_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + > + EXPECT_EQ(0, close(connect_fd)); > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_HARNESS_MAIN > -- > 2.25.1 > > -- BR, Muhammad Usama Anjum

1 year, 6 months

2
3
0 0

[PATCH v5 0/3] livepatch: Move modules to selftests and add a new test

by Marcos Paulo de Souza

Changes in v5: * Fixed an issue found by Joe that copied Kbuild files along with the test modules to the installation directory. * Added Joe Lawrense review tags. Changes in v4: * Documented how to compile the livepatch selftests without running the tests (Joe) * Removed the mention to lib/livepatch on MAINTAINERS file, reported by checkpatch. Changes in v3: * Rebased on top of v6.6-rc5 * The commits messages were improved (Thanks Petr!) * Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel modules, and adapt selftests to build it before running the test. * Moved test_klp-call_getpid out of test_programs, since the gen_tar would just copy the generated test programs to the livepatches dir, and so scripts relying on test_programs/test_klp-call_getpid will fail. * Added a module_param for klp_pids, describing it's usage. * Simplified the call_getpid program to ignore the return of getpid syscall, since we only want to make sure the process transitions correctly to the patched stated * The test-syscall.sh not prints a log message showing the number of remaining processes to transition into to livepatched state, and check_output expects it to be 0. * Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c - Link to v3: https://lore.kernel.org/r/20231031-send-lp-kselftests-v3-0-2b1655c2605f@sus… - Link to v2: https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus… This patchset moves the current kernel testing livepatch modules from lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles them as out-of-tree modules before testing. There is also a new test being added. This new test exercises multiple processes calling a syscall, while a livepatch patched the syscall. Why this move is an improvement: * The modules are now compiled as out-of-tree modules against the current running kernel, making them capable of being tested on different systems with newer or older kernels. * Such approach now needs kernel-devel package to be installed, since they are out-of-tree modules. These can be generated by running "make rpm-pkg" in the kernel source. What needs to be solved: * Currently gen_tar only packages the resulting binaries of the tests, and not the sources. For the current approach, the newly added modules would be compiled and then packaged. It works when testing on a system with the same kernel version. But it will fail when running on a machine with different kernel version, since module was compiled against the kernel currently running. This is not a new problem, just aligning the expectations. For the current approach to be truly system agnostic gen_tar would need to include the module and program sources to be compiled in the target systems. Thanks in advance! Marcos Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com> --- Marcos Paulo de Souza (3): kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable livepatch: Move tests from lib/livepatch to selftests/livepatch selftests: livepatch: Test livepatching a heavily called syscall Documentation/dev-tools/kselftest.rst | 4 + MAINTAINERS | 1 - arch/s390/configs/debug_defconfig | 1 - arch/s390/configs/defconfig | 1 - lib/Kconfig.debug | 22 ---- lib/Makefile | 2 - lib/livepatch/Makefile | 14 --- tools/testing/selftests/lib.mk | 25 ++++- tools/testing/selftests/livepatch/Makefile | 5 +- tools/testing/selftests/livepatch/README | 25 +++-- tools/testing/selftests/livepatch/config | 1 - tools/testing/selftests/livepatch/functions.sh | 34 +++--- .../testing/selftests/livepatch/test-callbacks.sh | 50 ++++----- tools/testing/selftests/livepatch/test-ftrace.sh | 6 +- .../testing/selftests/livepatch/test-livepatch.sh | 10 +- .../selftests/livepatch/test-shadow-vars.sh | 2 +- tools/testing/selftests/livepatch/test-state.sh | 18 ++-- tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++ tools/testing/selftests/livepatch/test-sysfs.sh | 6 +- .../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++ .../selftests/livepatch/test_modules/Makefile | 20 ++++ .../test_modules}/test_klp_atomic_replace.c | 0 .../test_modules}/test_klp_callbacks_busy.c | 0 .../test_modules}/test_klp_callbacks_demo.c | 0 .../test_modules}/test_klp_callbacks_demo2.c | 0 .../test_modules}/test_klp_callbacks_mod.c | 0 .../livepatch/test_modules}/test_klp_livepatch.c | 0 .../livepatch/test_modules}/test_klp_shadow_vars.c | 0 .../livepatch/test_modules}/test_klp_state.c | 0 .../livepatch/test_modules}/test_klp_state2.c | 0 .../livepatch/test_modules}/test_klp_state3.c | 0 .../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++ 32 files changed, 339 insertions(+), 121 deletions(-) --- base-commit: 89ecef4cb0ac442d5ad48c1aae1e2e1e7744d46f change-id: 20231031-send-lp-kselftests-4c917dcd4565 Best regards, -- Marcos Paulo de Souza <mpdesouza(a)suse.com>

1 year, 6 months

2
6
0 0

[PATCH 1/7] selftests/mm: hugepage-shm: conform test to TAP format output

by Muhammad Usama Anjum

Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. The "." was being printed inside for loop to indicate the writes progress. This was extraneous and hence removed in the patch. Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- tools/testing/selftests/mm/hugepage-shm.c | 47 +++++++++++------------ 1 file changed, 22 insertions(+), 25 deletions(-) diff --git a/tools/testing/selftests/mm/hugepage-shm.c b/tools/testing/selftests/mm/hugepage-shm.c index 478bb1e989e9..f949dbbc3454 100644 --- a/tools/testing/selftests/mm/hugepage-shm.c +++ b/tools/testing/selftests/mm/hugepage-shm.c @@ -34,11 +34,10 @@ #include <sys/ipc.h> #include <sys/shm.h> #include <sys/mman.h> +#include "../kselftest.h" #define LENGTH (256UL*1024*1024) -#define dprintf(x) printf(x) - /* Only ia64 requires this */ #ifdef __ia64__ #define ADDR (void *)(0x8000000000000000UL) @@ -54,44 +53,42 @@ int main(void) unsigned long i; char *shmaddr; + ksft_print_header(); + ksft_set_plan(1); + shmid = shmget(2, LENGTH, SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W); - if (shmid < 0) { - perror("shmget"); - exit(1); - } - printf("shmid: 0x%x\n", shmid); + if (shmid < 0) + ksft_exit_fail_msg("shmget: %s\n", strerror(errno)); + + ksft_print_msg("shmid: 0x%x\n", shmid); shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS); if (shmaddr == (char *)-1) { - perror("Shared memory attach failure"); shmctl(shmid, IPC_RMID, NULL); - exit(2); + ksft_exit_fail_msg("Shared memory attach failure: %s\n", strerror(errno)); } - printf("shmaddr: %p\n", shmaddr); - dprintf("Starting the writes:\n"); - for (i = 0; i < LENGTH; i++) { + ksft_print_msg("shmaddr: %p\n", shmaddr); + + ksft_print_msg("Starting the writes:"); + for (i = 0; i < LENGTH; i++) shmaddr[i] = (char)(i); - if (!(i % (1024 * 1024))) - dprintf("."); - } - dprintf("\n"); + ksft_print_msg("Done.\n"); - dprintf("Starting the Check..."); + ksft_print_msg("Starting the Check..."); for (i = 0; i < LENGTH; i++) - if (shmaddr[i] != (char)i) { - printf("\nIndex %lu mismatched\n", i); - exit(3); - } - dprintf("Done.\n"); + if (shmaddr[i] != (char)i) + ksft_exit_fail_msg("\nIndex %lu mismatched\n", i); + ksft_print_msg("Done.\n"); if (shmdt((const void *)shmaddr) != 0) { - perror("Detach failure"); shmctl(shmid, IPC_RMID, NULL); - exit(4); + ksft_exit_fail_msg("Detach failure: %s\n", strerror(errno)); } shmctl(shmid, IPC_RMID, NULL); - return 0; + ksft_test_result_pass("Completed test\n"); + + ksft_finished(); } -- 2.42.0

1 year, 6 months

1
7
0 0

[PATCH v2] selftests/move_mount_set_group:Make tests build with old libc

by Hu Yadi

From: "Hu.Yadi" <hu.yadi(a)h3c.com> Replace SYS_<syscall> with __NR_<syscall>. Using the __NR_<syscall> notation, provided by UAPI, is useful to build tests on systems without the SYS_<syscall> definitions. Replace SYS_move_mount with __NR_move_mount Signed-off-by: Hu.Yadi <hu.yadi(a)h3c.com> Suggested-by:Jiao <jiaoxupo(a)h3c.com> Reviewed-by:Berlin <berlin(a)h3c.com> --- Changes v1 -> v2: - Fix mail of Suggested-by and Reviewed-by .../move_mount_set_group/move_mount_set_group_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c b/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c index 50ed5d475dd1..bcf51d785a37 100644 --- a/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c +++ b/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c @@ -218,7 +218,7 @@ static bool move_mount_set_group_supported(void) if (mount(NULL, SET_GROUP_FROM, NULL, MS_SHARED, 0)) return -1; - ret = syscall(SYS_move_mount, AT_FDCWD, SET_GROUP_FROM, + ret = syscall(__NR_move_mount, AT_FDCWD, SET_GROUP_FROM, AT_FDCWD, SET_GROUP_TO, MOVE_MOUNT_SET_GROUP); umount2("/tmp", MNT_DETACH); @@ -363,7 +363,7 @@ TEST_F(move_mount_set_group, complex_sharing_copying) CLONE_VM | CLONE_FILES); ASSERT_GT(pid, 0); ASSERT_EQ(wait_for_pid(pid), 0); - ASSERT_EQ(syscall(SYS_move_mount, ca_from.mntfd, "", + ASSERT_EQ(syscall(__NR_move_mount, ca_from.mntfd, "", ca_to.mntfd, "", MOVE_MOUNT_SET_GROUP | MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); -- 2.23.0

1 year, 6 months

3
3
0 0

[PATCH V12 0/7] amd-pstate preferred core

by Meng Li

Hi all: The core frequency is subjected to the process variation in semiconductors. Not all cores are able to reach the maximum frequency respecting the infrastructure limits. Consequently, AMD has redefined the concept of maximum frequency of a part. This means that a fraction of cores can reach maximum frequency. To find the best process scheduling policy for a given scenario, OS needs to know the core ordering informed by the platform through highest performance capability register of the CPPC interface. Earlier implementations of amd-pstate preferred core only support a static core ranking and targeted performance. Now it has the ability to dynamically change the preferred core based on the workload and platform conditions and accounting for thermals and aging. Amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferred core. Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. Amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority. Amd-pstate driver will provide an initial core ordering at boot time. It relies on the CPPC interface to communicate the core ranking to the operating system and scheduler to make sure that OS is choosing the cores with highest performance firstly for scheduling the process. When amd-pstate driver receives a message with the highest performance change, it will update the core ranking. Changes from V11->V12: - all: - - pick up Reviewed-By flag added by Perry. - cpufreq: amd-pstate: - - rebase the latest linux-next and fixed conflicts. - - fixed the issue about cpudata without init in amd_pstate_update_highest_perf(). Changes from V10->V11: - cpufreq: amd-pstate: - - according Perry's commnts, I replace the string with str_enabled_disable(). Changes from V9->V10: - cpufreq: amd-pstate: - - add judgement for highest_perf. When it is less than 255, the preferred core feature is enabled. And it will set the priority. - - deleset "static u32 max_highest_perf" etc, because amd p-state perferred coe does not require specail process for hotpulg. Changes form V8->V9: - all: - - pick up Tested-By flag added by Oleksandr. - cpufreq: amd-pstate: - - pick up Review-By flag added by Wyes. - - ignore modification of bug. - - add a attribute of prefcore_ranking. - - modify data type conversion from u32 to int. - Documentation: amd-pstate: - - pick up Review-By flag added by Wyes. Changes form V7->V8: - all: - - pick up Review-By flag added by Mario and Ray. - cpufreq: amd-pstate: - - use hw_prefcore embeds into cpudata structure. - - delete preferred core init from cpu online/off. Changes form V6->V7: - x86: - - Modify kconfig about X86_AMD_PSTATE. - cpufreq: amd-pstate: - - modify incorrect comments about scheduler_work(). - - convert highest_perf data type. - - modify preferred core init when cpu init and online. - acpi: cppc: - - modify link of CPPC highest performance. - cpufreq: - - modify link of CPPC highest performance changed. Changes form V5->V6: - cpufreq: amd-pstate: - - modify the wrong tag order. - - modify warning about hw_prefcore sysfs attribute. - - delete duplicate comments. - - modify the variable name cppc_highest_perf to prefcore_ranking. - - modify judgment conditions for setting highest_perf. - - modify sysfs attribute for CPPC highest perf to pr_debug message. - Documentation: amd-pstate: - - modify warning: title underline too short. Changes form V4->V5: - cpufreq: amd-pstate: - - modify sysfs attribute for CPPC highest perf. - - modify warning about comments - - rebase linux-next - cpufreq: - - Moidfy warning about function declarations. - Documentation: amd-pstate: - - align with ``amd-pstat`` Changes form V3->V4: - Documentation: amd-pstate: - - Modify inappropriate descriptions. Changes form V2->V3: - x86: - - Modify kconfig and description. - cpufreq: amd-pstate: - - Add Co-developed-by tag in commit message. - cpufreq: - - Modify commit message. - Documentation: amd-pstate: - - Modify inappropriate descriptions. Changes form V1->V2: - acpi: cppc: - - Add reference link. - cpufreq: - - Moidfy link error. - cpufreq: amd-pstate: - - Init the priorities of all online CPUs - - Use a single variable to represent the status of preferred core. - Documentation: - - Default enabled preferred core. - Documentation: amd-pstate: - - Modify inappropriate descriptions. - - Default enabled preferred core. - - Use a single variable to represent the status of preferred core. Meng Li (7): x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion. acpi: cppc: Add get the highest performance cppc control cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. cpufreq: Add a notification message that the highest perf has changed cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically Documentation: amd-pstate: introduce amd-pstate preferred core Documentation: introduce amd-pstate preferrd core mode kernel command line options .../admin-guide/kernel-parameters.txt | 5 + Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++- arch/x86/Kconfig | 5 +- drivers/acpi/cppc_acpi.c | 13 ++ drivers/acpi/processor_driver.c | 6 + drivers/cpufreq/amd-pstate.c | 175 +++++++++++++++++- drivers/cpufreq/cpufreq.c | 13 ++ include/acpi/cppc_acpi.h | 5 + include/linux/amd-pstate.h | 10 + include/linux/cpufreq.h | 5 + 10 files changed, 284 insertions(+), 12 deletions(-) -- 2.34.1

1 year, 6 months

4
21
0 0

[PATCH v8 00/10] Add iommufd nesting (part 2/2)

by Yi Liu

Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation. Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation. .-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |---------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation> This series is based on the first part which was merged [1], this series is to add the cache invalidation interface or the userspace to invalidate cache after modifying the stage-1 page table. This includes both the iommufd changes and the VT-d driver changes. Complete code can be found in [2], QEMU could can be found in [3]. At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks. [1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1 Change log: v8: - Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user op path (Kevin) - Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang) - Clear *fault per restart in qi_submit_sync() to avoid acroos submission error accumulation. (Kevin) - Define the vtd cache invalidation uapi structure in separate patch (Kevin) - Rename inv_error to be hw_error (Kevin) - Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr', 'data_type', "entry_len' and 'entry_num" (Kevin) - Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR in the same time (Kevin) v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.… - Remove domain->ops->cache_invalidate_user check in hwpt alloc path due to failure in bisect (Baolu) - Remove out_driver_error_code from struct iommu_hwpt_invalidate after discussion in v6. Should expect per-entry error code. - Rework the selftest cache invalidation part to report a per-entry error - Allow user to pass in an empty array to have a try-and-fail mechanism for user to check if a given req_type is supported by the kernel (Jason) - Define a separate enum type for cache invalidation data (Jason) - Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before returning (Nicolin) - Merge the VT-d nesting part 2/2 https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c… into this series to avoid defining empty enum in the middle of the series. The major difference is adding the VT-d related invalidation uapi structures together with the generic data structures in patch 02 of this series. - VT-d driver was refined to report ICE/ITE error from the bottom cache invalidation submit helpers, hence the cache_invalidate_user op could report such errors via the per-entry error field to user. VT-d driver will not stop the invalidation array walking due to the ICE/ITE errors as such errors are defined by VT-d spec, userspace should be able to handle it and let the real user (say Virtual Machine) know about it. But for other errors like invalid uapi data structure configuration, memory copy failure, such errors should stop the array walking as it may have more issues if go on. - Minor fixes per Jason and Kevin's review comments v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c… - No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c… - Split the iommufd nesting series into two parts of alloc_user and invalidation (Jason) - Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and do the same with the structures/alloc()/abort()/destroy(). Reworked the selftest accordingly too. (Jason) - Move hwpt/data_type into struct iommu_user_data from standalone op arguments. (Jason) - Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA, _TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin) - Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin) - Add macro to the iommu_copy_struct_from_user() to calculate min_size (Jason) - Fix two bugs spotted by ZhaoYan v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.… - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs - Rework invalidate uAPI to be a multi-request array-based design - Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array - Add a patch fixing TEST_LENGTH() in selftest program - Drop IOMMU_RESV_IOVA_RANGES patches - Update kdoc and inline comments - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far. v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.… - Add new uAPI things in alphabetical order - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for sanity, replacing the previous op->domain_alloc_user_data_len solution - Return ERR_PTR from domain_alloc_user instead of NULL - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin) - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O page table). (Kevin) - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl - Minor changes per Kevin's inputs v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c… - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1 v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.… Thanks, Yi Liu Lu Baolu (4): iommu: Add cache_invalidate_user op iommu/vt-d: Allow qi_submit_sync() to return the QI faults iommu/vt-d: Convert stage-1 cache invalidation to return QI fault iommu/vt-d: Add iotlb flush for nested domain Nicolin Chen (4): iommu: Add iommu_copy_struct_from_user_array helper iommufd/selftest: Add mock_domain_cache_invalidate_user support iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu (2): iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add data structure for Intel VT-d stage-1 cache invalidation drivers/iommu/intel/dmar.c | 38 ++-- drivers/iommu/intel/iommu.c | 12 +- drivers/iommu/intel/iommu.h | 8 +- drivers/iommu/intel/irq_remapping.c | 2 +- drivers/iommu/intel/nested.c | 118 ++++++++++++ drivers/iommu/intel/pasid.c | 14 +- drivers/iommu/intel/svm.c | 14 +- drivers/iommu/iommufd/hw_pagetable.c | 41 ++++ drivers/iommu/iommufd/iommufd_private.h | 10 + drivers/iommu/iommufd/iommufd_test.h | 39 ++++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 86 +++++++++ include/linux/iommu.h | 100 ++++++++++ include/uapi/linux/iommufd.h | 98 ++++++++++ tools/testing/selftests/iommu/iommufd.c | 179 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++ 16 files changed, 781 insertions(+), 38 deletions(-) -- 2.34.1

1 year, 6 months

5
25
0 0

[RFC PATCH v5 0/4] Introduce mseal()

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)chromium.org> This patchset proposes a new mseal() syscall for the Linux kernel. In a nutshell, mseal() protects the VMAs of a given virtual memory range against modifications, such as changes to their permission bits. Modern CPUs support memory permissions, such as the read/write (RW) and no-execute (NX) bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves the security stance on memory corruption bugs, as an attacker cannot simply write to arbitrary memory and point the code to it. The memory must be marked with the X bit, or else an exception will occur. Internally, the kernel maintains the memory permissions in a data structure called VMA (vm_area_struct). mseal() additionally protects the VMA itself against modifications of the selected seal type. Memory sealing is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. Two system calls are involved in sealing the map: mmap() and mseal(). The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. In addition: mmap() has two related changes. The PROT_SEAL bit in prot field of mmap(). When present, it marks the map sealed since creation. The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks the map as sealable. A map created without MAP_SEALABLE will not support sealing, i.e. mseal() will fail. Applications that don't care about sealing will expect their behavior unchanged. For those that need sealing support, opt-in by adding MAP_SEALABLE in mmap(). The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this API. Indeed, the Chrome browser has very specific requirements for sealing, which are distinct from those of most applications. For example, in the case of libc, sealing is only applied to read-only (RO) or read-execute (RX) memory segments (such as .text and .RELRO) to prevent them from becoming writable, the lifetime of those mappings are tied to the lifetime of the process. Chrome wants to seal two large address space reservations that are managed by different allocators. The memory is mapped RW- and RWX respectively but write access to it is restricted using pkeys (or in the future ARM permission overlay extensions). The lifetime of those mappings are not tied to the lifetime of the process, therefore, while the memory is sealed, the allocators still need to free or discard the unused memory. For example, with madvise(DONTNEED). However, always allowing madvise(DONTNEED) on this range poses a security risk. For example if a jump instruction crosses a page boundary and the second page gets discarded, it will overwrite the target bytes with zeros and change the control flow. Checking write-permission before the discard operation allows us to control when the operation is valid. In this case, the madvise will only succeed if the executing thread has PKEY write permissions and PKRU changes are protected in software by control-flow integrity. Although the initial version of this patch series is targeting the Chrome browser as its first user, it became evident during upstream discussions that we would also want to ensure that the patch set eventually is a complete solution for memory sealing and compatible with other use cases. The specific scenario currently in mind is glibc's use case of loading and sealing ELF executables. To this end, Stephen is working on a change to glibc to add sealing support to the dynamic linker, which will seal all non-writable segments at startup. Once this work is completed, all applications will be able to automatically benefit from these new protections. Change history: =============== V5: - fix build issue in mseal-Wire-up-mseal-syscall (Suggested by Linus Torvalds, and Greg KH) - updates on selftest. V4: (Suggested by Linus Torvalds) - new signature: mseal(start,len,flags) - 32 bit is not supported. vm_seal is removed, use vm_flags instead. - single bit in vm_flags for sealed state. - CONFIG_MSEAL kernel config is removed. - single bit of PROT_SEAL in the "Prot" field of mmap(). Other changes: - update selftest (Suggested by Muhammad Usama Anjum) - update documentation. https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/ V3: - Abandon per-syscall approach, (Suggested by Linus Torvalds). - Organize sealing types around their functionality, such as MM_SEAL_BASE, MM_SEAL_PROT_PKEY. - Extend the scope of sealing from calls originated in userspace to both kernel and userspace. (Suggested by Linus Torvalds) - Add seal type support in mmap(). (Suggested by Pedro Falcato) - Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent destructive operations of madvise. (Suggested by Jann Horn and Stephen Röttger) - Make sealed VMAs mergeable. (Suggested by Jann Horn) - Add MAP_SEALABLE to mmap() - Add documentation - mseal.rst https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o… v2: Use _BITUL to define MM_SEAL_XX type. Use unsigned long for seal type in sys_mseal() and other functions. Remove internal VM_SEAL_XX type and convert_user_seal_type(). Remove MM_ACTION_XX type. Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask. Add more comments in code. Add a detailed commit message. https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/ v1: https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/ ---------------------------------------------------------------- [1] https://kernelnewbies.org/Linux_2_6_8 [2] https://v8.dev/blog/control-flow-integrity [3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b… [4] https://man.openbsd.org/mimmutable.2 [5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge… [6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf… [7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/ Jeff Xu (4): mseal: Wire up mseal syscall mseal: add mseal syscall selftest mm/mseal memory sealing mseal:add documentation Documentation/userspace-api/mseal.rst | 181 ++ arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/mm.h | 60 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/mman-common.h | 7 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 1 + mm/Makefile | 4 + mm/madvise.c | 12 + mm/mmap.c | 27 + mm/mprotect.c | 10 + mm/mremap.c | 31 + mm/mseal.c | 330 +++ tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/mseal_test.c | 1989 +++++++++++++++++++ 32 files changed, 2677 insertions(+), 2 deletions(-) create mode 100644 Documentation/userspace-api/mseal.rst create mode 100644 mm/mseal.c create mode 100644 tools/testing/selftests/mm/mseal_test.c -- 2.43.0.195.gebba966016-goog

1 year, 6 months

7
12
0 0

[PATCH v2] selftests/landlock:Fix two build issues

by Hu Yadi

From: "Hu.Yadi" <hu.yadi(a)h3c.com> Two issues comes up while building selftest/landlock: the first one is as to gettid net_test.c: In function ‘set_service’: net_test.c:91:45: warning: implicit declaration of function ‘gettid’; [-Wimplicit-function-declaration] "_selftests-landlock-net-tid%d-index%d", gettid(), ^~~~~~ getgid net_test.c:(.text+0x4e0): undefined reference to `gettid' the second is compiler error gcc -Wall -O2 -isystem fs_test.c -lcap -o selftests/landlock/fs_test fs_test.c:4575:9: error: initializer element is not constant .mnt = mnt_tmp, ^~~~~~~ this patch is to fix them Signed-off-by: Hu.Yadi <hu.yadi(a)h3c.com> Suggested-by: Jiao <jiaoxupo(a)h3c.com> Reviewed-by: Berlin <berlin(a)h3c.com> --- Changes v1 -> v2: - fix whitespace error - replace SYS_gettid with _NR_gettid tools/testing/selftests/landlock/fs_test.c | 5 ++++- tools/testing/selftests/landlock/net_test.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c index 18e1f86a6234..a992cf7c0ad1 100644 --- a/tools/testing/selftests/landlock/fs_test.c +++ b/tools/testing/selftests/landlock/fs_test.c @@ -4572,7 +4572,10 @@ FIXTURE_VARIANT(layout3_fs) /* clang-format off */ FIXTURE_VARIANT_ADD(layout3_fs, tmpfs) { /* clang-format on */ - .mnt = mnt_tmp, + .mnt = { + .type = "tmpfs", + .data = "size=4m,mode=700", + }, .file_path = file1_s1d1, }; diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c index 929e21c4db05..455f4efe7346 100644 --- a/tools/testing/selftests/landlock/net_test.c +++ b/tools/testing/selftests/landlock/net_test.c @@ -88,7 +88,7 @@ static int set_service(struct service_fixture *const srv, case AF_UNIX: srv->unix_addr.sun_family = prot.domain; sprintf(srv->unix_addr.sun_path, - "_selftests-landlock-net-tid%d-index%d", gettid(), + "_selftests-landlock-net-tid%ld-index%d", syscall(__NR_gettid), index); srv->unix_addr_len = SUN_LEN(&srv->unix_addr); srv->unix_addr.sun_path[0] = '\0'; -- 2.23.0

1 year, 6 months

1
0
0 0

[PATCH v3] selftests/move_mount_set_group:Fix build issue with old libc

by Hu Yadi

From: "Hu.Yadi" <hu.yadi(a)h3c.com> Replace SYS_move_mount with __NR_move_mount to fix build error with old libc. Signed-off-by: Hu.Yadi <hu.yadi(a)h3c.com> Suggested-by: Jiao <jiaoxupo(a)h3c.com> Reviewed-by: Berlin <berlin(a)h3c.com> --- Changes v2 -> v3: - Adjust comments Changes v1 -> v2: - Fix mail of Suggested-by and Reviewed-by .../move_mount_set_group/move_mount_set_group_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c b/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c index 50ed5d475dd1..bcf51d785a37 100644 --- a/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c +++ b/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c @@ -218,7 +218,7 @@ static bool move_mount_set_group_supported(void) if (mount(NULL, SET_GROUP_FROM, NULL, MS_SHARED, 0)) return -1; - ret = syscall(SYS_move_mount, AT_FDCWD, SET_GROUP_FROM, + ret = syscall(__NR_move_mount, AT_FDCWD, SET_GROUP_FROM, AT_FDCWD, SET_GROUP_TO, MOVE_MOUNT_SET_GROUP); umount2("/tmp", MNT_DETACH); @@ -363,7 +363,7 @@ TEST_F(move_mount_set_group, complex_sharing_copying) CLONE_VM | CLONE_FILES); ASSERT_GT(pid, 0); ASSERT_EQ(wait_for_pid(pid), 0); - ASSERT_EQ(syscall(SYS_move_mount, ca_from.mntfd, "", + ASSERT_EQ(syscall(__NR_move_mount, ca_from.mntfd, "", ca_to.mntfd, "", MOVE_MOUNT_SET_GROUP | MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH), 0); -- 2.23.0

1 year, 6 months

1
0
0 0

[PATCH 1/1] selftests: mm: hugepage-vmemmap fails on 64K page size systems.

by Donet Tom

The kernel sefltest mm/hugepage-vmemmap fails on architectures which has different page size other than 4K. In hugepage-vmemmap page size used is 4k so the pfn calculation will go wrong on systems which has different page size .The length of MAP_HUGETLB memory must be hugepage aligned but in hugepage-vmemmap map length is 2M so this will not get aligned if the system has differnet hugepage size. Added psize() to get the page size and default_huge_page_size() to get the default hugepage size at run time, hugepage-vmemmap test pass on powerpc with 64K page size and x86 with 4K page size. Result on powerpc without patch (page size 64K) *# ./hugepage-vmemmap Returned address is 0x7effff000000 whose pfn is 0 Head page flags (100000000) is invalid check_page_flags: Invalid argument *# Result on powerpc with patch (page size 64K) *# ./hugepage-vmemmap Returned address is 0x7effff000000 whose pfn is 600 *# Result on x86 with patch (page size 4K) *# ./hugepage-vmemmap Returned address is 0x7fc7c2c00000 whose pfn is 1dac00 *# Signed-off-by: Donet Tom <donettom(a)linux.vnet.ibm.com> Reported-by : Geetika Moolchandani (geetika(a)linux.ibm.com) Tested-by : Geetika Moolchandani (geetika(a)linux.ibm.com) --- tools/testing/selftests/mm/hugepage-vmemmap.c | 29 ++++++++++++------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/mm/hugepage-vmemmap.c b/tools/testing/selftests/mm/hugepage-vmemmap.c index 5b354c209e93..894d28c3dd47 100644 --- a/tools/testing/selftests/mm/hugepage-vmemmap.c +++ b/tools/testing/selftests/mm/hugepage-vmemmap.c @@ -10,10 +10,7 @@ #include <unistd.h> #include <sys/mman.h> #include <fcntl.h> - -#define MAP_LENGTH (2UL * 1024 * 1024) - -#define PAGE_SIZE 4096 +#include "vm_util.h" #define PAGE_COMPOUND_HEAD (1UL << 15) #define PAGE_COMPOUND_TAIL (1UL << 16) @@ -39,6 +36,9 @@ #define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) #endif +static size_t pagesize; +static size_t maplength; + static void write_bytes(char *addr, size_t length) { unsigned long i; @@ -56,7 +56,7 @@ static unsigned long virt_to_pfn(void *addr) if (fd < 0) return -1UL; - lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET); + lseek(fd, (unsigned long)addr / pagesize * sizeof(pagemap), SEEK_SET); read(fd, &pagemap, sizeof(pagemap)); close(fd); @@ -86,7 +86,7 @@ static int check_page_flags(unsigned long pfn) * this also verifies kernel has correctly set the fake page_head to tail * while hugetlb_free_vmemmap is enabled. */ - for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) { + for (i = 1; i < maplength / pagesize; i++) { read(fd, &pageflags, sizeof(pageflags)); if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS || (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) { @@ -106,18 +106,25 @@ int main(int argc, char **argv) void *addr; unsigned long pfn; - addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); + pagesize = psize(); + maplength = default_huge_page_size(); + if (!maplength) { + printf("Unable to determine huge page size\n"); + exit(1); + } + + addr = mmap(MAP_ADDR, maplength, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(1); } /* Trigger allocation of HugeTLB page. */ - write_bytes(addr, MAP_LENGTH); + write_bytes(addr, maplength); pfn = virt_to_pfn(addr); if (pfn == -1UL) { - munmap(addr, MAP_LENGTH); + munmap(addr, maplength); perror("virt_to_pfn"); exit(1); } @@ -125,13 +132,13 @@ int main(int argc, char **argv) printf("Returned address is %p whose pfn is %lx\n", addr, pfn); if (check_page_flags(pfn) < 0) { - munmap(addr, MAP_LENGTH); + munmap(addr, maplength); perror("check_page_flags"); exit(1); } /* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ - if (munmap(addr, MAP_LENGTH)) { + if (munmap(addr, maplength)) { perror("munmap"); exit(1); } -- 2.43.0

1 year, 6 months

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror January 2024