This series adds namespace support to vhost-vsock. It does not add namespaces to any of the guest transports (virtio-vsock, hyperv, or vmci).
The current revision only supports two modes: local or global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior).
If it is deemed necessary to add mixed mode up front, it is doable but at the cost of more complexity than local and global modes. Mixed will require adding the notion of allocation to the socket lookup functions (like vhost_vsock_get()) and also more logic will be necessary for controlling or using lookups differently based on mixed-to-global or global-to-mixed scenarios.
The current implementation takes into consideration the future need for mixed mode and makes sure it is possible by making vsock_ns_mode per-namespace, as for mixed mode we need at least one "global" namespace and one "mixed" namespace for it to work. Is it feasible to support local and global modes only initially?
I've demoted this series to RFC, as I haven't been able to re-run the tests after rebasing onto the upstreamed vmtest.sh, some of the code is still pretty messy, there are still some TODOs, stale comments, and other work to do. I thought reviewers might want to see the current state even though unfinished, since I'll be OoO until the second week of July and that just feels like a long time of silence given we've already all done work on this together.
Thanks again for everyone's help and reviews!
Signed-off-by: Bobby Eshleman bobbyeshleman@gmail.com --- Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
--- Bobby Eshleman (11): selftests/vsock: add NS tests to vmtest.sh vsock: a per-net vsock NS mode state vsock: add vsock net ns helpers vsock: add net to vsock skb cb vsock: add common code for vsock NS support virtio-vsock: add netns to common code vhost/vsock: add netns support vsock/virtio: add netns hooks hv_sock: add netns hooks vsock/vmci: add netns hooks vsock/loopback: add netns support
MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 ++- include/linux/virtio_vsock.h | 12 + include/net/af_vsock.h | 53 ++- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 19 ++ net/vmw_vsock/af_vsock.c | 203 +++++++++++- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport.c | 5 +- net/vmw_vsock/virtio_transport_common.c | 14 +- net/vmw_vsock/vmci_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 4 +- tools/testing/selftests/vsock/vmtest.sh | 555 +++++++++++++++++++++++++++++--- 13 files changed, 843 insertions(+), 81 deletions(-) --- base-commit: 8909f5f4ecd551c2299b28e05254b77424c8c7dc change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
This patch has not been tested since rebasing onto upstream vmtest.sh. It is probably very broken, but here to show the direction the testing is going in...
vsock_test uses tcp for the control socket. TCP itself responds to namespaces. In order to test vsock but not break TCP, vmtest.sh sets up a bridge with socat (perhaps ip tables would be better, because it can reduce an out-of-tree tool dependency). Another option is to not use vsock_test for the NS tests, but it seems more robust to test all of vsock instead of just (for example) connectibility...
Signed-off-by: Bobby Eshleman bobbyeshleman@gmail.com --- tools/testing/selftests/vsock/vmtest.sh | 555 +++++++++++++++++++++++++++++--- 1 file changed, 510 insertions(+), 45 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index edacebfc1632..8f627f60cc11 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -7,6 +7,48 @@ # * virtme-ng # * busybox-static (used by virtme-ng) # * qemu (used by virtme-ng) +# +# Namespace tests require to test the functionality of VSOCK under different +# namespace configurations. Ideally, we can use vsock_test and friends under +# the different configurations to ensure that all functionality works +# regardless of namespace setup. vsock_test also requires TCP for its control +# plane, which is also impacted by namespacing. For this reason, these tests +# build a bridge between the namespaces so that the TCP control traffic can +# flow between namespaces. The bridge setup looks as follows: +# +# +# | +# +------------------+ | +# | VM | | +# | | NS0 | NS1 +# | +------------+ | | +# | | | | --------+--------------------+ +# | | vsock_test | | | | +# | | | | <-------+-----------------+ | +# | +------------+ | | VSOCK_TEST_PORT| | +# | | | | | VSOCK +# +------------------+ | | | +# ^ | | | | +# CONTROL_PORT| | | | | +# | | | | | +# | | | | v +# | | | +------------+ +# | | TCP | | | +# | | | | vsock_test | +# | | | | | +# | | | +------------+ +# CONTROL_PORT | | | CONTROL_PORT ^ | +# | | | | | +# | v | CONTROL_PORT | v +# +-------+ | +-------+ +# | |veth0 | veth1| | +# | socat |<-------------+------------- | socat | +# | | -------------+------------> | | +# +-------+ | +-------+ +# NS_BRIDGE_PORT | NS_BRIDGE_PORT +# | + +set -u
readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) @@ -19,11 +61,11 @@ readonly TEST_HOST_PORT=50000 readonly TEST_HOST_PORT_LISTENER=50001 readonly SSH_GUEST_PORT=22 readonly SSH_HOST_PORT=2222 -readonly VSOCK_CID=1234 +readonly BRIDGE_PORT=5678 +readonly DEFAULT_CID=1234 readonly WAIT_PERIOD=3 readonly WAIT_PERIOD_MAX=60 -readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) -readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) +WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a # control port forwarded for vsock_test. Because virtme-ng doesn't support @@ -33,23 +75,48 @@ readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) # add the kernel cmdline options that virtme-init uses to setup the interface. readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" -readonly QEMU_OPTS="\ +readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) +readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +QEMU_OPTS="\ -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ -device virtio-net-pci,netdev=n0 \ - -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ - --pidfile ${QEMU_PIDFILE} \ " readonly KERNEL_CMDLINE="\ virtme.dhcp net.ifnames=0 biosdevname=0 \ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ " readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) -readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_NAMES=( + vm_server_host_client + vm_client_host_server + vm_loopback + host_vsock_ns_mode + host_vsock_ns_mode_write_once + global_same_cid + local_same_cid + global_local_same_cid + local_global_same_cid + global_host_connect_global_vm + global_vm_connect_global_host + global_vm_connect_mixed_host +) + readonly TEST_DESCS=( "Run vsock_test in server mode on the VM and in client mode on the host." "Run vsock_test in client mode on the VM and in server mode on the host." "Run vsock_test using the loopback transport in the VM." + "Check /proc/net/vsock_ns_mode strings on the host." + "Check /proc/net/vsock_ns_mode is write-once on the host." + "Test that CID allocation fails with the same CID, one global NS and another global NS." + "Test that CID allocation succeeds with the same CID, one local NS and another local NS." + "Test that CID allocation succeeds with the same CID, one global NS and one local NS, global allocates first." + "Test that CID allocation succeeds with the same CID, one global NS and one local NS, local allocates first." ) +readonly NEEDS_SETUP=(vm_server_host_client vm_client_host_server vm_loopback) +readonly MODES=("local" "global" "mixed") +readonly PIDFILE_TEMPLATE="/tmp/qemu_vsock_vmtest_XXXX.pid" + +declare -a PIDFILES
VERBOSE=0
@@ -84,21 +151,40 @@ die() { exit "${KSFT_FAIL}" }
+cleanup() { + terminate_pidfiles ${PIDFILES[@]} + del_namespaces +} + vm_ssh() { ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" return $? }
-cleanup() { - if [[ -s "${QEMU_PIDFILE}" ]]; then - pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 - fi +vm_ssh_ns() { + local ns="${1}" + local NS_EXEC="ip netns exec ${ns}" + shift
- # If failure occurred during or before qemu start up, then we need - # to clean this up ourselves. - if [[ -e "${QEMU_PIDFILE}" ]]; then - rm "${QEMU_PIDFILE}" - fi + ${NS_EXEC} ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost $* + + return $? +} + +terminate_pidfiles() { + local pidfile + + for pidfile in $@; do + if [[ -s "${pidfile}" ]]; then + pkill -SIGTERM -F ${pidfile} 2>&1 > /dev/null + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${pidfile}" ]]; then + rm "${pidfile}" + fi + done }
check_args() { @@ -189,7 +275,13 @@ handle_build() { }
vm_start() { + local cid=$1 + local ns=$2 + local verify_boot=${3:-1} + local pidfile=${4:-} + local logfile=/dev/null + local qemu_opts="" local verbose_opt="" local kernel_opt="" local qemu @@ -201,35 +293,53 @@ vm_start() { logfile=/dev/stdout fi
+ qemu_opts="\ + ${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${cid} \ + --pidfile ${pidfile} + " + if [[ "${BUILD}" -eq 1 ]]; then kernel_opt="${KERNEL_CHECKOUT}" fi
- vng \ + if [[ ! -z "${ns}" ]]; then + NS_EXEC="ip netns exec ${ns}" + fi + + if [[ -z "${pidfile}" ]]; then + pidfile=$(mktemp $PIDFILE_TEMPLATE) + PIDFILES+=("${pidfile}") + fi + + ${NS_EXEC} vng \ --run \ ${kernel_opt} \ ${verbose_opt} \ - --qemu-opts="${QEMU_OPTS}" \ + --qemu-opts="${qemu_opts}" \ --qemu="${qemu}" \ --user root \ --append "${KERNEL_CMDLINE}" \ --rw &> ${logfile} &
- if ! timeout ${WAIT_TOTAL} \ - bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then - die "failed to boot VM" - fi + timeout ${WAIT_TOTAL} \ + bash -c 'while [[ ! -s '"${pidfile}"' ]]; do sleep 1; done; exit 0' }
vm_wait_for_ssh() { + local ns="${1}" local i
i=0 - while true; do + while [[ true ]]; do if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then die "Timed out waiting for guest ssh" fi - if vm_ssh -- true; then + if [[ ! -z "${ns}" ]]; then + vm_ssh_ns "${ns}" -- true + else + vm_ssh -- true + fi + if [[ $? -eq 0 ]]; then break fi i=$(( i + 1 )) @@ -262,8 +372,9 @@ wait_for_listener()
vm_wait_for_listener() { local port=$1 + local host_ns=$2
- vm_ssh <<EOF + vm_ssh_ns "${host_ns}" <<EOF $(declare -f wait_for_listener) wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} EOF @@ -271,6 +382,17 @@ EOF
host_wait_for_listener() { wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" + wait_for_listener ${TEST_HOST_PORT_LISTENER} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} +} + +host_ns_wait_for_listener() { + local ns="${1}" + local port="${2}" + + ip netns exec "${ns}" bash <<-EOF + $(declare -f wait_for_listener) + wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} + EOF }
__log_stdin() { @@ -331,7 +453,7 @@ test_vm_server_host_client() { ${VSOCK_TEST} \ --mode=client \ --control-host=127.0.0.1 \ - --peer-cid="${VSOCK_CID}" \ + --peer-cid="${DEFAULT_CID}" \ --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}"
return $? @@ -343,7 +465,7 @@ test_vm_client_host_server() { ${VSOCK_TEST} \ --mode "server" \ --control-port "${TEST_HOST_PORT_LISTENER}" \ - --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + --peer-cid "${DEFAULT_CID}" 2>&1 | log_host "${testname}" &
host_wait_for_listener
@@ -376,6 +498,309 @@ test_vm_loopback() { return $? }
+add_namespaces() { + local init=${1:-0} + + for mode in "${MODES[@]}"; do + if ! ip netns add "${mode}"; then + return ${KSFT_FAIL} + fi + + # e.g., global-2, local-2, mixed-2 + if ! ip netns add "${mode}-2"; then + return ${KSFT_FAIL} + fi + + if [[ ${init} -eq 1 ]]; then + ns_set_mode "${mode}" "${mode}" + ns_set_mode "${mode}-2" "${mode}" + + # we need lo for qemu port forwarding + ip netns exec "${mode}" ip link set dev lo up + ip netns exec "${mode}-2" ip link set dev lo up + fi + done + return 0 +} + +del_namespaces() { + for mode in "${MODES[@]}"; do + ip netns del "${mode}" + ip netns del "${mode}-2" + done &>/dev/null +} + +ns_set_mode() { + local ns=$1 + local mode=$2 + + echo "${mode}" \ + | ip netns exec "${ns}" \ + tee /proc/net/vsock_ns_mode &>/dev/null +} + +setup_bridge() { + local ns0 + local ns1 + local addr1 + + ns0=$1 + ns1=$2 + + ip link add veth0 type veth peer name veth1 + ip link set veth0 netns "${ns0}" + ip link set veth1 netns "${ns1}" + ip netns exec "${ns0}" ip addr add 10.0.0.1/24 dev veth0 + ip netns exec "${ns1}" ip addr add 10.0.0.2/24 dev veth1 + ip netns exec "${ns0}" ip link set veth0 up + ip netns exec "${ns1}" ip link set veth1 up +} + +teardown_bridge() { + local ns0="${1}" + + # veth1 is implicitly destroyed with veth0 + ip netns exec "${ns0}" ip link delete veth0 +} + +test_host_vsock_ns_mode() { + if ! add_namespaces; then + return ${KSFT_FAIL} + fi + + for mode in "${MODES[@]}"; do + if ! ns_set_mode "${mode}" "${mode}"; then + del_namespaces + return ${KSFT_FAIL} + fi + done + + if ! del_namespaces; then + return ${KSFT_FAIL} + fi +} + +test_host_vsock_ns_mode_write_once() { + if ! add_namespaces; then + return ${KSFT_FAIL} + fi + + for mode in "${MODES[@]}"; do + if ! ns_set_mode "${mode}" "${mode}"; then + del_namespaces + return ${KSFT_FAIL} + fi + + # try setting back to global, should fail + if ns_set_mode "${mode}" "global"; then + del_namespaces + return ${KSFT_FAIL} + fi + done + + if ! del_namespaces; then + return ${KSFT_FAIL} + fi +} + +namespaces_can_boot_same_cid() { + local ns1=$1 + local ns2=$2 + local cid=20 + local pidfile1 + local pidfile2 + local msg + + if ! add_namespaces 1; then + return 1 + fi + + if [[ ${VERBOSE} -gt 0 ]]; then + echo "booting vm 1" | tap_prefix + fi + + pidfile1=$(mktemp $PIDFILE_TEMPLATE) + PIDFILES+=("${pidfile1}") + vm_start ${cid} ${ns1} ${pidfile1} + + if [[ ${VERBOSE} -gt 0 ]]; then + echo "booting vm 2" | tap_prefix + fi + + pidfile2=$(mktemp $PIDFILE_TEMPLATE) + PIDFILES+=("${pidfile2}") + WAIT_TOTAL=30 vm_start ${cid} ${ns2} ${pidfile2} + + rc=$? + if [[ $rc -eq 0 ]]; then + msg="successfully booted" + rc=0 + else + msg="failed to boot" + rc=1 + fi + + if [[ ${VERBOSE} -gt 0 ]]; then + echo "vm 2 ${msg}" | tap_prefix + fi + if ! del_namespaces; then + echo "failed to delete namespaces" | tap_prefix + fi + + terminate_pidfiles ${pidfile1} ${pidfile2} + return $rc +} + +test_global_same_cid() { + if namespaces_can_boot_same_cid "global" "global-2"; then + return $KSFT_FAIL + fi + + return $KSFT_PASS +} + +test_local_global_same_cid() { + if namespaces_can_boot_same_cid "local" "global"; then + return $KSFT_PASS + fi + + return $KSFT_FAIL +} + +test_global_local_same_cid() { + if namespaces_can_boot_same_cid "global" "local"; then + return $KSFT_PASS + fi + + return $KSFT_FAIL +} + +test_local_same_cid() { + if namespaces_can_boot_same_cid "local" "local"; then + return $KSFT_FAIL + fi + + return $KSFT_PASS +} + +test_global_host_connect_global_vm() { + local testname="${FUNCNAME[0]#test_}" + local cid=${DEFAULT_CID} + local port=1234 + local host_ns="global" + local host_ns2="global-2" + + add_namespaces 1 + setup_bridge "${host_ns}" "${host_ns2}" + + # Start server in VM in namespace + if ! vm_start ${cid} "${host_ns}"; then + teardown_bridge "${host_ns}" + return $KSFT_FAIL + fi + + vm_ssh_ns "${host_ns}" \ + -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + vm_wait_for_listener ${TEST_GUEST_PORT} "${host_ns}" + + # Setup NS-to-NS "bridge" + ip netns exec "${host_ns}" socat TCP-LISTEN:${BRIDGE_PORT},fork \ + TCP-CONNECT:localhost:${TEST_HOST_PORT} & + host_ns_wait_for_listener "${host_ns}" "${BRIDGE_PORT}" + + ip netns exec "${host_ns2}" \ + socat TCP:10.0.0.1:${BRIDGE_PORT} TCP-LISTEN:${TEST_HOST_PORT},fork & + host_ns_wait_for_listener "${host_ns2}" "${TEST_HOST_PORT}" + + # Start client in other namespace + ip netns exec "${host_ns2}" ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${cid}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + rc=$? + + if [[ ! $rc -eq 0 ]]; then + return $KSFT_FAIL + fi + + del_namespaces + + return $KSFT_PASS +} + +do_ns_vm_client_host_server_test() { + local testname="$1" + local host_ns="$2" + local host_ns2="$3" + local cid=${DEFAULT_CID} + + # must not be same as qemu hostfwd port + local port=12345 + + add_namespaces 1 + setup_bridge "${host_ns}" "${host_ns2}" + + if ! vm_start ${cid} "${host_ns}"; then + teardown_bridge "${host_ns}" + return $KSFT_FAIL + fi + + ip netns exec "${host_ns2}" ${VSOCK_TEST} \ + --mode=server \ + --peer-cid="${cid}" \ + --control-port="${port}" 2>&1 | log_host "${testname}" & + + host_ns_wait_for_listener "${host_ns2}" "${port}" + + ip netns exec "${host_ns2}" \ + socat TCP-LISTEN:${BRIDGE_PORT},bind=10.0.0.2,fork \ + TCP:localhost:${port} & + + host_ns_wait_for_listener "${host_ns2}" "${BRIDGE_PORT}" + + ip netns exec "${host_ns}" socat TCP-LISTEN:${port},fork \ + TCP-CONNECT:10.0.0.2:${BRIDGE_PORT} & + + host_ns_wait_for_listener "${host_ns}" "${port}" + + vm_ssh_ns "${host_ns}" \ + -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --control-port="${port}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" + + if [[ ! $? -eq 0 ]]; then + return $KSFT_FAIL + fi + + del_namespaces + + return $KSFT_PASS +} + +test_global_vm_connect_global_host() { + local testname="${FUNCNAME[0]#test_}" + local host_ns="global" + local host_ns2="global-2" + + do_ns_vm_client_host_server_test ${testname} ${host_ns} ${host_ns2} +} + +test_global_vm_connect_mixed_host() { + local testname="${FUNCNAME[0]#test_}" + local host_ns="global" + local host_ns2="mixed" + + do_ns_vm_client_host_server_test ${testname} ${host_ns} ${host_ns2} +} + run_test() { local host_oops_cnt_before local host_warn_cnt_before @@ -421,7 +846,40 @@ run_test() { rc=$KSFT_FAIL fi
- return "${rc}" + check_result "${rc}" +} + +needs_setup() { + local tname + + tname="$1" + + for testname in ${NEEDS_SETUP[@]}; do + if [[ "${tname}" == "${testname}" ]]; then + return 1 + fi + done + + return 0 +} + +check_result() { + local rc + + rc=$1 + + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${cnt_total} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${cnt_total} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${cnt_total} ${arg} # exit=$rc" + fi + + cnt_total=$(( cnt_total + 1 )) }
QEMU="qemu-system-$(uname -m)" @@ -452,29 +910,36 @@ handle_build
echo "1..${#ARGS[@]}"
-log_setup "Booting up VM" -vm_start -vm_wait_for_ssh -log_setup "VM booted up" - cnt_pass=0 cnt_fail=0 cnt_skip=0 cnt_total=0 +setup_done=0 + +pidfile="" +for arg in ${ARGS[@]}; do + if needs_setup "${arg}"; then + if [[ -z "${pidfile}" ]]; then + pidfile=$(mktemp $PIDFILE_TEMPLATE) + log_setup "Booting up VM" + vm_start "${DEFAULT_CID}" "" "${pidfile}" + vm_wait_for_ssh + log_setup "VM booted up" + fi + + run_test "${arg}" + fi +done + +if [[ ! -z "${pidfile}" ]]; then + log_setup "VM terminate" + terminate_pidfiles "${pidfile}" +fi + for arg in "${ARGS[@]}"; do - run_test "${arg}" - rc=$? - if [[ ${rc} -eq $KSFT_PASS ]]; then - cnt_pass=$(( cnt_pass + 1 )) - echo "ok ${cnt_total} ${arg}" - elif [[ ${rc} -eq $KSFT_SKIP ]]; then - cnt_skip=$(( cnt_skip + 1 )) - echo "ok ${cnt_total} ${arg} # SKIP" - elif [[ ${rc} -eq $KSFT_FAIL ]]; then - cnt_fail=$(( cnt_fail + 1 )) - echo "not ok ${cnt_total} ${arg} # exit=$rc" + if ! needs_setup "${arg}"; then + run_test "${arg}" fi - cnt_total=$(( cnt_total + 1 )) done
echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
From: Bobby Eshleman bobbyeshleman@meta.com
Add the per-net vsock NS mode state. This only adds the structure for holding the mode necessary and some of the definitions, but does not integrate the functionality yet.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- MAINTAINERS | 1 + include/net/net_namespace.h | 4 ++++ include/net/netns/vsock.h | 19 +++++++++++++++++++ 3 files changed, 24 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index 507c5ff6f620..bf9015498854 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -26149,6 +26149,7 @@ L: netdev@vger.kernel.org S: Maintained F: drivers/vhost/vsock.c F: include/linux/virtio_vsock.h +F: include/net/netns/vsock.h F: include/uapi/linux/virtio_vsock.h F: net/vmw_vsock/virtio_transport.c F: net/vmw_vsock/virtio_transport_common.c diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 025a7574b275..005c0da4fb62 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -37,6 +37,7 @@ #include <net/netns/smc.h> #include <net/netns/bpf.h> #include <net/netns/mctp.h> +#include <net/netns/vsock.h> #include <net/net_trackers.h> #include <linux/ns_common.h> #include <linux/idr.h> @@ -196,6 +197,9 @@ struct net { /* Move to a better place when the config guard is removed. */ struct mutex rtnl_mutex; #endif +#if IS_ENABLED(CONFIG_VSOCKETS) + struct netns_vsock vsock; +#endif } __randomize_layout;
#include <linux/seq_file_net.h> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h new file mode 100644 index 000000000000..ea14b46ed437 --- /dev/null +++ b/include/net/netns/vsock.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __NET_NET_NAMESPACE_VSOCK_H +#define __NET_NET_NAMESPACE_VSOCK_H + +#include <linux/types.h> + +// TODO: rename to VSOCK_NET_* ? +#define VSOCK_NS_MODE_GLOBAL 1 +#define VSOCK_NS_MODE_LOCAL (1 << 1) +#define VSOCK_NS_MODE_INVALID (~0) +/* VSOCK_NS_MODE_WRITTEN_ONCE indicates "write-once" write has occurred */ +#define VSOCK_NS_MODE_WRITTEN_ONCE (1 << 7) + +struct netns_vsock { + struct ctl_table_header *vsock_hdr; + spinlock_t lock; + u8 ns_mode; +}; +#endif /* __NET_NET_NAMESPACE_VSOCK_H */
From: Bobby Eshleman bobbyeshleman@meta.com
Add helper functions for setting/getting vsock NS modes. This commit is in preparation for adding NS support to vsock.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/net/af_vsock.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index d56e6e135158..e0b9e6732d53 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -10,6 +10,7 @@
#include <linux/kernel.h> #include <linux/workqueue.h> +#include <net/netns/vsock.h> #include <net/sock.h> #include <uapi/linux/vm_sockets.h>
@@ -256,4 +257,49 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) { return t->msgzerocopy_allow && t->msgzerocopy_allow(); } + +extern struct net __vsock_global_net; +static inline struct net *vsock_global_net(void) +{ + return &__vsock_global_net; +} + +static inline u8 vsock_net_mode(struct net *net) +{ + u8 ret; + + spin_lock_bh(&net->vsock.lock); + ret = net->vsock.ns_mode; + spin_unlock_bh(&net->vsock.lock); + return ret; +} + +static inline void vsock_net_set_mode(struct net *net, u8 mode) +{ + spin_lock_bh(&net->vsock.lock); + net->vsock.ns_mode = mode | VSOCK_NS_MODE_WRITTEN_ONCE; + spin_unlock_bh(&net->vsock.lock); +} + +/* Return true if mode has already been written once. Otherwise, return false. */ +static inline bool vsock_net_mode_can_set(struct net *net) +{ + bool ret; + + spin_lock_bh(&net->vsock.lock); + ret = !(net->vsock.ns_mode & VSOCK_NS_MODE_WRITTEN_ONCE); + spin_unlock_bh(&net->vsock.lock); + return ret; +} + +/* Return true if vsock net mode check passes. Otherwise, return false. + * + * Read more about modes in comment header of net/vmw_vsock/af_vsock.c. + */ +static inline bool vsock_net_check_mode(struct net *n1, struct net *n2) +{ + return net_eq(n1, n2) || + (vsock_net_mode(n1) & VSOCK_NS_MODE_GLOBAL && + vsock_net_mode(n2) & VSOCK_NS_MODE_GLOBAL); +} #endif /* __AF_VSOCK_H__ */
From: Bobby Eshleman bobbyeshleman@meta.com
Add a net pointer to the vsock skb and helpers for getting/setting it. This is in preparation for adding vsock NS support.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/linux/virtio_vsock.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 36fb3edfa403..93edc1e798a5 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -13,6 +13,7 @@ struct virtio_vsock_skb_cb { bool reply; bool tap_delivered; u32 offset; + struct net *net; };
#define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb)) @@ -111,6 +112,16 @@ static inline size_t virtio_vsock_skb_len(struct sk_buff *skb) return (size_t)(skb_end_pointer(skb) - skb->head); }
+static inline struct net *virtio_vsock_skb_net(struct sk_buff *skb) +{ + return VIRTIO_VSOCK_SKB_CB(skb)->net; +} + +static inline void virtio_vsock_skb_set_net(struct sk_buff *skb, struct net *net) +{ + VIRTIO_VSOCK_SKB_CB(skb)->net = net; +} + #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4) #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64)
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS functionality (initialization, passing to transports, procfs, etc...) to the vsock socket layer. Later patches that add NS support to transports will depend on this patch.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/net/af_vsock.h | 7 +- net/vmw_vsock/af_vsock.c | 203 +++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 193 insertions(+), 17 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index e0b9e6732d53..1ba1c30b625d 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -144,7 +144,7 @@ struct vsock_transport { int flags); int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg, size_t len); - bool (*seqpacket_allow)(u32 remote_cid); + bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid); u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
/* Notification. */ @@ -214,9 +214,10 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected); void vsock_insert_connected(struct vsock_sock *vsk); void vsock_remove_bound(struct vsock_sock *vsk); void vsock_remove_connected(struct vsock_sock *vsk); -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net); struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst); + struct sockaddr_vm *dst, + struct net *net); void vsock_remove_sock(struct vsock_sock *vsk); void vsock_for_each_connected_socket(struct vsock_transport *transport, void (*fn)(struct sock *sk)); diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 2e7a3034e965..bec7e7aae956 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -83,6 +83,24 @@ * TCP_ESTABLISHED - connected * TCP_CLOSING - disconnecting * TCP_LISTEN - listening + * + * - Namespaces in vsock support two different modes configured + * through /proc/net/vsock_ns_mode. The modes are "local" and "global". + * Each mode defines how the namespace interacts with CIDs. + * /proc/net/vsock_ns_mode is write-once, so that it may be configured + * by a namespace manager. The default is "global". The mode is set + * per-namespace. + * + * The modes affect the allocation and accessibility of CIDs as follows: + * - global - aka fully public + * - CID allocation draws from the public pool + * - AF_VSOCK sockets may reach any CID allocated from the public pool + * - AF_VSOCK sockets may not reach CIDs allocated from private pools + * + * - local - aka fully private + * - CID allocation draws only from the private pool, does not affect public pool + * - AF_VSOCK sockets may only reach CIDs from the private pool + * - AF_VSOCK sockets may not reach CIDs allocated from outside the pool */
#include <linux/compat.h> @@ -100,6 +118,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/net.h> +#include <linux/proc_fs.h> #include <linux/poll.h> #include <linux/random.h> #include <linux/skbuff.h> @@ -111,6 +130,7 @@ #include <linux/workqueue.h> #include <net/sock.h> #include <net/af_vsock.h> +#include <net/netns/vsock.h> #include <uapi/linux/vm_sockets.h> #include <uapi/asm-generic/ioctls.h>
@@ -149,6 +169,9 @@ static const struct vsock_transport *transport_dgram; static const struct vsock_transport *transport_local; static DEFINE_MUTEX(vsock_register_mutex);
+struct net __vsock_global_net; +EXPORT_SYMBOL_GPL(__vsock_global_net); + /**** UTILS ****/
/* Each bound VSocket is stored in the bind hash table and each connected @@ -235,33 +258,42 @@ static void __vsock_remove_connected(struct vsock_sock *vsk) sock_put(&vsk->sk); }
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr, + struct net *net) { struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) { + struct sock *sk = sk_vsock(vsk); + if (vsock_addr_equals_addr(addr, &vsk->local_addr)) - return sk_vsock(vsk); + if (vsock_net_check_mode(net, sock_net(sk))) + return sk;
if (addr->svm_port == vsk->local_addr.svm_port && (vsk->local_addr.svm_cid == VMADDR_CID_ANY || - addr->svm_cid == VMADDR_CID_ANY)) - return sk_vsock(vsk); + addr->svm_cid == VMADDR_CID_ANY) && + vsock_net_check_mode(net, sock_net(sk))) + return sk; }
return NULL; }
static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst) + struct sockaddr_vm *dst, + struct net *net) { struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_connected_sockets(src, dst), connected_table) { + struct sock *sk = sk_vsock(vsk); + if (vsock_addr_equals_addr(src, &vsk->remote_addr) && - dst->svm_port == vsk->local_addr.svm_port) { - return sk_vsock(vsk); + dst->svm_port == vsk->local_addr.svm_port && + vsock_net_check_mode(net, sock_net(sk))) { + return sk; } }
@@ -304,12 +336,12 @@ void vsock_remove_connected(struct vsock_sock *vsk) } EXPORT_SYMBOL_GPL(vsock_remove_connected);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net) { struct sock *sk;
spin_lock_bh(&vsock_table_lock); - sk = __vsock_find_bound_socket(addr); + sk = __vsock_find_bound_socket(addr, net); if (sk) sock_hold(sk);
@@ -320,12 +352,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst) + struct sockaddr_vm *dst, + struct net *net) { struct sock *sk;
spin_lock_bh(&vsock_table_lock); - sk = __vsock_find_connected_socket(src, dst); + sk = __vsock_find_connected_socket(src, dst, net); if (sk) sock_hold(sk);
@@ -513,7 +546,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
if (sk->sk_type == SOCK_SEQPACKET) { if (!new_transport->seqpacket_allow || - !new_transport->seqpacket_allow(remote_cid)) { + !new_transport->seqpacket_allow(vsk, remote_cid)) { module_put(new_transport->module); return -ESOCKTNOSUPPORT; } @@ -644,6 +677,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, { static u32 port; struct sockaddr_vm new_addr; + struct net *net = sock_net(sk_vsock(vsk));
if (!port) port = get_random_u32_above(LAST_RESERVED_PORT); @@ -660,7 +694,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
- if (!__vsock_find_bound_socket(&new_addr)) { + if (!__vsock_find_bound_socket(&new_addr, net)) { found = true; break; } @@ -677,7 +711,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return -EACCES; }
- if (__vsock_find_bound_socket(&new_addr)) + if (__vsock_find_bound_socket(&new_addr, net)) return -EADDRINUSE; }
@@ -2588,6 +2622,138 @@ static struct miscdevice vsock_device = { .fops = &vsock_device_ops, };
+#define VSOCK_NS_MODE_NAME_MAX 8 + +static struct ctl_table vsock_table[] = { + { + .procname = "vsock_ns_mode", + .data = &init_net.vsock.ns_mode, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dostring + }, +}; + +static int __net_init vsock_sysctl_register(struct net *net) +{ + struct ctl_table *table; + + if (net_eq(net, &init_net)) { + table = vsock_table; + } else { + table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL); + if (!table) + goto err_alloc; + + table[0].data = &net->vsock.ns_mode; + } + + net->vsock.vsock_hdr = register_net_sysctl_sz(net, "net/vsock", table, + ARRAY_SIZE(vsock_table)); + if (!net->vsock.vsock_hdr) + goto err_reg; + + return 0; + +err_reg: + if (!net_eq(net, &init_net)) + kfree(table); +err_alloc: + return -ENOMEM; +} + +static void vsock_sysctl_unregister(struct net *net) +{ + const struct ctl_table *table; + + table = net->vsock.vsock_hdr->ctl_table_arg; + unregister_net_sysctl_table(net->vsock.vsock_hdr); + if (!net_eq(net, &init_net)) + kfree(table); +} + +#ifdef CONFIG_PROC_FS +static int vsock_proc_ns_mode_show(struct seq_file *seq, void *v) +{ + struct net *net = seq_file_single_net(seq); + const char *p = "invalid"; + + spin_lock_bh(&net->vsock.lock); + if (net->vsock.ns_mode & VSOCK_NS_MODE_GLOBAL) + p = "global"; + else if (net->vsock.ns_mode & VSOCK_NS_MODE_LOCAL) + p = "local"; + else + WARN_ONCE(1, "invalid vsock_ns_mode"); + spin_unlock_bh(&net->vsock.lock); + seq_printf(seq, "%s", p); + return 0; +} + +static int vsock_proc_ns_mode_write(struct file *file, char *buf, size_t size) +{ + struct seq_file *m = file->private_data; + struct net *net = seq_file_single_net(m); + size_t len = size - 1; + int ret = 0; + u8 mode; + + if (!vsock_net_mode_can_set(net)) + return -EPERM; + + mode = 0; + if (!strncmp(buf, "global", len)) + mode |= VSOCK_NS_MODE_GLOBAL; + else if (!strncmp(buf, "local", len)) + mode |= VSOCK_NS_MODE_LOCAL; + else + return -EINVAL; + + vsock_net_set_mode(net, mode); + + return ret; +} +#endif /* CONFIG_PROC_FS */ + +static void vsock_net_init(struct net *net) +{ + spin_lock_init(&net->vsock.lock); + net->vsock.ns_mode = VSOCK_NS_MODE_GLOBAL; +} + +static __net_init int vsock_sysctl_init_net(struct net *net) +{ + vsock_net_init(net); + + if (vsock_sysctl_register(net)) + goto out; + +#ifdef CONFIG_PROC_FS + if (!proc_create_net_single_write("vsock_ns_mode", 0644, net->proc_net, + vsock_proc_ns_mode_show, + vsock_proc_ns_mode_write, + NULL)) + goto err_sysctl; +#endif + + return 0; + +err_sysctl: + vsock_sysctl_unregister(net); +out: + return -ENOMEM; +} + +static __net_exit void vsock_sysctl_exit_net(struct net *net) +{ + vsock_sysctl_unregister(net); +} + +static struct pernet_operations vsock_sysctl_ops __net_initdata = { + .init = vsock_sysctl_init_net, + .exit = vsock_sysctl_exit_net, +}; + static int __init vsock_init(void) { int err = 0; @@ -2615,10 +2781,19 @@ static int __init vsock_init(void) goto err_unregister_proto; }
+ if (register_pernet_subsys(&vsock_sysctl_ops)) { + err = -ENOMEM; + goto err_unregister_sock; + } + + vsock_net_init(&init_net); + vsock_net_init(vsock_global_net()); vsock_bpf_build_proto();
return 0;
+err_unregister_sock: + sock_unregister(AF_VSOCK); err_unregister_proto: proto_unregister(&vsock_proto); err_deregister_misc:
From: Bobby Eshleman bobbyeshleman@meta.com
Add support to the virtio common code for passing around net namespace pointers (tx and rx). The series still requires non-common transport support to be added by future patches.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport_common.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 93edc1e798a5..81355f84b76c 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -160,6 +160,7 @@ struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; struct msghdr *msg; + struct net *net; u32 pkt_len; u16 type; u16 op; diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 1b5d9896edae..310f2e92c527 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -313,6 +313,8 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info * info->flags, zcopy);
+ virtio_vsock_skb_set_net(skb, info->net); + return skb; out: kfree_skb(skb); @@ -524,6 +526,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1064,6 +1067,7 @@ int virtio_transport_connect(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_REQUEST, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1079,6 +1083,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) (mode & SEND_SHUTDOWN ? VIRTIO_VSOCK_SHUTDOWN_SEND : 0), .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1105,6 +1110,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, .msg = msg, .pkt_len = len, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1142,6 +1148,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk, .op = VIRTIO_VSOCK_OP_RST, .reply = !!skb, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
/* Send RST only if the original pkt is not a RST pkt */ @@ -1162,6 +1169,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(hdr->type), .reply = true, + .net = virtio_vsock_skb_net(skb), }; struct sk_buff *reply;
@@ -1462,6 +1470,7 @@ virtio_transport_send_response(struct vsock_sock *vsk, .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1576,6 +1585,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, struct sk_buff *skb) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); + struct net *net = virtio_vsock_skb_net(skb); struct sockaddr_vm src, dst; struct vsock_sock *vsk; struct sock *sk; @@ -1603,9 +1613,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, /* The socket must be in connected or bound table * otherwise send reset back */ - sk = vsock_find_connected_socket(&src, &dst); + sk = vsock_find_connected_socket(&src, &dst, net); if (!sk) { - sk = vsock_find_bound_socket(&dst); + sk = vsock_find_bound_socket(&dst, net); if (!sk) { (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt;
From: Bobby Eshleman bobbyeshleman@meta.com
Add the ability to isolate vsock flows using namespaces.
The namespace for a VM is inherited from the PID that opened the vhost-vsock device.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- drivers/vhost/vsock.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 802153e23073..863419533a3f 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -46,6 +46,8 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8); struct vhost_vsock { struct vhost_dev dev; struct vhost_virtqueue vqs[2]; + struct net *net; + netns_tracker ns_tracker;
/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */ struct hlist_node hash; @@ -59,6 +61,22 @@ struct vhost_vsock { bool seqpacket_allow; };
+static void vhost_vsock_net_set(struct vhost_vsock *vsock, struct net *net) +{ + if (net_eq(net, vsock_global_net())) + vsock->net = vsock_global_net(); + else + vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL); +} + +static void vhost_vsock_net_put(struct vhost_vsock *vsock) +{ + if (net_eq(vsock->net, vsock_global_net())) + return; + + put_net_track(vsock->net, &vsock->ns_tracker); +} + static u32 vhost_transport_get_local_cid(void) { return VHOST_VSOCK_DEFAULT_HOST_CID; @@ -67,7 +85,7 @@ static u32 vhost_transport_get_local_cid(void) /* Callers that dereference the return value must hold vhost_vsock_mutex or the * RCU read lock. */ -static struct vhost_vsock *vhost_vsock_get(u32 guest_cid) +static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net) { struct vhost_vsock *vsock;
@@ -78,9 +96,8 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid) if (other_cid == 0) continue;
- if (other_cid == guest_cid) + if (other_cid == guest_cid && vsock_net_check_mode(net, vsock->net)) return vsock; - }
return NULL; @@ -272,13 +289,14 @@ static int vhost_transport_send_pkt(struct sk_buff *skb) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); + struct net *net = virtio_vsock_skb_net(skb); struct vhost_vsock *vsock; int len = skb->len;
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid)); + vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net); if (!vsock) { rcu_read_unlock(); kfree_skb(skb); @@ -305,7 +323,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) rcu_read_lock();
/* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk_vsock(vsk))); if (!vsock) goto out;
@@ -403,7 +421,7 @@ static bool vhost_transport_msgzerocopy_allow(void) return true; }
-static bool vhost_transport_seqpacket_allow(u32 remote_cid); +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport vhost_transport = { .transport = { @@ -459,13 +477,14 @@ static struct virtio_transport vhost_transport = { .send_pkt = vhost_transport_send_pkt, };
-static bool vhost_transport_seqpacket_allow(u32 remote_cid) +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { + struct net *net = sock_net(sk_vsock(vsk)); struct vhost_vsock *vsock; bool seqpacket_allow = false;
rcu_read_lock(); - vsock = vhost_vsock_get(remote_cid); + vsock = vhost_vsock_get(remote_cid, net);
if (vsock) seqpacket_allow = vsock->seqpacket_allow; @@ -525,6 +544,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) continue; }
+ virtio_vsock_skb_set_net(skb, vsock->net); total_len += sizeof(*hdr) + skb->len;
/* Deliver to monitoring devices all received packets */ @@ -651,10 +671,16 @@ static void vhost_vsock_free(struct vhost_vsock *vsock)
static int vhost_vsock_dev_open(struct inode *inode, struct file *file) { + struct vhost_virtqueue **vqs; struct vhost_vsock *vsock; + struct net *net; int ret;
+ net = get_net_ns_by_pid(current->pid); + if (IS_ERR(net)) + return PTR_ERR(net); + /* This struct is large and allocation could fail, fall back to vmalloc * if there is no other way. */ @@ -668,6 +694,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) goto out; }
+ vhost_vsock_net_set(vsock, net); vsock->guest_cid = 0; /* no CID assigned yet */ vsock->seqpacket_allow = false;
@@ -707,7 +734,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk) */
/* If the peer is still valid, no need to reset connection */ - if (vhost_vsock_get(vsk->remote_addr.svm_cid)) + if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) return;
/* If the close timeout is pending, let it expire. This avoids races @@ -752,6 +779,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file) virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
vhost_dev_cleanup(&vsock->dev); + vhost_vsock_net_put(vsock); kfree(vsock->dev.vqs); vhost_vsock_free(vsock); return 0; @@ -778,7 +806,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */ mutex_lock(&vhost_vsock_mutex); - other = vhost_vsock_get(guest_cid); + other = vhost_vsock_get(guest_cid, vsock->net); if (other && other != vsock) { mutex_unlock(&vhost_vsock_mutex); return -EADDRINUSE;
From: Bobby Eshleman bobbyeshleman@meta.com
This changes virtio to not be broken by new internal API changes required for NS support on the host side. virtio-vsock namespaces are always global mode, so behavior is unchanged for them.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/virtio_transport.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f0e48e6911fc..25c1bca7b136 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -536,7 +536,7 @@ static bool virtio_transport_msgzerocopy_allow(void) return true; }
-static bool virtio_transport_seqpacket_allow(u32 remote_cid); +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport virtio_transport = { .transport = { @@ -593,7 +593,7 @@ static struct virtio_transport virtio_transport = { .can_msgzerocopy = virtio_transport_can_msgzerocopy, };
-static bool virtio_transport_seqpacket_allow(u32 remote_cid) +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct virtio_vsock *vsock; bool seqpacket_allow; @@ -649,6 +649,7 @@ static void virtio_transport_rx_work(struct work_struct *work) }
virtio_vsock_skb_rx_put(skb); + virtio_vsock_skb_set_net(skb, vsock_global_net()); virtio_transport_deliver_tap_pkt(skb); virtio_transport_recv_pkt(&virtio_transport, skb); }
From: Bobby Eshleman bobbyeshleman@meta.com
Make NS changes not break hyperv. Guest vsocks still remain in the global namespace always, so the behavior is unchanged.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/hyperv_transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 31342ab502b4..85b22366ef00 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -313,7 +313,7 @@ static void hvs_open_connection(struct vmbus_channel *chan) return;
hvs_addr_init(&addr, conn_from_host ? if_type : if_instance); - sk = vsock_find_bound_socket(&addr); + sk = vsock_find_bound_socket(&addr, vsock_global_net()); if (!sk) return;
From: Bobby Eshleman bobbyeshleman@meta.com
Add hooks for new internal NS calls to avoid breaking vmci. Guest vsocks remain in global mode namespaces, so behavior is unchanged.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/vmci_transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index b370070194fa..8f374f84a526 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -703,9 +703,9 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg) vsock_addr_init(&src, pkt->dg.src.context, pkt->src_port); vsock_addr_init(&dst, pkt->dg.dst.context, pkt->dst_port);
- sk = vsock_find_connected_socket(&src, &dst); + sk = vsock_find_connected_socket(&src, &dst, vsock_global_net()); if (!sk) { - sk = vsock_find_bound_socket(&dst); + sk = vsock_find_bound_socket(&dst, vsock_global_net()); if (!sk) { /* We could not find a socket for this specified * address. If this packet is a RST, we just drop it.
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. In theory, loopback can be viewed as a given CID, and so should collide with other vsocks when the namespaces are in global mode, but should not collide if the namespace is in local mode. This has not been tested yet, but will be by the next rev.
TODO: add tests for this
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/vsock_loopback.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 6e78927a598e..1b2fab73e0d0 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) return 0; }
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid); +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid); static bool vsock_loopback_msgzerocopy_allow(void) { return true; @@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = { .send_pkt = vsock_loopback_send_pkt, };
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid) +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { return true; }
CCing Daniel who commented v2.
On Mon, Jun 16, 2025 at 09:32:49PM -0700, Bobby Eshleman wrote:
This series adds namespace support to vhost-vsock. It does not add namespaces to any of the guest transports (virtio-vsock, hyperv, or vmci).
The current revision only supports two modes: local or global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior).
If it is deemed necessary to add mixed mode up front, it is doable but at the cost of more complexity than local and global modes. Mixed will require adding the notion of allocation to the socket lookup functions (like vhost_vsock_get()) and also more logic will be necessary for controlling or using lookups differently based on mixed-to-global or global-to-mixed scenarios.
The current implementation takes into consideration the future need for mixed mode and makes sure it is possible by making vsock_ns_mode per-namespace, as for mixed mode we need at least one "global" namespace and one "mixed" namespace for it to work. Is it feasible to support local and global modes only initially?
I've demoted this series to RFC, as I haven't been able to re-run the tests after rebasing onto the upstreamed vmtest.sh, some of the code is still pretty messy, there are still some TODOs, stale comments, and other work to do. I thought reviewers might want to see the current state even though unfinished, since I'll be OoO until the second week of July and that just feels like a long time of silence given we've already all done work on this together.
Thanks again for everyone's help and reviews!
Signed-off-by: Bobby Eshleman bobbyeshleman@gmail.com
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Thanks for this! FYI I'll be off for the next days, I hope to comment next week.
Thanks, Stefano
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
Bobby Eshleman (11): selftests/vsock: add NS tests to vmtest.sh vsock: a per-net vsock NS mode state vsock: add vsock net ns helpers vsock: add net to vsock skb cb vsock: add common code for vsock NS support virtio-vsock: add netns to common code vhost/vsock: add netns support vsock/virtio: add netns hooks hv_sock: add netns hooks vsock/vmci: add netns hooks vsock/loopback: add netns support
MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 ++- include/linux/virtio_vsock.h | 12 + include/net/af_vsock.h | 53 ++- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 19 ++ net/vmw_vsock/af_vsock.c | 203 +++++++++++- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport.c | 5 +- net/vmw_vsock/virtio_transport_common.c | 14 +- net/vmw_vsock/vmci_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 4 +- tools/testing/selftests/vsock/vmtest.sh | 555 +++++++++++++++++++++++++++++--- 13 files changed, 843 insertions(+), 81 deletions(-)
base-commit: 8909f5f4ecd551c2299b28e05254b77424c8c7dc change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
Bobby Eshleman bobbyeshleman@meta.com
linux-kselftest-mirror@lists.linaro.org