This series adds namespace support to vhost-vsock. It does not add
namespaces to any of the guest transports (virtio-vsock, hyperv, or
vmci).
The current revision only supports two modes: local or global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).
Future may include supporting a mixed mode, which I expect to be more
complicated because socket lookups will have to include new logic and
API changes to behave differently based on if the lookup is part of a
mixed mode CID allocation, a global CID allocation, a mixed-to-global
connection (allowed), or a global-to-mixed connection (not allowed).
Modes are per-netns and write-once. This allows a system to configure
namespaces independently (some may share CIDs, others are completely
isolated). This also supports future mixed use cases, where there may be
namespaces in global mode spinning up VMs while there are
mixed mode namespaces that provide services to the VMs, but are not
allowed to allocate from the global CID pool.
Thanks again for everyone's help and reviews!
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
To: Stefano Garzarella <sgarzare(a)redhat.com>
To: Shuah Khan <shuah(a)kernel.org>
To: David S. Miller <davem(a)davemloft.net>
To: Eric Dumazet <edumazet(a)google.com>
To: Jakub Kicinski <kuba(a)kernel.org>
To: Paolo Abeni <pabeni(a)redhat.com>
To: Simon Horman <horms(a)kernel.org>
To: Stefan Hajnoczi <stefanha(a)redhat.com>
To: Michael S. Tsirkin <mst(a)redhat.com>
To: Jason Wang <jasowang(a)redhat.com>
To: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
To: Eugenio Pérez <eperezma(a)redhat.com>
To: K. Y. Srinivasan <kys(a)microsoft.com>
To: Haiyang Zhang <haiyangz(a)microsoft.com>
To: Wei Liu <wei.liu(a)kernel.org>
To: Dexuan Cui <decui(a)microsoft.com>
To: Bryan Tan <bryan-bt.tan(a)broadcom.com>
To: Vishnu Dasa <vishnu.dasa(a)broadcom.com>
To: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: virtualization(a)lists.linux.dev
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: linux-hyperv(a)vger.kernel.org
Cc: berrange(a)redhat.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
---
Bobby Eshleman (12):
vsock: a per-net vsock NS mode state
vsock: add net to vsock skb cb
vsock: add netns to af_vsock core
vsock/virtio: add netns to virtio transport common
vhost/vsock: add netns support
vsock/virtio: use the global netns
hv_sock: add netns hooks
vsock/vmci: add netns hooks
vsock/loopback: add netns support
selftests/vsock: improve logging in vmtest.sh
selftests/vsock: invoke vsock_test through helpers
selftests/vsock: add namespace tests
MAINTAINERS | 1 +
drivers/vhost/vsock.c | 48 +-
include/linux/virtio_vsock.h | 12 +
include/net/af_vsock.h | 59 +-
include/net/net_namespace.h | 4 +
include/net/netns/vsock.h | 21 +
net/vmw_vsock/af_vsock.c | 204 +++++-
net/vmw_vsock/hyperv_transport.c | 2 +-
net/vmw_vsock/virtio_transport.c | 5 +-
net/vmw_vsock/virtio_transport_common.c | 14 +-
net/vmw_vsock/vmci_transport.c | 4 +-
net/vmw_vsock/vsock_loopback.c | 59 +-
tools/testing/selftests/vsock/vmtest.sh | 1088 ++++++++++++++++++++++++++-----
13 files changed, 1330 insertions(+), 191 deletions(-)
---
base-commit: dd500e4aecf25e48e874ca7628697969df679493
change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)meta.com>
Signed-off-by: Devaansh Kumar <devaanshk840(a)gmail.com>
---
tools/testing/selftests/futex/include/futextest.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index ddbcfc9b7..c77352b97 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -134,7 +134,9 @@ futex_unlock_pi(futex_t *uaddr, int opflags)
}
/**
- * futex_wake_op() - FIXME: COME UP WITH A GOOD ONE LINE DESCRIPTION
+ * futex_wake_op() - atomically modify uaddr2
+ * @nr_wake: wake up to this many tasks on uaddr
+ * @nr_wake2: wake up to this many tasks on uaddr2
*/
static inline int
futex_wake_op(futex_t *uaddr, futex_t *uaddr2, int nr_wake, int nr_wake2,
--
2.49.0
The get_next_frame() function in psock_tpacket.c was missing a return
statement in its default switch case, leading to a compiler warning.
This was caused by a `bug_on(1)` call, which is defined as an
`assert()`, being compiled out because NDEBUG is defined during the
build.
Instead of adding a `return NULL;` which would silently hide the error
and could lead to crashes later, this change restores the original
author's intent. By adding `#undef NDEBUG` before including <assert.h>,
we ensure the assertion is active and will cause the test to abort if
this unreachable code is ever executed.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/net/psock_tpacket.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 0dd909e325d9..a54f2eb754ce 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -38,6 +38,7 @@
#include <arpa/inet.h>
#include <stdint.h>
#include <string.h>
+#undef NDEBUG
#include <assert.h>
#include <net/if.h>
#include <inttypes.h>
--
2.50.1.703.g449372360f-goog
With /proc/pid/maps now being read under per-vma lock protection we can
reuse parts of that code to execute PROCMAP_QUERY ioctl also without
taking mmap_lock. The change is designed to reduce mmap_lock contention
and prevent PROCMAP_QUERY ioctl calls from blocking address space updates.
This patchset was split out of the original patchset [1] that introduced
per-vma lock usage for /proc/pid/maps reading. It contains PROCMAP_QUERY
tests, code refactoring patch to simplify the main change and the actual
transition to per-vma lock.
Changes since v2 [2]
- Added Reviewed-by, per Vlastimil Babka
- Fixed query_vma_find_by_addr() to handle lock_ctx->mmap_locked case,
per Vlastimil Babka
[1] https://lore.kernel.org/all/20250704060727.724817-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250804231552.1217132-1-surenb@google.com/
Suren Baghdasaryan (3):
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
fs/proc/task_mmu: factor out proc_maps_private fields used by
PROCMAP_QUERY
fs/proc/task_mmu: execute PROCMAP_QUERY ioctl under per-vma locks
fs/proc/internal.h | 15 +-
fs/proc/task_mmu.c | 152 ++++++++++++------
fs/proc/task_nommu.c | 14 +-
tools/testing/selftests/proc/proc-maps-race.c | 65 ++++++++
4 files changed, 184 insertions(+), 62 deletions(-)
base-commit: 8e7e0c6d09502e44aa7a8fce0821e042a6ec03d1
--
2.50.1.565.gc32cd1483b-goog
Avoid pointer type value compared with 0 to make code clear.
./tools/testing/selftests/bpf/progs/mem_rdonly_untrusted.c:221:10-11: WARNING comparing pointer to 0.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=23403
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
tools/testing/selftests/bpf/progs/mem_rdonly_untrusted.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/mem_rdonly_untrusted.c b/tools/testing/selftests/bpf/progs/mem_rdonly_untrusted.c
index 4f94c971ae86..6b725725b2bf 100644
--- a/tools/testing/selftests/bpf/progs/mem_rdonly_untrusted.c
+++ b/tools/testing/selftests/bpf/progs/mem_rdonly_untrusted.c
@@ -218,7 +218,7 @@ int null_check(void *ctx)
int *p;
p = bpf_rdonly_cast(0, 0);
- if (p == 0)
+ if (!p)
/* make this a function call to avoid compiler
* moving r0 assignment before check.
*/
--
2.43.5