On Thu, Nov 06, 2025 at 05:18:00PM +0100, Stefano Garzarella wrote:
On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add netns logic to vsock core. Additionally, modify transport hook prototypes to be used by later transport-specific patches (e.g., *_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions (e.g., vsock_find_connected_socket()) to take into account the socket namespace and the namespace mode before considering a candidate socket a "match".
Introduce a dummy namespace struct, __vsock_global_dummy_net, to be used by transports that do not support namespacing. This dummy always has mode "global" to preserve previous CID behavior.
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that accepts the "global" or "local" mode strings.
The transports (besides vhost) are modified to use the global dummy, which makes them behave as if always in the global namespace. Vhost is an exception because it inherits its namespace from the process that opens the vhost device.
Add netns functionality (initialization, passing to transports, procfs, etc...) to the af_vsock socket layer. Later patches that add netns support to transports depend on this patch.
seqpacket_allow() callbacks are modified to take a vsk so that transport implementations can inspect sock_net(sk) and vsk->net_mode when performing lookups (e.g., vhost does this in its future netns patch). Because the API change affects all transports, it seemed more appropriate to make this internal API change in the "vsock core" patch then in the "vhost" patch.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v7:
- hv_sock: fix hyperv build error
- explain why vhost does not use the dummy
- explain usage of __vsock_global_dummy_net
- explain why VSOCK_NET_MODE_STR_MAX is 8 characters
- use switch-case in vsock_net_mode_string()
- avoid changing transports as much as possible
- add vsock_find_{bound,connected}_socket_net()
- rename `vsock_hdr` to `sysctl_hdr`
- add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
global mode for virtio-vsock, move skb->cb zero-ing into wrapper
- explain seqpacket_allow() change
- move net setting to __vsock_create() instead of vsock_create() so
that child sockets also have their net assigned upon accept()
Changes in v6:
- unregister sysctl ops in vsock_exit()
- af_vsock: clarify description of CID behavior
- af_vsock: fix buf vs buffer naming, and length checking
- af_vsock: fix length checking w/ correct ctl_table->maxlen
Changes in v5:
- vsock_global_net() -> vsock_global_dummy_net()
- update comments for new uAPI
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- add prototype changes so patch remains compilable
drivers/vhost/vsock.c | 4 +- include/linux/virtio_vsock.h | 21 ++++ include/net/af_vsock.h | 14 ++- net/vmw_vsock/af_vsock.c | 264 ++++++++++++++++++++++++++++++++++++--- net/vmw_vsock/virtio_transport.c | 7 +- net/vmw_vsock/vsock_loopback.c | 4 +- 6 files changed, 288 insertions(+), 26 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index ae01457ea2cd..34adf0cf9124 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void) return true; }
-static bool vhost_transport_seqpacket_allow(u32 remote_cid); +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport vhost_transport = { .transport = { @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = { .send_pkt = vhost_transport_send_pkt, };
-static bool vhost_transport_seqpacket_allow(u32 remote_cid) +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct vhost_vsock *vsock; bool seqpacket_allow = false; diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 7f334a32133c..29290395054c 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb, VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode; }
+static inline struct sk_buff * +virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask) +{
- struct sk_buff *skb;
- skb = virtio_vsock_alloc_linear_skb(size, mask);
- if (!skb)
return NULL;- memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
- /* virtio-vsock does not yet support namespaces, so on receive
* we force legacy namespace behavior using the global dummy net* and global net mode.*/- virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
- virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
- return skb;
+}
Why we are introducing this change in this patch?
Where the net of the virtio's skb is read?
Oh good point, this is a weird place for this. I'll move this to where it is actually used.
[...]
+static int vsock_net_mode_string(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)+{
- char data[VSOCK_NET_MODE_STR_MAX] = {0};
- enum vsock_net_mode mode;
- struct ctl_table tmp;
- struct net *net;
- int ret;
- if (!table->data || !table->maxlen || !*lenp) {
*lenp = 0;return 0;- }
- net = current->nsproxy->net_ns;
- tmp = *table;
- tmp.data = data;
- if (!write) {
const char *p;mode = vsock_net_mode(net);switch (mode) {case VSOCK_NET_MODE_GLOBAL:p = VSOCK_NET_MODE_STR_GLOBAL;break;case VSOCK_NET_MODE_LOCAL:p = VSOCK_NET_MODE_STR_LOCAL;break;default:WARN_ONCE(true, "netns has invalid vsock mode");*lenp = 0;return 0;}strscpy(data, p, sizeof(data));tmp.maxlen = strlen(p);- }
- ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
- if (ret)
return ret;- if (write) {
Do we need to check some capability, e.g. CAP_NET_ADMIN ?
We get that for free via the sysctl_net registration, through this path on open (CAP_NET_ADMIN is checked in net_ctl_permissions):
net_ctl_permissions+1 sysctl_perm+24 proc_sys_permission+117 inode_permission+217 link_path_walk+162 path_openat+152 do_filp_open+171 do_sys_openat2+98 __x64_sys_openat+69 do_syscall_64+93
Verified with:
cp /bin/echo /tmp/echo_netadmin setcap cap_net_admin+ep /tmp/echo_netadmin
(non-root user fails with regular echo, succeeds with /tmp/echo_netadmin)
Best regards, Bobby