Thanks so much Stefano and to Moritz Sanft https://github.com/containerd/ttrpc-rust/pull/280
We've rebuilt and everything is again working as expected - all resolved.
Thanks again everyone -Simon
On Wed, Jan 22, 2025 at 6:12 AM Stefano Garzarella sgarzare@redhat.com wrote:
On Wed, 22 Jan 2025 at 10:23, Stefano Garzarella sgarzare@redhat.com wrote:
CCing Ruoqing He
On Wed, 22 Jan 2025 at 04:48, Simon Kaegi simon.kaegi@gmail.com wrote:
Thanks Stefano,
The feedback about vsock expectations was exactly what I was hoping you could provide.
You're welcome ;-)
In the Kata agent we're not directly setting SO_REUSEPORT as a socket option so I think what you suggest where SO_REUSEORT is being set indiscriminately is happening a layer down perhaps in the tokio or nix crates we use. I unfortunately do not have an easy way to reproduce the problem without setting up kata containers and what's more you need to then rebuild a recent kata flavoured minimal kernel to see the issue.
I talked with Ruoqing He yesterday about this issue since he knows Kata better than me :-)
He pointed out that Kata is using ttrpc-rust and he shared with me this code: https://github.com/containerd/ttrpc-rust/blob/0610015a92c340c6d88f81c0d6f9f4...
The change (setting SO_REUSEPORT) was introduced more than 4 years ago, but I honestly don't think it solved the problem mentioned in the commit: https://github.com/containerd/ttrpc-rust/commit/9ac87828ee870ecf5fb5feaa45cc... So far it didn't give any problems because it was allowed on every socket, but effectively it was a NOP for AF_VSOCK.
IIUC that code, it supports 2 address families: AF_VSOCK and AF_UNIX. For AF_VSOCK we've made it clear that SO_REUSEPORT is useless, but for AF_UNIX it's even more useless since there's no concept of a port, so in my opinion `setsockopt(fd, sockopt::ReusePort, &true)?;` can be removed completely. Or at least not fail the entire function if it's unsupported, whereas now it fails and the next bind is not done.
I don't know where this code is called, but removing that line is likely to make everything work correctly.
It looks like they already released a new version of ttrpc to fix it: https://github.com/containerd/ttrpc-rust/pull/281
And Kata is updating its dependency: https://github.com/kata-containers/kata-containers/pull/10775
I hope it will fix your issue!
Stefano