On Tue, Jan 14, 2025 at 05:31:08PM +0100, Michal Luczaj wrote:
On 1/14/25 11:16, Stefano Garzarella wrote:
On Tue, Jan 14, 2025 at 01:09:24AM +0100, Michal Luczaj wrote:
On 1/13/25 16:01, Stefano Garzarella wrote:
On Mon, Jan 13, 2025 at 02:51:58PM +0100, Michal Luczaj wrote:
On 1/13/25 12:05, Stefano Garzarella wrote:
... An alternative approach, which would perhaps allow us to avoid all this, is to re-insert the socket in the unbound list after calling release() when we deassign the transport.
WDYT?
If we can't keep the old state (sk_state, transport, etc) on failed re-connect() then reverting back to initial state sounds, uhh, like an option :) I'm not sure how well this aligns with (user's expectations of) good ol' socket API, but maybe that train has already left.
We really want to behave as similar as possible with the other sockets, like AF_INET, so I would try to continue toward that train.
I was worried that such connect()/transport error handling may have some user visible side effects, but I guess I was wrong. I mean you can still reach a sk_state=TCP_LISTEN with a transport assigned[1], but perhaps that's a different issue.
I've tried your suggestion on top of this series. Passes the tests.
Great, thanks!
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index fa9d1b49599b..4718fe86689d 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -492,6 +492,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) vsk->transport->release(vsk); vsock_deassign_transport(vsk);
vsock_addr_unbind(&vsk->local_addr);
vsock_addr_unbind(&vsk->remote_addr);
My only doubt is that if a user did a specific bind() before the connect, this way we're resetting everything, is that right?
That is right.
But we aren't changing much. Transport release already removes vsk from vsock_bound_sockets. So even though vsk->local_addr is untouched (i.e. vsock_addr_bound() returns `true`), vsk can't be picked by vsock_find_bound_socket(). User can't bind() it again, either.
Okay, I see, so maybe for now makes sense to merge your patch, to fix the UAF fist.
And when patched as above: bind() works as "expected", but socket is pretty much useless, anyway. If I'm correct, the first failing connect() trips virtio_transport_recv_connecting(), which sets `sk->sk_err`. I don't see it being reset. Does the vsock suppose to keep sk_err state once set?
Nope, I think this is another thing to fix.
Currently only AF_VSOCK throws ConnectionResetError:
from socket import * def test(family, addr): s = socket(family, SOCK_STREAM) assert s.connect_ex(addr) != 0 lis = socket(family, SOCK_STREAM) lis.bind(addr) lis.listen() s.connect(addr) p, _ = lis.accept() p.send(b'x') assert s.recv(1) == b'x' test(AF_INET, ('127.0.0.1', 2000)) test(AF_UNIX, '\0/tmp/foo') test(AF_VSOCK, (1, 2000)) # VMADDR_CID_LOCAL
Maybe we need to look better at the release, and prevent it from removing the socket from the lists as you suggested, maybe adding a function in af_vsock.c that all transports can call.
I'd be happy to submit a proper patch, but it would be helpful to decide how close to AF_INET/AF_UNIX's behaviour is close enough. Or would you rather have that UAF plugged first?
I'd say, let's fix the UAF first, then fix the behaviour (also in a single series, but I prefer 2 separate patches if possible). About that, AF_VSOCK was started with the goal of following AF_INET as closely as possible, and the test suite should serve that as well, so if we can solve this problem and get closer to AF_INET, possibly even adding a dedicated test, that would be ideal!
Thank you very much for the help! Stefano