Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

20 Mar 2025


      On 3/20/25 21:54, Cong Wang wrote:
...
On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote:
...
On 3/19/25 23:18, Cong Wang wrote:
...
On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
...
Signal delivery during connect() may lead to a disconnect of an already
established socket. That involves removing socket from any sockmap and
resetting state to SS_UNCONNECTED. While it correctly restores socket's
proto, a call to vsock_bpf_recvmsg() might have been already under way in
another thread. If the connect()ing thread reassigns the vsock transport to
NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
*THREAD 1*                      *THREAD 2*
...
...
connect
  / state = SS_CONNECTED /
                                sock_map_update_elem
                                vsock_bpf_recvmsg
                                  psock = sk_psock_get()
  lock sk
  if signal_pending
    unhash
      sock_map_remove_links
So vsock's ->recvmsg() should be restored after this, right? Then how is
vsock_bpf_recvmsg() called afterward?
I'm not sure I understand the question, so I've added a header above: those
are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called
afterwards. It was called before sock_map_remove_links(). Note that at the
time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still
executing (in T2).
I thought the above vsock_bpf_recvmsg() on the right side completed
before sock_map_remove_links(), sorry for the confusion.
No problem, I see why you've might. Perhaps deeper indentation would make
things clearer.
...
...
...
...
state = SS_UNCONNECTED

release sk
connect
  transport = NULL
                                  lock sk
                                  WARN_ON_ONCE(!vsk->transport)
And I am wondering why we need to WARN here since we can handle this error
case correctly?
The WARN and transport check are here for defensive measures, and to state
a contract.
But I think I get your point. If we accept for a fact of life that BPF code
should be able to handle transport disappearing - then WARN can be removed
(while keeping the check) and this patch can be dropped.
I am thinking whether we have more elegant way to handle this case,
WARN looks not pretty.
Since the case should never happen, I like to think of WARN as a deliberate
eyesore :)
...
...
My aim, instead, was to keep things consistent. By which I mean sticking to
the conditions expressed in vsock_bpf_update_proto() as invariants; so that
vsock with a psock is guaranteed to have transport assigned.
Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg()
because for example UDP is (almost) lockless, so enforcing the sock lock
for all vsock types looks not flexible and may hurt performance.
Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks
for different struct proto (as we did for TCP/UDP)?
By UDP you mean vsock SOCK_DGRAM? No need to worry. VMCI is the only
transport that features VSOCK_TRANSPORT_F_DGRAM, but it does not
implemented read_skb() callback, making it unsupported by BPF/sockmap.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment