On 11/06, Willem de Bruijn wrote:
I think my other issue with MSG_SOCK_DEVMEM being on recvmsg is that it somehow implies that I have an option of passing or not passing it for an individual system call. If we know that we're going to use dmabuf with the socket, maybe we should move this flag to the socket() syscall?
fd = socket(AF_INET6, SOCK_STREAM, SOCK_DEVMEM);
?
I think it should then be a setsockopt called before any data is exchanged, with no change of modifying mode later. We generally use setsockopts for the mode of a socket. This use of the protocol field in socket() for setting a mode would be novel. Also, it might miss passively opened connections, or be overly restrictive: one approach for all accepted child sockets.
I was thinking this is similar to SOCK_CLOEXEC or SOCK_NONBLOCK? There are plenty of bits we can grab. But setsockopt works as well!
To follow up: if we have this flag on a socket, not on a per-message basis, can we also use recvmsg for the recycling part maybe?
while (true) { memset(msg, 0, ...);
/* receive the tokens */ ret = recvmsg(fd, &msg, 0); /* recycle the tokens from the above recvmsg() */ ret = recvmsg(fd, &msg, MSG_RECYCLE);
}
recvmsg + MSG_RECYCLE can parse the same format that regular recvmsg exports (SO_DEVMEM_OFFSET) and we can also add extra cmsg option to recycle a range.
Will this be more straightforward than a setsockopt(SO_DEVMEM_DONTNEED)? Or is it more confusing?
It would have to be sendmsg, as recvmsg is a copy_to_user operation.
I am not aware of any precedent in multiplexing the data stream and a control operation stream in this manner. It would also require adding a branch in the sendmsg hot path.
Is it too much plumbing to copy_from_user msg_control deep in recvmsg stack where we need it? Mixing in sendmsg is indeed ugly :-(
Regarding hot patch: aren't we already doing copy_to_user for the tokens in this hot path, so having one extra condition shouldn't hurt too much?
The memory is associated with the socket, freed when the socket is closed as well as on SO_DEVMEM_DONTNEED. Fundamentally it is a socket state operation, for which setsockopt is the socket interface.
Is your request purely a dislike, or is there some technical concern with BPF and setsockopt?
It's mostly because I've been bitten too much by custom socket options that are not really on/off/update-value operations:
29ebbba7d461 - bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen 00e74ae08638 - bpf: Don't EFAULT for getsockopt with optval=NULL 9cacf81f8161 - bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVE d8fe449a9c51 - bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
I do agree that this particular case of SO_DEVMEM_DONTNEED seems ok, but things tend to evolve and change.