On Mon, Nov 6, 2023 at 3:37 PM David Ahern dsahern@kernel.org wrote:
On 11/6/23 3:18 PM, Mina Almasry wrote:
> @@ -991,7 +993,7 @@ struct sk_buff { > #if IS_ENABLED(CONFIG_IP_SCTP) > __u8 csum_not_inet:1; > #endif > - > + __u8 devmem:1; > #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) > __u16 tc_index; /* traffic control index */ > #endif > @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) > __skb_zcopy_downgrade_managed(skb); > } > > +/* Return true if frags in this skb are not readable by the host. */ > +static inline bool skb_frags_not_readable(const struct sk_buff *skb) > +{ > + return skb->devmem;
bikeshedding: should we also rename 'devmem' sk_buff flag to 'not_readable'? It better communicates the fact that the stack shouldn't dereference the frags (because it has 'devmem' fragments or for some other potential future reason).
+1.
Also, the flag on the skb is an optimization - a high level signal that one or more frags is in unreadable memory. There is no requirement that all of the frags are in the same memory type.
David: maybe there should be such a requirement (that they all are unreadable)? Might be easier to support initially; we can relax later on.
Currently devmem == not_readable, and the restriction is that all the frags in the same skb must be either all readable or all unreadable (all devmem or all non-devmem).
What requires that restriction? In all of the uses of skb->devmem and skb_frags_not_readable() what matters is if any frag is not readable, then frag list walk or collapse is avoided.
Currently only tcp_recvmsg_devmem(), I think. tcp_recvmsg_locked() delegates to tcp_recvmsg_devmem() if skb->devmem, and tcp_recvmsg_devmem() net_err's if it finds a non-iov frag in the skb. This is done for some simplicity, because iov's are given to the user via cmsg, but pages are copied into the linear buffer. I think it would be confusing for the user if we simultaneously copied some data to the linear buffer and gave them a devmem cmsgs in the same recvmsg() call.
So, my simplicity is:
1. in a single skb, all frags must be devmem or non-devmem, no mixing. 2. In a single recvmsg() call, we only process devmem or non-devmem skbs, no mixing.