On Wed, Jul 26, 2023 at 09:10:20AM -0700, Yan Zhai wrote:
On Wed, Jul 26, 2023 at 06:01:00PM +0300, Dan Carpenter wrote:
On Wed, Jul 26, 2023 at 07:14:56AM -0700, Yan Zhai wrote:
On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote:
I'm not positive I understand the code in ip_finish_output2(). I think instead of looking for LWTUNNEL_XMIT_DONE it should instead look for != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that instead?
I considered about changing lwt side logic. But it would bring larger impact since there are multiple types of encaps on this hook, not just bpf redirect. Changing bpf return values is a minimum change on the other hand. In addition, returning value of NET_RX_DROP and NET_XMIT_CN are the same, so if we don't do something in bpf redirect, there is no way to distinguish them later: the former is considered as an error, while "CN" is considered as non-error.
Uh, NET_RX/XMIT_DROP values are 1. NET_XMIT_CN is 2.
I'm not an expert but I think what happens is that we treat NET_XMIT_CN as success so that it takes a while for the resend to happen. Eventually the TCP layer will detect it as a dropped packet.
My eyes slipped lines. CN is 2. But the fact RX return value can be returned on a TX path still makes me feel unclean. Odds are low that we will have new statuses in future, it is a risk. I'd hope to contain these values only inside BPF redirect code as they are the reason why such rx values can show up there. Meanwhile, your argument do make good sense to me that the same problem may occur for other stuff. It is true. In fact, I just re-examined BPF-REROUTE path, it has the exact same issue by directly sending dst_output value back.
So I would propose to do two things:
- still convert BPF redirect ingress code to contain the propagation
of mixed return. Return only TX side value instead, which is also what majority of those local senders are expecting. (I was wrong about positive values returned to sendmsg below btw, they are not).
- change LWTUNNEL_XMIT_CONTINUE and check for this at xmit hook.
Sounds good!
regards, dan carpenter