On Fri, Dec 29, 2023 at 05:11:48PM +0100, Romain Gantois wrote:
Thanks for telling me about DSA_LOOP, I've tested several DSA tagging protocols with the RZN1 GMAC1 hardware using this method. Here's what I found in a nutshell:
Good job exploring the complexity of the problem in depth.
For tagging protocols that change the EtherType field in the MAC header (e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105): On TX the tagged frames are almost always ignored by the checksum offload engine and IP header checker of the MAC device. I say "almost always" because there is an unlikely but nasty corner case where a DSA tag can be identical to an IP EtherType value. In these cases, the frame will likely fail IP header checks and be dropped by the MAC.
Yes, there are a few poorly designed DSA tagging formats where arbitrary fields overlap with what the conduit interface sees as the EtherType field. We don't design the tagging formats, as they are proprietary (except for those derived from tag_8021q), we just support them. In some cases where the switch has permitted that, we have implemented dynamic changing of tagging protocols (like 'echo edsa > /sys/class/net/eth0/dsa/tagging') in order to increase the compatibility between a particular switch and its conduit interface. And where the compatibility with the default tagging protocol was beyond broken, we accepted an alternative one through the 'dsa-tag-protocol' device tree property.
Ignoring these corner cases, the DSA frames will egress with a partial checksum and be dropped by the recipient. On RX, these frames will, once again, not be detected as IP frames by the MAC. So they will be transmitted to the CPU. However, the stmmac driver will assume (wrongly in this case) that these frames' checksums have been verified by the MAC. So it will set CHECKSUM_UNECESSARY:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/... And so the IP/TCP checksums will not be checked at all, which is not ideal.
Yup, this all stems from the fact that DSA inherits the checksum offload features of the conduit (stmmac) from its vlan_features. People think that vlan_features are inherited only by VLAN upper interfaces, but that is not the case. Confusingly, in some cases, offloading NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM really does work (Broadcom conduit + Broadcom switch, Marvell conduit + Marvell switch, etc), so we can't remove this mechanism. But it uncovers lack of API compliance in drivers such as the stmmac, which is why it is a fragile mechanism.
There are other DSA tagging protocols which cause different issues. For example DSA_TAG_PROTO_BRCM_PREPEND, which seems to offset the whole MAC header, and DSA_TAG_PROTO_LAN9303 which sets ETH_P_8021Q as its EtherType. I haven't dug too deeply on these issues yet, since I'd rather deal with the checksumming issue before getting distracted by VLAN offloading and other stuff.
I agree that what brcm-prepend does - shifting the entire frame to the right by 4 octets - sounds problematic in general (making the conduit see the EtherType as octets [3:2] of the original MAC SA). But you also need to take a look at where those protocols are used, and if that is relevant in any way to the stmmac.
/* Broadcom BCM58xx chips have a flow accelerator on Port 8 * which requires us to use the prepended Broadcom tag type */ if (dev->chip_id == BCM58XX_DEVICE_ID && port == B53_CPU_PORT) { dev->tag_protocol = DSA_TAG_PROTO_BRCM_PREPEND; goto out; }
From what I understand, DSA_TAG_PROTO_BRCM_PREPEND is only used internally within Broadcom SoCs, so it seems likely that it's not designed with generic compatibility in mind.
As for DSA_TAG_PROTO_LAN9303, let me guess what the problem was. TX was fine, but on RX, the packets got dropped in hardware before they even reached the stmmac driver, because it declares NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER as features, and the DSA tags effectively look like unregistered VLAN traffic.
That is certainly an area where the lan9303 support can be improved. Other VLAN-based taggers like tag_8021q perform vlan_vid_add() calls on the conduit interface so that it won't drop the traffic even when it uses hardware VLAN filtering.
Among the tagging protocols I tested, the only one that didn't cause any issues was DSA_TAG_PROTO_TRAILER, which only appends stuff to the frame.
It's very curious that you say this. Tail taggers are notoriously problematic, because while the conduit will perform the checksum offload function on the packets, the checksum calculation goes until the very end of the frame. Thus, that checksum will be wrong after the switch consumes the tail tag (and does not update the L4 checksum).
There is no way to overcome that except to not inherit any checksum offload features for tail taggers. But that would break some other thing, so we opted for having this line in the xmit procedure of tail taggers:
if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) return NULL;
But apparently we have been inconsistent in applying this to trailer_xmit() as well. So DSA_TAG_PROTO_TRAILER should actually be a case of "checksum is computed, but is incorrect after tag stripping", but you say that it was the only one that worked fine.
TLDR: The simplest solution seems to be to modify the stmmac TX and RX paths to disable checksum offloading for frames that have a non-IP ethertype in their MAC header. This will fix the checksum situation for DSA tagging protocols that set non-IP and non-8021Q EtherTypes. Some edge cases like DSA_TAG_PROTO_BRCM_PREPEND and DSA_TAG_PROTO_LAN9303 will require a completely different solution if we want these MAC devices to handle them properly. Please share any thoughts you might have on this suggestion.
I think the overall idea is correct, with the small mentions of "let's ignore brcm-prepend" and "lan9303 should work, maybe it's just a case of disabling the VLAN filtering features through ethtool and testing again?".