Some stmmac cores have Checksum Offload Engines that cannot handle DSA tags properly. These cores find the IP/TCP headers on their own and end up computing an incorrect checksum when a DSA tag is inserted between the MAC header and IP header.
Add an additional check on stmmac link up so that COE is deactivated when the stmmac device is used as a DSA conduit.
Add a new dma_feature flag to allow cores to signal that their COEs can't handle DSA tags on TX.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan") Cc: stable@vger.kernel.org Reported-by: Richard Tresidder rtresidd@electromag.com.au Closes: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electrom... Reported-by: Romain Gantois romain.gantois@bootlin.com Closes: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.... Signed-off-by: Romain Gantois romain.gantois@bootlin.com --- drivers/net/ethernet/stmicro/stmmac/common.h | 1 + .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c | 1 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 16 +++++++++++++++- 3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h index a381dee8b52d..1469d95e77a0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/common.h +++ b/drivers/net/ethernet/stmicro/stmmac/common.h @@ -406,6 +406,7 @@ struct dma_features { unsigned int rx_coe_type1; unsigned int rx_coe_type2; unsigned int rxfifo_over_2048; + unsigned int dsa_breaks_tx_coe; /* TX and RX number of channels */ unsigned int number_rx_channel; unsigned int number_tx_channel; diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c index daf79cdbd3ec..50686cdc3320 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c @@ -264,6 +264,7 @@ static int dwmac1000_get_hw_feature(void __iomem *ioaddr, dma_cap->number_tx_channel = (hw_cap & DMA_HW_FEAT_TXCHCNT) >> 22; /* Alternate (enhanced) DESC mode */ dma_cap->enh_desc = (hw_cap & DMA_HW_FEAT_ENHDESSEL) >> 24; + dma_cap->dsa_breaks_tx_coe = 1;
return 0; } diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index a9b6b383e863..733348c65e04 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -42,6 +42,7 @@ #include <net/page_pool/helpers.h> #include <net/pkt_cls.h> #include <net/xdp_sock_drv.h> +#include <net/dsa.h> #include "stmmac_ptp.h" #include "stmmac.h" #include "stmmac_xdp.h" @@ -993,8 +994,11 @@ static void stmmac_mac_link_up(struct phylink_config *config, int speed, int duplex, bool tx_pause, bool rx_pause) { - struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); + struct net_device *ndev = to_net_dev(config->dev); + struct stmmac_priv *priv = netdev_priv(ndev); + unsigned int tx_queue_cnt; u32 old_ctrl, ctrl; + int queue;
if ((priv->plat->flags & STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP) && priv->plat->serdes_powerup) @@ -1102,6 +1106,16 @@ static void stmmac_mac_link_up(struct phylink_config *config,
if (priv->plat->flags & STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY) stmmac_hwtstamp_correct_latency(priv, priv); + + /* DSA tags break COEs on some cores. Disable TX checksum + * offloading on those cores if the netdevice is a DSA conduit. + */ + if (priv->dma_cap.dsa_breaks_tx_coe && netdev_uses_dsa(ndev)) { + tx_queue_cnt = priv->plat->tx_queues_to_use; + for (queue = 0; queue < tx_queue_cnt; queue++) + priv->plat->tx_queues_cfg[queue].coe_unsupported = true; + } + }
static const struct phylink_mac_ops stmmac_phylink_mac_ops = {
Hi Romain,
On Mon, Dec 18, 2023 at 05:23:23PM +0100, Romain Gantois wrote:
Some stmmac cores have Checksum Offload Engines that cannot handle DSA tags properly. These cores find the IP/TCP headers on their own and end up computing an incorrect checksum when a DSA tag is inserted between the MAC header and IP header.
Add an additional check on stmmac link up so that COE is deactivated when the stmmac device is used as a DSA conduit.
Add a new dma_feature flag to allow cores to signal that their COEs can't handle DSA tags on TX.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan") Cc: stable@vger.kernel.org Reported-by: Richard Tresidder rtresidd@electromag.com.au Closes: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electrom... Reported-by: Romain Gantois romain.gantois@bootlin.com Closes: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.... Signed-off-by: Romain Gantois romain.gantois@bootlin.com
DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_SJA1105 and DSA_TAG_PROTO_SJA1110 construct tags with ETH_P_8021Q as EtherType. Do you still think it would be correct to say that all DSA tags break COE on the stmmac, as this patch assumes?
The NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM convention is not about statically checking whether the interface using DSA, but about looking at each packet before deciding whether to use the offload engine or to call skb_checksum_help().
You can experiment with any tagging protocol on the stmmac driver, and thus with the controller's response to any kind of traffic, even if the port is not attached to a hardware switch. You need to enable the CONFIG_NET_DSA_LOOP option, edit the return value of dsa_loop_get_protocol() and the "netdev" field of dsa_loop_pdata. The packets need to be analyzed on the link partner with a packet analysis tool, since there is no switch to strip the DSA tag.
drivers/net/ethernet/stmicro/stmmac/common.h | 1 + .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c | 1 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 16 +++++++++++++++- 3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c index daf79cdbd3ec..50686cdc3320 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c @@ -264,6 +264,7 @@ static int dwmac1000_get_hw_feature(void __iomem *ioaddr, dma_cap->number_tx_channel = (hw_cap & DMA_HW_FEAT_TXCHCNT) >> 22; /* Alternate (enhanced) DESC mode */ dma_cap->enh_desc = (hw_cap & DMA_HW_FEAT_ENHDESSEL) >> 24;
- dma_cap->dsa_breaks_tx_coe = 1;
return 0; } diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index a9b6b383e863..733348c65e04 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -42,6 +42,7 @@ #include <net/page_pool/helpers.h> #include <net/pkt_cls.h> #include <net/xdp_sock_drv.h> +#include <net/dsa.h> #include "stmmac_ptp.h" #include "stmmac.h" #include "stmmac_xdp.h" @@ -993,8 +994,11 @@ static void stmmac_mac_link_up(struct phylink_config *config, int speed, int duplex, bool tx_pause, bool rx_pause) {
- struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev));
- struct net_device *ndev = to_net_dev(config->dev);
- struct stmmac_priv *priv = netdev_priv(ndev);
- unsigned int tx_queue_cnt; u32 old_ctrl, ctrl;
- int queue;
if ((priv->plat->flags & STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP) && priv->plat->serdes_powerup) @@ -1102,6 +1106,16 @@ static void stmmac_mac_link_up(struct phylink_config *config, if (priv->plat->flags & STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY) stmmac_hwtstamp_correct_latency(priv, priv);
- /* DSA tags break COEs on some cores. Disable TX checksum
* offloading on those cores if the netdevice is a DSA conduit.
*/
- if (priv->dma_cap.dsa_breaks_tx_coe && netdev_uses_dsa(ndev)) {
tx_queue_cnt = priv->plat->tx_queues_to_use;
for (queue = 0; queue < tx_queue_cnt; queue++)
priv->plat->tx_queues_cfg[queue].coe_unsupported = true;
- }
The DSA switch driver can load after stmmac_mac_link_up() runs. This implementation is racy anyway.
} static const struct phylink_mac_ops stmmac_phylink_mac_ops = { -- 2.43.0
pw-bot: cr
Hi Vlad,
+ Linus Walleij
On Tue, 19 Dec 2023 14:20:34 +0200 Vladimir Oltean olteanv@gmail.com wrote:
Hi Romain,
On Mon, Dec 18, 2023 at 05:23:23PM +0100, Romain Gantois wrote:
Some stmmac cores have Checksum Offload Engines that cannot handle DSA tags properly. These cores find the IP/TCP headers on their own and end up computing an incorrect checksum when a DSA tag is inserted between the MAC header and IP header.
Add an additional check on stmmac link up so that COE is deactivated when the stmmac device is used as a DSA conduit.
Add a new dma_feature flag to allow cores to signal that their COEs can't handle DSA tags on TX.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan") Cc: stable@vger.kernel.org Reported-by: Richard Tresidder rtresidd@electromag.com.au Closes: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electrom... Reported-by: Romain Gantois romain.gantois@bootlin.com Closes: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.... Signed-off-by: Romain Gantois romain.gantois@bootlin.com
DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_SJA1105 and DSA_TAG_PROTO_SJA1110 construct tags with ETH_P_8021Q as EtherType. Do you still think it would be correct to say that all DSA tags break COE on the stmmac, as this patch assumes?
The NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM convention is not about statically checking whether the interface using DSA, but about looking at each packet before deciding whether to use the offload engine or to call skb_checksum_help().
So it looks like an acceptable solution would be something along the lines of what Linus is suggesting here :
https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-...
If so, maybe it's worth adding a new helper for that check ?
Maxime
On Tue, Dec 19, 2023 at 2:07 PM Maxime Chevallier maxime.chevallier@bootlin.com wrote:
So it looks like an acceptable solution would be something along the lines of what Linus is suggesting here :
https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-...
If so, maybe it's worth adding a new helper for that check ?
Yeah it's a bit annoying when skb->protocol is not == ethertype of buffer.
I can certainly add a helper such as skb_eth_raw_ethertype() to <linux/if_ether.h> that will inspect the actual ethertype in skb->data.
It's the most straight-forward approach.
We could also add something like bool custom_ethertype; to struct sk_buff and set that to true if the tagger adds a custom ethertype. But I don't know how the network developers feel about that.
Yours, Linus Walleij
Hi Linus,
On Tue, 19 Dec 2023 15:19:45 +0100 Linus Walleij linus.walleij@linaro.org wrote:
On Tue, Dec 19, 2023 at 2:07 PM Maxime Chevallier maxime.chevallier@bootlin.com wrote:
So it looks like an acceptable solution would be something along the lines of what Linus is suggesting here :
https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-...
If so, maybe it's worth adding a new helper for that check ?
Yeah it's a bit annoying when skb->protocol is not == ethertype of buffer.
I can certainly add a helper such as skb_eth_raw_ethertype() to <linux/if_ether.h> that will inspect the actual ethertype in skb->data.
It's the most straight-forward approach.
Agreed :)
We could also add something like bool custom_ethertype; to struct sk_buff and set that to true if the tagger adds a custom ethertype. But I don't know how the network developers feel about that.
I don't think this would be OK, first because sk_buff is pretty sensitive when it comes to cache alignment, adding things for this kind of use-cases isn't necessarily a good idea. Moreover, populating this flag isn't going to be straightforward as well. I guess some ethertype would be compatible with checksum engines, while other wouldn't, so probably what 'custom_ethertype' means will depend on the MAC driver.
From my point of view the first approach would indeed be better.
Thanks,
Maxime
On Tue, Dec 19, 2023 at 05:29:32PM +0100, Maxime Chevallier wrote:
Hi Linus,
On Tue, 19 Dec 2023 15:19:45 +0100 Linus Walleij linus.walleij@linaro.org wrote:
On Tue, Dec 19, 2023 at 2:07 PM Maxime Chevallier maxime.chevallier@bootlin.com wrote:
So it looks like an acceptable solution would be something along the lines of what Linus is suggesting here :
https://lore.kernel.org/netdev/20231216-new-gemini-ethernet-regression-v2-2-...
If so, maybe it's worth adding a new helper for that check ?
Yeah it's a bit annoying when skb->protocol is not == ethertype of buffer.
I can certainly add a helper such as skb_eth_raw_ethertype() to <linux/if_ether.h> that will inspect the actual ethertype in skb->data.
It's the most straight-forward approach.
Agreed :)
If you rewrite that patch to use skb_vlan_eth_hdr() to get a struct vlan_ethhdr pointer through which h_vlan_proto and h_vlan_encapsulated_proto are accessible, I don't see much value in writing that helper. It is going to beg the question how generic should it be - should it also treat ETH_P_8021AD, should it treat nested VLANs?
At the end of the day, you are trying to cover in software the cases for which the hardware engine can perform TX checksum offloading. That is going to be hardware specific.
We could also add something like bool custom_ethertype; to struct sk_buff and set that to true if the tagger adds a custom ethertype. But I don't know how the network developers feel about that.
I don't think this would be OK, first because sk_buff is pretty sensitive when it comes to cache alignment, adding things for this kind of use-cases isn't necessarily a good idea. Moreover, populating this flag isn't going to be straightforward as well. I guess some ethertype would be compatible with checksum engines, while other wouldn't, so probably what 'custom_ethertype' means will depend on the MAC driver.
From my point of view the first approach would indeed be better.
I guess we should first try to answer the questions "what does skb->protocol represent?" and "does DSA use it correctly?" before even thinking about adding yet another fuzzy layer on top it.
On Tue, Dec 19, 2023 at 11:46 PM Vladimir Oltean olteanv@gmail.com wrote:
On Tue, Dec 19, 2023 at 05:29:32PM +0100, Maxime Chevallier wrote:
I can certainly add a helper such as skb_eth_raw_ethertype() to <linux/if_ether.h> that will inspect the actual ethertype in skb->data.
It's the most straight-forward approach.
Agreed :)
If you rewrite that patch to use skb_vlan_eth_hdr() to get a struct vlan_ethhdr pointer through which h_vlan_proto and h_vlan_encapsulated_proto are accessible, I don't see much value in writing that helper. It is going to beg the question how generic should it be - should it also treat ETH_P_8021AD, should it treat nested VLANs?
I guess I should just post the patches inline. (It came from both Erics and Maximes suggestion really.)
Actually I wrote two helpers, one to get the ethertype from the ethernet frame which is pretty straight-forward.
include/linux/if_ether.h
+/* This determines the ethertype incoded into the skb data without + * relying on skb->protocol which is not always identical. + */ +static inline u16 skb_eth_raw_ethertype(const struct sk_buff *skb) +{ + struct ethhdr *hdr; + + /* If we can't extract a header, return invalid type */ + if (!skb_pointer_if_linear(skb, 0, ETH_HLEN)) + return 0x0000U; + + hdr = skb_eth_hdr(skb); + + return ntohs(hdr->h_proto); +}
Then for *this* driver I need to check for the ethertype ETH_P_8021Q what is inside it, one level down, and that is a separate helper. And I named it skb_vlan_raw_inner_ethertype() It will retrieve the inner type no matter
include/linux/if_vlan.h
+/* This determines the inner ethertype incoded into the skb data without + * relying on skb->protocol which is not always identical. + */ +static inline u16 skb_vlan_raw_inner_ethertype(const struct sk_buff *skb) +{ + struct vlan_ethhdr *vhdr; + + if (!skb_pointer_if_linear(skb, 0, VLAN_ETH_HLEN)) + return 0x0000U; + + vhdr = vlan_eth_hdr(skb); + return ntohs(vhdr->h_vlan_encapsulated_proto); +}
(We can bikeshed the name of the function. *_inner_protocol maybe.)
It does not handle nested VLANs and I don't see why it should since the immediate siblings in if_vlan.h does not, i.e. vlan_eth_hdr(), skb_vlan_eth_hdr(). It's pretty clear these helpers all go just one level down. (We can add a *_descend_*() helper the day someone needs that.)
At the end of the day, you are trying to cover in software the cases for which the hardware engine can perform TX checksum offloading. That is going to be hardware specific.
Yeps and I am happy to fold these helpers inside of my driver if they are not helpful to anyone else, or if that is the best idea for something intended for a fix, i.e. an -rc kernel.
I guess we should first try to answer the questions "what does skb->protocol represent?" and "does DSA use it correctly?" before even thinking about adding yet another fuzzy layer on top it.
Fair point! Let's take a step back. The kerneldoc says:
* @protocol: Packet protocol from driver
That's a bit vague and it was in the first commit in git history :/
But Eric probably knows the right way to use protocol.
But we know for sure that VLAN uses this for the outermost protocol ETH_P_8021Q (etc).
I wonder how the network stack reacts if we set the skb->protocol to whatever DSA taggers put at the position of the ethertype.
For RTL taggers probably this works because they use an elaborate custom ethertype, but e.g. net/dsa/tag_mtk.c will just put in "ethertype" 0x0000, 0x0001 or 0x0002, the two latter which are formally ETH_P_802_3 and ETH_P_AX25 which I think is maybe not so good to put into skb->protocol.
Another option is to set it to the ETH_P_DSA ethertype, currently unused in the kernel.
Now this kind of thinking makes me insecure because: git grep '->protocol' net/
There is just sooooo much code inspecting ->protocol in the generic network stack that this seems like inviting disaster.
Yours, Linus Walleij
On Wed, Dec 20, 2023 at 1:43 AM Linus Walleij linus.walleij@linaro.org wrote:
Then for *this* driver I need to check for the ethertype ETH_P_8021Q what is inside it, one level down, and that is a separate helper. And I named it skb_vlan_raw_inner_ethertype() It will retrieve the inner type no matter
include/linux/if_vlan.h
+/* This determines the inner ethertype incoded into the skb data without
- relying on skb->protocol which is not always identical.
- */
+static inline u16 skb_vlan_raw_inner_ethertype(const struct sk_buff *skb) +{
struct vlan_ethhdr *vhdr;
if (!skb_pointer_if_linear(skb, 0, VLAN_ETH_HLEN))
return 0x0000U;
vhdr = vlan_eth_hdr(skb);
return ntohs(vhdr->h_vlan_encapsulated_proto);
+}
(We can bikeshed the name of the function. *_inner_protocol maybe.)
It does not handle nested VLANs and I don't see why it should since the immediate siblings in if_vlan.h does not, i.e. vlan_eth_hdr(), skb_vlan_eth_hdr(). It's pretty clear these helpers all go just one level down. (We can add a *_descend_*() helper the day someone needs that.)
Forget this whole discussion because in <linux/if_vlan.h> there is already vlan_get_protocol() and vlan_get_protocol_and_depth() so this problem is already solved, just better.
Yours, Linus Walleij
On Tue, 19 Dec 2023, Vladimir Oltean wrote:
DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_SJA1105 and DSA_TAG_PROTO_SJA1110 construct tags with ETH_P_8021Q as EtherType. Do you still think it would be correct to say that all DSA tags break COE on the stmmac, as this patch assumes?
The NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM convention is not about statically checking whether the interface using DSA, but about looking at each packet before deciding whether to use the offload engine or to call skb_checksum_help().
You can experiment with any tagging protocol on the stmmac driver, and thus with the controller's response to any kind of traffic, even if the port is not attached to a hardware switch. You need to enable the
Thanks for telling me about DSA_LOOP, I've tested several DSA tagging protocols with the RZN1 GMAC1 hardware using this method. Here's what I found in a nutshell:
For tagging protocols that change the EtherType field in the MAC header (e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105): On TX the tagged frames are almost always ignored by the checksum offload engine and IP header checker of the MAC device. I say "almost always" because there is an unlikely but nasty corner case where a DSA tag can be identical to an IP EtherType value. In these cases, the frame will likely fail IP header checks and be dropped by the MAC.
Ignoring these corner cases, the DSA frames will egress with a partial checksum and be dropped by the recipient. On RX, these frames will, once again, not be detected as IP frames by the MAC. So they will be transmitted to the CPU. However, the stmmac driver will assume (wrongly in this case) that these frames' checksums have been verified by the MAC. So it will set CHECKSUM_UNECESSARY:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/...
And so the IP/TCP checksums will not be checked at all, which is not ideal.
There are other DSA tagging protocols which cause different issues. For example DSA_TAG_PROTO_BRCM_PREPEND, which seems to offset the whole MAC header, and DSA_TAG_PROTO_LAN9303 which sets ETH_P_8021Q as its EtherType. I haven't dug too deeply on these issues yet, since I'd rather deal with the checksumming issue before getting distracted by VLAN offloading and other stuff.
Among the tagging protocols I tested, the only one that didn't cause any issues was DSA_TAG_PROTO_TRAILER, which only appends stuff to the frame.
TLDR: The simplest solution seems to be to modify the stmmac TX and RX paths to disable checksum offloading for frames that have a non-IP ethertype in their MAC header. This will fix the checksum situation for DSA tagging protocols that set non-IP and non-8021Q EtherTypes. Some edge cases like DSA_TAG_PROTO_BRCM_PREPEND and DSA_TAG_PROTO_LAN9303 will require a completely different solution if we want these MAC devices to handle them properly. Please share any thoughts you might have on this suggestion.
Best Regards,
On Fri, Dec 29, 2023 at 05:11:48PM +0100, Romain Gantois wrote:
Thanks for telling me about DSA_LOOP, I've tested several DSA tagging protocols with the RZN1 GMAC1 hardware using this method. Here's what I found in a nutshell:
Good job exploring the complexity of the problem in depth.
For tagging protocols that change the EtherType field in the MAC header (e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105): On TX the tagged frames are almost always ignored by the checksum offload engine and IP header checker of the MAC device. I say "almost always" because there is an unlikely but nasty corner case where a DSA tag can be identical to an IP EtherType value. In these cases, the frame will likely fail IP header checks and be dropped by the MAC.
Yes, there are a few poorly designed DSA tagging formats where arbitrary fields overlap with what the conduit interface sees as the EtherType field. We don't design the tagging formats, as they are proprietary (except for those derived from tag_8021q), we just support them. In some cases where the switch has permitted that, we have implemented dynamic changing of tagging protocols (like 'echo edsa > /sys/class/net/eth0/dsa/tagging') in order to increase the compatibility between a particular switch and its conduit interface. And where the compatibility with the default tagging protocol was beyond broken, we accepted an alternative one through the 'dsa-tag-protocol' device tree property.
Ignoring these corner cases, the DSA frames will egress with a partial checksum and be dropped by the recipient. On RX, these frames will, once again, not be detected as IP frames by the MAC. So they will be transmitted to the CPU. However, the stmmac driver will assume (wrongly in this case) that these frames' checksums have been verified by the MAC. So it will set CHECKSUM_UNECESSARY:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/stmicro/... And so the IP/TCP checksums will not be checked at all, which is not ideal.
Yup, this all stems from the fact that DSA inherits the checksum offload features of the conduit (stmmac) from its vlan_features. People think that vlan_features are inherited only by VLAN upper interfaces, but that is not the case. Confusingly, in some cases, offloading NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM really does work (Broadcom conduit + Broadcom switch, Marvell conduit + Marvell switch, etc), so we can't remove this mechanism. But it uncovers lack of API compliance in drivers such as the stmmac, which is why it is a fragile mechanism.
There are other DSA tagging protocols which cause different issues. For example DSA_TAG_PROTO_BRCM_PREPEND, which seems to offset the whole MAC header, and DSA_TAG_PROTO_LAN9303 which sets ETH_P_8021Q as its EtherType. I haven't dug too deeply on these issues yet, since I'd rather deal with the checksumming issue before getting distracted by VLAN offloading and other stuff.
I agree that what brcm-prepend does - shifting the entire frame to the right by 4 octets - sounds problematic in general (making the conduit see the EtherType as octets [3:2] of the original MAC SA). But you also need to take a look at where those protocols are used, and if that is relevant in any way to the stmmac.
/* Broadcom BCM58xx chips have a flow accelerator on Port 8 * which requires us to use the prepended Broadcom tag type */ if (dev->chip_id == BCM58XX_DEVICE_ID && port == B53_CPU_PORT) { dev->tag_protocol = DSA_TAG_PROTO_BRCM_PREPEND; goto out; }
From what I understand, DSA_TAG_PROTO_BRCM_PREPEND is only used internally within Broadcom SoCs, so it seems likely that it's not designed with generic compatibility in mind.
As for DSA_TAG_PROTO_LAN9303, let me guess what the problem was. TX was fine, but on RX, the packets got dropped in hardware before they even reached the stmmac driver, because it declares NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER as features, and the DSA tags effectively look like unregistered VLAN traffic.
That is certainly an area where the lan9303 support can be improved. Other VLAN-based taggers like tag_8021q perform vlan_vid_add() calls on the conduit interface so that it won't drop the traffic even when it uses hardware VLAN filtering.
Among the tagging protocols I tested, the only one that didn't cause any issues was DSA_TAG_PROTO_TRAILER, which only appends stuff to the frame.
It's very curious that you say this. Tail taggers are notoriously problematic, because while the conduit will perform the checksum offload function on the packets, the checksum calculation goes until the very end of the frame. Thus, that checksum will be wrong after the switch consumes the tail tag (and does not update the L4 checksum).
There is no way to overcome that except to not inherit any checksum offload features for tail taggers. But that would break some other thing, so we opted for having this line in the xmit procedure of tail taggers:
if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb)) return NULL;
But apparently we have been inconsistent in applying this to trailer_xmit() as well. So DSA_TAG_PROTO_TRAILER should actually be a case of "checksum is computed, but is incorrect after tag stripping", but you say that it was the only one that worked fine.
TLDR: The simplest solution seems to be to modify the stmmac TX and RX paths to disable checksum offloading for frames that have a non-IP ethertype in their MAC header. This will fix the checksum situation for DSA tagging protocols that set non-IP and non-8021Q EtherTypes. Some edge cases like DSA_TAG_PROTO_BRCM_PREPEND and DSA_TAG_PROTO_LAN9303 will require a completely different solution if we want these MAC devices to handle them properly. Please share any thoughts you might have on this suggestion.
I think the overall idea is correct, with the small mentions of "let's ignore brcm-prepend" and "lan9303 should work, maybe it's just a case of disabling the VLAN filtering features through ethtool and testing again?".
linux-stable-mirror@lists.linaro.org