The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From 4d51419d49930be2701c2633ae271b350397c3ca Mon Sep 17 00:00:00 2001
From: Ilya Maximets <i.maximets(a)ovn.org>
Date: Sun, 4 Apr 2021 19:50:31 +0200
Subject: [PATCH] openvswitch: fix send of uninitialized stack memory in ct
limit reply
'struct ovs_zone_limit' has more members than initialized in
ovs_ct_limit_get_default_limit(). The rest of the memory is a random
kernel stack content that ends up being sent to userspace.
Fix that by using designated initializer that will clear all
non-specified fields.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Ilya Maximets <i.maximets(a)ovn.org>
Acked-by: Tonghao Zhang <xiangxia.m.yue(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
net/openvswitch/conntrack.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 71cec03e8612..d217bd91176b 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -2034,10 +2034,10 @@ static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit,
static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info,
struct sk_buff *reply)
{
- struct ovs_zone_limit zone_limit;
-
- zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
- zone_limit.limit = info->default_limit;
+ struct ovs_zone_limit zone_limit = {
+ .zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
+ .limit = info->default_limit,
+ };
return nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
}
--
2.30.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From ed7bedd2c3ca040f1e8ea02c6590a93116b1ec78 Mon Sep 17 00:00:00 2001
From: Guangbin Huang <huangguangbin2(a)huawei.com>
Date: Tue, 6 Apr 2021 21:10:43 +0800
Subject: [PATCH] net: hns3: clear VF down state bit before request link status
Currently, the VF down state bit is cleared after VF sending
link status request command. There is problem that when VF gets
link status replied from PF, the down state bit may still set
as 1. In this case, the link status replied from PF will be
ignored and always set VF link status to down.
To fix this problem, clear VF down state bit before VF requests
link status.
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Guangbin Huang <huangguangbin2(a)huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 700e068764c8..14b83eca0a5e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -2624,14 +2624,14 @@ static int hclgevf_ae_start(struct hnae3_handle *handle)
{
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+ clear_bit(HCLGEVF_STATE_DOWN, &hdev->state);
+
hclgevf_reset_tqp_stats(handle);
hclgevf_request_link_info(hdev);
hclgevf_update_link_mode(hdev);
- clear_bit(HCLGEVF_STATE_DOWN, &hdev->state);
-
return 0;
}
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From 9ab1265d52314fce1b51e8665ea6dbc9ac1a027c Mon Sep 17 00:00:00 2001
From: Evan Nimmo <evan.nimmo(a)alliedtelesis.co.nz>
Date: Tue, 2 Mar 2021 08:00:04 +1300
Subject: [PATCH] xfrm: Use actual socket sk instead of skb socket for
xfrm_output_resume
A situation can occur where the interface bound to the sk is different
to the interface bound to the sk attached to the skb. The interface
bound to the sk is the correct one however this information is lost inside
xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume
instead. This assumes that the sk bound interface and the bound interface
attached to the sk within the skb are the same which can lead to lookup
failures inside ip_route_me_harder resulting in the packet being dropped.
We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the
global VRF however we have an encapsulated dot1q tunnel interface that
is within a different VRF. We also have a mangle rule that marks the
packets causing them to be processed inside ip_route_me_harder.
Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this
worked fine as the sk attached to the skb was changed from the dot1q
encapsulated interface to the sk for the tunnel which meant the interface
bound to the sk and the interface bound to the skb were identical.
Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk
when routing harder") fixed some of these issues however a similar
problem existed in the xfrm code.
Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
Signed-off-by: Evan Nimmo <evan.nimmo(a)alliedtelesis.co.nz>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
---
include/net/xfrm.h | 2 +-
net/ipv4/ah4.c | 2 +-
net/ipv4/esp4.c | 2 +-
net/ipv6/ah6.c | 2 +-
net/ipv6/esp6.c | 2 +-
net/xfrm/xfrm_output.c | 10 +++++-----
6 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b2a06f10b62c..bfbc7810df94 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1557,7 +1557,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
int xfrm_trans_queue(struct sk_buff *skb,
int (*finish)(struct net *, struct sock *,
struct sk_buff *));
-int xfrm_output_resume(struct sk_buff *skb, int err);
+int xfrm_output_resume(struct sock *sk, struct sk_buff *skb, int err);
int xfrm_output(struct sock *sk, struct sk_buff *skb);
#if IS_ENABLED(CONFIG_NET_PKTGEN)
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index d99e1be94019..36ed85bf2ad5 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -141,7 +141,7 @@ static void ah_output_done(struct crypto_async_request *base, int err)
}
kfree(AH_SKB_CB(skb)->tmp);
- xfrm_output_resume(skb, err);
+ xfrm_output_resume(skb->sk, skb, err);
}
static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index a3271ec3e162..4b834bbf95e0 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -279,7 +279,7 @@ static void esp_output_done(struct crypto_async_request *base, int err)
x->encap && x->encap->encap_type == TCP_ENCAP_ESPINTCP)
esp_output_tail_tcp(x, skb);
else
- xfrm_output_resume(skb, err);
+ xfrm_output_resume(skb->sk, skb, err);
}
}
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 440080da805b..080ee7f44c64 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -316,7 +316,7 @@ static void ah6_output_done(struct crypto_async_request *base, int err)
}
kfree(AH_SKB_CB(skb)->tmp);
- xfrm_output_resume(skb, err);
+ xfrm_output_resume(skb->sk, skb, err);
}
static int ah6_output(struct xfrm_state *x, struct sk_buff *skb)
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 153ad103ba74..727d791ed5e6 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -314,7 +314,7 @@ static void esp_output_done(struct crypto_async_request *base, int err)
x->encap && x->encap->encap_type == TCP_ENCAP_ESPINTCP)
esp_output_tail_tcp(x, skb);
else
- xfrm_output_resume(skb, err);
+ xfrm_output_resume(skb->sk, skb, err);
}
}
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index a7ab19353313..b81ca117dac7 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -503,22 +503,22 @@ out:
return err;
}
-int xfrm_output_resume(struct sk_buff *skb, int err)
+int xfrm_output_resume(struct sock *sk, struct sk_buff *skb, int err)
{
struct net *net = xs_net(skb_dst(skb)->xfrm);
while (likely((err = xfrm_output_one(skb, err)) == 0)) {
nf_reset_ct(skb);
- err = skb_dst(skb)->ops->local_out(net, skb->sk, skb);
+ err = skb_dst(skb)->ops->local_out(net, sk, skb);
if (unlikely(err != 1))
goto out;
if (!skb_dst(skb)->xfrm)
- return dst_output(net, skb->sk, skb);
+ return dst_output(net, sk, skb);
err = nf_hook(skb_dst(skb)->ops->family,
- NF_INET_POST_ROUTING, net, skb->sk, skb,
+ NF_INET_POST_ROUTING, net, sk, skb,
NULL, skb_dst(skb)->dev, xfrm_output2);
if (unlikely(err != 1))
goto out;
@@ -534,7 +534,7 @@ EXPORT_SYMBOL_GPL(xfrm_output_resume);
static int xfrm_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- return xfrm_output_resume(skb, 1);
+ return xfrm_output_resume(sk, skb, 1);
}
static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb)
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From c7dbf4c08868d9db89b8bfe8f8245ca61b01ed2f Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert(a)secunet.com>
Date: Fri, 26 Mar 2021 09:44:48 +0100
Subject: [PATCH] xfrm: Provide private skb extensions for segmented and hw
offloaded ESP packets
Commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec
crypto offload.") added a XFRM_XMIT flag to avoid duplicate ESP trailer
insertion on HW offload. This flag is set on the secpath that is shared
amongst segments. This lead to a situation where some segments are
not transformed correctly when segmentation happens at layer 3.
Fix this by using private skb extensions for segmented and hw offloaded
ESP packets.
Fixes: 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in IPsec crypto offload.")
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
---
net/ipv4/esp4_offload.c | 11 ++++++++++-
net/ipv6/esp6_offload.c | 11 ++++++++++-
net/xfrm/xfrm_device.c | 2 --
3 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index ed3de486ea34..33687cf58286 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -314,8 +314,17 @@ static int esp_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features_
ip_hdr(skb)->tot_len = htons(skb->len);
ip_send_check(ip_hdr(skb));
- if (hw_offload)
+ if (hw_offload) {
+ if (!skb_ext_add(skb, SKB_EXT_SEC_PATH))
+ return -ENOMEM;
+
+ xo = xfrm_offload(skb);
+ if (!xo)
+ return -EINVAL;
+
+ xo->flags |= XFRM_XMIT;
return 0;
+ }
err = esp_output_tail(x, skb, &esp);
if (err)
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index f35203ab39f5..4af56affaafd 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -348,8 +348,17 @@ static int esp6_xmit(struct xfrm_state *x, struct sk_buff *skb, netdev_features
ipv6_hdr(skb)->payload_len = htons(len);
- if (hw_offload)
+ if (hw_offload) {
+ if (!skb_ext_add(skb, SKB_EXT_SEC_PATH))
+ return -ENOMEM;
+
+ xo = xfrm_offload(skb);
+ if (!xo)
+ return -EINVAL;
+
+ xo->flags |= XFRM_XMIT;
return 0;
+ }
err = esp6_output_tail(x, skb, &esp);
if (err)
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index edf11893dbe8..6d6917b68856 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -134,8 +134,6 @@ struct sk_buff *validate_xmit_xfrm(struct sk_buff *skb, netdev_features_t featur
return skb;
}
- xo->flags |= XFRM_XMIT;
-
if (skb_is_gso(skb) && unlikely(x->xso.dev != dev)) {
struct sk_buff *segs;
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From 66167c310deb4ac1725f81004fb4b504676ad0bf Mon Sep 17 00:00:00 2001
From: Ido Schimmel <idosch(a)nvidia.com>
Date: Mon, 29 Mar 2021 11:29:23 +0300
Subject: [PATCH] mlxsw: spectrum: Fix ECN marking in tunnel decapsulation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cited commit changed the behavior of the software data path with regards
to the ECN marking of decapsulated packets. However, the commit did not
change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
driver is using the function in order to ensure that the hardware and
software data paths act the same with regards to the ECN marking of
decapsulated packets.
The discrepancy was uncovered by commit 5aa3c334a449 ("selftests:
forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
aligned the selftest to the new behavior. Without this patch the
selftest passes when used with veth pairs, but fails when used with
mlxsw netdevs.
Fix this by instructing the device to propagate the ECT(1) mark from the
outer header to the inner header when the inner header is ECT(0), for
both NVE and IP-in-IP tunnels.
A helper is added in order not to duplicate the code between both tunnel
types.
Fixes: b723748750ec ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 15 +++++++++++++++
.../net/ethernet/mellanox/mlxsw/spectrum_ipip.c | 7 +++----
.../net/ethernet/mellanox/mlxsw/spectrum_nve.c | 7 +++----
3 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index d9d9e1f488f9..ba28ac7e79bc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -21,6 +21,7 @@
#include <net/red.h>
#include <net/vxlan.h>
#include <net/flow_offload.h>
+#include <net/inet_ecn.h>
#include "port.h"
#include "core.h"
@@ -347,6 +348,20 @@ struct mlxsw_sp_port_type_speed_ops {
u32 (*ptys_proto_cap_masked_get)(u32 eth_proto_cap);
};
+static inline u8 mlxsw_sp_tunnel_ecn_decap(u8 outer_ecn, u8 inner_ecn,
+ bool *trap_en)
+{
+ bool set_ce = false;
+
+ *trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, &set_ce);
+ if (set_ce)
+ return INET_ECN_CE;
+ else if (outer_ecn == INET_ECN_ECT_1 && inner_ecn == INET_ECN_ECT_0)
+ return INET_ECN_ECT_1;
+ else
+ return inner_ecn;
+}
+
static inline struct net_device *
mlxsw_sp_bridge_vxlan_dev_find(struct net_device *br_dev)
{
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
index 6ccca39bae84..64a8f838eb53 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
@@ -335,12 +335,11 @@ static int mlxsw_sp_ipip_ecn_decap_init_one(struct mlxsw_sp *mlxsw_sp,
u8 inner_ecn, u8 outer_ecn)
{
char tidem_pl[MLXSW_REG_TIDEM_LEN];
- bool trap_en, set_ce = false;
u8 new_inner_ecn;
+ bool trap_en;
- trap_en = __INET_ECN_decapsulate(outer_ecn, inner_ecn, &set_ce);
- new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
-
+ new_inner_ecn = mlxsw_sp_tunnel_ecn_decap(outer_ecn, inner_ecn,
+ &trap_en);
mlxsw_reg_tidem_pack(tidem_pl, outer_ecn, inner_ecn, new_inner_ecn,
trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tidem), tidem_pl);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index e5ec595593f4..9eba8fa684ae 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -909,12 +909,11 @@ static int __mlxsw_sp_nve_ecn_decap_init(struct mlxsw_sp *mlxsw_sp,
u8 inner_ecn, u8 outer_ecn)
{
char tndem_pl[MLXSW_REG_TNDEM_LEN];
- bool trap_en, set_ce = false;
u8 new_inner_ecn;
+ bool trap_en;
- trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, &set_ce);
- new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
-
+ new_inner_ecn = mlxsw_sp_tunnel_ecn_decap(outer_ecn, inner_ecn,
+ &trap_en);
mlxsw_reg_tndem_pack(tndem_pl, outer_ecn, inner_ecn, new_inner_ecn,
trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tndem), tndem_pl);
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From fb6ec87f7229b92baa81b35cbc76f2626d5bfadb Mon Sep 17 00:00:00 2001
From: Maxim Kochetkov <fido_max(a)inbox.ru>
Date: Mon, 29 Mar 2021 18:30:16 +0300
Subject: [PATCH] net: dsa: Fix type was not set for devlink port
If PHY is not available on DSA port (described at devicetree but absent or
failed to detect) then kernel prints warning after 3700 secs:
[ 3707.948771] ------------[ cut here ]------------
[ 3707.948784] Type was not set for devlink port.
[ 3707.948894] WARNING: CPU: 1 PID: 17 at net/core/devlink.c:8097 0xc083f9d8
We should unregister the devlink port as a user port and
re-register it as an unused port before executing "continue" in case of
dsa_port_setup error.
Fixes: 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")
Signed-off-by: Maxim Kochetkov <fido_max(a)inbox.ru>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
net/dsa/dsa2.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index d142eb2b288b..3c3e56a1f34d 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -795,8 +795,14 @@ static int dsa_tree_setup_switches(struct dsa_switch_tree *dst)
list_for_each_entry(dp, &dst->ports, list) {
err = dsa_port_setup(dp);
- if (err)
+ if (err) {
+ dsa_port_devlink_teardown(dp);
+ dp->type = DSA_PORT_TYPE_UNUSED;
+ err = dsa_port_devlink_setup(dp);
+ if (err)
+ goto teardown;
continue;
+ }
}
return 0;
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From 622d13694b5f048c01caa7ba548498d9880d4cb0 Mon Sep 17 00:00:00 2001
From: Ong Boon Leong <boon.leong.ong(a)intel.com>
Date: Wed, 31 Mar 2021 21:25:03 +0800
Subject: [PATCH] xdp: fix xdp_return_frame() kernel BUG throw for page_pool
memory model
xdp_return_frame() may be called outside of NAPI context to return
xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with
napi_direct = false. For page_pool memory model, __xdp_return() calls
xdp_return_frame_no_direct() unconditionally and below false negative
kernel BUG throw happened under preempt-rt build:
[ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884
[ 430.451678] caller is __xdp_return+0x1ff/0x2e0
[ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45
Changes in v2:
- This patch fixes the issue by making xdp_return_frame_no_direct() is
only called if napi_direct = true, as recommended for better by
Jesper Dangaard Brouer. Thanks!
Fixes: 2539650fadbf ("xdp: Helpers for disabling napi_direct of xdp_return_frame")
Signed-off-by: Ong Boon Leong <boon.leong.ong(a)intel.com>
Acked-by: Jesper Dangaard Brouer <brouer(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
net/core/xdp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 05354976c1fc..858276e72c68 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -350,7 +350,8 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
/* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
page = virt_to_head_page(data);
- napi_direct &= !xdp_return_frame_no_direct();
+ if (napi_direct && xdp_return_frame_no_direct())
+ napi_direct = false;
page_pool_put_full_page(xa->page_pool, page, napi_direct);
rcu_read_unlock();
break;
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
>From 1a73704c82ed4ee95532ac04645d02075bd1ce3d Mon Sep 17 00:00:00 2001
From: Eli Cohen <elic(a)nvidia.com>
Date: Wed, 24 Mar 2021 09:46:09 +0200
Subject: [PATCH] net/mlx5: Fix HW spec violation configuring uplink
Make sure to modify uplink port to follow only if the uplink_follow
capability is set as required by the HW spec. Failure to do so causes
traffic to the uplink representor net device to cease after switching to
switchdev mode.
Fixes: 7d0314b11cdd ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: Eli Cohen <elic(a)nvidia.com>
Reviewed-by: Roi Dayan <roid(a)nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index a132fff7a980..8d39bfee84a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1107,8 +1107,9 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv)
mlx5e_rep_tc_enable(priv);
- mlx5_modify_vport_admin_state(mdev, MLX5_VPORT_STATE_OP_MOD_UPLINK,
- 0, 0, MLX5_VPORT_ADMIN_STATE_AUTO);
+ if (MLX5_CAP_GEN(mdev, uplink_follow))
+ mlx5_modify_vport_admin_state(mdev, MLX5_VPORT_STATE_OP_MOD_UPLINK,
+ 0, 0, MLX5_VPORT_ADMIN_STATE_AUTO);
mlx5_lag_add(mdev, netdev);
priv->events_nb.notifier_call = uplink_rep_async_event;
mlx5_notifier_register(mdev, &priv->events_nb);
--
2.30.2