This is a note to let you know that I've just added the patch titled
mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mac80211_hwsim-fix-a-possible-sleep-in-atomic-bug-in-hwsim_get_radio_nl.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Jia-Ju Bai <baijiaju1990(a)163.com>
Date: Tue, 12 Dec 2017 17:26:36 +0800
Subject: mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
From: Jia-Ju Bai <baijiaju1990(a)163.com>
[ Upstream commit 162bd5e5fd921785077b5862d8f2ffabe2fe11e5 ]
The driver may sleep under a spinlock.
The function call path is:
hwsim_get_radio_nl (acquire the spinlock)
nlmsg_new(GFP_KERNEL) --> may sleep
To fix it, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool(DSAC) and checked by my code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990(a)163.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/wireless/mac80211_hwsim.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -3154,7 +3154,7 @@ static int hwsim_get_radio_nl(struct sk_
if (!net_eq(wiphy_net(data->hw->wiphy), genl_info_net(info)))
continue;
- skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
if (!skb) {
res = -ENOMEM;
goto out_err;
Patches currently in stable-queue which might be from baijiaju1990(a)163.com are
queue-4.9/mac80211_hwsim-fix-a-possible-sleep-in-atomic-bug-in-hwsim_get_radio_nl.patch
This is a note to let you know that I've just added the patch titled
mac80211: mesh: drop frames appearing to be from us
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mac80211-mesh-drop-frames-appearing-to-be-from-us.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Thu, 4 Jan 2018 15:51:53 +0100
Subject: mac80211: mesh: drop frames appearing to be from us
From: Johannes Berg <johannes.berg(a)intel.com>
[ Upstream commit 736a80bbfda709fb3631f5f62056f250a38e5804 ]
If there are multiple mesh stations with the same MAC address,
they will both get confused and start throwing warnings.
Obviously in this case nothing can actually work anyway, so just
drop frames that look like they're from ourselves early on.
Reported-by: Gui Iribarren <gui(a)altermundi.net>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/mac80211/rx.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -3611,6 +3611,8 @@ static bool ieee80211_accept_frame(struc
}
return true;
case NL80211_IFTYPE_MESH_POINT:
+ if (ether_addr_equal(sdata->vif.addr, hdr->addr2))
+ return false;
if (multicast)
return true;
return ether_addr_equal(sdata->vif.addr, hdr->addr1);
Patches currently in stable-queue which might be from johannes.berg(a)intel.com are
queue-4.9/mac80211_hwsim-fix-a-possible-sleep-in-atomic-bug-in-hwsim_get_radio_nl.patch
queue-4.9/nl80211-check-for-the-required-netlink-attribute-presence.patch
queue-4.9/mac80211-mesh-drop-frames-appearing-to-be-from-us.patch
This is a note to let you know that I've just added the patch titled
lib/mpi: Fix umul_ppmm() for MIPS64r6
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
lib-mpi-fix-umul_ppmm-for-mips64r6.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: James Hogan <jhogan(a)kernel.org>
Date: Tue, 5 Dec 2017 23:31:35 +0000
Subject: lib/mpi: Fix umul_ppmm() for MIPS64r6
From: James Hogan <jhogan(a)kernel.org>
[ Upstream commit bbc25bee37d2b32cf3a1fab9195b6da3a185614a ]
Current MIPS64r6 toolchains aren't able to generate efficient
DMULU/DMUHU based code for the C implementation of umul_ppmm(), which
performs an unsigned 64 x 64 bit multiply and returns the upper and
lower 64-bit halves of the 128-bit result. Instead it widens the 64-bit
inputs to 128-bits and emits a __multi3 intrinsic call to perform a 128
x 128 multiply. This is both inefficient, and it results in a link error
since we don't include __multi3 in MIPS linux.
For example commit 90a53e4432b1 ("cfg80211: implement regdb signature
checking") merged in v4.15-rc1 recently broke the 64r6_defconfig and
64r6el_defconfig builds by indirectly selecting MPILIB. The same build
errors can be reproduced on older kernels by enabling e.g. CRYPTO_RSA:
lib/mpi/generic_mpih-mul1.o: In function `mpihelp_mul_1':
lib/mpi/generic_mpih-mul1.c:50: undefined reference to `__multi3'
lib/mpi/generic_mpih-mul2.o: In function `mpihelp_addmul_1':
lib/mpi/generic_mpih-mul2.c:49: undefined reference to `__multi3'
lib/mpi/generic_mpih-mul3.o: In function `mpihelp_submul_1':
lib/mpi/generic_mpih-mul3.c:49: undefined reference to `__multi3'
lib/mpi/mpih-div.o In function `mpihelp_divrem':
lib/mpi/mpih-div.c:205: undefined reference to `__multi3'
lib/mpi/mpih-div.c:142: undefined reference to `__multi3'
Therefore add an efficient MIPS64r6 implementation of umul_ppmm() using
inline assembly and the DMULU/DMUHU instructions, to prevent __multi3
calls being emitted.
Fixes: 7fd08ca58ae6 ("MIPS: Add build support for the MIPS R6 ISA")
Signed-off-by: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: linux-mips(a)linux-mips.org
Cc: linux-crypto(a)vger.kernel.org
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/mpi/longlong.h | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
--- a/lib/mpi/longlong.h
+++ b/lib/mpi/longlong.h
@@ -671,7 +671,23 @@ do { \
************** MIPS/64 **************
***************************************/
#if (defined(__mips) && __mips >= 3) && W_TYPE_SIZE == 64
-#if (__GNUC__ >= 5) || (__GNUC__ >= 4 && __GNUC_MINOR__ >= 4)
+#if defined(__mips_isa_rev) && __mips_isa_rev >= 6
+/*
+ * GCC ends up emitting a __multi3 intrinsic call for MIPS64r6 with the plain C
+ * code below, so we special case MIPS64r6 until the compiler can do better.
+ */
+#define umul_ppmm(w1, w0, u, v) \
+do { \
+ __asm__ ("dmulu %0,%1,%2" \
+ : "=d" ((UDItype)(w0)) \
+ : "d" ((UDItype)(u)), \
+ "d" ((UDItype)(v))); \
+ __asm__ ("dmuhu %0,%1,%2" \
+ : "=d" ((UDItype)(w1)) \
+ : "d" ((UDItype)(u)), \
+ "d" ((UDItype)(v))); \
+} while (0)
+#elif (__GNUC__ >= 5) || (__GNUC__ >= 4 && __GNUC_MINOR__ >= 4)
#define umul_ppmm(w1, w0, u, v) \
do { \
typedef unsigned int __ll_UTItype __attribute__((mode(TI))); \
Patches currently in stable-queue which might be from jhogan(a)kernel.org are
queue-4.9/lib-mpi-fix-umul_ppmm-for-mips64r6.patch
This is a note to let you know that I've just added the patch titled
ipv6: icmp6: Allow icmp messages to be looped back
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipv6-icmp6-allow-icmp-messages-to-be-looped-back.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Brendan McGrath <redmcg(a)redmandi.dyndns.org>
Date: Wed, 13 Dec 2017 22:14:57 +1100
Subject: ipv6: icmp6: Allow icmp messages to be looped back
From: Brendan McGrath <redmcg(a)redmandi.dyndns.org>
[ Upstream commit 588753f1eb18978512b1c9b85fddb457d46f9033 ]
One example of when an ICMPv6 packet is required to be looped back is
when a host acts as both a Multicast Listener and a Multicast Router.
A Multicast Router will listen on address ff02::16 for MLDv2 messages.
Currently, MLDv2 messages originating from a Multicast Listener running
on the same host as the Multicast Router are not being delivered to the
Multicast Router. This is due to dst.input being assigned the default
value of dst_discard.
This results in the packet being looped back but discarded before being
delivered to the Multicast Router.
This patch sets dst.input to ip6_input to ensure a looped back packet
is delivered to the Multicast Router.
Signed-off-by: Brendan McGrath <redmcg(a)redmandi.dyndns.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/route.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1651,6 +1651,7 @@ struct dst_entry *icmp6_dst_alloc(struct
}
rt->dst.flags |= DST_HOST;
+ rt->dst.input = ip6_input;
rt->dst.output = ip6_output;
atomic_set(&rt->dst.__refcnt, 1);
rt->rt6i_gateway = fl6->daddr;
Patches currently in stable-queue which might be from redmcg(a)redmandi.dyndns.org are
queue-4.9/ipv6-icmp6-allow-icmp-messages-to-be-looped-back.patch
This is a note to let you know that I've just added the patch titled
led: core: Fix brightness setting when setting delay_off=0
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
led-core-fix-brightness-setting-when-setting-delay_off-0.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Matthieu CASTET <matthieu.castet(a)parrot.com>
Date: Tue, 12 Dec 2017 11:10:44 +0100
Subject: led: core: Fix brightness setting when setting delay_off=0
From: Matthieu CASTET <matthieu.castet(a)parrot.com>
[ Upstream commit 2b83ff96f51d0b039c4561b9f95c824d7bddb85c ]
With the current code, the following sequence won't work :
echo timer > trigger
echo 0 > delay_off
* at this point we call
** led_delay_off_store
** led_blink_set
---
drivers/leds/led-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/leds/led-core.c
+++ b/drivers/leds/led-core.c
@@ -186,7 +186,7 @@ void led_blink_set(struct led_classdev *
unsigned long *delay_on,
unsigned long *delay_off)
{
- del_timer_sync(&led_cdev->blink_timer);
+ led_stop_software_blink(led_cdev);
led_cdev->flags &= ~LED_BLINK_ONESHOT;
led_cdev->flags &= ~LED_BLINK_ONESHOT_STOP;
Patches currently in stable-queue which might be from matthieu.castet(a)parrot.com are
queue-4.9/led-core-fix-brightness-setting-when-setting-delay_off-0.patch
This is a note to let you know that I've just added the patch titled
ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ip6_tunnel-get-the-min-mtu-properly-in-ip6_tnl_xmit.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 18 Dec 2017 14:26:21 +0800
Subject: ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit c9fefa08190fc879fb2e681035d7774e0a8c5170 ]
Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but
IPV6_MIN_MTU actually only works when the inner packet is ipv6.
With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst
couldn't be set less than 1280. It would cause tx_err and the
packet to be dropped when the outer dst pmtu is close to 1280.
Jianlin found it by running ipv4 traffic with the topo:
(client) gre6 <---> eth1 (route) eth2 <---> gre6 (server)
After changing eth2 mtu to 1300, the performance became very
low, or the connection was even broken. The issue also affects
ip4ip6 and ip6ip6 tunnels.
So if the inner packet is ipv4, 576 should be considered as the
min mtu.
Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can
only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP.
This patch using 576 as the min mtu for non-ipv6 packet works
for all those cases.
Reported-by: Jianlin Shi <jishi(a)redhat.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/ip6_tunnel.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1127,8 +1127,13 @@ route_lookup:
max_headroom += 8;
mtu -= 8;
}
- if (mtu < IPV6_MIN_MTU)
- mtu = IPV6_MIN_MTU;
+ if (skb->protocol == htons(ETH_P_IPV6)) {
+ if (mtu < IPV6_MIN_MTU)
+ mtu = IPV6_MIN_MTU;
+ } else if (mtu < 576) {
+ mtu = 576;
+ }
+
if (skb_dst(skb) && !t->parms.collect_md)
skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-4.9/ip6_tunnel-get-the-min-mtu-properly-in-ip6_tnl_xmit.patch
This is a note to let you know that I've just added the patch titled
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx5-fix-mlx5_ib_alloc_mr-error-flow.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Nitzan Carmi <nitzanc(a)mellanox.com>
Date: Tue, 26 Dec 2017 11:20:20 +0200
Subject: IB/mlx5: Fix mlx5_ib_alloc_mr error flow
From: Nitzan Carmi <nitzanc(a)mellanox.com>
[ Upstream commit 45e6ae7ef21b907dacb18da62d5787d74a31d860 ]
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.
To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).
Fixes: 8a187ee52b04 ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg(a)mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx5/mr.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1648,6 +1648,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib
MLX5_SET(mkc, mkc, access_mode, mr->access_mode);
MLX5_SET(mkc, mkc, umr_en, 1);
+ mr->ibmr.device = pd->device;
err = mlx5_core_create_mkey(dev->mdev, &mr->mmkey, in, inlen);
if (err)
goto err_destroy_psv;
Patches currently in stable-queue which might be from nitzanc(a)mellanox.com are
queue-4.9/ib-mlx5-fix-mlx5_ib_alloc_mr-error-flow.patch
queue-4.9/ib-mlx4-fix-mlx4_ib_alloc_mr-error-flow.patch
This is a note to let you know that I've just added the patch titled
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx4-fix-mlx4_ib_alloc_mr-error-flow.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 31 Dec 2017 15:33:14 +0200
Subject: IB/mlx4: Fix mlx4_ib_alloc_mr error flow
From: Leon Romanovsky <leonro(a)mellanox.com>
[ Upstream commit 5a371cf87e145b86efd32007e46146e78c1eff6d ]
ibmr.device is being set only after ib_alloc_mr() is successfully complete.
Therefore, in case imlx4_mr_enable() returns with error, the error flow
unwinder calls to mlx4_free_priv_pages(), which uses ibmr.device.
Such usage causes to NULL dereference oops and to fix it, the IB device
should be set in the mr struct earlier stage (e.g. prior to calling
mlx4_free_priv_pages()).
Fixes: 1b2cd0fc673c ("IB/mlx4: Support the new memory registration API")
Signed-off-by: Nitzan Carmi <nitzanc(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx4/mr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -406,7 +406,6 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib
goto err_free_mr;
mr->max_pages = max_num_sg;
-
err = mlx4_mr_enable(dev->dev, &mr->mmr);
if (err)
goto err_free_pl;
@@ -417,6 +416,7 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib
return &mr->ibmr;
err_free_pl:
+ mr->ibmr.device = pd->device;
mlx4_free_priv_pages(mr);
err_free_mr:
(void) mlx4_mr_free(dev->dev, &mr->mmr);
Patches currently in stable-queue which might be from leonro(a)mellanox.com are
queue-4.9/ib-mlx4-fix-mlx4_ib_alloc_mr-error-flow.patch
This is a note to let you know that I've just added the patch titled
IB/ipoib: Fix race condition in neigh creation
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-ipoib-fix-race-condition-in-neigh-creation.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Erez Shitrit <erezsh(a)mellanox.com>
Date: Sun, 31 Dec 2017 15:33:15 +0200
Subject: IB/ipoib: Fix race condition in neigh creation
From: Erez Shitrit <erezsh(a)mellanox.com>
[ Upstream commit 16ba3defb8bd01a9464ba4820a487f5b196b455b ]
When using enhanced mode for IPoIB, two threads may execute xmit in
parallel to two different TX queues while the target is the same.
In this case, both of them will add the same neighbor to the path's
neigh link list and we might see the following message:
list_add double add: new=ffff88024767a348, prev=ffff88024767a348...
WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70
ipoib_start_xmit+0x477/0x680 [ib_ipoib]
dev_hard_start_xmit+0xb9/0x3e0
sch_direct_xmit+0xf9/0x250
__qdisc_run+0x176/0x5d0
__dev_queue_xmit+0x1f5/0xb10
__dev_queue_xmit+0x55/0xb10
Analysis:
Two SKB are scheduled to be transmitted from two cores.
In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get.
Two calls to neigh_add_path are made. One thread takes the spin-lock
and calls ipoib_neigh_alloc which creates the neigh structure,
then (after the __path_find) the neigh is added to the path's neigh
link list. When the second thread enters the critical section it also
calls ipoib_neigh_alloc but in this case it gets the already allocated
ipoib_neigh structure, which is already linked to the path's neigh
link list and adds it again to the list. Which beside of triggering
the list, it creates a loop in the linked list. This loop leads to
endless loop inside path_rec_completion.
Solution:
Check list_empty(&neigh->list) before adding to the list.
Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send"
Fixes: b63b70d87741 ('IPoIB: Use a private hash table for path lookup in xmit path')
Signed-off-by: Erez Shitrit <erezsh(a)mellanox.com>
Reviewed-by: Alex Vesker <valex(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 25 ++++++++++++++++++-------
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 5 ++++-
2 files changed, 22 insertions(+), 8 deletions(-)
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -919,8 +919,8 @@ static int path_rec_start(struct net_dev
return 0;
}
-static void neigh_add_path(struct sk_buff *skb, u8 *daddr,
- struct net_device *dev)
+static struct ipoib_neigh *neigh_add_path(struct sk_buff *skb, u8 *daddr,
+ struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_path *path;
@@ -933,7 +933,15 @@ static void neigh_add_path(struct sk_buf
spin_unlock_irqrestore(&priv->lock, flags);
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
- return;
+ return NULL;
+ }
+
+ /* To avoid race condition, make sure that the
+ * neigh will be added only once.
+ */
+ if (unlikely(!list_empty(&neigh->list))) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ return neigh;
}
path = __path_find(dev, daddr + 4);
@@ -971,7 +979,7 @@ static void neigh_add_path(struct sk_buf
spin_unlock_irqrestore(&priv->lock, flags);
ipoib_send(dev, skb, path->ah, IPOIB_QPN(daddr));
ipoib_neigh_put(neigh);
- return;
+ return NULL;
}
} else {
neigh->ah = NULL;
@@ -988,7 +996,7 @@ static void neigh_add_path(struct sk_buf
spin_unlock_irqrestore(&priv->lock, flags);
ipoib_neigh_put(neigh);
- return;
+ return NULL;
err_path:
ipoib_neigh_free(neigh);
@@ -998,6 +1006,8 @@ err_drop:
spin_unlock_irqrestore(&priv->lock, flags);
ipoib_neigh_put(neigh);
+
+ return NULL;
}
static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
@@ -1103,8 +1113,9 @@ static int ipoib_start_xmit(struct sk_bu
case htons(ETH_P_TIPC):
neigh = ipoib_neigh_get(dev, phdr->hwaddr);
if (unlikely(!neigh)) {
- neigh_add_path(skb, phdr->hwaddr, dev);
- return NETDEV_TX_OK;
+ neigh = neigh_add_path(skb, phdr->hwaddr, dev);
+ if (likely(!neigh))
+ return NETDEV_TX_OK;
}
break;
case htons(ETH_P_ARP):
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -818,7 +818,10 @@ void ipoib_mcast_send(struct net_device
spin_lock_irqsave(&priv->lock, flags);
if (!neigh) {
neigh = ipoib_neigh_alloc(daddr, dev);
- if (neigh) {
+ /* Make sure that the neigh will be added only
+ * once to mcast list.
+ */
+ if (neigh && list_empty(&neigh->list)) {
kref_get(&mcast->ah->ref);
neigh->ah = mcast->ah;
list_add_tail(&neigh->list, &mcast->neigh_list);
Patches currently in stable-queue which might be from erezsh(a)mellanox.com are
queue-4.9/ib-ipoib-fix-race-condition-in-neigh-creation.patch
This is a note to let you know that I've just added the patch titled
i40e/i40evf: Account for frags split over multiple descriptors in check linearize
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
i40e-i40evf-account-for-frags-split-over-multiple-descriptors-in-check-linearize.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 28 16:13:29 CET 2018
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
Date: Fri, 8 Dec 2017 10:55:04 -0800
Subject: i40e/i40evf: Account for frags split over multiple descriptors in check linearize
From: Alexander Duyck <alexander.h.duyck(a)intel.com>
[ Upstream commit 248de22e638f10bd5bfc7624a357f940f66ba137 ]
The original code for __i40e_chk_linearize didn't take into account the
fact that if a fragment is 16K in size or larger it has to be split over 2
descriptors and the smaller of those 2 descriptors will be on the trailing
edge of the transmit. As a result we can get into situations where we didn't
catch requests that could result in a Tx hang.
This patch takes care of that by subtracting the length of all but the
trailing edge of the stale fragment before we test for sum. By doing this
we can guarantee that we have all cases covered, including the case of a
fragment that spans multiple descriptors. We don't need to worry about
checking the inner portions of this since 12K is the maximum aligned DMA
size and that is larger than any MSS will ever be since the MTU limit for
jumbos is something on the order of 9K.
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher(a)intel.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 +++++++++++++++++++++++---
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 26 +++++++++++++++++++++++---
2 files changed, 46 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2670,10 +2670,30 @@ bool __i40e_chk_linearize(struct sk_buff
/* Walk through fragments adding latest fragment, testing it, and
* then removing stale fragments from the sum.
*/
- stale = &skb_shinfo(skb)->frags[0];
- for (;;) {
+ for (stale = &skb_shinfo(skb)->frags[0];; stale++) {
+ int stale_size = skb_frag_size(stale);
+
sum += skb_frag_size(frag++);
+ /* The stale fragment may present us with a smaller
+ * descriptor than the actual fragment size. To account
+ * for that we need to remove all the data on the front and
+ * figure out what the remainder would be in the last
+ * descriptor associated with the fragment.
+ */
+ if (stale_size > I40E_MAX_DATA_PER_TXD) {
+ int align_pad = -(stale->page_offset) &
+ (I40E_MAX_READ_REQ_SIZE - 1);
+
+ sum -= align_pad;
+ stale_size -= align_pad;
+
+ do {
+ sum -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ stale_size -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ } while (stale_size > I40E_MAX_DATA_PER_TXD);
+ }
+
/* if sum is negative we failed to make sufficient progress */
if (sum < 0)
return true;
@@ -2681,7 +2701,7 @@ bool __i40e_chk_linearize(struct sk_buff
if (!nr_frags--)
break;
- sum -= skb_frag_size(stale++);
+ sum -= stale_size;
}
return false;
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1872,10 +1872,30 @@ bool __i40evf_chk_linearize(struct sk_bu
/* Walk through fragments adding latest fragment, testing it, and
* then removing stale fragments from the sum.
*/
- stale = &skb_shinfo(skb)->frags[0];
- for (;;) {
+ for (stale = &skb_shinfo(skb)->frags[0];; stale++) {
+ int stale_size = skb_frag_size(stale);
+
sum += skb_frag_size(frag++);
+ /* The stale fragment may present us with a smaller
+ * descriptor than the actual fragment size. To account
+ * for that we need to remove all the data on the front and
+ * figure out what the remainder would be in the last
+ * descriptor associated with the fragment.
+ */
+ if (stale_size > I40E_MAX_DATA_PER_TXD) {
+ int align_pad = -(stale->page_offset) &
+ (I40E_MAX_READ_REQ_SIZE - 1);
+
+ sum -= align_pad;
+ stale_size -= align_pad;
+
+ do {
+ sum -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ stale_size -= I40E_MAX_DATA_PER_TXD_ALIGNED;
+ } while (stale_size > I40E_MAX_DATA_PER_TXD);
+ }
+
/* if sum is negative we failed to make sufficient progress */
if (sum < 0)
return true;
@@ -1883,7 +1903,7 @@ bool __i40evf_chk_linearize(struct sk_bu
if (!nr_frags--)
break;
- sum -= skb_frag_size(stale++);
+ sum -= stale_size;
}
return false;
Patches currently in stable-queue which might be from alexander.h.duyck(a)intel.com are
queue-4.9/i40e-i40evf-account-for-frags-split-over-multiple-descriptors-in-check-linearize.patch