The following patch ensures that the bounce_buffer is not null prior to using it within skb_copy_from_linear_data.
The problem can be recreated toggling on/off Large send offload.
The following script when run (along with some iperf traffic recreates the crash within 5-10 mins or so).
while true do ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso off ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso on done
Note: This issue happens the very first time largsesend offload is turned off too (but the above script recreates the issue all the times)
[76563.914173] Unable to handle kernel paging request for data at address 0x00000000 [76563.914197] Faulting instruction address: 0xc000000000063940 [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1] [76563.914210] SMP NR_CPUS=2048 NUMA pSeries [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu [76563.914258] task: c0000000fe9efcc0 ti: c0000000feab0000 task.ti: c0000000feab0000 [76563.914265] NIP: c000000000063940 LR: d000000000d31788 CTR: c000000000064100 [76563.914271] REGS: c0000000feab2ff0 TRAP: 0300 Not tainted (4.4.0-34-generic) [76563.914277] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 4800284e XER: 0000001a [76563.914294] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 GPR00: 000000000000f240 c0000000feab3270 c0000000015b5d00 0000000000000000 GPR04: c00000000d9b0004 000000000000002a 0000000000000006 c0000000efc0ccac GPR08: d000000000d3dd28 0000000000000080 0000000000000000 d000000000d34758 GPR12: c000000000064100 c00000000e7f1c80 c0000000ffdca938 0000000000000100 GPR16: c0000000ffdca738 c0000000ffdca538 c0000000feab34a0 c0000000015f4d00 GPR20: 0000000000000000 c0000000015f4cf0 c0000000f5945900 0000000000000000 GPR24: 0000000000000000 0000000080000000 c0000000feab32d0 c0000000efc0ccac GPR28: c0000000f23ccb00 c0000000f5945000 c0000000f23ccb00 c0000000efc0cc00 [76563.914380] NIP [c000000000063940] memcpy_power7+0x40/0x800 [76563.914387] LR [d000000000d31788] ibmveth_start_xmit+0x1c8/0x8d0 [ibmveth] [76563.914392] Call Trace: [76563.914396] [c0000000feab3270] [c0000000feab32d0] 0xc0000000feab32d0 (unreliable) [76563.914407] [c0000000feab3360] [c0000000009816f4] dev_hard_start_xmit+0x304/0x530 [76563.914415] [c0000000feab3440] [c0000000009b6564] sch_direct_xmit+0x124/0x330 [76563.914423] [c0000000feab34e0] [c000000000981ddc] __dev_queue_xmit+0x26c/0x770 [76563.914431] [c0000000feab3580] [c000000000a1efc0] arp_xmit+0x30/0xd0 [76563.914438] [c0000000feab35f0] [c000000000a1f0f4] arp_send_dst.part.0+0x94/0xb0 [76563.914445] [c0000000feab3660] [c000000000a1fcb4] arp_solicit+0x114/0x2b0 [76563.914452] [c0000000feab3730] [c00000000098d8f4] neigh_probe+0x84/0xd0 [76563.914460] [c0000000feab3760] [c0000000009937cc] neigh_timer_handler+0xbc/0x320 [76563.914468] [c0000000feab37a0] [c00000000014a3fc] call_timer_fn+0x5c/0x1c0 [76563.914474] [c0000000feab3830] [c00000000014a8bc] run_timer_softirq+0x31c/0x3f0 [76563.914483] [c0000000feab3900] [c0000000000bec58] __do_softirq+0x188/0x3e0 [76563.914490] [c0000000feab39f0] [c0000000000bf128] irq_exit+0xc8/0x100 [76563.914498] [c0000000feab3a10] [c00000000001f974] timer_interrupt+0xa4/0xe0 [76563.914505] [c0000000feab3a40] [c000000000002714] decrementer_common+0x114/0x180 [76563.914515] --- interrupt: 901 at plpar_hcall_norets+0x1c/0x28 [76563.914515] LR = check_and_cede_processor+0x34/0x50 [76563.914525] [c0000000feab3d30] [c000000000916bf0] check_and_cede_processor+0x20/0x50 (unreliable) [76563.914534] [c0000000feab3d90] [c000000000916e18] shared_cede_loop+0x68/0x170 [76563.914541] [c0000000feab3dd0] [c000000000913e20] cpuidle_enter_state+0x160/0x410 [76563.914549] [c0000000feab3e30] [c000000000119d48] call_cpuidle+0x78/0xd0 [76563.914556] [c0000000feab3e70] [c00000000011a11c] cpu_startup_entry+0x37c/0x480 [76563.914564] [c0000000feab3f30] [c00000000004563c] start_secondary+0x33c/0x360 [76563.914572] [c0000000feab3f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
Signed-off-by: Bryant G. Ly bryantly@linux.vnet.ibm.com Signed-off-by: Sivakumar Krishnasamy ksiva@linux.vnet.ibm.com Cc: stable@vger.kernel.org # 4.4+ --- drivers/net/ethernet/ibm/ibmveth.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index f210398200ece..1d29b1649118d 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, */ if (force_bounce || (!skb_is_nonlinear(skb) && (skb->len < tx_copybreak))) { - skb_copy_from_linear_data(skb, adapter->bounce_buffer, - skb->len); + if (adapter->bounce_buffer) { + skb_copy_from_linear_data(skb, adapter->bounce_buffer, + skb->len); + } else { + adapter->tx_send_failed++; + netdev->stats.tx_dropped++; + goto out; + }
descs[0].fields.flags_len = desc_flags | skb->len; descs[0].fields.address = adapter->bounce_buffer_dma;
Hi Bryant,
This looks a bit better, but...
The following patch ensures that the bounce_buffer is not null prior to using it within skb_copy_from_linear_data.
How would this occur?
Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close() and allocated in ibmveth_open() only. If allocation fails, the whole opening of the device fails with -ENOMEM.
It seems your test case - changing TSO - causes ibmveth_set_tso() to cause an adaptor restart - an ibmveth_close(dev) and then an ibmveth_open(dev). Is this happening in parallel with an out of memory condition - is the memory allocation failing?
Alternatively, could it be the case that you're closing the device while packets are in flight, and then trying to use a bounce_buffer that's been freed elsewhere? Do you need to decouple memory freeing from ibmveth_close?
The problem can be recreated toggling on/off Large send offload.
The following script when run (along with some iperf traffic recreates the crash within 5-10 mins or so).
while true do ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso off ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso on done
Note: This issue happens the very first time largsesend offload is turned off too (but the above script recreates the issue all the times)
[76563.914173] Unable to handle kernel paging request for data at address 0x00000000 [76563.914197] Faulting instruction address: 0xc000000000063940 [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1] [76563.914210] SMP NR_CPUS=2048 NUMA pSeries [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
^--- yikes!
There are relevant changes to this area since 4.4: 2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit") 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
Does this crash occurs on a more recent kernel?
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index f210398200ece..1d29b1649118d 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, */ if (force_bounce || (!skb_is_nonlinear(skb) && (skb->len < tx_copybreak))) {
skb_copy_from_linear_data(skb, adapter->bounce_buffer,
skb->len);
if (adapter->bounce_buffer) {
skb_copy_from_linear_data(skb, adapter->bounce_buffer,
skb->len);
} else {
adapter->tx_send_failed++;
netdev->stats.tx_dropped++;
goto out;
Finally, as I alluded to at the top of this message, isn't the disappearance of the bounce-buffer a pretty serious issue? As I understand it, it means your data structures are now inconsistent. Do you need to - at the least - be more chatty here?
Regards, Daniel
}
descs[0].fields.flags_len = desc_flags | skb->len; descs[0].fields.address = adapter->bounce_buffer_dma; -- 2.13.6 (Apple Git-96)
On Wed, 2017-11-15 at 13:47 +1100, Daniel Axtens wrote:
Hi Bryant,
This looks a bit better, but...
The following patch ensures that the bounce_buffer is not null prior to using it within skb_copy_from_linear_data.
How would this occur?
Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close() and allocated in ibmveth_open() only. If allocation fails, the whole opening of the device fails with -ENOMEM.
It seems your test case - changing TSO - causes ibmveth_set_tso() to cause an adaptor restart - an ibmveth_close(dev) and then an ibmveth_open(dev). Is this happening in parallel with an out of memory condition - is the memory allocation failing?
Alternatively, could it be the case that you're closing the device while packets are in flight, and then trying to use a bounce_buffer that's been freed elsewhere? Do you need to decouple memory freeing from ibmveth_close?
Hrm, you should at least stop the tx queue and NAPI (and synchronize) while doing a reset. A lot of drivers, rather than doing close/open (which does subtly different things) tend to instead fire a work queue (often called reset_task) which does the job (and uses the same lower level helpers as open/close to free/allocate the rings etc...).
The problem can be recreated toggling on/off Large send offload.
The following script when run (along with some iperf traffic recreates the crash within 5-10 mins or so).
while true do ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso off ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso on done
Note: This issue happens the very first time largsesend offload is turned off too (but the above script recreates the issue all the times)
[76563.914173] Unable to handle kernel paging request for data at address 0x00000000 [76563.914197] Faulting instruction address: 0xc000000000063940 [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1] [76563.914210] SMP NR_CPUS=2048 NUMA pSeries [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
^--- yikes!
There are relevant changes to this area since 4.4: 2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit") 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
Does this crash occurs on a more recent kernel?
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index f210398200ece..1d29b1649118d 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, */ if (force_bounce || (!skb_is_nonlinear(skb) && (skb->len < tx_copybreak))) {
skb_copy_from_linear_data(skb, adapter->bounce_buffer,
skb->len);
if (adapter->bounce_buffer) {
skb_copy_from_linear_data(skb, adapter->bounce_buffer,
skb->len);
} else {
adapter->tx_send_failed++;
netdev->stats.tx_dropped++;
goto out;
Finally, as I alluded to at the top of this message, isn't the disappearance of the bounce-buffer a pretty serious issue? As I understand it, it means your data structures are now inconsistent. Do you need to - at the least - be more chatty here?
Regards, Daniel
}
descs[0].fields.flags_len = desc_flags | skb->len; descs[0].fields.address = adapter->bounce_buffer_dma; -- 2.13.6 (Apple Git-96)
On 11/14/17 8:47 PM, Daniel Axtens wrote:
Hi Bryant,
This looks a bit better, but...
The following patch ensures that the bounce_buffer is not null prior to using it within skb_copy_from_linear_data.
How would this occur?
Looking at ibmveth.c, I see bounce_buffer being freed in ibmveth_close() and allocated in ibmveth_open() only. If allocation fails, the whole opening of the device fails with -ENOMEM.
It seems your test case - changing TSO - causes ibmveth_set_tso() to cause an adaptor restart - an ibmveth_close(dev) and then an ibmveth_open(dev). Is this happening in parallel with an out of memory condition - is the memory allocation failing?
Alternatively, could it be the case that you're closing the device while packets are in flight, and then trying to use a bounce_buffer that's been freed elsewhere? Do you need to decouple memory freeing from ibmveth_close?
When checksum or largesend attribute for VEA is enabled, it triggers a ibmveth_set_features->ibmveth_set_csum_offload or ibmveth_set_tso
Those two routines will do a ibmveth_close, which as you see unmaps and frees the bounce buffer.
Like you said if there is data in transmit, it causes the system to crash due to bounce_buffer being used when its already freed elsewhere.
This patch just closes the window, bad things can still happen. I wanted to leave it up to the people who actively develop in ibmveth to close the window, since introducing a lock can be expensive in tx.
The problem can be recreated toggling on/off Large send offload.
The following script when run (along with some iperf traffic recreates the crash within 5-10 mins or so).
while true do ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso off ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso on done
Note: This issue happens the very first time largsesend offload is turned off too (but the above script recreates the issue all the times)
[76563.914173] Unable to handle kernel paging request for data at address 0x00000000 [76563.914197] Faulting instruction address: 0xc000000000063940 [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1] [76563.914210] SMP NR_CPUS=2048 NUMA pSeries [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu
^--- yikes!
There are relevant changes to this area since 4.4: 2c42bf4b4317 ("ibmveth: check return of skb_linearize in ibmveth_start_xmit") 66aa0678efc2 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.")
Does this crash occurs on a more recent kernel?
We DKMS ibmveth, so we have the most recent stuff. Also I have a test 4.14 kernel that still have this problem.
diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index f210398200ece..1d29b1649118d 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1092,8 +1092,14 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, */ if (force_bounce || (!skb_is_nonlinear(skb) && (skb->len < tx_copybreak))) {
skb_copy_from_linear_data(skb, adapter->bounce_buffer,
skb->len);
if (adapter->bounce_buffer) {
skb_copy_from_linear_data(skb, adapter->bounce_buffer,
skb->len);
} else {
adapter->tx_send_failed++;
netdev->stats.tx_dropped++;
goto out;
Finally, as I alluded to at the top of this message, isn't the disappearance of the bounce-buffer a pretty serious issue? As I understand it, it means your data structures are now inconsistent. Do you need to - at the least - be more chatty here?
Regards, Daniel
Yes, we can be more chatty, but like I said previously the real solution is to make sure if there is data in transmit to clear that up first before napi_disable in ibmveth_close.
-Bryant
On Wed, 2017-11-15 at 10:45 -0600, Bryant G. Ly wrote:
This patch just closes the window, bad things can still happen. I wanted to leave it up to the people who actively develop in ibmveth to close the window, since introducing a lock can be expensive in tx.
You don't need to instroduce a lock. The network stack already have a per-queue lock, you just use the existing one.
Look at what I did in sungem or ftgmac100 with the reset task, those are fairly simple drivers and should illustrate the technique.
Cheers, Ben.
linux-stable-mirror@lists.linaro.org