The patch below does not apply to the 5.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 441fdee1eaf050ef0040bde0d7af075c1c6a6d8b Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Wed, 29 Apr 2020 23:48:43 +0100
Subject: [PATCH] rxrpc: Fix ack discard
The Rx protocol has a "previousPacket" field in it that is not handled in
the same way by all protocol implementations. Sometimes it contains the
serial number of the last DATA packet received, sometimes the sequence
number of the last DATA packet received and sometimes the highest sequence
number so far received.
AF_RXRPC is using this to weed out ACKs that are out of date (it's possible
for ACK packets to get reordered on the wire), but this does not work with
OpenAFS which will just stick the sequence number of the last packet seen
into previousPacket.
The issue being seen is that big AFS FS.StoreData RPC (eg. of ~256MiB) are
timing out when partly sent. A trace was captured, with an additional
tracepoint to show ACKs being discarded in rxrpc_input_ack(). Here's an
excerpt showing the problem.
52873.203230: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 0002449c q=00024499 fl=09
A DATA packet with sequence number 00024499 has been transmitted (the "q="
field).
...
52873.243296: rxrpc_rx_ack: c=000004ae 00012a2b DLY r=00024499 f=00024497 p=00024496 n=0
52873.243376: rxrpc_rx_ack: c=000004ae 00012a2c IDL r=0002449b f=00024499 p=00024498 n=0
52873.243383: rxrpc_rx_ack: c=000004ae 00012a2d OOS r=0002449d f=00024499 p=0002449a n=2
The Out-Of-Sequence ACK indicates that the server didn't see DATA sequence
number 00024499, but did see seq 0002449a (previousPacket, shown as "p=",
skipped the number, but firstPacket, "f=", which shows the bottom of the
window is set at that point).
52873.252663: rxrpc_retransmit: c=000004ae q=24499 a=02 xp=14581537
52873.252664: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244bc q=00024499 fl=0b *RETRANS*
The packet has been retransmitted. Retransmission recurs until the peer
says it got the packet.
52873.271013: rxrpc_rx_ack: c=000004ae 00012a31 OOS r=000244a1 f=00024499 p=0002449e n=6
More OOS ACKs indicate that the other packets that are already in the
transmission pipeline are being received. The specific-ACK list is up to 6
ACKs and NAKs.
...
52873.284792: rxrpc_rx_ack: c=000004ae 00012a49 OOS r=000244b9 f=00024499 p=000244b6 n=30
52873.284802: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=63505500
52873.284804: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c2 q=00024499 fl=0b *RETRANS*
52873.287468: rxrpc_rx_ack: c=000004ae 00012a4a OOS r=000244ba f=00024499 p=000244b7 n=31
52873.287478: rxrpc_rx_ack: c=000004ae 00012a4b OOS r=000244bb f=00024499 p=000244b8 n=32
At this point, the server's receive window is full (n=32) with presumably 1
NAK'd packet and 31 ACK'd packets. We can't transmit any more packets.
52873.287488: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=61327980
52873.287489: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c3 q=00024499 fl=0b *RETRANS*
52873.293850: rxrpc_rx_ack: c=000004ae 00012a4c DLY r=000244bc f=000244a0 p=00024499 n=25
And now we've received an ACK indicating that a DATA retransmission was
received. 7 packets have been processed (the occupied part of the window
moved, as indicated by f= and n=).
52873.293853: rxrpc_rx_discard_ack: c=000004ae r=00012a4c 000244a0<00024499 00024499<000244b8
However, the DLY ACK gets discarded because its previousPacket has gone
backwards (from p=000244b8, in the ACK at 52873.287478 to p=00024499 in the
ACK at 52873.293850).
We then end up in a continuous cycle of retransmit/discard. kafs fails to
update its window because it's discarding the ACKs and can't transmit an
extra packet that would clear the issue because the window is full.
OpenAFS doesn't change the previousPacket value in the ACKs because no new
DATA packets are received with a different previousPacket number.
Fix this by altering the discard check to only discard an ACK based on
previousPacket if there was no advance in the firstPacket. This allows us
to transmit a new packet which will cause previousPacket to advance in the
next ACK.
The check, however, needs to allow for the possibility that previousPacket
may actually have had the serial number placed in it instead - in which
case it will go outside the window and we should ignore it.
Fixes: 1a2391c30c0b ("rxrpc: Fix detection of out of order acks")
Reported-by: Dave Botsch <botsch(a)cnf.cornell.edu>
Signed-off-by: David Howells <dhowells(a)redhat.com>
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 2f22f082a66c..3be4177baf70 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -802,6 +802,30 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
}
}
+/*
+ * Return true if the ACK is valid - ie. it doesn't appear to have regressed
+ * with respect to the ack state conveyed by preceding ACKs.
+ */
+static bool rxrpc_is_ack_valid(struct rxrpc_call *call,
+ rxrpc_seq_t first_pkt, rxrpc_seq_t prev_pkt)
+{
+ rxrpc_seq_t base = READ_ONCE(call->ackr_first_seq);
+
+ if (after(first_pkt, base))
+ return true; /* The window advanced */
+
+ if (before(first_pkt, base))
+ return false; /* firstPacket regressed */
+
+ if (after_eq(prev_pkt, call->ackr_prev_seq))
+ return true; /* previousPacket hasn't regressed. */
+
+ /* Some rx implementations put a serial number in previousPacket. */
+ if (after_eq(prev_pkt, base + call->tx_winsize))
+ return false;
+ return true;
+}
+
/*
* Process an ACK packet.
*
@@ -865,8 +889,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
}
/* Discard any out-of-order or duplicate ACKs (outside lock). */
- if (before(first_soft_ack, call->ackr_first_seq) ||
- before(prev_pkt, call->ackr_prev_seq)) {
+ if (!rxrpc_is_ack_valid(call, first_soft_ack, prev_pkt)) {
trace_rxrpc_rx_discard_ack(call->debug_id, sp->hdr.serial,
first_soft_ack, call->ackr_first_seq,
prev_pkt, call->ackr_prev_seq);
@@ -882,8 +905,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
spin_lock(&call->input_lock);
/* Discard any out-of-order or duplicate ACKs (inside lock). */
- if (before(first_soft_ack, call->ackr_first_seq) ||
- before(prev_pkt, call->ackr_prev_seq)) {
+ if (!rxrpc_is_ack_valid(call, first_soft_ack, prev_pkt)) {
trace_rxrpc_rx_discard_ack(call->debug_id, sp->hdr.serial,
first_soft_ack, call->ackr_first_seq,
prev_pkt, call->ackr_prev_seq);
I'm announcing the release of the 4.9.225 kernel.
All users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/networking/l2tp.txt | 8
Makefile | 2
arch/arm/include/asm/futex.h | 9
arch/arm64/kernel/machine_kexec.c | 3
drivers/base/component.c | 8
drivers/dma/tegra210-adma.c | 2
drivers/hid/hid-ids.h | 1
drivers/hid/hid-multitouch.c | 3
drivers/i2c/i2c-dev.c | 48 +--
drivers/i2c/muxes/i2c-demux-pinctrl.c | 1
drivers/iio/dac/vf610_dac.c | 1
drivers/iommu/amd_iommu_init.c | 9
drivers/misc/mei/client.c | 2
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 13
drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 6
drivers/net/ethernet/intel/igb/igb_main.c | 4
drivers/net/gtp.c | 9
drivers/nvdimm/btt.c | 8
drivers/platform/x86/alienware-wmi.c | 17 -
drivers/platform/x86/asus-nb-wmi.c | 24 +
drivers/rapidio/devices/rio_mport_cdev.c | 5
drivers/staging/greybus/uart.c | 4
drivers/staging/iio/accel/sca3000_ring.c | 2
drivers/staging/iio/resolver/ad2s1210.c | 17 -
drivers/usb/core/message.c | 4
drivers/watchdog/watchdog_dev.c | 67 +---
fs/ceph/caps.c | 1
fs/configfs/dir.c | 1
fs/file.c | 2
fs/gfs2/glock.c | 3
include/linux/net.h | 3
include/linux/padata.h | 13
include/uapi/linux/if_pppol2tp.h | 13
include/uapi/linux/l2tp.h | 17 +
kernel/padata.c | 88 ++---
lib/Makefile | 2
net/l2tp/l2tp_core.c | 174 +++--------
net/l2tp/l2tp_core.h | 46 +-
net/l2tp/l2tp_eth.c | 216 ++++++++-----
net/l2tp/l2tp_netlink.c | 79 ++---
net/l2tp/l2tp_ppp.c | 309 +++++++++++---------
net/socket.c | 46 ++
scripts/gcc-plugins/Makefile | 1
scripts/gcc-plugins/gcc-common.h | 4
security/integrity/evm/evm_crypto.c | 2
security/integrity/ima/ima_fs.c | 3
sound/core/pcm_lib.c | 1
47 files changed, 733 insertions(+), 568 deletions(-)
Al Viro (1):
fix multiplication overflow in copy_fdtable()
Alan Stern (1):
USB: core: Fix misleading driver bug report
Alexander Monakov (1):
iommu/amd: Fix over-read of ACPI UID from IVRS table
Alexander Usyskin (1):
mei: release me_cl object reference
Arjun Vynipadath (2):
cxgb4: free mac_hlist properly
cxgb4/cxgb4vf: Fix mac_hlist initialization and free
Arnd Bergmann (1):
ubsan: build ubsan.c more conservatively
Asbjørn Sloth Tønnesen (3):
net: l2tp: export debug flags to UAPI
net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*
net: l2tp: ppp: change PPPOL2TP_MSG_* => L2TP_MSG_*
Bob Peterson (1):
Revert "gfs2: Don't demote a glock until its revokes are written"
Brent Lu (1):
ALSA: pcm: fix incorrect hw_base increase
Cao jin (1):
igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
Christoph Hellwig (1):
arm64: fix the flush_icache_range arguments in machine_kexec
Christophe JAILLET (4):
i2c: mux: demux-pinctrl: Fix an error handling path in 'i2c_demux_pinctrl_probe()'
dmaengine: tegra210-adma: Fix an error handling path in 'tegra_adma_probe()'
iio: dac: vf610: Fix an error handling path in 'vf610_dac_probe()'
iio: sca3000: Remove an erroneous 'get_device()'
Colin Ian King (1):
platform/x86: alienware-wmi: fix kfree on potentially uninitialized pointer
Daniel Jordan (2):
padata: initialize pd->cpu with effective cpumask
padata: purge get_cpu and reorder_via_wq from padata_do_serial
Dragos Bogdan (1):
staging: iio: ad2s1210: Fix SPI reading
Frédéric Pierret (fepitre) (1):
gcc-common.h: Update for GCC 10
Greg Kroah-Hartman (1):
Linux 4.9.225
Guillaume Nault (17):
l2tp: remove useless duplicate session detection in l2tp_netlink
l2tp: remove l2tp_session_find()
l2tp: define parameters of l2tp_session_get*() as "const"
l2tp: define parameters of l2tp_tunnel_find*() as "const"
l2tp: initialise session's refcount before making it reachable
l2tp: hold tunnel while looking up sessions in l2tp_netlink
l2tp: hold tunnel while processing genl delete command
l2tp: hold tunnel while handling genl tunnel updates
l2tp: hold tunnel while handling genl TUNNEL_GET commands
l2tp: hold tunnel used while creating sessions with netlink
l2tp: prevent creation of sessions on terminated tunnels
l2tp: pass tunnel pointer to ->session_create()
l2tp: fix l2tp_eth module loading
l2tp: don't register sessions in l2tp_session_create()
l2tp: initialise l2tp_eth sessions before registering them
l2tp: protect sock pointer of struct pppol2tp_session with RCU
l2tp: initialise PPP sessions before registering them
Hans de Goede (1):
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
Herbert Xu (1):
padata: Replace delayed timer with immediate workqueue in padata_reorder
James Hilliard (1):
component: Silence bind error on -EPROBE_DEFER
Jason A. Donenfeld (1):
padata: get_next is never NULL
John Hubbard (1):
rapidio: fix an error in get_user_pages_fast() error handling
Kevin Hao (2):
i2c: dev: Fix the race between the release of i2c_dev and cdev
watchdog: Fix the race between the release of watchdog_core_data and cdev
Mathias Krause (3):
padata: ensure the reorder timer callback runs on the correct CPU
padata: ensure padata_do_serial() runs on the correct CPU
padata: set cpu_index of unused CPUs to -1
Oscar Carter (1):
staging: greybus: Fix uninitialized scalar variable
Peter Zijlstra (1):
x86/uaccess, ubsan: Fix UBSAN vs. SMAP
R. Parameswaran (3):
New kernel function to get IP overhead on a socket.
L2TP:Adjust intf MTU, add underlay L3, L2 hdrs.
l2tp: device MTU setup, tunnel socket needs a lock
Roberto Sassu (2):
evm: Check also if *tfm is an error pointer in init_desc()
ima: Fix return value of ima_write_policy()
Sebastian Reichel (1):
HID: multitouch: add eGalaxTouch P80H84 support
Thomas Gleixner (1):
ARM: futex: Address build warning
Tobias Klauser (1):
padata: Remove unused but set variables
Vishal Verma (1):
libnvdimm/btt: Remove unnecessary code in btt_freelist_init
Wu Bo (1):
ceph: fix double unlock in handle_cap_export()
Xiyu Yang (1):
configfs: fix config_item refcnt leak in configfs_rmdir()
Yoshiyuki Kurauchi (1):
gtp: set NLM_F_MULTI flag in gtp_genl_dump_pdp()
I'm announcing the release of the 4.4.225 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/networking/l2tp.txt | 8
Makefile | 2
arch/arm/include/asm/futex.h | 9
drivers/hid/hid-ids.h | 1
drivers/hid/hid-multitouch.c | 3
drivers/i2c/i2c-dev.c | 60 +++--
drivers/media/media-device.c | 43 +++-
drivers/media/media-devnode.c | 168 +++++++++-------
drivers/media/usb/uvc/uvc_driver.c | 2
drivers/misc/mei/client.c | 2
drivers/net/ethernet/intel/igb/igb_main.c | 4
drivers/nvdimm/btt.c | 8
drivers/platform/x86/alienware-wmi.c | 17 -
drivers/platform/x86/asus-nb-wmi.c | 24 ++
drivers/staging/iio/accel/sca3000_ring.c | 2
drivers/staging/iio/resolver/ad2s1210.c | 17 +
drivers/usb/core/message.c | 4
fs/ceph/caps.c | 1
fs/ext4/xattr.c | 66 +++---
fs/file.c | 2
fs/gfs2/glock.c | 3
include/linux/cpumask.h | 19 +
include/linux/net.h | 3
include/linux/padata.h | 13 -
include/media/media-device.h | 5
include/media/media-devnode.h | 32 ++-
include/net/ipv6.h | 2
include/uapi/linux/if_pppol2tp.h | 13 -
include/uapi/linux/l2tp.h | 17 +
kernel/padata.c | 88 +++-----
lib/cpumask.c | 32 +++
net/ipv6/datagram.c | 4
net/l2tp/l2tp_core.c | 181 +++++------------
net/l2tp/l2tp_core.h | 47 ++--
net/l2tp/l2tp_eth.c | 216 ++++++++++++--------
net/l2tp/l2tp_ip.c | 68 +++---
net/l2tp/l2tp_ip6.c | 82 +++----
net/l2tp/l2tp_netlink.c | 124 +++++++-----
net/l2tp/l2tp_ppp.c | 309 +++++++++++++++++-------------
net/socket.c | 46 ++++
security/integrity/evm/evm_crypto.c | 2
sound/core/pcm_lib.c | 1
42 files changed, 1015 insertions(+), 735 deletions(-)
Al Viro (1):
fix multiplication overflow in copy_fdtable()
Alan Stern (1):
USB: core: Fix misleading driver bug report
Alexander Usyskin (1):
mei: release me_cl object reference
Asbjørn Sloth Tønnesen (3):
net: l2tp: export debug flags to UAPI
net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*
net: l2tp: ppp: change PPPOL2TP_MSG_* => L2TP_MSG_*
Bob Peterson (1):
Revert "gfs2: Don't demote a glock until its revokes are written"
Brent Lu (1):
ALSA: pcm: fix incorrect hw_base increase
Cao jin (1):
igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
Christophe JAILLET (1):
iio: sca3000: Remove an erroneous 'get_device()'
Colin Ian King (1):
platform/x86: alienware-wmi: fix kfree on potentially uninitialized pointer
Dan Carpenter (1):
i2c: dev: use after free in detach
Daniel Jordan (2):
padata: initialize pd->cpu with effective cpumask
padata: purge get_cpu and reorder_via_wq from padata_do_serial
Dragos Bogdan (1):
staging: iio: ad2s1210: Fix SPI reading
Erico Nunes (1):
i2c: dev: switch from register_chrdev to cdev API
Greg Kroah-Hartman (1):
Linux 4.4.225
Guillaume Nault (22):
l2tp: lock socket before checking flags in connect()
l2tp: fix racy socket lookup in l2tp_ip and l2tp_ip6 bind()
l2tp: hold session while sending creation notifications
l2tp: take a reference on sessions used in genetlink handlers
l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6
l2tp: remove useless duplicate session detection in l2tp_netlink
l2tp: remove l2tp_session_find()
l2tp: define parameters of l2tp_session_get*() as "const"
l2tp: define parameters of l2tp_tunnel_find*() as "const"
l2tp: initialise session's refcount before making it reachable
l2tp: hold tunnel while looking up sessions in l2tp_netlink
l2tp: hold tunnel while processing genl delete command
l2tp: hold tunnel while handling genl tunnel updates
l2tp: hold tunnel while handling genl TUNNEL_GET commands
l2tp: hold tunnel used while creating sessions with netlink
l2tp: prevent creation of sessions on terminated tunnels
l2tp: pass tunnel pointer to ->session_create()
l2tp: fix l2tp_eth module loading
l2tp: don't register sessions in l2tp_session_create()
l2tp: initialise l2tp_eth sessions before registering them
l2tp: protect sock pointer of struct pppol2tp_session with RCU
l2tp: initialise PPP sessions before registering them
Hans de Goede (1):
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
Herbert Xu (1):
padata: Replace delayed timer with immediate workqueue in padata_reorder
Jason A. Donenfeld (1):
padata: get_next is never NULL
Kevin Hao (1):
i2c: dev: Fix the race between the release of i2c_dev and cdev
Mathias Krause (3):
padata: ensure the reorder timer callback runs on the correct CPU
padata: ensure padata_do_serial() runs on the correct CPU
padata: set cpu_index of unused CPUs to -1
Mauro Carvalho Chehab (2):
media-devnode: fix namespace mess
media-device: dynamically allocate struct media_devnode
Max Kellermann (2):
drivers/media/media-devnode: clear private_data before put_device()
media-devnode: add missing mutex lock in error handler
Michael Kelley (1):
cpumask: Make for_each_cpu_wrap() available on UP as well
Peter Zijlstra (1):
sched/fair, cpumask: Export for_each_cpu_wrap()
R. Parameswaran (3):
New kernel function to get IP overhead on a socket.
L2TP:Adjust intf MTU, add underlay L3, L2 hdrs.
l2tp: device MTU setup, tunnel socket needs a lock
Roberto Sassu (1):
evm: Check also if *tfm is an error pointer in init_desc()
Sebastian Reichel (1):
HID: multitouch: add eGalaxTouch P80H84 support
Shuah Khan (3):
media: Fix media_open() to clear filp->private_data in error leg
media: fix use-after-free in cdev_put() when app exits after driver unbind
media: fix media devnode ioctl/syscall and unregister race
Theodore Ts'o (1):
ext4: lock the xattr block before checksuming it
Thomas Gleixner (1):
ARM: futex: Address build warning
Tobias Klauser (1):
padata: Remove unused but set variables
Vishal Verma (1):
libnvdimm/btt: Remove unnecessary code in btt_freelist_init
Wolfram Sang (1):
i2c: dev: don't start function name with 'return'
Wu Bo (1):
ceph: fix double unlock in handle_cap_export()
viresh kumar (1):
i2c-dev: don't get i2c adapter via i2c_dev