On Fri, Apr 18, 2025 at 06:34:07PM +0100, Simon Horman wrote:
On Fri, Apr 18, 2025 at 01:00:24AM +0200, chia-yu.chang@nokia-bell-labs.com wrote:
From: Ilpo Järvinen ij@kernel.org
Instead of sending the option in every ACK, limit sending to those ACKs where the option is necessary:
- Handshake
- "Change-triggered ACK" + the ACK following it. The 2nd ACK is necessary to unambiguously indicate which of the ECN byte counters in increasing. The first ACK has two counters increasing due to the ecnfield edge.
- ACKs with CE to allow CEP delta validations to take advantage of the option.
- Force option to be sent every at least once per 2^22 bytes. The check is done using the bit edges of the byte counters (avoids need for extra variables).
- AccECN option beacon to send a few times per RTT even if nothing in the ECN state requires that. The default is 3 times per RTT, and its period can be set via sysctl_tcp_ecn_option_beacon.
Signed-off-by: Ilpo Järvinen ij@kernel.org Co-developed-by: Chia-Yu Chang chia-yu.chang@nokia-bell-labs.com Signed-off-by: Chia-Yu Chang chia-yu.chang@nokia-bell-labs.com
include/linux/tcp.h | 3 +++ include/net/netns/ipv4.h | 1 + include/net/tcp.h | 1 + net/ipv4/sysctl_net_ipv4.c | 9 ++++++++ net/ipv4/tcp.c | 5 ++++- net/ipv4/tcp_input.c | 36 +++++++++++++++++++++++++++++++- net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_minisocks.c | 2 ++ net/ipv4/tcp_output.c | 42 ++++++++++++++++++++++++++++++-------- 9 files changed, 90 insertions(+), 10 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 0e032d9631ac..9619524d8901 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -309,7 +309,10 @@ struct tcp_sock { u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce */ unused2:4; u8 accecn_minlen:2,/* Minimum length of AccECN option sent */
prev_ecnfield:2,/* ECN bits from the previous segment */
est_ecnfield:2;/* ECN field for AccECN delivered estimates */accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */
- u64 accecn_opt_tstamp; /* Last AccECN option sent timestamp */ u32 app_limited; /* limited until "delivered" reaches this val */ u32 rcv_wnd; /* Current receiver window */
/*
...
@@ -5113,7 +5116,7 @@ static void __init tcp_struct_check(void) /* 32bit arches with 8byte alignment on u64 fields might need padding * before tcp_clock_cache. */
- CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 122 + 6);
- CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 130 + 6);
Hi,
While this seems find on x86_64, x86_32 and arm64, it does not seem correct on ARM (32-bit).
This is because the existing two byte hole after est_ecnfield grows to 6 bytes. I assume this is because of alignment requirements. But in any case, the result is that the overall group size increases by 12 bytes rather than 8.
I believe that you can avoid the hole growing, and thus limit the overall increase in size of the group to 12 bytes, by placing accecn_opt_tstamp elsewhere, e.g. after app_limited rather than before it.
You can exercise this by cross compiling for ARM and examining the structure layout using pahole.
Cross compilers available from [1] should be able to do that something like this:
PATH=".../gcc-14.2.0-nolibc/arm-linux-gnueabi/bin:$PATH" export ARCH=arm export CROSS_COMPILE=arm-linux-gnueabi-
make allmodconfig echo "CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y" >> .config yes "" | make oldconfig
make net/ipv4/tcp.o
Sorry, I omitted the invocation of pahole.
pahole net/ipv4/tcp.o
[1] https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/14.2.0/