-----Original Message----- From: Paolo Abeni pabeni@redhat.com Sent: Tuesday, April 29, 2025 12:14 PM To: Chia-Yu Chang (Nokia) chia-yu.chang@nokia-bell-labs.com; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; edumazet@google.com; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) koen.de_schepper@nokia-bell-labs.com; g.white g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel vidhi_goel@apple.com Cc: Olivier Tilmans (Nokia) olivier.tilmans@nokia.com Subject: Re: [PATCH v5 net-next 03/15] tcp: AccECN core
CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
On 4/22/25 5:35 PM, chia-yu.chang@nokia-bell-labs.com wrote:
@@ -298,6 +298,9 @@ struct tcp_sock { u32 snd_up; /* Urgent pointer */ u32 delivered; /* Total data packets delivered incl. rexmits */ u32 delivered_ce; /* Like the above but only ECE marked packets */
u32 received_ce; /* Like the above but for rcvd CE marked pkts */
u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce */
unused2:4;
AFAICS this uses a 4 bytes hole present prior to this patch after "rcv_wnd", leaving a 3 bytes hole after 'unused2'. Possibly should be worth mentioning the hole presence.
@Eric: would it make sense use this hole for 'noneagle'/'rate_app_limited' and shrink the 'tcp_sock_write_txrx' group a bit?
Hi Paolo,
Thanks for the feedback and sorry for my late response. I can either mention it here or move the places. However, as the following patches will continue change holes, so maybe I mention the hole change per patch make it more understandable. If this is find for you, then I will make revision in the next version.
[...]
@@ -5095,7 +5097,7 @@ static void __init tcp_struct_check(void) /* 32bit arches with 8byte alignment on u64 fields might need padding * before tcp_clock_cache. */
CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 92 + 4);
CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock,
- tcp_sock_write_txrx, 97 + 7);
Really? I *think* the change here should not move the cacheline end around, due to holes. Could you please include the relevant pahole (trimmed) output prior to this patch and after in the commit message?
Here is pahole output before and after this patch. Indeed, it creates 3 bytes hole after 'unused2' so it shall add (5+3)=8 to the original 92 + 4. Finally, it will be 92 + 4 + (5 + 3) = 97 + 7. *BEFORE this patch* __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2585 0 */
/* XXX 3 bytes hole, try to pack */
__be32 pred_flags; /* 2588 4 */ u64 tcp_clock_cache; /* 2592 8 */ u64 tcp_mstamp; /* 2600 8 */ u32 rcv_nxt; /* 2608 4 */ u32 snd_nxt; /* 2612 4 */ u32 snd_una; /* 2616 4 */ u32 window_clamp; /* 2620 4 */ /* --- cacheline 41 boundary (2624 bytes) --- */ u32 srtt_us; /* 2624 4 */ u32 packets_out; /* 2628 4 */ u32 snd_up; /* 2632 4 */ u32 delivered; /* 2636 4 */ u32 delivered_ce; /* 2640 4 */ u32 app_limited; /* 2644 4 */ u32 rcv_wnd; /* 2648 4 */ struct tcp_options_received rx_opt; /* 2652 24 */ u8 nonagle:4; /* 2676: 0 1 */ u8 rate_app_limited:1; /* 2676: 4 1 */
/* XXX 3 bits hole, try to pack */
__u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 2677 0 */
/* XXX 3 bytes hole, try to pack */
*AFTER this patch* __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2585 0 */
/* XXX 3 bytes hole, try to pack */
__be32 pred_flags; /* 2588 4 */ u64 tcp_clock_cache; /* 2592 8 */ u64 tcp_mstamp; /* 2600 8 */ u32 rcv_nxt; /* 2608 4 */ u32 snd_nxt; /* 2612 4 */ u32 snd_una; /* 2616 4 */ u32 window_clamp; /* 2620 4 */ /* --- cacheline 41 boundary (2624 bytes) --- */ u32 srtt_us; /* 2624 4 */ u32 packets_out; /* 2628 4 */ u32 snd_up; /* 2632 4 */ u32 delivered; /* 2636 4 */ u32 delivered_ce; /* 2640 4 */ u32 received_ce; /* 2644 4 */ u8 received_ce_pending:4; /* 2648: 0 1 */ u8 unused2:4; /* 2648: 4 1 */
/* XXX 3 bytes hole, try to pack */
u32 app_limited; /* 2652 4 */ u32 rcv_wnd; /* 2656 4 */ struct tcp_options_received rx_opt; /* 2660 24 */ u8 nonagle:4; /* 2684: 0 1 */ u8 rate_app_limited:1; /* 2684: 4 1 */
/* XXX 3 bits hole, try to pack */
__u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 2685 0 */
[...]
@@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb) if (tcp_ca_needs_ecn(sk)) tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&
tcp_ecn_mode_rfc3168(tp)) { /* Better not delay acks, sender can have a very low cwnd */ tcp_enter_quickack_mode(sk, 2); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; }
tp->ecn_flags |= TCP_ECN_SEEN;
At this point is not entirely clear to me why the removal of the above line is needed/correct.
What I see is to move the place to set this flag from here to tcp_ecn_received_counters().
Also, this function will work when receiving data by calling tcp_event_data_recv().
While tcp_ecn_received_counters() takes effect in more places (e.g., either len <= tcp_header_len or NOT) to ensure ACE counter tracks all segments including pure ACKs.
[...]
@@ -4056,6 +4118,11 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_rack_update_reo_wnd(sk, &rs);
if (tcp_ecn_mode_accecn(tp))
ecn_count = tcp_accecn_process(sk, skb,
tp->delivered - delivered,
&flag);
AFAICS the above could set FLAG_ECE in flags, menaning the previous tcp_clean_rtx_queue() will run with such flag cleared and the later function checking such flag will not. I wondering if this inconsistency could cause problems?
This flag set by tcp_accecn_process() will be used by following functions: tcp_in_ack_event(), tcp_fastretrans_alert().
And this shall only impact the AccECN mode.
Best regards, Chia-Yu
/P