RE: [PATCH v8 net-next 04/15] tcp: AccECN core

20 Jun 2025


      ...
-----Original Message-----
From: Paolo Abeni pabeni@redhat.com 
Sent: Tuesday, June 17, 2025 10:03 AM
To: Chia-Yu Chang (Nokia) chia-yu.chang@nokia-bell-labs.com; edumazet@google.com; linux-doc@vger.kernel.org; corbet@lwn.net; horms@kernel.org; dsahern@kernel.org; kuniyu@amazon.com; bpf@vger.kernel.org; netdev@vger.kernel.org; dave.taht@gmail.com; jhs@mojatatu.com; kuba@kernel.org; stephen@networkplumber.org; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; andrew+netdev@lunn.ch; donald.hunter@gmail.com; ast@fiberby.net; liuhangbin@gmail.com; shuah@kernel.org; linux-kselftest@vger.kernel.org; ij@kernel.org; ncardwell@google.com; Koen De Schepper (Nokia) koen.de_schepper@nokia-bell-labs.com; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
Cc: Olivier Tilmans (Nokia) olivier.tilmans@nokia.com
Subject: Re: [PATCH v8 net-next 04/15] tcp: AccECN core
CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
On 6/10/25 2:53 PM, chia-yu.chang@nokia-bell-labs.com wrote:
...
From: Ilpo Järvinen ij@kernel.org
This change implements Accurate ECN without negotiation and AccECN 
Option (that will be added by later changes). Based on AccECN 
specifications:
  https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Accurate ECN allows feeding back the number of CE (congestion
experienced) marks accurately to the sender in contrast to
RFC3168 ECN that can only signal one marks-seen-yes/no per RTT.
Congestion control algorithms can take advantage of the accurate ECN 
information to fine-tune their congestion response to avoid drastic 
rate reduction when only mild congestion is encountered.
With Accurate ECN, tp->received_ce (r.cep in AccECN spec) keeps track 
of how many segments have arrived with a CE mark. Accurate ECN uses 
ACE field (ECE, CWR, AE) to communicate the value back to the sender 
which updates tp->delivered_ce (s.cep) based on the feedback. This 
signalling channel is lossy when ACE field overflow occurs.
Conservative strategy is selected here to deal with the ACE overflow, 
however, some strategies using the AccECN option later in the overall 
patchset mitigate against false overflows detected.
The ACE field values on the wire are offset by 
TCP_ACCECN_CEP_INIT_OFFSET. Delivered_ce/received_ce count the real CE 
marks rather than forcing all downstream users to adapt to the wire 
offset.
This patch uses the first 1-byte hole and the last 4-byte hole of the 
tcp_sock_write_txrx for 'received_ce_pending' and 'received_ce'.
Also, the group size of tcp_sock_write_txrx is increased from
91 + 4 to 95 + 4 due to the new u32 received_ce member. Below are the 
trimmed pahole outcomes before and after this patch.
AFAICS 'received_ce' fills the existing 4 bytes hole, so tcp_sock_write_txrx size should be now (95 + 0), am I missreading something?
Hi Paolo,
First, thanks for the feedback.
Regarding such "+ 4" is due to the byte alignemnt needed for ARM arch.
You can see below tha pahole outputs BEFORE and AFTER this patch, in which a 4-byte hole is added before "tcp_clock_cache" to make it aligns from the multiple of 8.
And the comment in net/ipv4/tcp.c also explain it: "32bit arches with 8byte alignment on u64 fields might need padding before tcp_clock_cache."
Does it make sense to you?
[BEFORE PATCH]
__u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  1869     0 */
u8                         nonagle:4;            /*  1869: 0  1 */
u8                         rate_app_limited:1;   /*  1869: 4  1 */
/* XXX 3 bits hole, try to pack */
/* XXX 2 bytes hole, try to pack */
__be32                     pred_flags;           /*  1872     4 */
/* XXX 4 bytes hole, try to pack */
u64                        tcp_clock_cache;      /*  1880     8 */
u64                        tcp_mstamp;           /*  1888     8 */
u32                        rcv_nxt;              /*  1896     4 */
u32                        snd_nxt;              /*  1900     4 */
u32                        snd_una;              /*  1904     4 */
u32                        window_clamp;         /*  1908     4 */
u32                        srtt_us;              /*  1912     4 */
u32                        packets_out;          /*  1916     4 */
/* --- cacheline 30 boundary (1920 bytes) --- */
u32                        snd_up;               /*  1920     4 */
u32                        delivered;            /*  1924     4 */
u32                        delivered_ce;         /*  1928     4 */
u32                        app_limited;          /*  1932     4 */
u32                        rcv_wnd;              /*  1936     4 */
struct tcp_options_received rx_opt;              /*  1940    24 */
__u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  1964     0 */
[AFTER PATCH]
__u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  1869     0 */
u8                         nonagle:4;            /*  1869: 0  1 */
u8                         rate_app_limited:1;   /*  1869: 4  1 */
/* XXX 3 bits hole, try to pack */
/* Force alignment to the next boundary: */
u8                         :0;
u8                         received_ce_pending:4; /*  1870: 0  1 */
u8                         unused2:4;            /*  1870: 4  1 */
/* XXX 1 byte hole, try to pack */
__be32                     pred_flags;           /*  1872     4 */
/* XXX 4 bytes hole, try to pack */
u64                        tcp_clock_cache;      /*  1880     8 */
u64                        tcp_mstamp;           /*  1888     8 */
u32                        rcv_nxt;              /*  1896     4 */
u32                        snd_nxt;              /*  1900     4 */
u32                        snd_una;              /*  1904     4 */
u32                        window_clamp;         /*  1908     4 */
u32                        srtt_us;              /*  1912     4 */
u32                        packets_out;          /*  1916     4 */
/* --- cacheline 30 boundary (1920 bytes) --- */
u32                        snd_up;               /*  1920     4 */
u32                        delivered;            /*  1924     4 */
u32                        delivered_ce;         /*  1928     4 */
u32                        received_ce;          /*  1932     4 */
u32                        app_limited;          /*  1936     4 */
u32                        rcv_wnd;              /*  1940     4 */
struct tcp_options_received rx_opt;              /*  1944    24 */
__u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  1968     0 */
...
...
@@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb)
              if (tcp_ca_needs_ecn(sk))
                      tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);

        if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {




        if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&


            tcp_ecn_mode_rfc3168(tp)) {
                /* Better not delay acks, sender can have a very low cwnd */
                tcp_enter_quickack_mode(sk, 2);
                tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
        }




        tp->ecn_flags |= TCP_ECN_SEEN;



It's not clear why you need to move this statement earlier in the code path even for ecn_mode_rfc3168(). Either a comment or
            if (!tcp_ecn_mode_rfc3168(tp))
                    break;


a few lines aboved could help.
...
          break;
  default:
          if (tcp_ca_needs_ecn(sk))
                  tcp_ca_event(sk, CA_EVENT_ECN_NO_CE);


        tp->ecn_flags |= TCP_ECN_SEEN;



Same here.
Thanks,
Paolo
OK, will apply the above feedback in the next version and thanks.
BRs,
Chia-Yu

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v8 net-next 04/15] tcp: AccECN core