Hangbin Liu liuhangbin@gmail.com wrote:
On Tue, Jul 15, 2025 at 09:19:49PM -0700, Jay Vosburgh wrote:
Presuming that I'm correct that we're not implementing 6.4.1 d), above, correctly, then I don't think this is a proper fix, as it kind of band-aids over the problem a bit.
Looking at the code, I suspect the problem revolves around the "lacp_active" check in ad_periodic_machine():
static void ad_periodic_machine(struct port *port, struct bond_params *bond_params) { periodic_states_t last_state;
/* keep current state machine state to compare later if it was changed */ last_state = port->sm_periodic_state;
/* check if port was reinitialized */ if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) || (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) || !bond_params->lacp_active) { port->sm_periodic_state = AD_NO_PERIODIC; }
In the above, because all the tests are chained with ||, the lacp_active test overrides the two correct-looking LACP_STATE_LACP_ACTIVITY tests.
It looks like ad_initialize_port() always sets LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing ever clears it.
Thinking out loud, perhaps this could be fixed by
a) remove the test of bond_params->lacp_active here, and,
b) The lacp_active option setting controls whether LACP_ACTIVITY is set in port->actor_oper_port_state.
Thoughts?
Hi Jay,
I did some investigation and testing. In addition to your previous change, we also need to initialize the partner's state to 0 in ad_initialize_port_tmpl(). Otherwise, the check:
!(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)
in ad_periodic_machine() will fail even when the actor is in passive mode.
Agreed; the .port_state in the port_params tmpl should just be zero; the magic number 1 there now, which is LACP_STATE_LACP_ACTIVITY, is just wrong. For the actor side, the lacp_active option will set it appropriately, and the partner's will be updated by any LACPDUs that arrive.
Also, the line:
port->partner_oper.port_state |= LACP_STATE_LACP_ACTIVITY;
in ad_rx_machine() should be removed, since we can't assume the partner is in active mode. [1]
Also agreed.
With these two changes, we can ensure:
- In passive mode, the actor will not send LACPDU before receiving any LACPDU from the partner.
- Once it receives the partner’s LACPDU, it will start sending periodic LACPDUs as expected.
Do you agree with making these changes? If so, I can post a patch for your review.
Yes, please post a patch.
[1] IEEE 8021AX-2020, Figure 6-14—LACP Receive state diagram, the AD_RX_EXPIRED statue should be
Partner_Oper_Port_State.Synchronization = FALSE; Partner_Oper_Port_State.Short_Timeout = TRUE; Actor_Oper_Port_State.Expired = TRUE; LACP_currentWhile = Short_Timeout_Time;
FWIW, I usually reference the older standards 2008 or 2014, as the 2020 edition changes a lot of things and bonding isn't necessarily conformant to those changes (e.g., many of the state machines are different in large or small ways). Technically, the bonding implementation was written to the pre-802.1AX standard when it was still part of 802.3 (hence the name 802.3ad), clause 43.
This particular bit (the EXPIRED state actions) is the same, but, for example, the transition test from EXPIRED to DEFAULTED is different in the 2014 vs 2020 editions, and we need to be careful not to implement the state machines piecemeal from different editions of the standard.
-J
--- -Jay Vosburgh, jv@jvosburgh.net