This is a note to let you know that I've just added the patch titled
bonding: Don't update slave->link until ready to commit
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: bonding-don-t-update-slave-link-until-ready-to-commit.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Tue Apr 10 10:31:53 CEST 2018
From: Nithin Sujir nsujir@tintri.com Date: Wed, 24 May 2017 19:45:17 -0700 Subject: bonding: Don't update slave->link until ready to commit
From: Nithin Sujir nsujir@tintri.com
[ Upstream commit 797a93647a48d6cb8a20641a86a71713a947f786 ]
In the loadbalance arp monitoring scheme, when a slave link change is detected, the slave->link is immediately updated and slave_state_changed is set. Later down the function, the rtnl_lock is acquired and the changes are committed, updating the bond link state.
However, the acquisition of the rtnl_lock can fail. The next time the monitor runs, since slave->link is already updated, it determines that link is unchanged. This results in the bond link state permanently out of sync with the slave link.
This patch modifies bond_loadbalance_arp_mon() to handle link changes identical to bond_ab_arp_{inspect/commit}(). The new link state is maintained in slave->new_link until we're ready to commit at which point it's copied into slave->link.
NOTE: miimon_{inspect/commit}() has a more complex state machine requiring the use of the bond_{propose,commit}_link_state() functions which maintains the intermediate state in slave->link_new_state. The arp monitors don't require that.
Testing: This bug is very easy to reproduce with the following steps. 1. In a loop, toggle a slave link of a bond slave interface. 2. In a separate loop, do ifconfig up/down of an unrelated interface to create contention for rtnl_lock. Within a few iterations, the bond link goes out of sync with the slave link.
Signed-off-by: Nithin Nayak Sujir nsujir@tintri.com Cc: Mahesh Bandewar maheshb@google.com Cc: Jay Vosburgh jay.vosburgh@canonical.com Acked-by: Mahesh Bandewar maheshb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2555,11 +2555,13 @@ static void bond_loadbalance_arp_mon(str bond_for_each_slave_rcu(bond, slave, iter) { unsigned long trans_start = dev_trans_start(slave->dev);
+ slave->new_link = BOND_LINK_NOCHANGE; + if (slave->link != BOND_LINK_UP) { if (bond_time_in_interval(bond, trans_start, 1) && bond_time_in_interval(bond, slave->last_rx, 1)) {
- slave->link = BOND_LINK_UP; + slave->new_link = BOND_LINK_UP; slave_state_changed = 1;
/* primary_slave has no meaning in round-robin @@ -2586,7 +2588,7 @@ static void bond_loadbalance_arp_mon(str if (!bond_time_in_interval(bond, trans_start, 2) || !bond_time_in_interval(bond, slave->last_rx, 2)) {
- slave->link = BOND_LINK_DOWN; + slave->new_link = BOND_LINK_DOWN; slave_state_changed = 1;
if (slave->link_failure_count < UINT_MAX) @@ -2617,6 +2619,11 @@ static void bond_loadbalance_arp_mon(str if (!rtnl_trylock()) goto re_arm;
+ bond_for_each_slave(bond, slave, iter) { + if (slave->new_link != BOND_LINK_NOCHANGE) + slave->link = slave->new_link; + } + if (slave_state_changed) { bond_slave_state_change(bond); if (BOND_MODE(bond) == BOND_MODE_XOR)
Patches currently in stable-queue which might be from nsujir@tintri.com are
queue-4.4/bonding-don-t-update-slave-link-until-ready-to-commit.patch