This is a note to let you know that I've just added the patch titled
net/mlx4_core: Avoid delays during VF driver device shutdown
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: net-mlx4_core-avoid-delays-during-vf-driver-device-shutdown.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Mon Dec 18 14:47:43 CET 2017
From: Jack Morgenstein jackm@dev.mellanox.co.il Date: Mon, 13 Mar 2017 19:29:08 +0200 Subject: net/mlx4_core: Avoid delays during VF driver device shutdown
From: Jack Morgenstein jackm@dev.mellanox.co.il
[ Upstream commit 4cbe4dac82e423ecc9a0ba46af24a860853259f4 ]
Some Hypervisors detach VFs from VMs by instantly causing an FLR event to be generated for a VF.
In the mlx4 case, this will cause that VF's comm channel to be disabled before the VM has an opportunity to invoke the VF device's "shutdown" method.
For such Hypervisors, there is a race condition between the VF's shutdown method and its internal-error detection/reset thread.
The internal-error detection/reset thread (which runs every 5 seconds) also detects a disabled comm channel. If the internal-error detection/reset flow wins the race, we still get delays (while that flow tries repeatedly to detect comm-channel recovery).
The cited commit fixed the command timeout problem when the internal-error detection/reset flow loses the race.
This commit avoids the unneeded delays when the internal-error detection/reset flow wins.
Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown") Signed-off-by: Jack Morgenstein jackm@dev.mellanox.co.il Reported-by: Simon Xiao sixiao@microsoft.com Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin alexander.levin@verizon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/cmd.c | 11 +++++++++++ drivers/net/ethernet/mellanox/mlx4/main.c | 11 +++++++++++ include/linux/mlx4/device.h | 1 + 3 files changed, 23 insertions(+)
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c @@ -2278,6 +2278,17 @@ static int sync_toggles(struct mlx4_dev rd_toggle = swab32(readl(&priv->mfunc.comm->slave_read)); if (wr_toggle == 0xffffffff || rd_toggle == 0xffffffff) { /* PCI might be offline */ + + /* If device removal has been requested, + * do not continue retrying. + */ + if (dev->persist->interface_state & + MLX4_INTERFACE_STATE_NOWAIT) { + mlx4_warn(dev, + "communication channel is offline\n"); + return -EIO; + } + msleep(100); wr_toggle = swab32(readl(&priv->mfunc.comm-> slave_write)); --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1763,6 +1763,14 @@ static int mlx4_comm_check_offline(struc (u32)(1 << COMM_CHAN_OFFLINE_OFFSET)); if (!offline_bit) return 0; + + /* If device removal has been requested, + * do not continue retrying. + */ + if (dev->persist->interface_state & + MLX4_INTERFACE_STATE_NOWAIT) + break; + /* There are cases as part of AER/Reset flow that PF needs * around 100 msec to load. We therefore sleep for 100 msec * to allow other tasks to make use of that CPU during this @@ -3690,6 +3698,9 @@ static void mlx4_remove_one(struct pci_d struct mlx4_priv *priv = mlx4_priv(dev); int active_vfs = 0;
+ if (mlx4_is_slave(dev)) + persist->interface_state |= MLX4_INTERFACE_STATE_NOWAIT; + mutex_lock(&persist->interface_state_mutex); persist->interface_state |= MLX4_INTERFACE_STATE_DELETION; mutex_unlock(&persist->interface_state_mutex); --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -460,6 +460,7 @@ enum { enum { MLX4_INTERFACE_STATE_UP = 1 << 0, MLX4_INTERFACE_STATE_DELETION = 1 << 1, + MLX4_INTERFACE_STATE_NOWAIT = 1 << 2, };
#define MSTR_SM_CHANGE_MASK (MLX4_EQ_PORT_INFO_MSTR_SM_SL_CHANGE_MASK | \
Patches currently in stable-queue which might be from jackm@dev.mellanox.co.il are
queue-4.4/net-mlx4_core-avoid-delays-during-vf-driver-device-shutdown.patch
linux-stable-mirror@lists.linaro.org