6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit 89a898d63f6f588acf5c104c65c94a38b68c69a6 ]
drain_fw_reset() waits for ongoing firmware reset events and blocks new event handling, but does not clear the reset requested flag, and may keep sync reset polling.
To fix it, call mlx5_sync_reset_clear_reset_requested() to clear the flag, stop sync reset polling, and resume health polling, ensuring health issues are still detected after the firmware reset drain.
Fixes: 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Shay Drori shayd@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/1765284977-1363052-2-git-send-email-tariqt@nvidia.c... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c index 89e399606877b..33df0418e5754 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c @@ -843,7 +843,8 @@ void mlx5_drain_fw_reset(struct mlx5_core_dev *dev) cancel_work_sync(&fw_reset->reset_reload_work); cancel_work_sync(&fw_reset->reset_now_work); cancel_work_sync(&fw_reset->reset_abort_work); - cancel_delayed_work(&fw_reset->reset_timeout_work); + if (test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) + mlx5_sync_reset_clear_reset_requested(dev, true); }
static const struct devlink_param mlx5_fw_reset_devlink_params[] = {