The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x d006207625657322ba8251b6e7e829f9659755dc # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2023081247-enrich-coliseum-0e9d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
d00620762565 ("net/mlx5: Skip clock update work when device is in error state") d6f3dc8f509c ("net/mlx5: Move all internal timer metadata into a dedicated struct") 1436de0b9915 ("net/mlx5: Refactor init clock function") fb609b5112bd ("net/mlx5: Always use container_of to find mdev pointer from clock struct") ed56d749c366 ("net/mlx5: Query PPS pin operational status before registering it") 88c8cf92db48 ("net/mlx5: Fix a bug of using ptp channel index as pin index") ddcdc368b103 ("RDMA/mlx5: Use get_zeroed_page() for clock_info") 4a0475d57ad1 ("mlx5: extend PTP gettime function to read system clock") 5d8678365c90 ("mlx5: update timecounter at least twice per counter overflow") 41069256e930 ("net/mlx5: Clock, Use async events chain") a52a7d01fde1 ("net/mlx5: FPGA, Use async events chain") da19a102ce87 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d006207625657322ba8251b6e7e829f9659755dc Mon Sep 17 00:00:00 2001 From: Moshe Shemesh moshe@nvidia.com Date: Wed, 19 Jul 2023 11:33:44 +0300 Subject: [PATCH] net/mlx5: Skip clock update work when device is in error state
When device is in error state, marked by the flag MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible and so clock update work should be skipped. Furthermore, such access through PCI in error state, after calling mlx5_pci_disable_device() can result in failing to recover from pci errors.
Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support") Reported-and-tested-by: Ganesh G R ganeshgr@linux.ibm.com Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.c... Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Aya Levin ayal@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c index 973babfaff25..377372f0578a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c @@ -227,10 +227,15 @@ static void mlx5_timestamp_overflow(struct work_struct *work) clock = container_of(timer, struct mlx5_clock, timer); mdev = container_of(clock, struct mlx5_core_dev, clock);
+ if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) + goto out; + write_seqlock_irqsave(&clock->lock, flags); timecounter_read(&timer->tc); mlx5_update_clock_info_page(mdev); write_sequnlock_irqrestore(&clock->lock, flags); + +out: schedule_delayed_work(&timer->overflow_work, timer->overflow_period); }