In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the order which they got initialized.
Also remove the redundant scheduling of the mailbox task in idpf_vc_core_init.
Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh tarun.k.singh@intel.com Signed-off-by: Pavan Kumar Linga pavan.kumar.linga@intel.com --- drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 + drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 7 ------- 2 files changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index c3848e10e7db..b4fbb99bfad2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter) */ err = idpf_vc_core_init(adapter); if (err) { + cancel_delayed_work_sync(&adapter->mbx_task); idpf_deinit_dflt_mbx(adapter); goto unlock_mutex; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index 3be883726b87..d77d6c3805e2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -3017,11 +3017,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) goto err_netdev_alloc; }
- /* Start the mailbox task before requesting vectors. This will ensure - * vector information response from mailbox is handled - */ - queue_delayed_work(adapter->mbx_wq, &adapter->mbx_task, 0); - queue_delayed_work(adapter->serv_wq, &adapter->serv_task, msecs_to_jiffies(5 * (adapter->pdev->devfn & 0x07)));
@@ -3046,7 +3041,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
err_intr_req: cancel_delayed_work_sync(&adapter->serv_task); - cancel_delayed_work_sync(&adapter->mbx_task); idpf_vport_params_buf_rel(adapter); err_netdev_alloc: kfree(adapter->vports); @@ -3070,7 +3064,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) adapter->state = __IDPF_VER_CHECK; if (adapter->vcxn_mngr) idpf_vc_xn_shutdown(adapter->vcxn_mngr); - idpf_deinit_dflt_mbx(adapter); set_bit(IDPF_HR_DRV_LOAD, adapter->flags); queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task, msecs_to_jiffies(task_delay));
On Tue, Oct 22, 2024 at 10:35:27AM -0700, Pavan Kumar Linga wrote:
In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the order which they got initialized.
Also remove the redundant scheduling of the mailbox task in idpf_vc_core_init.
I think it might be better to move this last change into a separate patch targeted at iwl rather than iwl-net. It isn't a fix, right?
Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh tarun.k.singh@intel.com Signed-off-by: Pavan Kumar Linga pavan.kumar.linga@intel.com
...
On 10/24/2024 7:39 AM, Simon Horman wrote:
On Tue, Oct 22, 2024 at 10:35:27AM -0700, Pavan Kumar Linga wrote:
In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again.
In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the order which they got initialized.
Also remove the redundant scheduling of the mailbox task in idpf_vc_core_init.
I think it might be better to move this last change into a separate patch targeted at iwl rather than iwl-net. It isn't a fix, right?
If we do not make that change in this patch, it results in calling cancel_delayed_work_sync twice in case of an error in idpf_vc_core_init (err_intr_req). Looks like there aren't any side effects of calling cancel_delayed_work_sync twice. I will remove the said changes from this patch.
Thanks, Pavan
Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable@vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh tarun.k.singh@intel.com Signed-off-by: Pavan Kumar Linga pavan.kumar.linga@intel.com
...
linux-stable-mirror@lists.linaro.org