From: Xiongfeng Wang wangxiongfeng2@huawei.com
[ Upstream commit d7061627d701c90e1cac1e1e60c45292f64f3470 ]
It turns out to be possible for hotplugging out a device to reach the stage of tearing down the device's group and default domain before the domain's flush queue has drained naturally. At this point, it is then possible for the timeout to expire just before the del_timer() call in free_iova_flush_queue(), such that we then proceed to free the FQ resources while fq_flush_timeout() is still accessing them on another CPU. Crashes due to this have been observed in the wild while removing NVMe devices.
Close the race window by using del_timer_sync() to safely wait for any active timeout handler to finish before we start to free things. We already avoid any locking in free_iova_flush_queue() since the FQ is supposed to be inactive anyway, so the potential deadlock scenario does not apply.
Fixes: 9a005a800ae8 ("iommu/iova: Add flush timer") Reviewed-by: John Garry john.garry@huawei.com Signed-off-by: Xiongfeng Wang wangxiongfeng2@huawei.com [ rm: rewrite commit message ] Signed-off-by: Robin Murphy robin.murphy@arm.com Link: https://lore.kernel.org/r/0a365e5b07f14b7344677ad6a9a734966a8422ce.163975363... Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/iova.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 612cbf668adf8..906582a21124d 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -64,8 +64,7 @@ static void free_iova_flush_queue(struct iova_domain *iovad) if (!has_iova_flush_queue(iovad)) return;
- if (timer_pending(&iovad->fq_timer)) - del_timer(&iovad->fq_timer); + del_timer_sync(&iovad->fq_timer);
fq_destroy_all_entries(iovad);