On 11/7/2024 8:31 PM, Robin Murphy wrote:
On 29/10/2024 4:15 pm, Will Deacon wrote:
On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
Null pointer dereference occurs due to a race between smmu driver probe and client driver probe, when of_dma_configure() for client is called after the iommu_device_register() for smmu driver probe has executed but before the driver_bound() for smmu driver has been called.
Following is how the race occurs:
[...]
Applied to will (for-joerg/arm-smmu/updates), thanks!
[1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound https://git.kernel.org/will/c/229e6ee43d2a
I've finally got to the point of proving to myself that this isn't the right fix, since once we do get __iommu_probe_device() working properly in the correct order, iommu_device_register() then runs into the same condition itself. Diff below should make this issue go away - I'll write up proper patches once I've tested it a little more.
Thanks, Robin.
----->8----- diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 737c5b882355..b7dcb1494aa4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3171,8 +3171,8 @@ static struct platform_driver arm_smmu_driver; static struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) { - struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver, - fwnode); + struct device *dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode); + put_device(dev); return dev ? dev_get_drvdata(dev) : NULL; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 8321962b3714..aba315aa6848 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1411,8 +1411,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) static struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) { - struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver, - fwnode); + struct device *dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode);
I think it would still follow this path:
bus_find_device_by_fwnode() -> bus_find_device() -> next_device()
next_device() would always return null until the driver is bound to the device which happens much later in really_probe() after the iommu_device_register() would be called even as per this patch. That way the race would still occur, wouldn't it? Can you please help me understand what I may be missing here? Are you saying that these additional patches are required along with the fix I've posted?
put_device(dev); return dev ? dev_get_drvdata(dev) : NULL; } @@ -2232,21 +2232,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) i, irq); }
- err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL, - "smmu.%pa", &smmu->ioaddr); - if (err) { - dev_err(dev, "Failed to register iommu in sysfs\n"); - return err; - }
- err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, - using_legacy_binding ? NULL : dev); - if (err) { - dev_err(dev, "Failed to register iommu\n"); - iommu_device_sysfs_remove(&smmu->iommu); - return err; - }
platform_set_drvdata(pdev, smmu);
/* Check for RMRs and install bypass SMRs if any */ @@ -2255,6 +2240,18 @@ static int arm_smmu_device_probe(struct platform_device *pdev) arm_smmu_device_reset(smmu); arm_smmu_test_smr_masks(smmu);
+ err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL, + "smmu.%pa", &smmu->ioaddr); + if (err) + return dev_err_probe(dev, err, "Failed to register iommu in sysfs\n");
+ err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, + using_legacy_binding ? NULL : dev); + if (err) { + iommu_device_sysfs_remove(&smmu->iommu); + return dev_err_probe(dev, err, "Failed to register iommu\n"); + }
/* * We want to avoid touching dev->power.lock in fastpaths unless * it's really going to do something useful - pm_runtime_enabled()