Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying ying.huang@intel.com Cc: Alistair Popple apopple@nvidia.com Tested-by: Luo Yuzhang yuzhang.luo@intel.com # QAT Tested-by: Tony Zhu tony.zhu@intel.com # DSA Signed-off-by: Lu Baolu baolu.lu@linux.intel.com --- drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 50a481c895b8..588385050a07 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address, rcu_read_unlock(); }
+static void intel_flush_svm_all(struct intel_svm *svm) +{ + struct device_domain_info *info; + struct intel_svm_dev *sdev; + + rcu_read_lock(); + list_for_each_entry_rcu(sdev, &svm->devs, list) { + info = dev_iommu_priv_get(sdev->dev); + + qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 1); + if (info->ats_enabled) { + qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid, + svm->pasid, sdev->qdep, + 0, 64 - VTD_PAGE_SHIFT); + quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT, + svm->pasid, sdev->qdep); + } + } + rcu_read_unlock(); +} + /* Pages have been freed at this point */ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, struct mm_struct *mm, @@ -223,6 +244,11 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, { struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
+ if (start == 0 && end == -1UL) { + intel_flush_svm_all(svm); + return; + } + intel_flush_svm_range(svm, start, (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0); }
On Fri, Nov 17, 2023 at 05:09:33PM +0800, Lu Baolu wrote:
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying ying.huang@intel.com Cc: Alistair Popple apopple@nvidia.com Tested-by: Luo Yuzhang yuzhang.luo@intel.com # QAT Tested-by: Tony Zhu tony.zhu@intel.com # DSA Signed-off-by: Lu Baolu baolu.lu@linux.intel.com
drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
This should go to -rc
Jason
Lu Baolu baolu.lu@linux.intel.com writes:
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Thanks for catching. This fix looks good so:
Reviewed-by: Alistair Popple apopple@nvidia.com
However examining the fixes patch again I note that we are calling mmu_notifier_invalidate_range(mm, 0, -1UL) from arch_tlbbatch_add_pending() in arch/x86/include/asm/tlbflush.h.
That seems suboptimal because we would be doing an invalidate all for every page unmap, and as of db6c1f6f236d ("mm/tlbbatch: introduce arch_flush_tlb_batched_pending()") arch_flush_tlb_batched_pending() calls flush_tlb_mm() anyway. So I think we can probably drop the explicit notifier call from arch_flush_tlb_batched_pending().
Will put togeather a patch for that.
- Alistair
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying ying.huang@intel.com Cc: Alistair Popple apopple@nvidia.com Tested-by: Luo Yuzhang yuzhang.luo@intel.com # QAT Tested-by: Tony Zhu tony.zhu@intel.com # DSA Signed-off-by: Lu Baolu baolu.lu@linux.intel.com
drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 50a481c895b8..588385050a07 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address, rcu_read_unlock(); } +static void intel_flush_svm_all(struct intel_svm *svm) +{
- struct device_domain_info *info;
- struct intel_svm_dev *sdev;
- rcu_read_lock();
- list_for_each_entry_rcu(sdev, &svm->devs, list) {
info = dev_iommu_priv_get(sdev->dev);
qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 1);
if (info->ats_enabled) {
qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
svm->pasid, sdev->qdep,
0, 64 - VTD_PAGE_SHIFT);
quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
svm->pasid, sdev->qdep);
}
- }
- rcu_read_unlock();
+}
/* Pages have been freed at this point */ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, struct mm_struct *mm, @@ -223,6 +244,11 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, { struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
- if (start == 0 && end == -1UL) {
intel_flush_svm_all(svm);
return;
- }
- intel_flush_svm_range(svm, start, (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0);
}
Alistair Popple apopple@nvidia.com writes:
Lu Baolu baolu.lu@linux.intel.com writes:
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Thanks for catching. This fix looks good so:
Reviewed-by: Alistair Popple apopple@nvidia.com
However examining the fixes patch again I note that we are calling mmu_notifier_invalidate_range(mm, 0, -1UL) from arch_tlbbatch_add_pending() in arch/x86/include/asm/tlbflush.h.
That seems suboptimal because we would be doing an invalidate all for every page unmap,
Yes. This can be performance regression for IOMMU TLB flushing. For CPU, it's "flush smaller ranges with more IPI" vs. "flush whole range with less IPI", and in general the later wins because the high overhead of IPI. But, IIUC, for IOMMU TLB, it becomes "flush smaller ranges" vs. "flush whole range". That is generally bad. It may be better to restore the original behavior. Can we just pass the size of TLB flushing in set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending(), and flush the IOMMU TLB for the range?
and as of db6c1f6f236d ("mm/tlbbatch: introduce arch_flush_tlb_batched_pending()") arch_flush_tlb_batched_pending() calls flush_tlb_mm() anyway. So I think we can probably drop the explicit notifier call from arch_flush_tlb_batched_pending().
arch_flush_tlb_batched_pending() is used when we need to change page table (e.g., munmap()) in parallel with TLB flushing batching (e.g., try_to_unmap()). The actual TLB flushing part for set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending() is try_to_unmap_flush()->arch_tlbbatch_flush().
Will put togeather a patch for that.
- Alistair
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
[snip]
-- Best Regards, Huang, Ying
"Huang, Ying" ying.huang@intel.com writes:
Alistair Popple apopple@nvidia.com writes:
Lu Baolu baolu.lu@linux.intel.com writes:
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Thanks for catching. This fix looks good so:
Reviewed-by: Alistair Popple apopple@nvidia.com
However examining the fixes patch again I note that we are calling mmu_notifier_invalidate_range(mm, 0, -1UL) from arch_tlbbatch_add_pending() in arch/x86/include/asm/tlbflush.h.
That seems suboptimal because we would be doing an invalidate all for every page unmap,
Yes. This can be performance regression for IOMMU TLB flushing. For CPU, it's "flush smaller ranges with more IPI" vs. "flush whole range with less IPI", and in general the later wins because the high overhead of IPI. But, IIUC, for IOMMU TLB, it becomes "flush smaller ranges" vs. "flush whole range". That is generally bad.
The "flush smaller ranges" vs. "flush whole range" is equally valid for some architectures, or at least some implementations of SMMU on ARM because flushing the whole range is a single IOMMU command vs. multiple for flushing a range. See for example https://lore.kernel.org/linux-arm-kernel/20230920052257.8615-1-nicolinc@nvid... which switches to a full invalidate depending on the range. I've no idea if that's true more generally though, although a similar situation existed on POWER9.
It may be better to restore the original behavior. Can we just pass the size of TLB flushing in set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending(), and flush the IOMMU TLB for the range?
Ideally we'd push the notifier call down the stack, closer to where the actual HW tlb invalidate gets called. I think I was just getting lost through all the indirection in the lower level x86_64 TLB flushing and batching code though. Will take another look.
and as of db6c1f6f236d ("mm/tlbbatch: introduce arch_flush_tlb_batched_pending()") arch_flush_tlb_batched_pending() calls flush_tlb_mm() anyway. So I think we can probably drop the explicit notifier call from arch_flush_tlb_batched_pending().
arch_flush_tlb_batched_pending() is used when we need to change page table (e.g., munmap()) in parallel with TLB flushing batching (e.g., try_to_unmap()). The actual TLB flushing part for set_tlb_ubc_flush_pending()->arch_tlbbatch_add_pending() is try_to_unmap_flush()->arch_tlbbatch_flush().
Thanks for the pointer. I must have got arch_tlbbatch_flush() and arch_flush_tlb_batched_pending() crossed at some point.
- Alistair
Will put togeather a patch for that.
- Alistair
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
[snip]
From: Lu Baolu baolu.lu@linux.intel.com Sent: Friday, November 17, 2023 5:10 PM
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc: stable@vger.kernel.org Cc: Huang Ying ying.huang@intel.com Cc: Alistair Popple apopple@nvidia.com Tested-by: Luo Yuzhang yuzhang.luo@intel.com # QAT Tested-by: Tony Zhu tony.zhu@intel.com # DSA Signed-off-by: Lu Baolu baolu.lu@linux.intel.com
drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 50a481c895b8..588385050a07 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address, rcu_read_unlock(); }
+static void intel_flush_svm_all(struct intel_svm *svm) +{
- struct device_domain_info *info;
- struct intel_svm_dev *sdev;
- rcu_read_lock();
- list_for_each_entry_rcu(sdev, &svm->devs, list) {
info = dev_iommu_priv_get(sdev->dev);
qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL,
1);
Why setting 'ih' to skip invalidating page structure caches?
if (info->ats_enabled) {
qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid,
info->pfsid,
svm->pasid, sdev->qdep,
0, 64 - VTD_PAGE_SHIFT);
quirk_extra_dev_tlb_flush(info, 0, 64 -
VTD_PAGE_SHIFT,
svm->pasid, sdev->qdep);
}
- }
- rcu_read_unlock();
+}
/* Pages have been freed at this point */ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, struct mm_struct *mm, @@ -223,6 +244,11 @@ static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, { struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
- if (start == 0 && end == -1UL) {
intel_flush_svm_all(svm);
return;
- }
- intel_flush_svm_range(svm, start, (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT,
0); } -- 2.34.1
On 11/20/23 11:45 AM, Tian, Kevin wrote:
From: Lu Baolubaolu.lu@linux.intel.com Sent: Friday, November 17, 2023 5:10 PM
Commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") moved the secondary TLB invalidations into the TLB invalidation functions to ensure that all secondary TLB invalidations happen at the same time as the CPU invalidation and added a flush-all type of secondary TLB invalidation for the batched mode, where a range of [0, -1UL) is used to indicates that the range extends to the end of the address space.
However, using an end address of -1UL caused an overflow in the Intel IOMMU driver, where the end address was rounded up to the next page. As a result, both the IOTLB and device ATC were not invalidated correctly.
Add a flush all helper function and call it when the invalidation range is from 0 to -1UL, ensuring that the entire caches are invalidated correctly.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") Cc:stable@vger.kernel.org Cc: Huang Yingying.huang@intel.com Cc: Alistair Poppleapopple@nvidia.com Tested-by: Luo Yuzhangyuzhang.luo@intel.com # QAT Tested-by: Tony Zhutony.zhu@intel.com # DSA Signed-off-by: Lu Baolubaolu.lu@linux.intel.com
drivers/iommu/intel/svm.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 50a481c895b8..588385050a07 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -216,6 +216,27 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address, rcu_read_unlock(); }
+static void intel_flush_svm_all(struct intel_svm *svm) +{
- struct device_domain_info *info;
- struct intel_svm_dev *sdev;
- rcu_read_lock();
- list_for_each_entry_rcu(sdev, &svm->devs, list) {
info = dev_iommu_priv_get(sdev->dev);
qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL,
1);
Why setting 'ih' to skip invalidating page structure caches?
It should be set to '0'. Good catch! Thank you!
Best regards, baolu
linux-stable-mirror@lists.linaro.org