The translation caches may preserve obsolete data when the mapping size is changed, suppose the following sequence which can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB) 2. while (1) { (a) DMA MAP 0,0xa0000 (b) DMA UNMAP 0,0xa0000 (c) DMA MAP 0,0xc0000000 * DMA read IOVA 0 may failure here (Not present) * if the problem occurs. (d) DMA UNMAP 0,0xc0000000 }
The page table(only focus on IOVA 0) after (a) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be flushed even if the iommu driver flush cache when unmap, so the obsolete data may be preserved in cache, which would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to 2M-superpage mapping, but it does not transform to 0x21d200883 directly:
1. PDE: 0x1a30a72003 2. __domain_mapping dma_pte_free_pagetable Set the PDE entry to ZERO Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO to avoid the obsolete info be preserved.
Cc: David Woodhouse dwmw2@infradead.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Nadav Amit nadav.amit@gmail.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Kevin Tian kevin.tian@intel.com Cc: Gonglei (Arei) arei.gonglei@huawei.com
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage") Cc: stable@vger.kernel.org # v3.0+ Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hua... Suggested-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Longpeng(Mike) longpeng2@huawei.com --- drivers/iommu/intel/iommu.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, * removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables. + * + * We also need to flush the iotlb before creating + * superpage to ensure it does not perserves any + * obsolete info. */ - dma_pte_free_pagetable(domain, iov_pfn, end_pfn, - largepage_lvl + 1); + if (dma_pte_present(pte)) { + int i; + + dma_pte_free_pagetable(domain, iov_pfn, end_pfn, + largepage_lvl + 1); + for_each_domain_iommu(i, domain) + iommu_flush_iotlb_psi(g_iommus[i], domain, + iov_pfn, nr_pages, 0, 0); + } } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; }
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
The translation caches may preserve obsolete data when the mapping size is changed, suppose the following sequence which can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB) 2. while (1) { (a) DMA MAP 0,0xa0000 (b) DMA UNMAP 0,0xa0000 (c) DMA MAP 0,0xc0000000 * DMA read IOVA 0 may failure here (Not present) * if the problem occurs. (d) DMA UNMAP 0,0xc0000000 }
The page table(only focus on IOVA 0) after (a) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be flushed even if the iommu driver flush cache when unmap, so the obsolete data may be preserved in cache, which would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to 2M-superpage mapping, but it does not transform to 0x21d200883 directly:
- PDE: 0x1a30a72003
- __domain_mapping dma_pte_free_pagetable Set the PDE entry to ZERO Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO to avoid the obsolete info be preserved.
Cc: David Woodhouse dwmw2@infradead.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Nadav Amit nadav.amit@gmail.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Kevin Tian kevin.tian@intel.com Cc: Gonglei (Arei) arei.gonglei@huawei.com
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage") Cc: stable@vger.kernel.org # v3.0+ Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hua... Suggested-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Longpeng(Mike) longpeng2@huawei.com
drivers/iommu/intel/iommu.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, * removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables.
*
* We also need to flush the iotlb before creating
* superpage to ensure it does not perserves any
* obsolete info. */
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
if (dma_pte_present(pte)) {
int i;
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i], domain,
iov_pfn, nr_pages, 0, 0);
Thanks for patch!
How about making the flushed page size accurate? For example,
@@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1); for_each_domain_iommu(i, domain) - iommu_flush_iotlb_psi(g_iommus[i], domain, - iov_pfn, nr_pages, 0, 0); + iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn, + ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);
} } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; }
Best regards, baolu
Hi Baolu,
在 2021/4/2 11:06, Lu Baolu 写道:
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
The translation caches may preserve obsolete data when the mapping size is changed, suppose the following sequence which can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB) 2. while (1) { (a) DMA MAP 0,0xa0000 (b) DMA UNMAP 0,0xa0000 (c) DMA MAP 0,0xc0000000 * DMA read IOVA 0 may failure here (Not present) * if the problem occurs. (d) DMA UNMAP 0,0xc0000000 }
The page table(only focus on IOVA 0) after (a) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be flushed even if the iommu driver flush cache when unmap, so the obsolete data may be preserved in cache, which would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to 2M-superpage mapping, but it does not transform to 0x21d200883 directly:
- PDE: 0x1a30a72003
- __domain_mapping
dma_pte_free_pagetable Set the PDE entry to ZERO Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO to avoid the obsolete info be preserved.
Cc: David Woodhouse dwmw2@infradead.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Nadav Amit nadav.amit@gmail.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Kevin Tian kevin.tian@intel.com Cc: Gonglei (Arei) arei.gonglei@huawei.com
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage") Cc: stable@vger.kernel.org # v3.0+ Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hua...
Suggested-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Longpeng(Mike) longpeng2@huawei.com
drivers/iommu/intel/iommu.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, * removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables. + * + * We also need to flush the iotlb before creating + * superpage to ensure it does not perserves any + * obsolete info. */ - dma_pte_free_pagetable(domain, iov_pfn, end_pfn, - largepage_lvl + 1); + if (dma_pte_present(pte)) { + int i;
+ dma_pte_free_pagetable(domain, iov_pfn, end_pfn, + largepage_lvl + 1); + for_each_domain_iommu(i, domain) + iommu_flush_iotlb_psi(g_iommus[i], domain, + iov_pfn, nr_pages, 0, 0);
Thanks for patch!
How about making the flushed page size accurate? For example,
@@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1); for_each_domain_iommu(i, domain)
- iommu_flush_iotlb_psi(g_iommus[i], domain,
- iov_pfn, nr_pages, 0, 0);
- iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
- ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);
Yes, make sense.
Maybe another alternative is 'end_pfn - iova_pfn + 1', it's readable because we free pagetable with (iova_pfn, end_pfn) above. Which one do you prefer?
+ } } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; }
Best regards, baolu .
On 4/2/21 11:41 AM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
Hi Baolu,
在 2021/4/2 11:06, Lu Baolu 写道:
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
The translation caches may preserve obsolete data when the mapping size is changed, suppose the following sequence which can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB) 2. while (1) { (a) DMA MAP 0,0xa0000 (b) DMA UNMAP 0,0xa0000 (c) DMA MAP 0,0xc0000000 * DMA read IOVA 0 may failure here (Not present) * if the problem occurs. (d) DMA UNMAP 0,0xc0000000 }
The page table(only focus on IOVA 0) after (a) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x1a30a72003 entry:0xffff89b39cacb000 PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is: PML4: 0x19db5c1003 entry:0xffff899bdcd2f000 PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000 PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be flushed even if the iommu driver flush cache when unmap, so the obsolete data may be preserved in cache, which would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to 2M-superpage mapping, but it does not transform to 0x21d200883 directly:
- PDE: 0x1a30a72003
- __domain_mapping
dma_pte_free_pagetable Set the PDE entry to ZERO Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO to avoid the obsolete info be preserved.
Cc: David Woodhouse dwmw2@infradead.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Nadav Amit nadav.amit@gmail.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Kevin Tian kevin.tian@intel.com Cc: Gonglei (Arei) arei.gonglei@huawei.com
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage") Cc: stable@vger.kernel.org # v3.0+ Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hua...
Suggested-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Longpeng(Mike) longpeng2@huawei.com
drivers/iommu/intel/iommu.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, * removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables. + * + * We also need to flush the iotlb before creating + * superpage to ensure it does not perserves any + * obsolete info. */ - dma_pte_free_pagetable(domain, iov_pfn, end_pfn, - largepage_lvl + 1); + if (dma_pte_present(pte)) { + int i;
+ dma_pte_free_pagetable(domain, iov_pfn, end_pfn, + largepage_lvl + 1); + for_each_domain_iommu(i, domain) + iommu_flush_iotlb_psi(g_iommus[i], domain, + iov_pfn, nr_pages, 0, 0);
Thanks for patch!
How about making the flushed page size accurate? For example,
@@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1); for_each_domain_iommu(i, domain)
- iommu_flush_iotlb_psi(g_iommus[i], domain,
- iov_pfn, nr_pages, 0, 0);
- iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
- ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);
Yes, make sense.
Maybe another alternative is 'end_pfn - iova_pfn + 1', it's readable because we free pagetable with (iova_pfn, end_pfn) above. Which one do you prefer?
Yours looks better.
By the way, if you are willing to prepare a v2, please make sure to add Joerg (IOMMU subsystem maintainer) to the list.
Best regards, baolu
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, * removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables.
*
* We also need to flush the iotlb before creating
* superpage to ensure it does not perserves any
* obsolete info. */
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
if (dma_pte_present(pte)) {
The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is insufficient. How about removing this check and always performing cache invalidation?
int i;
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i], domain,
iov_pfn, nr_pages, 0, 0);
Best regards, baolu
Hi Baolu,
-----Original Message----- From: Lu Baolu [mailto:baolu.lu@linux.intel.com] Sent: Friday, April 2, 2021 12:44 PM To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) longpeng2@huawei.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org Cc: baolu.lu@linux.intel.com; David Woodhouse dwmw2@infradead.org; Nadav Amit nadav.amit@gmail.com; Alex Williamson alex.williamson@redhat.com; Kevin Tian kevin.tian@intel.com; Gonglei (Arei) arei.gonglei@huawei.com; stable@vger.kernel.org Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
* removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables.
*
* We also need to flush the iotlb before creating
* superpage to ensure it does not perserves any
* obsolete info. */
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
if (dma_pte_present(pte)) {
The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is insufficient. How about removing this check and always performing cache invalidation?
Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) or NOT-present ( e.g. create a totally new superpage mapping ), but we only need to call free_pagetable and flush_iotlb in the former case, right ?
int i;
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i], domain,
iov_pfn, nr_pages, 0, 0);
Best regards, baolu
Hi Longpeng,
On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
Hi Baolu,
-----Original Message----- From: Lu Baolu [mailto:baolu.lu@linux.intel.com] Sent: Friday, April 2, 2021 12:44 PM To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) longpeng2@huawei.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org Cc: baolu.lu@linux.intel.com; David Woodhouse dwmw2@infradead.org; Nadav Amit nadav.amit@gmail.com; Alex Williamson alex.williamson@redhat.com; Kevin Tian kevin.tian@intel.com; Gonglei (Arei) arei.gonglei@huawei.com; stable@vger.kernel.org Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
* removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables.
*
* We also need to flush the iotlb before creating
* superpage to ensure it does not perserves any
* obsolete info. */
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
if (dma_pte_present(pte)) {
The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is insufficient. How about removing this check and always performing cache invalidation?
Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to call free_pagetable and flush_iotlb in the former case, right ?
But this code covers multiple PTEs and perhaps crosses the page boundary.
How about moving this code into a separated function and check PTE presence there. A sample code could look like below: [compiled but not tested!]
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d334f5b4e382..0e04d450c38a 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, return level; }
+/* + * Ensure that old small page tables are removed to make room for superpage(s). + * We're going to add new large pages, so make sure we don't remove their parent + * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared. + */ +static void switch_to_super_page(struct dmar_domain *domain, + unsigned long start_pfn, + unsigned long end_pfn, int level) +{ + unsigned long lvl_pages = lvl_to_nr_pages(level); + struct dma_pte *pte = NULL; + int i; + + while (start_pfn <= end_pfn) { + if (!pte) + pte = pfn_to_dma_pte(domain, start_pfn, &level); + + if (dma_pte_present(pte)) { + dma_pte_free_pagetable(domain, start_pfn, + start_pfn + lvl_pages - 1, + level + 1); + + for_each_domain_iommu(i, domain) + iommu_flush_iotlb_psi(g_iommus[i], domain, + start_pfn, lvl_pages, + 0, 0); + } + + pte++; + start_pfn += lvl_pages; + if (first_pte_in_page(pte)) + pte = NULL; + } +} + static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long phys_pfn, unsigned long nr_pages, int prot) @@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return -ENOMEM; /* It is large page*/ if (largepage_lvl > 1) { - unsigned long nr_superpages, end_pfn; + unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE; - lvl_pages = lvl_to_nr_pages(largepage_lvl); - - nr_superpages = nr_pages / lvl_pages; - end_pfn = iov_pfn + nr_superpages * lvl_pages - 1; - - /* - * Ensure that old small page tables are - * removed to make room for superpage(s). - * We're adding new large pages, so make sure - * we don't remove their parent tables. - */ - dma_pte_free_pagetable(domain, iov_pfn, end_pfn, - largepage_lvl + 1); + end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1; + switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl); } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; }
I will send you the diff patch off list. Any thoughts?
Best regards, baolu
int i;
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i], domain,
iov_pfn, nr_pages, 0, 0);
Best regards, baolu
Hi Baolu,
-----Original Message----- From: Lu Baolu [mailto:baolu.lu@linux.intel.com] Sent: Thursday, April 8, 2021 12:32 PM To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) longpeng2@huawei.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org Cc: baolu.lu@linux.intel.com; David Woodhouse dwmw2@infradead.org; Nadav Amit nadav.amit@gmail.com; Alex Williamson alex.williamson@redhat.com; Kevin Tian kevin.tian@intel.com; Gonglei (Arei) arei.gonglei@huawei.com; stable@vger.kernel.org Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
Hi Baolu,
-----Original Message----- From: Lu Baolu [mailto:baolu.lu@linux.intel.com] Sent: Friday, April 2, 2021 12:44 PM To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) longpeng2@huawei.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org Cc: baolu.lu@linux.intel.com; David Woodhouse dwmw2@infradead.org; Nadav Amit nadav.amit@gmail.com; Alex Williamson alex.williamson@redhat.com; Kevin Tian kevin.tian@intel.com; Gonglei (Arei) arei.gonglei@huawei.com; stable@vger.kernel.org Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
* removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables.
*
* We also need to flush the iotlb before creating
* superpage to ensure it does not perserves any
* obsolete info. */
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
if (dma_pte_present(pte)) {
The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is insufficient. How about removing this check and always performing cache invalidation?
Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )
orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to call free_pagetable and flush_iotlb in the former case, right ?
But this code covers multiple PTEs and perhaps crosses the page boundary.
How about moving this code into a separated function and check PTE presence there. A sample code could look like below: [compiled but not tested!]
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d334f5b4e382..0e04d450c38a 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, return level; }
+/*
- Ensure that old small page tables are removed to make room for
superpage(s).
- We're going to add new large pages, so make sure we don't remove
their parent
- tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
- */
+static void switch_to_super_page(struct dmar_domain *domain,
unsigned long start_pfn,
unsigned long end_pfn, int level) {
Maybe "swith_to" will lead people to think "remove old and then setup new", so how about something like "remove_room_for_super_page" or "prepare_for_super_page" ?
unsigned long lvl_pages = lvl_to_nr_pages(level);
struct dma_pte *pte = NULL;
int i;
while (start_pfn <= end_pfn) {
start_pfn < end_pfn ?
if (!pte)
pte = pfn_to_dma_pte(domain, start_pfn, &level);
if (dma_pte_present(pte)) {
dma_pte_free_pagetable(domain, start_pfn,
start_pfn + lvl_pages - 1,
level + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i],
domain,
start_pfn,
lvl_pages,
0, 0);
}
pte++;
start_pfn += lvl_pages;
if (first_pte_in_page(pte))
pte = NULL;
}
+}
- static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return -ENOMEM; /* It is large page*/ if (largepage_lvl > 1) {
unsigned long nr_superpages, end_pfn;
unsigned long end_pfn; pteval |= DMA_PTE_LARGE_PAGE;
lvl_pages = lvl_to_nr_pages(largepage_lvl);
nr_superpages = nr_pages / lvl_pages;
end_pfn = iov_pfn + nr_superpages *
lvl_pages - 1;
/*
* Ensure that old small page tables are
* removed to make room for superpage(s).
* We're adding new large pages, so make
sure
* we don't remove their parent tables.
*/
dma_pte_free_pagetable(domain, iov_pfn,
end_pfn,
largepage_lvl +
1);
end_pfn = ((iov_pfn + nr_pages) &
level_mask(largepage_lvl)) - 1;
switch_to_super_page(domain, iov_pfn,
end_pfn, largepage_lvl); } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; }
I will send you the diff patch off list. Any thoughts?
The solution looks good to me.
It's free for you to send this patch if you want, I'll send V2 if you have no plan to send it :)
Best regards, baolu
int i;
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i], domain,
iov_pfn, nr_pages, 0, 0);
Best regards, baolu
Hi Longpeng,
On 4/8/21 3:37 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
Hi Baolu,
-----Original Message----- From: Lu Baolu [mailto:baolu.lu@linux.intel.com] Sent: Thursday, April 8, 2021 12:32 PM To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) longpeng2@huawei.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org Cc: baolu.lu@linux.intel.com; David Woodhouse dwmw2@infradead.org; Nadav Amit nadav.amit@gmail.com; Alex Williamson alex.williamson@redhat.com; Kevin Tian kevin.tian@intel.com; Gonglei (Arei) arei.gonglei@huawei.com; stable@vger.kernel.org Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
Hi Baolu,
-----Original Message----- From: Lu Baolu [mailto:baolu.lu@linux.intel.com] Sent: Friday, April 2, 2021 12:44 PM To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) longpeng2@huawei.com; iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org Cc: baolu.lu@linux.intel.com; David Woodhouse dwmw2@infradead.org; Nadav Amit nadav.amit@gmail.com; Alex Williamson alex.williamson@redhat.com; Kevin Tian kevin.tian@intel.com; Gonglei (Arei) arei.gonglei@huawei.com; stable@vger.kernel.org Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
* removed to make room for superpage(s). * We're adding new large pages, so make sure * we don't remove their parent tables.
*
* We also need to flush the iotlb before creating
* superpage to ensure it does not perserves any
* obsolete info. */
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
if (dma_pte_present(pte)) {
The dma_pte_free_pagetable() clears a batch of PTEs. So checking current PTE is insufficient. How about removing this check and always performing cache invalidation?
Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )
orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to call free_pagetable and flush_iotlb in the former case, right ?
But this code covers multiple PTEs and perhaps crosses the page boundary.
How about moving this code into a separated function and check PTE presence there. A sample code could look like below: [compiled but not tested!]
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d334f5b4e382..0e04d450c38a 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, return level; }
+/*
- Ensure that old small page tables are removed to make room for
superpage(s).
- We're going to add new large pages, so make sure we don't remove
their parent
- tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
- */
+static void switch_to_super_page(struct dmar_domain *domain,
unsigned long start_pfn,
unsigned long end_pfn, int level) {
Maybe "swith_to" will lead people to think "remove old and then setup new", so how about something like "remove_room_for_super_page" or "prepare_for_super_page" ?
I named it like this because we also want to have a opposite operation split_from_super_page() which switch a PDE or PDPE from super page setting up to small pages, which is needed to optimize dirty bit tracking during VM live migration.
unsigned long lvl_pages = lvl_to_nr_pages(level);
struct dma_pte *pte = NULL;
int i;
while (start_pfn <= end_pfn) {
start_pfn < end_pfn ?
end_pfn is inclusive.
if (!pte)
pte = pfn_to_dma_pte(domain, start_pfn, &level);
if (dma_pte_present(pte)) {
dma_pte_free_pagetable(domain, start_pfn,
start_pfn + lvl_pages - 1,
level + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i],
domain,
start_pfn,
lvl_pages,
0, 0);
}
pte++;
start_pfn += lvl_pages;
if (first_pte_in_page(pte))
pte = NULL;
}
+}
- static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return -ENOMEM; /* It is large page*/ if (largepage_lvl > 1) {
unsigned long nr_superpages, end_pfn;
unsigned long end_pfn; pteval |= DMA_PTE_LARGE_PAGE;
lvl_pages = lvl_to_nr_pages(largepage_lvl);
nr_superpages = nr_pages / lvl_pages;
end_pfn = iov_pfn + nr_superpages *
lvl_pages - 1;
/*
* Ensure that old small page tables are
* removed to make room for superpage(s).
* We're adding new large pages, so make
sure
* we don't remove their parent tables.
*/
dma_pte_free_pagetable(domain, iov_pfn,
end_pfn,
largepage_lvl +
1);
end_pfn = ((iov_pfn + nr_pages) &
level_mask(largepage_lvl)) - 1;
switch_to_super_page(domain, iov_pfn,
end_pfn, largepage_lvl); } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; }
I will send you the diff patch off list. Any thoughts?
The solution looks good to me.
It's free for you to send this patch if you want, I'll send V2 if you have no plan to send it :)
Please go ahead with a new version. Thank you for catching this and managing to fix it.
Best regards, baolu
Best regards, baolu
int i;
dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
iommu_flush_iotlb_psi(g_iommus[i], domain,
iov_pfn, nr_pages, 0, 0);
Best regards, baolu
linux-stable-mirror@lists.linaro.org