This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
1) commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails") tries to fix that iova allocation can fail while there are still free space available. This is not backported to 5.10 stable. 2) commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3 HW") fix the performance regression introduced by 1), however, this is just a temporary solution and will cause io performance regression because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size). 3) John Garry posted a patchset to fix the problem. 4) The temporary solution is reverted.
It's weird that the patch in 2) is backported to 5.10 stable alone, while the right thing to do is to backport them all together.
Signed-off-by: Yu Kuai yukuai3@huawei.com --- drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index dfe7e6370d84..cd41dc061d87 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2738,7 +2738,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) struct hisi_hba *hisi_hba = shost_priv(shost); struct device *dev = hisi_hba->dev; int ret = sas_slave_configure(sdev); - unsigned int max_sectors;
if (ret) return ret; @@ -2756,12 +2755,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) } }
- /* Set according to IOMMU IOVA caching limit */ - max_sectors = min_t(size_t, queue_max_hw_sectors(sdev->request_queue), - (PAGE_SIZE * 32) >> SECTOR_SHIFT); - - blk_queue_max_hw_sectors(sdev->request_queue, max_sectors); - return 0; }
On 27/09/2022 14:01, Yu Kuai wrote:
This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
- commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while there are still free space available. This is not backported to 5.10 stable.
This arrived in 5.11, I think
- commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this is just a temporary solution and will cause io performance regression because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
Did you really notice a performance regression? In what scenario? which kernel versions?
- John Garry posted a patchset to fix the problem.
- The temporary solution is reverted.
It's weird that the patch in 2) is backported to 5.10 stable alone, while the right thing to do is to backport them all together.
I would tend to agree. I did not notice fce54ed02757 backported at all. But I did consider backporting it to address 4e89dce72521. Anyway, the proper solution is merged for 6.0 in 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") and I have a revert of "scsi: hisi_sas: Limit max hw sectors for v3 HW" queued for 6.1, but I would not plan on reverting for stable.
Please let me know if any issue here.
Thanks, John
Signed-off-by: Yu Kuai yukuai3@huawei.com
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index dfe7e6370d84..cd41dc061d87 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2738,7 +2738,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) struct hisi_hba *hisi_hba = shost_priv(shost); struct device *dev = hisi_hba->dev; int ret = sas_slave_configure(sdev);
- unsigned int max_sectors;
if (ret) return ret; @@ -2756,12 +2755,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) } }
- /* Set according to IOMMU IOVA caching limit */
- max_sectors = min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
(PAGE_SIZE * 32) >> SECTOR_SHIFT);
- blk_queue_max_hw_sectors(sdev->request_queue, max_sectors);
- return 0; }
Hi, John
在 2022/09/27 21:06, John Garry 写道:
On 27/09/2022 14:01, Yu Kuai wrote:
This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
- commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while there are still free space available. This is not backported to 5.10 stable.
This arrived in 5.11, I think
- commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this is just a temporary solution and will cause io performance regression because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
Did you really notice a performance regression? In what scenario? which kernel versions?
We are using 5.10, and test tool is fs_mark and it's doing writeback, and benefits from io merge, before this patch, avgqusz is 300+, and this patch will limit avgqusz to 128.
I think that in any other case that io size is greater than 128k, this patch will probably have defects.
Thanks, Kuai
- John Garry posted a patchset to fix the problem.
- The temporary solution is reverted.
It's weird that the patch in 2) is backported to 5.10 stable alone, while the right thing to do is to backport them all together.
I would tend to agree. I did not notice fce54ed02757 backported at all. But I did consider backporting it to address 4e89dce72521. Anyway, the proper solution is merged for 6.0 in 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") and I have a revert of "scsi: hisi_sas: Limit max hw sectors for v3 HW" queued for 6.1, but I would not plan on reverting for stable.
Please let me know if any issue here.
Thanks, John
Signed-off-by: Yu Kuai yukuai3@huawei.com
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index dfe7e6370d84..cd41dc061d87 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -2738,7 +2738,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) struct hisi_hba *hisi_hba = shost_priv(shost); struct device *dev = hisi_hba->dev; int ret = sas_slave_configure(sdev); - unsigned int max_sectors; if (ret) return ret; @@ -2756,12 +2755,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev) } } - /* Set according to IOMMU IOVA caching limit */ - max_sectors = min_t(size_t, queue_max_hw_sectors(sdev->request_queue), - (PAGE_SIZE * 32) >> SECTOR_SHIFT);
- blk_queue_max_hw_sectors(sdev->request_queue, max_sectors);
return 0; }
.
On 27/09/2022 14:14, Yu Kuai wrote:
Hi, John
在 2022/09/27 21:06, John Garry 写道:
On 27/09/2022 14:01, Yu Kuai wrote:
This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
- commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while there are still free space available. This is not backported to 5.10 stable.
This arrived in 5.11, I think
- commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this is just a temporary solution and will cause io performance regression because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
Did you really notice a performance regression? In what scenario? which kernel versions?
We are using 5.10, and test tool is fs_mark and it's doing writeback, and benefits from io merge, before this patch, avgqusz is 300+, and this patch will limit avgqusz to 128.
OK, so I think it's ok to revert for 5.10
I think that in any other case that io size is greater than 128k, this patch will probably have defects.
However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was automatically backported for 5.15 stable. Please double check that.
And can you also check performance there for those kernels?
The reason which we had fce54ed02757 was because 4e89dce72521 hammered performance when IOMMU enabled, and at least I saw no performance regression for fce54ed02757 in other scenarios.
Thanks, John
Hi, John
在 2022/09/27 21:45, John Garry 写道:
On 27/09/2022 14:14, Yu Kuai wrote:
Hi, John
在 2022/09/27 21:06, John Garry 写道:
On 27/09/2022 14:01, Yu Kuai wrote:
This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
- commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while there are still free space available. This is not backported to 5.10 stable.
This arrived in 5.11, I think
- commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this is just a temporary solution and will cause io performance regression because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
Did you really notice a performance regression? In what scenario? which kernel versions?
We are using 5.10, and test tool is fs_mark and it's doing writeback, and benefits from io merge, before this patch, avgqusz is 300+, and this patch will limit avgqusz to 128.
OK, so I think it's ok to revert for 5.10
I think that in any other case that io size is greater than 128k, this patch will probably have defects.
However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was automatically backported for 5.15 stable. Please double check that.
And can you also check performance there for those kernels?
I'm pretty sure io split can decline performance, especially for HDD, because blk-mq can't guarantee that split io can be dispatched to disk sequentially. However, this is usually not common with proper max_sectors_kb.
Here is an example that if max_sector_kb is 128k, performance will drop a lot under high concurrency:
https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/
Here I set max_sectors_kb to 128k manually, and 1m random io performance will drop while io concurrency increase:
| numjobs | v5.18-rc1 | | ------- | --------- | | 1 | 67.7 | | 2 | 67.7 | | 4 | 67.7 | | 8 | 67.7 | | 16 | 64.8 | | 32 | 59.8 | | 64 | 54.9 | | 128 | 49 | | 256 | 37.7 | | 512 | 31.8 |
Thanks, Kuai
The reason which we had fce54ed02757 was because 4e89dce72521 hammered performance when IOMMU enabled, and at least I saw no performance regression for fce54ed02757 in other scenarios.
Thanks, John
.
On 27/09/2022 15:05, Yu Kuai wrote:
However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was automatically backported for 5.15 stable. Please double check that.
And can you also check performance there for those kernels?
I'm pretty sure io split can decline performance, especially for HDD, because blk-mq can't guarantee that split io can be dispatched to disk sequentially. However, this is usually not common with proper max_sectors_kb.
Here is an example that if max_sector_kb is 128k, performance will drop a lot under high concurrency:
https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/
Here I set max_sectors_kb to 128k manually, and 1m random io performance will drop while io concurrency increase:
| numjobs | v5.18-rc1 | | ------- | --------- | | 1 | 67.7 | | 2 | 67.7 | | 4 | 67.7 | | 8 | 67.7 | | 16 | 64.8 | | 32 | 59.8 | | 64 | 54.9 | | 128 | 49 | | 256 | 37.7 | | 512 | 31.8 |
Commit fce54ed02757 was to circumvent a terrible performance hit for IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU enabled?
If fce54ed02757 really does cause a performance regression in some scenarios, then we can consider reverting it from any stable kernel and also backporting [0] when it is included in Linus' kernel
[0] https://lore.kernel.org/linux-iommu/495de02c-59ce-917f-1cb4-5425a37063ed@hua...
thanks, John
Hi, John
在 2022/09/27 23:54, John Garry 写道:
On 27/09/2022 15:05, Yu Kuai wrote:
However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was automatically backported for 5.15 stable. Please double check that.
And can you also check performance there for those kernels?
I'm pretty sure io split can decline performance, especially for HDD, because blk-mq can't guarantee that split io can be dispatched to disk sequentially. However, this is usually not common with proper max_sectors_kb.
Here is an example that if max_sector_kb is 128k, performance will drop a lot under high concurrency:
https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/
Here I set max_sectors_kb to 128k manually, and 1m random io performance will drop while io concurrency increase:
| numjobs | v5.18-rc1 | | ------- | --------- | | 1 | 67.7 | | 2 | 67.7 | | 4 | 67.7 | | 8 | 67.7 | | 16 | 64.8 | | 32 | 59.8 | | 64 | 54.9 | | 128 | 49 | | 256 | 37.7 | | 512 | 31.8 |
Commit fce54ed02757 was to circumvent a terrible performance hit for IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU enabled?
I understand that fce54ed02757 fix a terrible performance regression, and I'm not familiar with IOMMU and I never test that.
If fce54ed02757 really does cause a performance regression in some scenarios, then we can consider reverting it from any stable kernel and also backporting [0] when it is included in Linus' kernel
That sounds good.
For 5.10 stable, I think it's ok to revert it for now, and if someone cares about the problem 4e89dce72521 fixed, they can try to backport it together with follow up patches.
Thanks, Kuai
[0] https://lore.kernel.org/linux-iommu/495de02c-59ce-917f-1cb4-5425a37063ed@hua...
thanks, John .
On 28/09/2022 02:35, Yu Kuai wrote:
However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was automatically backported for 5.15 stable. Please double check that.
And can you also check performance there for those kernels?
I'm pretty sure io split can decline performance, especially for HDD, because blk-mq can't guarantee that split io can be dispatched to disk sequentially. However, this is usually not common with proper max_sectors_kb.
Here is an example that if max_sector_kb is 128k, performance will drop a lot under high concurrency:
https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/
This never got merged in any form, right?
Here I set max_sectors_kb to 128k manually, and 1m random io performance will drop while io concurrency increase:
| numjobs | v5.18-rc1 | | ------- | --------- | | 1 | 67.7 | | 2 | 67.7 | | 4 | 67.7 | | 8 | 67.7 | | 16 | 64.8 | | 32 | 59.8 | | 64 | 54.9 | | 128 | 49 | | 256 | 37.7 | | 512 | 31.8 |
Commit fce54ed02757 was to circumvent a terrible performance hit for IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU enabled?
I understand that fce54ed02757 fix a terrible performance regression, and I'm not familiar with IOMMU and I never test that.
If fce54ed02757 really does cause a performance regression in some scenarios, then we can consider reverting it from any stable kernel and also backporting [0] when it is included in Linus' kernel
That sounds good.
For 5.10 stable, I think it's ok to revert it for now, and if someone cares about the problem 4e89dce72521 fixed, they can try to backport it together with follow up patches.
For 5.10 stable revert only,
Reviewed-by: John Garry john.garry@huawei.com
Thanks, John
Hi,
在 2022/09/28 15:36, John Garry 写道:
On 28/09/2022 02:35, Yu Kuai wrote:
However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was automatically backported for 5.15 stable. Please double check that.
And can you also check performance there for those kernels?
I'm pretty sure io split can decline performance, especially for HDD, because blk-mq can't guarantee that split io can be dispatched to disk sequentially. However, this is usually not common with proper max_sectors_kb.
Here is an example that if max_sector_kb is 128k, performance will drop a lot under high concurrency:
https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/
This never got merged in any form, right?
Yes.
Here I set max_sectors_kb to 128k manually, and 1m random io performance will drop while io concurrency increase:
| numjobs | v5.18-rc1 | | ------- | --------- | | 1 | 67.7 | | 2 | 67.7 | | 4 | 67.7 | | 8 | 67.7 | | 16 | 64.8 | | 32 | 59.8 | | 64 | 54.9 | | 128 | 49 | | 256 | 37.7 | | 512 | 31.8 |
Commit fce54ed02757 was to circumvent a terrible performance hit for IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU enabled?
I understand that fce54ed02757 fix a terrible performance regression, and I'm not familiar with IOMMU and I never test that.
If fce54ed02757 really does cause a performance regression in some scenarios, then we can consider reverting it from any stable kernel and also backporting [0] when it is included in Linus' kernel
That sounds good.
For 5.10 stable, I think it's ok to revert it for now, and if someone cares about the problem 4e89dce72521 fixed, they can try to backport it together with follow up patches.
For 5.10 stable revert only,
Reviewed-by: John Garry john.garry@huawei.com
Thanks for the review!
Kuai
Thanks, John .
linux-stable-mirror@lists.linaro.org