After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative "short" value.
Use min_t(int, ...) to fix the issue.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Reviewed-by: Ming Lei ming.lei@redhat.com ---
v1 tried to fix the issue by changing the storvsc driver: https://lwn.net/ml/linux-kernel/BYAPR21MB1270BBC14D5F1AE69FC31A16BFB09@BYAPR...
v2 directly fixed the scsi core change instead as Michael Kelley suggested (refer to the above link).
v3 simplified the commit log, as John Garry suggested. Added Haiyang's and Ming's Reviewed-by.
drivers/scsi/hosts.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 3f6f14f0cafb..24b72ee4246f 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -220,7 +220,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, goto fail; }
- shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, + /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */ + shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, shost->can_queue);
error = scsi_init_sense_cache(shost);
On 08/10/2021 05:35, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative "short" value.
Use min_t(int, ...) to fix the issue.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") Cc:stable@vger.kernel.org Signed-off-by: Dexuan Cuidecui@microsoft.com Reviewed-by: Haiyang Zhanghaiyangz@microsoft.com Reviewed-by: Ming Leiming.lei@redhat.com
Reviewed-by: John Garry john.garry@huawei.com
thanks
Dexuan,
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative "short" value.
Use min_t(int, ...) to fix the issue.
I queued this up as a short term workaround. However, I am hoping that the rework of the scaling code in storvsc lands soon.
From: Martin K. Petersen martin.petersen@oracle.com Sent: Tuesday, October 12, 2021 9:42 AM
Dexuan,
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative "short" value.
Use min_t(int, ...) to fix the issue.
I queued this up as a short term workaround. However, I am hoping that the rework of the scaling code in storvsc lands soon.
Thanks, Martin! I know Michael Kelley will improve the netvsc.
Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should change scsi_driver.can_queue from "int" to "u16"? BTW, I guess the "cmd_per_lun" should also be "u16" rather than "short"?
This was discussed in May, and it looks like the conclusion was not clear to me: https://lwn.net/ml/linux-kernel/457d23a9-deb0-4ee1-fe7f-5a63605d9686@huawei....
Thanks, -- Dexuan
Dexuan,
Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should change scsi_driver.can_queue from "int" to "u16"?
BTW, I guess the "cmd_per_lun" should also be "u16" rather than "short"?
I agree that it would be nice to get all this cleaned up. Several, somewhat peculiar, 25-year old design choices.
cmd_per_lun has traditionally been in the ballpark of low hundreds, can_queue typically in the low thousands. And the block layer currently caps at ~10K. Happy to take patches fixing this up, although I am a bit worried about how much churn it will generate.
That said, I do think that cleaning this up is somewhat orthogonal to the issue with storvsc. I suspect that allowing a huge amount of concurrent outstanding commands is going to be detrimental to performance for most workloads. And from that perspective I think that the short->int fix, while valid given the type discrepancy, is just treating the symptom.
Therefore I consider the short->int fix a workaround. And the proper fix involves looking closely at things are scaled in the storvsc case. Which I have noted that Michael is working on.
From: Martin K. Petersen martin.petersen@oracle.com Sent: Tuesday, October 12, 2021 2:28 PM To: Dexuan Cui decui@microsoft.com
Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should change scsi_driver.can_queue from "int" to "u16"?
BTW, I guess the "cmd_per_lun" should also be "u16" rather than "short"?
I agree that it would be nice to get all this cleaned up. Several, somewhat peculiar, 25-year old design choices.
cmd_per_lun has traditionally been in the ballpark of low hundreds, can_queue typically in the low thousands. And the block layer currently caps at ~10K. Happy to take patches fixing this up, although I am a bit worried about how much churn it will generate.
Thanks for the explanation!
That said, I do think that cleaning this up is somewhat orthogonal to the issue with storvsc. I suspect that allowing a huge amount of concurrent outstanding commands is going to be detrimental to performance for most workloads. And from that perspective I think that the short->int fix, while valid given the type discrepancy, is just treating the symptom.
Agreed.
Therefore I consider the short->int fix a workaround. And the proper fix involves looking closely at things are scaled in the storvsc case. Which I have noted that Michael is working on.
Agreed. My v1 actually tried to work around the storvsc driver instread. :-)
On Thu, 7 Oct 2021 21:35:46 -0700, Dexuan Cui wrote:
After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative "short" value.
Use min_t(int, ...) to fix the issue.
[...]
Applied to 5.15/scsi-fixes, thanks!
[1/1] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() https://git.kernel.org/mkp/scsi/c/50b6cb351636
linux-stable-mirror@lists.linaro.org