On 2024/6/26 11:56, Peter Wang (王信友) wrote:
On Tue, 2024-06-25 at 09:42 -0700, Bart Van Assche wrote:
Please include a full root cause analysis when reposting fixes for the reported crashes. It is not clear to me how it is possible that an invalid pointer is passed to blk_mq_unique_tag() (0x194). As I mentioned in my previous email, freeing a request does not modify the request pointer and does not modify the SCSI command pointer either. As one can derive from the blk_mq_alloc_rqs() call stack, memory for struct request and struct scsi_cmnd is allocated at request queue allocation time and is not freed until the request queue is freed. Hence, for a given tag, neither the request pointer nor the SCSI command pointer changes as long as a request queue exists. Hence my request for an explanation how it is possible that an invalid pointer was passed to blk_mq_unique_tag().
Thanks,
Bart.
Hi Bart,
Sorry I have not explain root-cause clearly. I will add more clear root-cause analyze next version.
And it is not an invalid pointer is passed to blk_mq_unique_tag(), I means blk_mq_unique_tag function try access null pointer. It is differnt and cause misunderstanding.
The null pinter blk_mq_unique_tag try access is: rq->mq_hctx(NULL)->queue_num.
Hi Peter,
What is queue_num's offset of blk_mq_hw_ctx in your machine?
gdb vmlinux
(gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num $5 = 0x164
I read your descriptions and wondered a same race flow as you described following. But I found the offset mismatch, if the racing flow is correct, then the address accessed in blk_mq_unique_tag() should be 0x164, not 0x194. Maybe the offset is different between our machine?
What's more, if the racing flow is correct, I did not get how your changes can address this racing flow.
The racing flow is:
Thread A ufshcd_err_handler step 1 ufshcd_cmd_inflight(true) step 3 ufshcd_mcq_req_to_hwq blk_mq_unique_tag rq->mq_hctx->queue_num step 5
Thread B ufs_mtk_mcq_intr(cq complete ISR) step 2 scsi_done ... __blk_mq_free_request rq->mq_hctx = NULL; step 4
Thanks. Peter