On Tue, 2018-04-10 at 07:20 -0700, Tejun Heo wrote:
On Mon, Apr 09, 2018 at 06:34:55PM -0700, Bart Van Assche wrote:
Since the request state can be updated from two different contexts, namely regular completion and request timeout, this race cannot be fixed with RCU synchronization only. Fix this race as follows:
Well, it can be and the patches have been posted months ago.
That's not correct. I have explained you in detail that the two patches you posted do not fix all the races fixed by the patch at the start of this e-mail thread.
Switching to another model might be better but let's please do that with the right rationales. A good portion of this seems to be built on misunderstandings.
Which misunderstandings? I'm not aware of any misunderstandings at my side. Additionally, tests with two different block drivers (NVMeOF initiator and the SRP initiator driver) have shown that the current blk-mq timeout implementation with or without your two patches applied result in subtle and hard to debug crashes and/or memory corruption. That is not the case for the patch at the start of this thread. The latest report of a crash I ran into myself and that is fixed by the patch at the start of this thread is available here: https://www.spinics.net/lists/linux-rdma/msg63240.html.
Please also keep in mind that if this patch would be accepted that that does not prevent this patch to be replaced with an RCU-based solution later on. If anyone comes up any time with a reliably working RCU-based solution I will be happy to accept a revert of this patch and I will help reviewing that RCU-based solution.
Bart.