Hello, Bart.
On Mon, Apr 09, 2018 at 09:30:27PM +0000, Bart Van Assche wrote:
On Mon, 2018-04-09 at 11:56 -0700, tj@kernel.org wrote:
On Mon, Apr 09, 2018 at 05:03:05PM +0000, Bart Van Assche wrote:
exist today in the blk-mq timeout handling code cannot be fixed completely using RCU only.
I really don't think that is that complicated. Let's first confirm the race fix and get to narrowing / closing that window.
Two months ago it was reported for the first time that commit 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and generation based scheme") introduces a regression. Since that report nobody has posted a patch that fixes all races related to blk-mq timeout handling and that only
The two patches using RCU were posted a long time ago. It was just that the repro that only you had at the time didn't work anymore so we couldn't confirm the fix. If we now have a different repro, awesome. Let's see whether the fix works.
uses RCU. If you want to continue working on this that's fine with me. But since my opinion is that it is impossible to fix these races using RCU only I will continue working on an alternative approach. See also "[PATCH] blk-mq: Fix a race between resetting the timer and completion handling" (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18089.html).
ISTR discussing that patch earlier. Didn't the RCU based fix get posted after that discussion?
Thanks.