Hi Tejun,
On Tue, Apr 10, 2018 at 08:30:31AM -0700, tj@kernel.org wrote:
Hello, Ming.
On Tue, Apr 10, 2018 at 11:25:54PM +0800, Ming Lei wrote:
- if (time_after_eq(jiffies, deadline) &&
blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) {
blk_mq_rq_timed_out(rq, reserved);
Normal completion still can happen between blk_mq_change_rq_state() and blk_mq_rq_timed_out().
In tj's approach, there is synchronize_rcu() between writing aborted_gstate and blk_mq_rq_timed_out, it is easier for normal completion to happen during the big window.
I don't think plugging this hole is all that difficult, but this shouldn't lead to any critical failures. If so, that'd be a driver bug.
I agree, the issue should be in driver's irq handler and .timeout in theory.
For example, even though one request has been done by irq handler, .timeout still may return RESET_TIMER.
Thanks, Ming