Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce() must use synchronize_sched().
Reported-by: Tejun Heo tj@kernel.org Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") Signed-off-by: Bart Van Assche bart.vanassche@wdc.com Cc: Hannes Reinecke hare@suse.com Cc: Ming Lei ming.lei@redhat.com Cc: Christoph Hellwig hch@lst.de Cc: Johannes Thumshirn jthumshirn@suse.de Cc: Tejun Heo tj@kernel.org Cc: Oleksandr Natalenko oleksandr@natalenko.name Cc: Martin Steigerwald martin@lichtvoll.de Cc: stable@vger.kernel.org # v4.15 --- drivers/scsi/scsi_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1d83f29aee74..0b99ee2fbbb5 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -3014,7 +3014,7 @@ scsi_device_quiesce(struct scsi_device *sdev) * unfreeze even if the queue was already frozen before this function * was called. See also https://lwn.net/Articles/573497/. */ - synchronize_rcu(); + synchronize_sched(); blk_mq_unfreeze_queue(q);
mutex_lock(&sdev->state_mutex);
Hello Bart.
What is this one about?
Fix for the regression I (and others?) reported?¹
[1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and boot failures with blk_mq_terminate_expired in backtrace
https://bugzilla.kernel.org/show_bug.cgi?id=199077
Thanks, Martin
Bart Van Assche - 16.03.18, 18:35:
Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce() must use synchronize_sched().
Reported-by: Tejun Heo tj@kernel.org Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") Signed-off-by: Bart Van Assche bart.vanassche@wdc.com Cc: Hannes Reinecke hare@suse.com Cc: Ming Lei ming.lei@redhat.com Cc: Christoph Hellwig hch@lst.de Cc: Johannes Thumshirn jthumshirn@suse.de Cc: Tejun Heo tj@kernel.org Cc: Oleksandr Natalenko oleksandr@natalenko.name Cc: Martin Steigerwald martin@lichtvoll.de Cc: stable@vger.kernel.org # v4.15
drivers/scsi/scsi_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1d83f29aee74..0b99ee2fbbb5 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -3014,7 +3014,7 @@ scsi_device_quiesce(struct scsi_device *sdev) * unfreeze even if the queue was already frozen before this function * was called. See also https://lwn.net/Articles/573497/. */
- synchronize_rcu();
synchronize_sched(); blk_mq_unfreeze_queue(q);
mutex_lock(&sdev->state_mutex);
On 03/16/18 14:42, Martin Steigerwald wrote:
What is this one about?
Fix for the regression I (and others?) reported?¹
[1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and boot failures with blk_mq_terminate_expired in backtrace
Hello Martin,
This patch is a fix for the fix for the bug that you and others had reported. See also "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
This patch fixes an API violation. For those users for which suspend and resume works fine with commit 3a0a529971ec applied it should still work fine with this patch applied since this patch may cause scsi_device_quiesce() to wait longer.
Bart.
Hi Bart.
Bart Van Assche - 16.03.18, 22:51:
On 03/16/18 14:42, Martin Steigerwald wrote:
What is this one about?
Fix for the regression I (and others?) reported?¹
[1] [Bug 199077] [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and boot failures with blk_mq_terminate_expired in backtrace
[…]
This patch is a fix for the fix for the bug that you and others had reported. See also "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
This patch fixes an API violation. For those users for which suspend and resume works fine with commit 3a0a529971ec applied it should still work fine with this patch applied since this patch may cause scsi_device_quiesce() to wait longer.
Okay, so if I understand you correctly, this is not related to the regression I mentioned above.
Testing anyway.
Thanks,
On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:
Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce() must use synchronize_sched().
Is there a reason to use sched-RCU here instead of the regular one? If not, it'd be better to switch to regular RCU than the other way around.
Thanks.
On Mon, 2018-03-19 at 07:31 -0700, Tejun Heo wrote:
On Fri, Mar 16, 2018 at 10:35:16AM -0700, Bart Van Assche wrote:
Since blk_queue_enter() uses rcu_read_lock_sched() scsi_device_quiesce() must use synchronize_sched().
Is there a reason to use sched-RCU here instead of the regular one? If not, it'd be better to switch to regular RCU than the other way around.
Hello Tejun,
As explained in the comment in scsi_device_quiesce(), the effect of blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls surround a function call that uses rcu_read_lock_sched(), namely percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter() and scsi_device_quiesce().
Bart.
Hello,
On Mon, Mar 19, 2018 at 03:16:07PM +0000, Bart Van Assche wrote:
As explained in the comment in scsi_device_quiesce(), the effect of blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls surround a function call that uses rcu_read_lock_sched(), namely percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter() and scsi_device_quiesce().
Let's please not depend on rcu_read_lock_sched() in percpu_ref_tryget_live(). That's an implementation detail. If the code needs RCU based synchronization, it should perform them explicitly.
Thanks.
On Mon, 2018-03-19 at 08:21 -0700, tj@kernel.org wrote:
On Mon, Mar 19, 2018 at 03:16:07PM +0000, Bart Van Assche wrote:
As explained in the comment in scsi_device_quiesce(), the effect of blk_set_preempt_only() must be visible for percpu_ref_tryget() calls that occur after the queue unfreeze triggered by scsi_device_quiesce(). Hence the RCU read lock calls in blk_queue_enter(). Since these RCU read lock calls surround a function call that uses rcu_read_lock_sched(), namely percpu_ref_tryget_live(), we have to use sched-RCU in both blk_queue_enter() and scsi_device_quiesce().
Let's please not depend on rcu_read_lock_sched() in percpu_ref_tryget_live(). That's an implementation detail. If the code needs RCU based synchronization, it should perform them explicitly.
Hello Tejun,
The algorithm explained above does not depend on sched-rcu. But because percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock around that call we are forced to use sched-rcu. I hope this makes it clear.
Bart.
Hello,
On Mon, Mar 19, 2018 at 04:18:54PM +0000, Bart Van Assche wrote:
The algorithm explained above does not depend on sched-rcu. But because percpu_ref_tryget_live() uses sched-rcu and because we need to add an RCU lock around that call we are forced to use sched-rcu. I hope this makes it clear.
This could be me being slow but can you explain how what percpu_ref_tryget_live() uses internally affects whether we can use regular RCU around it?
Thanks.
On Mon, 2018-03-19 at 09:29 -0700, tj@kernel.org wrote:
This could be me being slow but can you explain how what percpu_ref_tryget_live() uses internally affects whether we can use regular RCU around it?
Hello Tejun,
For synchronization primitives that wait having a stronger synchronization primitive nested inside a more relaxed one can lead to a deadlock. But since the rcu read lock primitives do not wait it could be safe to use that kind of nesting with RCU. Do you perhaps know whether any documentation is available about that kind of nesting or whether it is already used elsewhere in the kernel?
Thanks,
Bart.
Hey,
On Mon, Mar 19, 2018 at 04:57:45PM +0000, Bart Van Assche wrote:
For synchronization primitives that wait having a stronger synchronization primitive nested inside a more relaxed one can lead to a deadlock. But since the rcu read lock primitives do not wait it could be safe to use that kind of nesting with RCU. Do you perhaps know whether any documentation is available about that kind of nesting or whether it is already used elsewhere in the kernel?
Oh, we nest them all the time. They're like (and sometimes literally are) preempt_disable() and don't care about nest ordering.
Thanks.
On Mon, 2018-03-19 at 10:02 -0700, tj@kernel.org wrote:
On Mon, Mar 19, 2018 at 04:57:45PM +0000, Bart Van Assche wrote:
For synchronization primitives that wait having a stronger synchronization primitive nested inside a more relaxed one can lead to a deadlock. But since the rcu read lock primitives do not wait it could be safe to use that kind of nesting with RCU. Do you perhaps know whether any documentation is available about that kind of nesting or whether it is already used elsewhere in the kernel?
Oh, we nest them all the time. They're like (and sometimes literally are) preempt_disable() and don't care about nest ordering.
Hello Martin,
This was probably already clear to you, but anyway: please drop the patch at the start of this thread.
Thanks,
Bart.
linux-stable-mirror@lists.linaro.org