There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for all users to leave RCU section.
Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Pavel Begunkov asml.silence@gmail.com --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c index f3af499b12a9..ce5fccf00367 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7351,6 +7351,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data, break;
percpu_ref_resurrect(&data->refs); + synchronize_rcu(); io_sqe_rsrc_set_node(ctx, data, backup_node); reinit_completion(&data->done); mutex_unlock(&ctx->uring_lock); @@ -10089,6 +10090,7 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
if (ret) { percpu_ref_resurrect(&ctx->refs); + synchronize_rcu(); goto out_quiesce; } }
On 2/19/21 6:39 PM, Pavel Begunkov wrote:
There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for all users to leave RCU section.
We need to do something better than synchronize_rcu() here, that can take a long time on a loaded box. I'll try and think about this one.
On 20/02/2021 03:40, Jens Axboe wrote:
On 2/19/21 6:39 PM, Pavel Begunkov wrote:
There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for all users to leave RCU section.
We need to do something better than synchronize_rcu() here, that can take a long time on a loaded box. I'll try and think about this one.
It only happens when it can't be drained and there are task_works or signals. I have another patch, doing it via tryget, but it's uglier and I'd rather prefer synchronize_rcu for stable.
Want me to send it tomorrow (on top or not)?
On 2/19/21 8:47 PM, Pavel Begunkov wrote:
On 20/02/2021 03:40, Jens Axboe wrote:
On 2/19/21 6:39 PM, Pavel Begunkov wrote:
There is a short window where percpu_refs are already turned zero, but we try to do resurrect(). Play nicer and wait for all users to leave RCU section.
We need to do something better than synchronize_rcu() here, that can take a long time on a loaded box. I'll try and think about this one.
It only happens when it can't be drained and there are task_works or signals. I have another patch, doing it via tryget, but it's uglier and I'd rather prefer synchronize_rcu for stable.
Right, but the task_work coming in may not be unlikely. So it's not strictly an error path.
linux-stable-mirror@lists.linaro.org