Re: [PATCH v3] nvme: rdma/tcp: fix list corruption with anatt timer

28 Apr 2021


      On 4/27/21 9:54 PM, Martin Wilck wrote:
...
On Tue, 2021-04-27 at 20:05 +0200, Hannes Reinecke wrote:
...
On 4/27/21 6:25 PM, Christoph Hellwig wrote:
...
On Tue, Apr 27, 2021 at 11:33:04AM +0200, Hannes Reinecke wrote:
...
As indicated in my previous mail, please change the description.
We have
since established a actual reason (duplicate calls to
add_timer()), so
please list it here.
So what happens if the offending add_timer is changed to mod_timer?
I guess that should be fine, as the boilerplate said it can act
as a safe version of add_timer.
But that would just solve the crash upon add_timer().
The code doesn't use add_timer(), only mod_timer() and
del_timer_sync(). And we didn't observe a crash upon add_timer(). What
we observed was that a timer had been enqueued multiple times, and the
kernel crashes in expire_timers()->detach_timer(), when it encounters
an already detached entry in the timer list.
nvme_mpath_init() doesn't use add_timer, but it uses timer_setup(). And
calling that on an already pending timer is even worse :-)
And my point is that the anatt timer is not stopped at the end of
nvme_init_identify() if any of the calls to
nvme_configure_apst()
nvme_configure_timestamp()
nvme_configure_directives()
nvme_configure_acre()
returns with an error. If they do the controller is reset, causing
eg nvme_tcp_configure_admin_queue() to be called, which will be
calling timer_setup() with the original timer still running.
If the (original) timer triggers _after_ that time we have the crash.
Cheers,
Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3] nvme: rdma/tcp: fix list corruption with anatt timer