On Sun, Jul 09, 2023 at 10:38:29AM +0300, Sagi Grimberg wrote:
namespace's request queue is frozen and quiesced during error recovering, writeback IO is blocked in bio_queue_enter(), so fsync_bdev() <- del_gendisk() can't move on, and causes IO hang. Removal could be from sysfs, hard unplug or error handling.
Fix this kind of issue by marking controller as DEAD if removal breaks error recovery.
This ways is reasonable too, because controller can't be recovered any more after being removed.
This looks fine to me Ming, Reviewed-by: Sagi Grimberg sagi@grimberg.me
I still want your patches for tcp/rdma that move the freeze. If you are not planning to send them, I swear I will :)
Ming, can you please send the tcp/rdma patches that move the freeze? As I said before, it addresses an existing issue with requests unnecessarily blocked on a frozen queue instead of failing over.
Any chance to fix the current issue in one easy(backportable) way[1] first?
All previous discussions on delay freeze[2] are generic, which apply on all nvme drivers, not mention this error handling difference causes extra maintain burden. I still suggest to convert all drivers in same way, and will work along the approach[1] aiming for v6.6.
[1] https://lore.kernel.org/linux-nvme/20230629064818.2070586-1-ming.lei@redhat.... [2] https://lore.kernel.org/linux-block/5bddeeb5-39d2-7cec-70ac-e3c623a8fca6@gri...
Thanks, Ming