On 4/22/24 13:15, Sean Anderson wrote:
On 4/22/24 12:49, Keith Busch wrote:
On Mon, Apr 22, 2024 at 12:28:23PM -0400, Sean Anderson wrote:
Sandisk SN530 NVMe drives have broken MSIs. On systems without MSI-X support, all commands time out resulting in the following message:
nvme nvme0: I/O tag 12 (100c) QID 0 timeout, completion polled
These timeouts cause the boot to take an excessively-long time (over 20 minutes) while the initial command queue is flushed.
Address this by adding a quirk for drives with buggy MSIs. The lspci output for this device (recorded on a system with MSI-X support) is:
Based on your description, the patch looks good. This will fallback to legacy emulated pin interrupts, and that's better than timeout polling, but will still appear sluggish compared to MSI's. Is there an errata from the vendor on this? I'm just curious if the bug is at the Device ID level, and not something we could constrain to a particular model or firmware revision.
I wasn't able to find any errata for this drive. I wasn't able to determine if there are any firmware updates for this drive (FWIW I have version "21160001"). I'll contact WD and see if they know about this issue.
Well, the response from WD support was "we don't support Linux, and if we did there aren't any bugs in the drive anyway".
--Sean