On Thu, Nov 02, 2023 at 06:06:33PM +0100, Mikulas Patocka wrote:
On Thu, 2 Nov 2023, Marek Marczykowski-Górecki wrote:
On Thu, Nov 02, 2023 at 10:28:57AM +0100, Mikulas Patocka wrote:
Try lowring /sys/block/nvme0n1/queue/max_sectors_kb to some small value (for example 64) and test if it helps.
Yes, this helps too.
On a plain upstream kernel with no other modifications (and with default max_sectors_kb), set the value /sys/module/nvme/parameters/sgl_threshold to "0" and test it if it deadlocks. Then, set this value to "1" and test it again.
Got deadlock wit both values.
Revert sgl_threshold back to the default (32768). Boot the kernel with the option "iommu=panic". Reproduce the deadlock and if you get a kernel panic, send us the panic log.
This is a Xen PV, so Linux is not in charge of IOMMU here. And there is SWIOTLB involved (64MB of it), I'm not sure if for every DMA, but definitely for some.
Then, try this patch (without "iommu=panic"), reproduce the deadlock and tell us which one of the "printk" statements is triggered during the deadlock.
I'll try this next.