On Thu, 2 Nov 2023, Marek Marczykowski-Górecki wrote:
On Tue, Oct 31, 2023 at 06:24:19PM +0100, Mikulas Patocka wrote:
Hi
I would like to ask you to try this patch. Revert the changes to "order" and "PAGE_ALLOC_COSTLY_ORDER" back to normal and apply this patch on a clean upstream kernel.
Does it deadlock?
There is a bug in dm-crypt that it doesn't account large pages in cc->n_allocated_pages, this patch fixes the bug.
This patch did not help.
If the previous patch didn't fix it, try this patch (on a clean upstream kernel).
This patch allocates large pages, but it breaks them up into single-page entries when adding them to the bio.
But this does help.
Thanks. So we can stop blaming the memory allocator and start blaming the NVMe subsystem.
I added NVMe maintainers to this thread - the summary of the problem is: In dm-crypt, we allocate a large compound page and add this compound page to the bio as a single big vector entry. Marek reports that on his system it causes deadlocks, the deadlocks look like a lost bio that was never completed. When I chop the large compound page to individual pages in dm-crypt and add bio vector for each of them, Marek reports that there are no longer any deadlocks. So, we have a problem (either hardware or software) that the NVMe subsystem doesn't like bio vectors with large bv_len. This is the original bug report: https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/
Marek, what NVMe devices do you use? Do you use the same device on all 3 machines where you hit this bug?
In the directory /sys/block/nvme0n1/queue: what is the value of dma_alignment, max_hw_sectors_kb, max_sectors_kb, max_segment_size, max_segments, virt_boundary_mask?
Try lowring /sys/block/nvme0n1/queue/max_sectors_kb to some small value (for example 64) and test if it helps.
Mikulas