On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:
Well, it would be possible that larger pages in a bio would trip e.g. bio splitting due to maximum segment size the disk supports (which can be e.g. 0xffff) and that upsets something somewhere. But this is pure speculation. We definitely need more debug data to be able to tell more.
I can collect more info, but I need some guidance how :) Some patch adding extra debug messages? Note I collect those via serial console (writing to disk doesn't work when it freezes), and that has some limits in the amount of data I can extract especially when printed quickly. For example sysrq-t is too much. Or maybe there is some trick to it, like increasing log_bug_len?
If you can do more tests, I would suggest this:
We already know that it works with order 3 and doesn't work with order 4.
So, in the file include/linux/mmzone.h, change PAGE_ALLOC_COSTLY_ORDER from 3 to 4 and in the file drivers/md/dm-crypt.c leave "unsigned int order = PAGE_ALLOC_COSTLY_ORDER" there.
Does it deadlock or not?
So, that we can see whether the deadlock depends on PAGE_ALLOC_COSTLY_ORDER or whether it is just a coincidence.
Mikulas