On 10/30/23 08:37, Mikulas Patocka wrote:
On Sun, 29 Oct 2023, Vlastimil Babka wrote:
Haven't found any. However I'd like to point out some things I noticed in crypt_alloc_buffer(), although they are probably not related.
static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size) { struct crypt_config *cc = io->cc; struct bio *clone; unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM; unsigned int remaining_size; unsigned int order = MAX_ORDER - 1;
retry: if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM)) mutex_lock(&cc->bio_alloc_lock);
What if we end up in "goto retry" more than once? I don't see a matching
It is impossible. Before we jump to the retry label, we set __GFP_DIRECT_RECLAIM. mempool_alloc can't ever fail if __GFP_DIRECT_RECLAIM is present (it will just wait until some other task frees some objects into the mempool).
Ah, missed that. And the traces don't show that we would be waiting for that. I'm starting to think the allocation itself is really not the issue here. Also I don't think it deprives something else of large order pages, as per the sysrq listing they still existed.
What I rather suspect is what happens next to the allocated bio such that it works well with order-0 or up to costly_order pages, but there's some problem causing a deadlock if the bio contains larger pages than that?
Cc Honza. The thread starts here: https://lore.kernel.org/all/ZTNH0qtmint%2FzLJZ@mail-itl/
The linked qubes reports has a number of blocked task listings that can be expanded: https://github.com/QubesOS/qubes-issues/issues/8575