On Tue, 31 Oct 2023, Marek Marczykowski-Górecki wrote:
On Tue, Oct 31, 2023 at 03:01:36PM +0100, Jan Kara wrote:
On Tue 31-10-23 04:48:44, Marek Marczykowski-Górecki wrote:
Then tried:
- PAGE_ALLOC_COSTLY_ORDER=4, order=4 - cannot reproduce,
- PAGE_ALLOC_COSTLY_ORDER=4, order=5 - cannot reproduce,
- PAGE_ALLOC_COSTLY_ORDER=4, order=6 - freeze rather quickly
I've retried the PAGE_ALLOC_COSTLY_ORDER=4,order=5 case several times and I can't reproduce the issue there. I'm confused...
And this kind of confirms that allocations > PAGE_ALLOC_COSTLY_ORDER causing hangs is most likely just a coincidence. Rather something either in the block layer or in the storage driver has problems with handling bios with sufficiently high order pages attached. This is going to be a bit painful to debug I'm afraid. How long does it take for you trigger the hang? I'm asking to get rough estimate how heavy tracing we can afford so that we don't overwhelm the system...
Sometimes it freezes just after logging in, but in worst case it takes me about 10min of more or less `tar xz` + `dd`.
Hi
I would like to ask you to try this patch. Revert the changes to "order" and "PAGE_ALLOC_COSTLY_ORDER" back to normal and apply this patch on a clean upstream kernel.
Does it deadlock?
There is a bug in dm-crypt that it doesn't account large pages in cc->n_allocated_pages, this patch fixes the bug.
Mikulas
--- drivers/md/dm-crypt.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
Index: linux-stable/drivers/md/dm-crypt.c =================================================================== --- linux-stable.orig/drivers/md/dm-crypt.c 2023-10-31 16:25:09.000000000 +0100 +++ linux-stable/drivers/md/dm-crypt.c 2023-10-31 16:53:14.000000000 +0100 @@ -1700,11 +1700,16 @@ retry: order = min(order, remaining_order);
while (order > 0) { + if (unlikely(percpu_counter_read_positive(&cc->n_allocated_pages) + (1 << order) > dm_crypt_pages_per_client)) + goto decrease_order; pages = alloc_pages(gfp_mask | __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_COMP, order); - if (likely(pages != NULL)) + if (likely(pages != NULL)) { + percpu_counter_add(&cc->n_allocated_pages, 1 << order); goto have_pages; + } +decrease_order: order--; }
@@ -1742,10 +1747,12 @@ static void crypt_free_buffer_pages(stru
if (clone->bi_vcnt > 0) { /* bio_for_each_folio_all crashes with an empty bio */ bio_for_each_folio_all(fi, clone) { - if (folio_test_large(fi.folio)) + if (folio_test_large(fi.folio)) { + percpu_counter_sub(&cc->n_allocated_pages, 1 << folio_order(fi.folio)); folio_put(fi.folio); - else + } else { mempool_free(&fi.folio->page, &cc->page_pool); + } } } }