On Wed, May 14, 2025 at 01:29:36PM -0700, Bart Van Assche wrote:
/* * Now assemble so we handle the lowest level first. */
bio_list_merge(&bio_list_on_stack[0], &lower); bio_list_merge(&bio_list_on_stack[0], &same);bio_list_on_stack[0] = bio_list_on_stack[1];
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
If I read this code correctly, this means that we no keep processing bios that already were on bio_list_on_stack[0] and the beginning of the loop in the next iteration(s) instead of finishing off the ones created by this iteration, which could lead to exhaustion of resources like mempool.
Note that this is a big if - the code is really hard to read, it should really grow a data structure for the on-stack list that has named members for both lists instead of the array magic.. :(
I'm still trying to understand your problem given that it wasn't described much. What I could think it is that bio_split_to_limits through bio_submit_split first re-submits the remainder bio using submit_bio_noacct, which the above should place on the same list and then later the stacking block drivers also submit the bio split off at the beginning, unlike blk-mq drivers that process it directly. But given that this resubmission should be on the lower list above I don't see how it causes problems.