On 3/10/23 1:16 PM, Eric Biggers wrote:
On Fri, Mar 10, 2023 at 12:14:10PM -0800, Eric Biggers wrote:
On Fri, Mar 10, 2023 at 07:33:37PM +0000, Mike Cloaked wrote:
With kerne. 6.2.3 if I simply plug in a usb external drive, mount it and umount it, then the journal has a kernel Oops and I have submitted a bug report, that includes the journal output, at https://bugzilla.kernel.org/show_bug.cgi?id=217174
As soon as the usb drive is unmounted, the kernel Oops occurs, and the machine hangs on shutdown and needs a hard reboot.
I have reproduced the same issue on three different machines, and in each case downgrading back to kernel 6.2.2 resolves the issue and it no longer occurs.
This would seem to be a regression in kernel 6.2.3
Mike C
Thanks for reporting this! If this is reliably reproducible and is known to be a regression between v6.2.2 and v6.2.3, any chance you could bisect it to find out the exact commit that introduced the bug?
For reference I'm also copying the stack trace from bugzilla below:
BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 9 PID: 1118 Comm: lvcreate Tainted: G T 6.2.3-1> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Ex> RIP: 0010:blk_throtl_update_limit_valid+0x1f/0x110
BTW, the block/ commits between v6.2.2 and v6.2.3 were:
blk-mq: avoid sleep in blk_mq_alloc_request_hctx blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait blk-mq: Fix potential io hung for shared sbitmap per tagset blk-mq: correct stale comment of .get_budget block: sync mixed merged request's failfast with 1st bio's block: Fix io statistics for cgroup in throttle path block: bio-integrity: Copy flags when bio_integrity_payload is cloned block: use proper return value from bio_failfast() blk-iocost: fix divide by 0 error in calc_lcoefs() blk-cgroup: dropping parent refcount after pd_free_fn() is done blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy() block: don't allow multiple bios for IOCB_NOWAIT issue block: clear bio->bi_bdev when putting a bio back in the cache block: be a bit more careful in checking for NULL bdev while polling
Without having any in-depth knowledge here, I think "blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()" looks the most suspicious here... I see that AUTOSEL selected it from a 3-patch series without backporting patch 2, maybe that could be it? Anyway, just a hunch.
Was just looking at this too, the primary suspects would indeed be those two blk-cgroup changes. And yes, they ended up in stable due to auto selection, and very odd how it picked 2 and not the 3rd?!
But I would revert:
bfe46d2efe46c5c952f982e2ca94fe2ec5e58e2a 57a425badc05c2e87e9f25713e5c3c0298e4202c
in that order from 6.2.3 and see if that helps. Adding Yu.