Hi Greg and Stable folks.
We've noticed regression in raid1 due to following commits: 79dabfd00a2b ("md/raid1: hold the barrier until handle_read_error() finishes") caeed0b9f1ce ("md/raid1: free the r1bio before waiting for blocked rdev")
Kernel crash during io tests like below: Sep 11 23:03:15 ps401a-901 kernel: [ 449.007040] RIP: 0010:call_bio_endio+0x1a/0x60 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.007147] Code: 00 5b e9 d9 79 b3 f0 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 8b 47 18 48 8b 5f 30 a8 01 75 04 c6 43 1a 0a 48 8b 53 08 <48> 8b 82 40 02 00 00 48 8b 40 50 48 8b 40 60 a8 80 74 12 48 8b 47 Sep 11 23:03:15 ps401a-901 kernel: [ 449.007347] RSP: 0018:ffffb3300f627b90 EFLAGS: 00010202 Sep 11 23:03:15 ps401a-901 kernel: [ 449.007448] RAX: 0000000000000025 RBX: ffff8d2cab013210 RCX: 0000000000000000 Sep 11 23:03:15 ps401a-901 kernel: [ 449.007582] RDX: 00000001ab000000 RSI: ffff8d2cab013210 RDI: ffff8cfdb1e1d100 Sep 11 23:03:15 ps401a-901 kernel: [ 449.007688] RBP: ffff8d34c2ee2800 R08: 000000000000c0d4 R09: 0000000000073f2c Sep 11 23:03:15 ps401a-901 kernel: [ 449.007795] R10: 0000000000073f34 R11: ffff8cfdb1e1d100 R12: ffff8d2cab013200 Sep 11 23:03:15 ps401a-901 kernel: [ 449.007901] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d34c2ee2800 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008011] FS: 0000000000000000(0000) GS:ffff8d3487c40000(0000) knlGS:0000000000000000 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008248] CR2: 00000001ab000240 CR3: 000000038360a000 CR4: 00000000000406e0 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008355] Call Trace: Sep 11 23:03:15 ps401a-901 kernel: [ 449.008448] <TASK> Sep 11 23:03:15 ps401a-901 kernel: [ 449.008539] ? __die_body+0x1a/0x60 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008638] ? page_fault_oops+0x136/0x2a0 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008754] ? exc_page_fault+0x5f/0x110 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008853] ? asm_exc_page_fault+0x22/0x30 Sep 11 23:03:15 ps401a-901 kernel: [ 449.008955] ? call_bio_endio+0x1a/0x60 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.009055] raid_end_bio_io+0x28/0x90 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.009158] raid1_end_write_request+0x10b/0x340 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.009263] submit_bio_checks+0x84/0x450 Sep 11 23:03:15 ps401a-901 kernel: [ 449.009364] ? __wake_up_common+0x77/0x140 Sep 11 23:03:15 ps401a-901 kernel: [ 449.009463] __submit_bio+0x106/0x190 Sep 11 23:03:15 ps401a-901 kernel: [ 449.009560] ? __queue_work+0x136/0x3b0 Sep 11 23:03:15 ps401a-901 kernel: [ 449.009659] submit_bio_noacct+0x268/0x2c0 Sep 11 23:03:15 ps401a-901 kernel: [ 449.009758] flush_bio_list+0x60/0x100 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.009859] flush_pending_writes+0x71/0xb0 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.009976] raid1d+0xa6/0x1280 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.010076] ? psi_task_switch+0xde/0x200 Sep 11 23:03:15 ps401a-901 kernel: [ 449.010175] ? __switch_to_asm+0x3a/0x60 Sep 11 23:03:15 ps401a-901 kernel: [ 449.010274] ? finish_task_switch+0x7d/0x280 Sep 11 23:03:15 ps401a-901 kernel: [ 449.010373] ? try_to_del_timer_sync+0x4d/0x80 Sep 11 23:03:15 ps401a-901 kernel: [ 449.010475] ? md_thread+0x137/0x170 [md_mod] Sep 11 23:03:15 ps401a-901 kernel: [ 449.010586] ? process_checks+0x4c0/0x4c0 [raid1] Sep 11 23:03:15 ps401a-901 kernel: [ 449.010688] md_thread+0x137/0x170 [md_mod]
Reverting both patches locally I can no longer reproduce the crash.
Please drop both patches from all the stable queues.
Thx! Jnipu Wang @ IONOS cloud
On Tue, Sep 12, 2023 at 01:46:29PM +0200, Jinpu Wang wrote:
Hi Greg and Stable folks.
We've noticed regression in raid1 due to following commits: 79dabfd00a2b ("md/raid1: hold the barrier until handle_read_error() finishes") caeed0b9f1ce ("md/raid1: free the r1bio before waiting for blocked rdev")
I'll drop them from all queues, but can you test 6.6-rc1 to be sure that all is ok there?
thanks,
greg k-h
On Tue, Sep 12, 2023 at 2:08 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Sep 12, 2023 at 01:46:29PM +0200, Jinpu Wang wrote:
Hi Greg and Stable folks.
We've noticed regression in raid1 due to following commits: 79dabfd00a2b ("md/raid1: hold the barrier until handle_read_error() finishes") caeed0b9f1ce ("md/raid1: free the r1bio before waiting for blocked rdev")
I'll drop them from all queues, but can you test 6.6-rc1 to be sure that all is ok there?
Sure, I will test 6.6-rc1.
thanks,
greg k-h
Thx
On Tue, Sep 12, 2023 at 3:53 PM Jinpu Wang jinpu.wang@ionos.com wrote:
On Tue, Sep 12, 2023 at 2:08 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Sep 12, 2023 at 01:46:29PM +0200, Jinpu Wang wrote:
Hi Greg and Stable folks.
We've noticed regression in raid1 due to following commits: 79dabfd00a2b ("md/raid1: hold the barrier until handle_read_error() finishes") caeed0b9f1ce ("md/raid1: free the r1bio before waiting for blocked rdev")
I'll drop them from all queues, but can you test 6.6-rc1 to be sure that all is ok there?
Sure, I will test 6.6-rc1.
I run same tests on 6.6-rc1, and can't reproduce the problem.
thanks,
greg k-h
Thx
On Wed, Sep 13, 2023 at 03:40:12PM +0200, Jinpu Wang wrote:
On Tue, Sep 12, 2023 at 3:53 PM Jinpu Wang jinpu.wang@ionos.com wrote:
On Tue, Sep 12, 2023 at 2:08 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Sep 12, 2023 at 01:46:29PM +0200, Jinpu Wang wrote:
Hi Greg and Stable folks.
We've noticed regression in raid1 due to following commits: 79dabfd00a2b ("md/raid1: hold the barrier until handle_read_error() finishes") caeed0b9f1ce ("md/raid1: free the r1bio before waiting for blocked rdev")
I'll drop them from all queues, but can you test 6.6-rc1 to be sure that all is ok there?
Sure, I will test 6.6-rc1.
I run same tests on 6.6-rc1, and can't reproduce the problem.
So is 6.1-rc just missing something else? Or are these commits not needed at all for older kernels (and hence the Fixes: tag lies?)
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org