W dniu 30.07.2024 o 17:49, Greg Kroah-Hartman pisze:
6.10-stable review patch. If anyone has any objections, please let me know.
From: Mateusz Jończyk mat.jonczyk@o2.pl
commit 36a5c03f232719eb4e2d925f4d584e09cfaf372c upstream.
Linux 6.9+ is unable to start a degraded RAID1 array with one drive, when that drive has a write-mostly flag set. During such an attempt, the following assertion in bio_split() is hit:
BUG_ON(sectors <= 0);
Call Trace: ? bio_split+0x96/0xb0 ? exc_invalid_op+0x53/0x70 ? bio_split+0x96/0xb0 ? asm_exc_invalid_op+0x1b/0x20 ? bio_split+0x96/0xb0 ? raid1_read_request+0x890/0xd20 ? __call_rcu_common.constprop.0+0x97/0x260 raid1_make_request+0x81/0xce0 ? __get_random_u32_below+0x17/0x70 ? new_slab+0x2b3/0x580 md_handle_request+0x77/0x210 md_submit_bio+0x62/0xa0 __submit_bio+0x17b/0x230 submit_bio_noacct_nocheck+0x18e/0x3c0 submit_bio_noacct+0x244/0x670
After investigation, it turned out that choose_slow_rdev() does not set the value of max_sectors in some cases and because of it, raid1_read_request calls bio_split with sectors == 0.
Fix it by filling in this variable.
This bug was introduced in commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") but apparently hidden until commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()") shortly thereafter.
Cc: stable@vger.kernel.org # 6.9.x+ Signed-off-by: Mateusz Jończyk mat.jonczyk@o2.pl Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") Cc: Song Liu song@kernel.org Cc: Yu Kuai yukuai3@huawei.com Cc: Paul Luse paul.e.luse@linux.intel.com Cc: Xiao Ni xni@redhat.com Cc: Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Hello,
FYI there is a second regression in Linux 6.9 - 6.11, which occurs with RAID component devices with a write-mostly flag when a new device is added to the array. (A write-mostly flag on a device specifies that the kernel is to avoid reading from such a device, if possible. It is enabled only manually with a mdadm command line switch and can be beneficial when devices are of different speed). The kernel than reads from the wrong component device before it is synced, which may result in data corruption.
Link: https://lore.kernel.org/lkml/9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl/T/
This is not caused by this patch, but only linked by similar functions and the write-mostly flag being involved in both cases. The issue is that without this patch, the kernel will fail to start or keep running a RAID array with a single write-mostly device and the user will not be able to add another device to it, which triggered the second regression.
Paul was of the opinion that this first patch should land nonetheless. I would like you to decide whether to ship it now or defer it.
Greetings,
Mateusz