From: Yu Kuai yukuai3@huawei.com
Changes in v2: - Add descriptions about chagnes from the original commits, in order to fix conflicts.
This set is actually a refactor, we're bacporting this set because following problems can be fixed:
https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts9... https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircu...
See details in patch 6.
I'll suggest people to upgrade their kernel to v6.6+, please let me know if anyone really want this set for lower version.
Benjamin Marzinski (1): md/raid5: recheck if reshape has finished with device_lock held
Yu Kuai (5): md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() md: add a new callback pers->bitmap_sector() md/raid5: implement pers->bitmap_sector() md/md-bitmap: move bitmap_{start, end}write to md upper layer
drivers/md/md-bitmap.c | 75 ++++++++++------- drivers/md/md-bitmap.h | 6 +- drivers/md/md.c | 26 ++++++ drivers/md/md.h | 5 ++ drivers/md/raid1.c | 35 ++------ drivers/md/raid1.h | 1 - drivers/md/raid10.c | 26 +----- drivers/md/raid10.h | 1 - drivers/md/raid5-cache.c | 4 - drivers/md/raid5.c | 174 ++++++++++++++++++++++----------------- drivers/md/raid5.h | 4 - 11 files changed, 185 insertions(+), 172 deletions(-)
From: Benjamin Marzinski bmarzins@redhat.com
commit 25b3a8237a03ec0b67b965b52d74862e77ef7115 upstream.
When handling an IO request, MD checks if a reshape is currently happening, and if so, where the IO sector is in relation to the reshape progress. MD uses conf->reshape_progress for both of these tasks. When the reshape finishes, conf->reshape_progress is set to MaxSector. If this occurs after MD checks if the reshape is currently happening but before it calls ahead_of_reshape(), then ahead_of_reshape() will end up comparing the IO sector against MaxSector. During a backwards reshape, this will make MD think the IO sector is in the area not yet reshaped, causing it to use the previous configuration, and map the IO to the sector where that data was before the reshape.
This bug can be triggered by running the lvm2 lvconvert-raid-reshape-linear_to_raid6-single-type.sh test in a loop, although it's very hard to reproduce.
Fix this by factoring the code that checks where the IO sector is in relation to the reshape out to a helper called get_reshape_loc(), which reads reshape_progress and reshape_safe while holding the device_lock, and then rechecks if the reshape has finished before calling ahead_of_reshape with the saved values.
Also use the helper during the REQ_NOWAIT check to see if the location is inside of the reshape region.
Fixes: fef9c61fdfabf ("md/raid5: change reshape-progress measurement to cope with reshaping backwards.") Signed-off-by: Benjamin Marzinski bmarzins@redhat.com Signed-off-by: Song Liu song@kernel.org Link: https://lore.kernel.org/r/20240702151802.1632010-1-bmarzins@redhat.com Signed-off-by: Yu Kuai yukuai3@huawei.com --- drivers/md/raid5.c | 64 +++++++++++++++++++++++++++++----------------- 1 file changed, 41 insertions(+), 23 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2c7f11e57667..3923063eada9 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5972,6 +5972,39 @@ static bool reshape_disabled(struct mddev *mddev) return is_md_suspended(mddev) || !md_is_rdwr(mddev); }
+enum reshape_loc { + LOC_NO_RESHAPE, + LOC_AHEAD_OF_RESHAPE, + LOC_INSIDE_RESHAPE, + LOC_BEHIND_RESHAPE, +}; + +static enum reshape_loc get_reshape_loc(struct mddev *mddev, + struct r5conf *conf, sector_t logical_sector) +{ + sector_t reshape_progress, reshape_safe; + /* + * Spinlock is needed as reshape_progress may be + * 64bit on a 32bit platform, and so it might be + * possible to see a half-updated value + * Of course reshape_progress could change after + * the lock is dropped, so once we get a reference + * to the stripe that we think it is, we will have + * to check again. + */ + spin_lock_irq(&conf->device_lock); + reshape_progress = conf->reshape_progress; + reshape_safe = conf->reshape_safe; + spin_unlock_irq(&conf->device_lock); + if (reshape_progress == MaxSector) + return LOC_NO_RESHAPE; + if (ahead_of_reshape(mddev, logical_sector, reshape_progress)) + return LOC_AHEAD_OF_RESHAPE; + if (ahead_of_reshape(mddev, logical_sector, reshape_safe)) + return LOC_INSIDE_RESHAPE; + return LOC_BEHIND_RESHAPE; +} + static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, sector_t logical_sector, struct bio *bi) @@ -5986,28 +6019,14 @@ static enum stripe_result make_stripe_request(struct mddev *mddev, seq = read_seqcount_begin(&conf->gen_lock);
if (unlikely(conf->reshape_progress != MaxSector)) { - /* - * Spinlock is needed as reshape_progress may be - * 64bit on a 32bit platform, and so it might be - * possible to see a half-updated value - * Of course reshape_progress could change after - * the lock is dropped, so once we get a reference - * to the stripe that we think it is, we will have - * to check again. - */ - spin_lock_irq(&conf->device_lock); - if (ahead_of_reshape(mddev, logical_sector, - conf->reshape_progress)) { - previous = 1; - } else { - if (ahead_of_reshape(mddev, logical_sector, - conf->reshape_safe)) { - spin_unlock_irq(&conf->device_lock); - ret = STRIPE_SCHEDULE_AND_RETRY; - goto out; - } + enum reshape_loc loc = get_reshape_loc(mddev, conf, + logical_sector); + if (loc == LOC_INSIDE_RESHAPE) { + ret = STRIPE_SCHEDULE_AND_RETRY; + goto out; } - spin_unlock_irq(&conf->device_lock); + if (loc == LOC_AHEAD_OF_RESHAPE) + previous = 1; }
new_sector = raid5_compute_sector(conf, logical_sector, previous, @@ -6189,8 +6208,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) /* Bail out if conflicts with reshape and REQ_NOWAIT is set */ if ((bi->bi_opf & REQ_NOWAIT) && (conf->reshape_progress != MaxSector) && - !ahead_of_reshape(mddev, logical_sector, conf->reshape_progress) && - ahead_of_reshape(mddev, logical_sector, conf->reshape_safe)) { + get_reshape_loc(mddev, conf, logical_sector) == LOC_INSIDE_RESHAPE) { bio_wouldblock_error(bi); if (rw == WRITE) md_write_end(mddev);
From: Yu Kuai yukuai3@huawei.com
commit 08c50142a128dcb2d7060aa3b4c5db8837f7a46a upstream.
behind_write is only used in raid1, prepare to refactor bitmap_{start/end}write(), there are no functional changes.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Xiao Ni xni@redhat.com Link: https://lore.kernel.org/r/20250109015145.158868-2-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org [There is no bitmap_operations, resolve conflicts by exporting new api md_bitmap_{start,end}_behind_write] Signed-off-by: Yu Kuai yukuai3@huawei.com --- drivers/md/md-bitmap.c | 60 +++++++++++++++++++++++++--------------- drivers/md/md-bitmap.h | 6 ++-- drivers/md/raid1.c | 11 +++++--- drivers/md/raid10.c | 5 ++-- drivers/md/raid5-cache.c | 4 +-- drivers/md/raid5.c | 13 ++++----- 6 files changed, 59 insertions(+), 40 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index ba63076cd8f2..6cd50ab69c2a 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -1465,22 +1465,12 @@ __acquires(bitmap->lock) &(bitmap->bp[page].map[pageoff]); }
-int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors, int behind) +int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, + unsigned long sectors) { if (!bitmap) return 0;
- if (behind) { - int bw; - atomic_inc(&bitmap->behind_writes); - bw = atomic_read(&bitmap->behind_writes); - if (bw > bitmap->behind_writes_used) - bitmap->behind_writes_used = bw; - - pr_debug("inc write-behind count %d/%lu\n", - bw, bitmap->mddev->bitmap_info.max_write_behind); - } - while (sectors) { sector_t blocks; bitmap_counter_t *bmc; @@ -1527,20 +1517,13 @@ int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long s } return 0; } -EXPORT_SYMBOL(md_bitmap_startwrite); +EXPORT_SYMBOL_GPL(md_bitmap_startwrite);
void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, - unsigned long sectors, int success, int behind) + unsigned long sectors, int success) { if (!bitmap) return; - if (behind) { - if (atomic_dec_and_test(&bitmap->behind_writes)) - wake_up(&bitmap->behind_wait); - pr_debug("dec write-behind count %d/%lu\n", - atomic_read(&bitmap->behind_writes), - bitmap->mddev->bitmap_info.max_write_behind); - }
while (sectors) { sector_t blocks; @@ -1580,7 +1563,7 @@ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, sectors = 0; } } -EXPORT_SYMBOL(md_bitmap_endwrite); +EXPORT_SYMBOL_GPL(md_bitmap_endwrite);
static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded) @@ -1842,6 +1825,39 @@ void md_bitmap_free(struct bitmap *bitmap) } EXPORT_SYMBOL(md_bitmap_free);
+void md_bitmap_start_behind_write(struct mddev *mddev) +{ + struct bitmap *bitmap = mddev->bitmap; + int bw; + + if (!bitmap) + return; + + atomic_inc(&bitmap->behind_writes); + bw = atomic_read(&bitmap->behind_writes); + if (bw > bitmap->behind_writes_used) + bitmap->behind_writes_used = bw; + + pr_debug("inc write-behind count %d/%lu\n", + bw, bitmap->mddev->bitmap_info.max_write_behind); +} +EXPORT_SYMBOL_GPL(md_bitmap_start_behind_write); + +void md_bitmap_end_behind_write(struct mddev *mddev) +{ + struct bitmap *bitmap = mddev->bitmap; + + if (!bitmap) + return; + + if (atomic_dec_and_test(&bitmap->behind_writes)) + wake_up(&bitmap->behind_wait); + pr_debug("dec write-behind count %d/%lu\n", + atomic_read(&bitmap->behind_writes), + bitmap->mddev->bitmap_info.max_write_behind); +} +EXPORT_SYMBOL_GPL(md_bitmap_end_behind_write); + void md_bitmap_wait_behind_writes(struct mddev *mddev) { struct bitmap *bitmap = mddev->bitmap; diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index bb9eb418780a..cc5e0b49b0b5 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -253,9 +253,11 @@ void md_bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long
/* these are exported */ int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, - unsigned long sectors, int behind); + unsigned long sectors); void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, - unsigned long sectors, int success, int behind); + unsigned long sectors, int success); +void md_bitmap_start_behind_write(struct mddev *mddev); +void md_bitmap_end_behind_write(struct mddev *mddev); int md_bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded); void md_bitmap_end_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int aborted); void md_bitmap_close_sync(struct bitmap *bitmap); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index cc02e7ec72c0..ae3cafa415f2 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -419,11 +419,12 @@ static void close_write(struct r1bio *r1_bio) bio_put(r1_bio->behind_master_bio); r1_bio->behind_master_bio = NULL; } + if (test_bit(R1BIO_BehindIO, &r1_bio->state)) + md_bitmap_end_behind_write(r1_bio->mddev); /* clear the bitmap if all writes complete successfully */ md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, r1_bio->sectors, - !test_bit(R1BIO_Degraded, &r1_bio->state), - test_bit(R1BIO_BehindIO, &r1_bio->state)); + !test_bit(R1BIO_Degraded, &r1_bio->state)); md_write_end(r1_bio->mddev); }
@@ -1530,8 +1531,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, alloc_behind_master_bio(r1_bio, bio); }
- md_bitmap_startwrite(bitmap, r1_bio->sector, r1_bio->sectors, - test_bit(R1BIO_BehindIO, &r1_bio->state)); + if (test_bit(R1BIO_BehindIO, &r1_bio->state)) + md_bitmap_start_behind_write(mddev); + md_bitmap_startwrite(bitmap, r1_bio->sector, + r1_bio->sectors); first_clone = 0; }
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 023413120851..7033cbff61cf 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -430,8 +430,7 @@ static void close_write(struct r10bio *r10_bio) /* clear the bitmap if all writes complete successfully */ md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, r10_bio->sectors, - !test_bit(R10BIO_Degraded, &r10_bio->state), - 0); + !test_bit(R10BIO_Degraded, &r10_bio->state)); md_write_end(r10_bio->mddev); }
@@ -1554,7 +1553,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, md_account_bio(mddev, &bio); r10_bio->master_bio = bio; atomic_set(&r10_bio->remaining, 1); - md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0); + md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors);
for (i = 0; i < conf->copies; i++) { if (r10_bio->devs[i].bio) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 889bba60d6ff..763bf0dcead8 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -315,8 +315,8 @@ void r5c_handle_cached_data_endio(struct r5conf *conf, r5c_return_dev_pending_writes(conf, &sh->dev[i]); md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, &sh->state), - 0); + !test_bit(STRIPE_DEGRADED, + &sh->state)); } } } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 3923063eada9..3484d649610d 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3606,7 +3606,7 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi, set_bit(STRIPE_BITMAP_PENDING, &sh->state); spin_unlock_irq(&sh->stripe_lock); md_bitmap_startwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), 0); + RAID5_STRIPE_SECTORS(conf)); spin_lock_irq(&sh->stripe_lock); clear_bit(STRIPE_BITMAP_PENDING, &sh->state); if (!sh->batch_head) { @@ -3708,7 +3708,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, } if (bitmap_end) md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), 0, 0); + RAID5_STRIPE_SECTORS(conf), 0); bitmap_end = 0; /* and fail all 'written' */ bi = sh->dev[i].written; @@ -3754,7 +3754,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, } if (bitmap_end) md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), 0, 0); + RAID5_STRIPE_SECTORS(conf), 0); /* If we were in the middle of a write the parity block might * still be locked - so just clear all R5_LOCKED flags */ @@ -4107,8 +4107,8 @@ static void handle_stripe_clean_event(struct r5conf *conf, } md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, &sh->state), - 0); + !test_bit(STRIPE_DEGRADED, + &sh->state)); if (head_sh->batch_head) { sh = list_first_entry(&sh->batch_list, struct stripe_head, @@ -5853,8 +5853,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi) d++) md_bitmap_startwrite(mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), - 0); + RAID5_STRIPE_SECTORS(conf)); sh->bm_seq = conf->seq_flush + 1; set_bit(STRIPE_BIT_DELAY, &sh->state); }
From: Yu Kuai yukuai3@huawei.com
commit 4f0e7d0e03b7b80af84759a9e7cfb0f81ac4adae upstream.
For the case that IO failed for one rdev, the bit will be mark as NEEDED in following cases:
1) If badblocks is set and rdev is not faulty; 2) If rdev is faulty;
Case 1) is useless because synchronize data to badblocks make no sense. Case 2) can be replaced with mddev->degraded.
Also remove R1BIO_Degraded, R10BIO_Degraded and STRIPE_DEGRADED since case 2) no longer use them.
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20250109015145.158868-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org [ Resolve minor conflicts ] Signed-off-by: Yu Kuai yukuai3@huawei.com --- drivers/md/md-bitmap.c | 19 ++++++++++--------- drivers/md/md-bitmap.h | 2 +- drivers/md/raid1.c | 27 +++------------------------ drivers/md/raid1.h | 1 - drivers/md/raid10.c | 23 +++-------------------- drivers/md/raid10.h | 1 - drivers/md/raid5-cache.c | 4 +--- drivers/md/raid5.c | 14 +++----------- drivers/md/raid5.h | 1 - 9 files changed, 21 insertions(+), 71 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 6cd50ab69c2a..1bb99102f7cc 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -1520,7 +1520,7 @@ int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, EXPORT_SYMBOL_GPL(md_bitmap_startwrite);
void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, - unsigned long sectors, int success) + unsigned long sectors) { if (!bitmap) return; @@ -1537,15 +1537,16 @@ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, return; }
- if (success && !bitmap->mddev->degraded && - bitmap->events_cleared < bitmap->mddev->events) { - bitmap->events_cleared = bitmap->mddev->events; - bitmap->need_sync = 1; - sysfs_notify_dirent_safe(bitmap->sysfs_can_clear); - } - - if (!success && !NEEDED(*bmc)) + if (!bitmap->mddev->degraded) { + if (bitmap->events_cleared < bitmap->mddev->events) { + bitmap->events_cleared = bitmap->mddev->events; + bitmap->need_sync = 1; + sysfs_notify_dirent_safe( + bitmap->sysfs_can_clear); + } + } else if (!NEEDED(*bmc)) { *bmc |= NEEDED_MASK; + }
if (COUNTER(*bmc) == COUNTER_MAX) wake_up(&bitmap->overflow_wait); diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h index cc5e0b49b0b5..8b89e260a93b 100644 --- a/drivers/md/md-bitmap.h +++ b/drivers/md/md-bitmap.h @@ -255,7 +255,7 @@ void md_bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors); void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, - unsigned long sectors, int success); + unsigned long sectors); void md_bitmap_start_behind_write(struct mddev *mddev); void md_bitmap_end_behind_write(struct mddev *mddev); int md_bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index ae3cafa415f2..b5601acc810f 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -423,8 +423,7 @@ static void close_write(struct r1bio *r1_bio) md_bitmap_end_behind_write(r1_bio->mddev); /* clear the bitmap if all writes complete successfully */ md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, - r1_bio->sectors, - !test_bit(R1BIO_Degraded, &r1_bio->state)); + r1_bio->sectors); md_write_end(r1_bio->mddev); }
@@ -481,8 +480,6 @@ static void raid1_end_write_request(struct bio *bio) if (!test_bit(Faulty, &rdev->flags)) set_bit(R1BIO_WriteError, &r1_bio->state); else { - /* Fail the request */ - set_bit(R1BIO_Degraded, &r1_bio->state); /* Finished with this branch */ r1_bio->bios[mirror] = NULL; to_put = bio; @@ -1415,11 +1412,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, break; } r1_bio->bios[i] = NULL; - if (!rdev || test_bit(Faulty, &rdev->flags)) { - if (i < conf->raid_disks) - set_bit(R1BIO_Degraded, &r1_bio->state); + if (!rdev || test_bit(Faulty, &rdev->flags)) continue; - }
atomic_inc(&rdev->nr_pending); if (test_bit(WriteErrorSeen, &rdev->flags)) { @@ -1445,16 +1439,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, */ max_sectors = bad_sectors; rdev_dec_pending(rdev, mddev); - /* We don't set R1BIO_Degraded as that - * only applies if the disk is - * missing, so it might be re-added, - * and we want to know to recover this - * chunk. - * In this case the device is here, - * and the fact that this chunk is not - * in-sync is recorded in the bad - * block log - */ continue; } if (is_bad) { @@ -2479,12 +2463,9 @@ static void handle_write_finished(struct r1conf *conf, struct r1bio *r1_bio) * errors. */ fail = true; - if (!narrow_write_error(r1_bio, m)) { + if (!narrow_write_error(r1_bio, m)) md_error(conf->mddev, conf->mirrors[m].rdev); - /* an I/O failed, we can't clear the bitmap */ - set_bit(R1BIO_Degraded, &r1_bio->state); - } rdev_dec_pending(conf->mirrors[m].rdev, conf->mddev); } @@ -2576,8 +2557,6 @@ static void raid1d(struct md_thread *thread) list_del(&r1_bio->retry_list); idx = sector_to_idx(r1_bio->sector); atomic_dec(&conf->nr_queued[idx]); - if (mddev->degraded) - set_bit(R1BIO_Degraded, &r1_bio->state); if (test_bit(R1BIO_WriteError, &r1_bio->state)) close_write(r1_bio); raid_end_bio_io(r1_bio); diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h index 14d4211a123a..44f2390a8866 100644 --- a/drivers/md/raid1.h +++ b/drivers/md/raid1.h @@ -187,7 +187,6 @@ struct r1bio { enum r1bio_state { R1BIO_Uptodate, R1BIO_IsSync, - R1BIO_Degraded, R1BIO_BehindIO, /* Set ReadError on bios that experience a readerror so that * raid1d knows what to do with them. diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 7033cbff61cf..0b04ae46b52e 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -429,8 +429,7 @@ static void close_write(struct r10bio *r10_bio) { /* clear the bitmap if all writes complete successfully */ md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, - r10_bio->sectors, - !test_bit(R10BIO_Degraded, &r10_bio->state)); + r10_bio->sectors); md_write_end(r10_bio->mddev); }
@@ -500,7 +499,6 @@ static void raid10_end_write_request(struct bio *bio) set_bit(R10BIO_WriteError, &r10_bio->state); else { /* Fail the request */ - set_bit(R10BIO_Degraded, &r10_bio->state); r10_bio->devs[slot].bio = NULL; to_put = bio; dec_rdev = 1; @@ -1489,10 +1487,8 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, r10_bio->devs[i].bio = NULL; r10_bio->devs[i].repl_bio = NULL;
- if (!rdev && !rrdev) { - set_bit(R10BIO_Degraded, &r10_bio->state); + if (!rdev && !rrdev) continue; - } if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) { sector_t first_bad; sector_t dev_sector = r10_bio->devs[i].addr; @@ -1509,14 +1505,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, * to other devices yet */ max_sectors = bad_sectors; - /* We don't set R10BIO_Degraded as that - * only applies if the disk is missing, - * so it might be re-added, and we want to - * know to recover this chunk. - * In this case the device is here, and the - * fact that this chunk is not in-sync is - * recorded in the bad block log. - */ continue; } if (is_bad) { @@ -3062,11 +3050,8 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio) rdev_dec_pending(rdev, conf->mddev); } else if (bio != NULL && bio->bi_status) { fail = true; - if (!narrow_write_error(r10_bio, m)) { + if (!narrow_write_error(r10_bio, m)) md_error(conf->mddev, rdev); - set_bit(R10BIO_Degraded, - &r10_bio->state); - } rdev_dec_pending(rdev, conf->mddev); } bio = r10_bio->devs[m].repl_bio; @@ -3125,8 +3110,6 @@ static void raid10d(struct md_thread *thread) r10_bio = list_first_entry(&tmp, struct r10bio, retry_list); list_del(&r10_bio->retry_list); - if (mddev->degraded) - set_bit(R10BIO_Degraded, &r10_bio->state);
if (test_bit(R10BIO_WriteError, &r10_bio->state)) diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h index 2e75e88d0802..3f16ad6904a9 100644 --- a/drivers/md/raid10.h +++ b/drivers/md/raid10.h @@ -161,7 +161,6 @@ enum r10bio_state { R10BIO_IsSync, R10BIO_IsRecover, R10BIO_IsReshape, - R10BIO_Degraded, /* Set ReadError on bios that experience a read error * so that raid10d knows what to do with them. */ diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 763bf0dcead8..8a0c8e78891f 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -314,9 +314,7 @@ void r5c_handle_cached_data_endio(struct r5conf *conf, set_bit(R5_UPTODATE, &sh->dev[i].flags); r5c_return_dev_pending_writes(conf, &sh->dev[i]); md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, - &sh->state)); + RAID5_STRIPE_SECTORS(conf)); } } } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 3484d649610d..b2d0f35eec63 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1359,8 +1359,6 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) submit_bio_noacct(rbi); } if (!rdev && !rrdev) { - if (op_is_write(op)) - set_bit(STRIPE_DEGRADED, &sh->state); pr_debug("skip op %d on disc %d for sector %llu\n", bi->bi_opf, i, (unsigned long long)sh->sector); clear_bit(R5_LOCKED, &sh->dev[i].flags); @@ -2925,7 +2923,6 @@ static void raid5_end_write_request(struct bio *bi) set_bit(R5_MadeGoodRepl, &sh->dev[i].flags); } else { if (bi->bi_status) { - set_bit(STRIPE_DEGRADED, &sh->state); set_bit(WriteErrorSeen, &rdev->flags); set_bit(R5_WriteError, &sh->dev[i].flags); if (!test_and_set_bit(WantReplacement, &rdev->flags)) @@ -3708,7 +3705,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, } if (bitmap_end) md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), 0); + RAID5_STRIPE_SECTORS(conf)); bitmap_end = 0; /* and fail all 'written' */ bi = sh->dev[i].written; @@ -3754,7 +3751,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, } if (bitmap_end) md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), 0); + RAID5_STRIPE_SECTORS(conf)); /* If we were in the middle of a write the parity block might * still be locked - so just clear all R5_LOCKED flags */ @@ -4106,9 +4103,7 @@ static void handle_stripe_clean_event(struct r5conf *conf, wbi = wbi2; } md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf), - !test_bit(STRIPE_DEGRADED, - &sh->state)); + RAID5_STRIPE_SECTORS(conf)); if (head_sh->batch_head) { sh = list_first_entry(&sh->batch_list, struct stripe_head, @@ -4385,7 +4380,6 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh, s->locked++; set_bit(R5_Wantwrite, &dev->flags);
- clear_bit(STRIPE_DEGRADED, &sh->state); set_bit(STRIPE_INSYNC, &sh->state); break; case check_state_run: @@ -4542,7 +4536,6 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh, clear_bit(R5_Wantwrite, &dev->flags); s->locked--; } - clear_bit(STRIPE_DEGRADED, &sh->state);
set_bit(STRIPE_INSYNC, &sh->state); break; @@ -4951,7 +4944,6 @@ static void break_stripe_batch_list(struct stripe_head *head_sh,
set_mask_bits(&sh->state, ~(STRIPE_EXPAND_SYNC_FLAGS | (1 << STRIPE_PREREAD_ACTIVE) | - (1 << STRIPE_DEGRADED) | (1 << STRIPE_ON_UNPLUG_LIST)), head_sh->state & (1 << STRIPE_INSYNC));
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 97a795979a35..80948057b877 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -358,7 +358,6 @@ enum { STRIPE_REPLACED, STRIPE_PREREAD_ACTIVE, STRIPE_DELAYED, - STRIPE_DEGRADED, STRIPE_BIT_DELAY, STRIPE_EXPANDING, STRIPE_EXPAND_SOURCE,
From: Yu Kuai yukuai3@huawei.com
commit 0c984a283a3ea3f10bebecd6c57c1d41b2e4f518 upstream.
This callback will be used in raid5 to convert io ranges from array to bitmap.
Signed-off-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Xiao Ni xni@redhat.com Link: https://lore.kernel.org/r/20250109015145.158868-4-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org --- drivers/md/md.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/md/md.h b/drivers/md/md.h index 7c9c13abd7ca..f395f4562bb9 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -661,6 +661,9 @@ struct md_personality void *(*takeover) (struct mddev *mddev); /* Changes the consistency policy of an active array. */ int (*change_consistency_policy)(struct mddev *mddev, const char *buf); + /* convert io ranges from array to bitmap */ + void (*bitmap_sector)(struct mddev *mddev, sector_t *offset, + unsigned long *sectors); };
struct md_sysfs_entry {
From: Yu Kuai yukuai3@huawei.com
commit 9c89f604476cf15c31fbbdb043cff7fbf1dbe0cb upstream.
Bitmap is used for the whole array for raid1/raid10, hence IO for the array can be used directly for bitmap. However, bitmap is used for underlying disks for raid5, hence IO for the array can't be used directly for bitmap.
Implement pers->bitmap_sector() for raid5 to convert IO ranges from the array to the underlying disks.
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20250109015145.158868-5-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org [ Resolve minor conflicts ] Signed-off-by: Yu Kuai yukuai3@huawei.com --- drivers/md/raid5.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b2d0f35eec63..0918386bb8ea 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5996,6 +5996,54 @@ static enum reshape_loc get_reshape_loc(struct mddev *mddev, return LOC_BEHIND_RESHAPE; }
+static void raid5_bitmap_sector(struct mddev *mddev, sector_t *offset, + unsigned long *sectors) +{ + struct r5conf *conf = mddev->private; + sector_t start = *offset; + sector_t end = start + *sectors; + sector_t prev_start = start; + sector_t prev_end = end; + int sectors_per_chunk; + enum reshape_loc loc; + int dd_idx; + + sectors_per_chunk = conf->chunk_sectors * + (conf->raid_disks - conf->max_degraded); + start = round_down(start, sectors_per_chunk); + end = round_up(end, sectors_per_chunk); + + start = raid5_compute_sector(conf, start, 0, &dd_idx, NULL); + end = raid5_compute_sector(conf, end, 0, &dd_idx, NULL); + + /* + * For LOC_INSIDE_RESHAPE, this IO will wait for reshape to make + * progress, hence it's the same as LOC_BEHIND_RESHAPE. + */ + loc = get_reshape_loc(mddev, conf, prev_start); + if (likely(loc != LOC_AHEAD_OF_RESHAPE)) { + *offset = start; + *sectors = end - start; + return; + } + + sectors_per_chunk = conf->prev_chunk_sectors * + (conf->previous_raid_disks - conf->max_degraded); + prev_start = round_down(prev_start, sectors_per_chunk); + prev_end = round_down(prev_end, sectors_per_chunk); + + prev_start = raid5_compute_sector(conf, prev_start, 1, &dd_idx, NULL); + prev_end = raid5_compute_sector(conf, prev_end, 1, &dd_idx, NULL); + + /* + * for LOC_AHEAD_OF_RESHAPE, reshape can make progress before this IO + * is handled in make_stripe_request(), we can't know this here hence + * we set bits for both. + */ + *offset = min(start, prev_start); + *sectors = max(end, prev_end) - *offset; +} + static enum stripe_result make_stripe_request(struct mddev *mddev, struct r5conf *conf, struct stripe_request_ctx *ctx, sector_t logical_sector, struct bio *bi) @@ -9099,6 +9147,7 @@ static struct md_personality raid6_personality = .quiesce = raid5_quiesce, .takeover = raid6_takeover, .change_consistency_policy = raid5_change_consistency_policy, + .bitmap_sector = raid5_bitmap_sector, }; static struct md_personality raid5_personality = { @@ -9124,6 +9173,7 @@ static struct md_personality raid5_personality = .quiesce = raid5_quiesce, .takeover = raid5_takeover, .change_consistency_policy = raid5_change_consistency_policy, + .bitmap_sector = raid5_bitmap_sector, };
static struct md_personality raid4_personality = @@ -9150,6 +9200,7 @@ static struct md_personality raid4_personality = .quiesce = raid5_quiesce, .takeover = raid4_takeover, .change_consistency_policy = raid5_change_consistency_policy, + .bitmap_sector = raid5_bitmap_sector, };
static int __init raid5_init(void)
From: Yu Kuai yukuai3@huawei.com
commit cd5fc653381811f1e0ba65f5d169918cab61476f upstream.
There are two BUG reports that raid5 will hang at bitmap_startwrite([1],[2]), root cause is that bitmap start write and end write is unbalanced, it's not quite clear where, and while reviewing raid5 code, it's found that bitmap operations can be optimized. For example, for a 4 disks raid5, with chunksize=8k, if user issue a IO (0 + 48k) to the array:
┌────────────────────────────────────────────────────────────┐ │chunk 0 │ │ ┌────────────┬─────────────┬─────────────┬────────────┼ │ sh0 │A0: 0 + 4k │A1: 8k + 4k │A2: 16k + 4k │A3: P │ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh1 │B0: 4k + 4k │B1: 12k + 4k │B2: 20k + 4k │B3: P │ ┼──────┴────────────┴─────────────┴─────────────┴────────────┼ │chunk 1 │ │ ┌────────────┬─────────────┬─────────────┬────────────┤ │ sh2 │C0: 24k + 4k│C1: 32k + 4k │C2: P │C3: 40k + 4k│ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh3 │D0: 28k + 4k│D1: 36k + 4k │D2: P │D3: 44k + 4k│ └──────┴────────────┴─────────────┴─────────────┴────────────┘
Before this patch, 4 stripe head will be used, and each sh will attach bio for 3 disks, and each attached bio will trigger bitmap_startwrite() once, which means total 12 times. - 3 times (0 + 4k), for (A0, A1 and A2) - 3 times (4 + 4k), for (B0, B1 and B2) - 3 times (8 + 4k), for (C0, C1 and C3) - 3 times (12 + 4k), for (D0, D1 and D3)
After this patch, md upper layer will calculate that IO range (0 + 48k) is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite() just once.
Noted that this patch will align bitmap ranges to the chunks, for example, if user issue a IO (0 + 4k) to array:
- Before this patch, 1 time (0 + 4k), for A0; - After this patch, 1 time (0 + 8k) for chunk 0;
Usually, one bitmap bit will represent more than one disk chunk, and this doesn't have any difference. And even if user really created a array that one chunk contain multiple bits, the overhead is that more data will be recovered after power failure.
Also remove STRIPE_BITMAP_PENDING since it's not used anymore.
[1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts9... [2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircu...
Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org [There is no bitmap_operations, resolve conflicts by replacing bitmap_ops->{startwrite, endwrite} with md_bitmap_{startwrite, endwrite}] Signed-off-by: Yu Kuai yukuai3@huawei.com --- drivers/md/md-bitmap.c | 2 -- drivers/md/md.c | 26 +++++++++++++++++++++ drivers/md/md.h | 2 ++ drivers/md/raid1.c | 5 ---- drivers/md/raid10.c | 4 ---- drivers/md/raid5-cache.c | 2 -- drivers/md/raid5.c | 50 ++++------------------------------------ drivers/md/raid5.h | 3 --- 8 files changed, 33 insertions(+), 61 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index 1bb99102f7cc..2085b1705f14 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -1517,7 +1517,6 @@ int md_bitmap_startwrite(struct bitmap *bitmap, sector_t offset, } return 0; } -EXPORT_SYMBOL_GPL(md_bitmap_startwrite);
void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors) @@ -1564,7 +1563,6 @@ void md_bitmap_endwrite(struct bitmap *bitmap, sector_t offset, sectors = 0; } } -EXPORT_SYMBOL_GPL(md_bitmap_endwrite);
static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int degraded) diff --git a/drivers/md/md.c b/drivers/md/md.c index d1f6770c5cc0..9bc19a5a4119 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8713,12 +8713,32 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev, } EXPORT_SYMBOL_GPL(md_submit_discard_bio);
+static void md_bitmap_start(struct mddev *mddev, + struct md_io_clone *md_io_clone) +{ + if (mddev->pers->bitmap_sector) + mddev->pers->bitmap_sector(mddev, &md_io_clone->offset, + &md_io_clone->sectors); + + md_bitmap_startwrite(mddev->bitmap, md_io_clone->offset, + md_io_clone->sectors); +} + +static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone) +{ + md_bitmap_endwrite(mddev->bitmap, md_io_clone->offset, + md_io_clone->sectors); +} + static void md_end_clone_io(struct bio *bio) { struct md_io_clone *md_io_clone = bio->bi_private; struct bio *orig_bio = md_io_clone->orig_bio; struct mddev *mddev = md_io_clone->mddev;
+ if (bio_data_dir(orig_bio) == WRITE && mddev->bitmap) + md_bitmap_end(mddev, md_io_clone); + if (bio->bi_status && !orig_bio->bi_status) orig_bio->bi_status = bio->bi_status;
@@ -8743,6 +8763,12 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio) if (blk_queue_io_stat(bdev->bd_disk->queue)) md_io_clone->start_time = bio_start_io_acct(*bio);
+ if (bio_data_dir(*bio) == WRITE && mddev->bitmap) { + md_io_clone->offset = (*bio)->bi_iter.bi_sector; + md_io_clone->sectors = bio_sectors(*bio); + md_bitmap_start(mddev, md_io_clone); + } + clone->bi_end_io = md_end_clone_io; clone->bi_private = md_io_clone; *bio = clone; diff --git a/drivers/md/md.h b/drivers/md/md.h index f395f4562bb9..f29fa8650cd0 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -746,6 +746,8 @@ struct md_io_clone { struct mddev *mddev; struct bio *orig_bio; unsigned long start_time; + sector_t offset; + unsigned long sectors; struct bio bio_clone; };
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index b5601acc810f..65309da1dca3 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -421,9 +421,6 @@ static void close_write(struct r1bio *r1_bio) } if (test_bit(R1BIO_BehindIO, &r1_bio->state)) md_bitmap_end_behind_write(r1_bio->mddev); - /* clear the bitmap if all writes complete successfully */ - md_bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector, - r1_bio->sectors); md_write_end(r1_bio->mddev); }
@@ -1517,8 +1514,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
if (test_bit(R1BIO_BehindIO, &r1_bio->state)) md_bitmap_start_behind_write(mddev); - md_bitmap_startwrite(bitmap, r1_bio->sector, - r1_bio->sectors); first_clone = 0; }
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 0b04ae46b52e..c300fd609ef0 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -427,9 +427,6 @@ static void raid10_end_read_request(struct bio *bio)
static void close_write(struct r10bio *r10_bio) { - /* clear the bitmap if all writes complete successfully */ - md_bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector, - r10_bio->sectors); md_write_end(r10_bio->mddev); }
@@ -1541,7 +1538,6 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio, md_account_bio(mddev, &bio); r10_bio->master_bio = bio; atomic_set(&r10_bio->remaining, 1); - md_bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors);
for (i = 0; i < conf->copies; i++) { if (r10_bio->devs[i].bio) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 8a0c8e78891f..53f3718c01eb 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -313,8 +313,6 @@ void r5c_handle_cached_data_endio(struct r5conf *conf, if (sh->dev[i].written) { set_bit(R5_UPTODATE, &sh->dev[i].flags); r5c_return_dev_pending_writes(conf, &sh->dev[i]); - md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf)); } } } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0918386bb8ea..f69e4a6a8a59 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -905,7 +905,6 @@ static bool stripe_can_batch(struct stripe_head *sh) if (raid5_has_log(conf) || raid5_has_ppl(conf)) return false; return test_bit(STRIPE_BATCH_READY, &sh->state) && - !test_bit(STRIPE_BITMAP_PENDING, &sh->state) && is_full_stripe_write(sh); }
@@ -3587,29 +3586,9 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi, (*bip)->bi_iter.bi_sector, sh->sector, dd_idx, sh->dev[dd_idx].sector);
- if (conf->mddev->bitmap && firstwrite) { - /* Cannot hold spinlock over bitmap_startwrite, - * but must ensure this isn't added to a batch until - * we have added to the bitmap and set bm_seq. - * So set STRIPE_BITMAP_PENDING to prevent - * batching. - * If multiple __add_stripe_bio() calls race here they - * much all set STRIPE_BITMAP_PENDING. So only the first one - * to complete "bitmap_startwrite" gets to set - * STRIPE_BIT_DELAY. This is important as once a stripe - * is added to a batch, STRIPE_BIT_DELAY cannot be changed - * any more. - */ - set_bit(STRIPE_BITMAP_PENDING, &sh->state); - spin_unlock_irq(&sh->stripe_lock); - md_bitmap_startwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf)); - spin_lock_irq(&sh->stripe_lock); - clear_bit(STRIPE_BITMAP_PENDING, &sh->state); - if (!sh->batch_head) { - sh->bm_seq = conf->seq_flush+1; - set_bit(STRIPE_BIT_DELAY, &sh->state); - } + if (conf->mddev->bitmap && firstwrite && !sh->batch_head) { + sh->bm_seq = conf->seq_flush+1; + set_bit(STRIPE_BIT_DELAY, &sh->state); } }
@@ -3660,7 +3639,6 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, BUG_ON(sh->batch_head); for (i = disks; i--; ) { struct bio *bi; - int bitmap_end = 0;
if (test_bit(R5_ReadError, &sh->dev[i].flags)) { struct md_rdev *rdev; @@ -3687,8 +3665,6 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, sh->dev[i].towrite = NULL; sh->overwrite_disks = 0; spin_unlock_irq(&sh->stripe_lock); - if (bi) - bitmap_end = 1;
log_stripe_write_finished(sh);
@@ -3703,10 +3679,6 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, bio_io_error(bi); bi = nextbi; } - if (bitmap_end) - md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf)); - bitmap_end = 0; /* and fail all 'written' */ bi = sh->dev[i].written; sh->dev[i].written = NULL; @@ -3715,7 +3687,6 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, sh->dev[i].page = sh->dev[i].orig_page; }
- if (bi) bitmap_end = 1; while (bi && bi->bi_iter.bi_sector < sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) { struct bio *bi2 = r5_next_bio(conf, bi, sh->dev[i].sector); @@ -3749,9 +3720,6 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, bi = nextbi; } } - if (bitmap_end) - md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf)); /* If we were in the middle of a write the parity block might * still be locked - so just clear all R5_LOCKED flags */ @@ -4102,8 +4070,7 @@ static void handle_stripe_clean_event(struct r5conf *conf, bio_endio(wbi); wbi = wbi2; } - md_bitmap_endwrite(conf->mddev->bitmap, sh->sector, - RAID5_STRIPE_SECTORS(conf)); + if (head_sh->batch_head) { sh = list_first_entry(&sh->batch_list, struct stripe_head, @@ -4935,8 +4902,7 @@ static void break_stripe_batch_list(struct stripe_head *head_sh, (1 << STRIPE_COMPUTE_RUN) | (1 << STRIPE_DISCARD) | (1 << STRIPE_BATCH_READY) | - (1 << STRIPE_BATCH_ERR) | - (1 << STRIPE_BITMAP_PENDING)), + (1 << STRIPE_BATCH_ERR)), "stripe state: %lx\n", sh->state); WARN_ONCE(head_sh->state & ((1 << STRIPE_DISCARD) | (1 << STRIPE_REPLACED)), @@ -5840,12 +5806,6 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi) } spin_unlock_irq(&sh->stripe_lock); if (conf->mddev->bitmap) { - for (d = 0; - d < conf->raid_disks - conf->max_degraded; - d++) - md_bitmap_startwrite(mddev->bitmap, - sh->sector, - RAID5_STRIPE_SECTORS(conf)); sh->bm_seq = conf->seq_flush + 1; set_bit(STRIPE_BIT_DELAY, &sh->state); } diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index 80948057b877..fd6171553880 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -371,9 +371,6 @@ enum { STRIPE_ON_RELEASE_LIST, STRIPE_BATCH_READY, STRIPE_BATCH_ERR, - STRIPE_BITMAP_PENDING, /* Being added to bitmap, don't add - * to batch yet. - */ STRIPE_LOG_TRAPPED, /* trapped into log (see raid5-cache.c) * this bit is used in two scenarios: *
linux-stable-mirror@lists.linaro.org