On Thu, Mar 10, 2022 at 3:02 PM Jens Axboe axboe@kernel.dk wrote:
On 3/10/22 3:37 PM, Song Liu wrote:
On Thu, Mar 10, 2022 at 2:15 PM Jens Axboe axboe@kernel.dk wrote:
On 3/8/22 11:42 PM, Song Liu wrote:
RAID arrays check/repair operations benefit a lot from merging requests. If we only check the previous entry for merge attempt, many merge will be missed. As a result, significant regression is observed for RAID check and repair.
Fix this by checking more than just the previous entry when plug->multiple_queues == true.
This improves the check/repair speed of a 20-HDD raid6 from 19 MB/s to 103 MB/s.
Do the underlying disks not have an IO scheduler attached? Curious why the merges aren't being done there, would be trivial when the list is flushed out. Because if the perf difference is that big, then other workloads would be suffering they are that sensitive to being within a plug worth of IO.
The disks have mq-deadline by default. I also tried kyber, the result is the same. Raid repair work sends IOs to all the HDDs in a round-robin manner. If we only check the previous request, there isn't much opportunity for merge. I guess other workloads may have different behavior?
Round robin one at the time? I feel like there's something odd or suboptimal with the raid rebuild, if it's that sensitive to plug merging.
It is not one request at a time, but more like (for raid456): read 4kB from HDD1, HDD2, HDD3..., then read another 4kB from HDD1, HDD2, HDD3, ...
Plug merging is mainly meant to reduce the overhead of merging, complement what the scheduler would do. If there's a big drop in performance just by not getting as efficient merging on the plug side, that points to an issue with something else.
We introduced blk_plug_max_rq_count() to give md more opportunities to merge at plug side, so I guess the behavior has been like this for a long time. I will take a look at the scheduler side and see whether we can just merge later, but I am not very optimistic about it.
Thanks, Song