We found an abnormally high latency when executing modprobe raid6_pq, the latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
How to reproduce: - Install cyclictest sudo apt install rt-tests - Run cyclictest example in one terminal sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m - Modprobe raid6_pq in another terminal sudo modprobe raid6_pq
This is caused by ksoftirqd fail to scheduled due to disable preemption, this time is too long and unreasonable.
Reduce high latency by using migrate_disabl()/emigrate_enable() instead of preempt_disable()/preempt_enable(), the latency won't greater than 100us.
This patch beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for CONFIG_PREEMPT_VOLUNTARY=y.
Cc: stable@vger.kernel.org Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions") Fixes: cc4589ebfae6 ("Rename raid6 files now they're in a 'raid6' directory.") Link: https://lore.kernel.org/linux-raid/b06c5e3ef3413f12a2c2b2a241005af9@linux.de... # v1 Signed-off-by: Yajun Deng yajun.deng@linux.dev --- lib/raid6/algos.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c index 6d5e5000fdd7..21611d05c34c 100644 --- a/lib/raid6/algos.c +++ b/lib/raid6/algos.c @@ -162,7 +162,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
perf = 0;
- preempt_disable(); + migrate_disable(); j0 = jiffies; while ((j1 = jiffies) == j0) cpu_relax(); @@ -171,7 +171,7 @@ static inline const struct raid6_calls *raid6_choose_gen( (*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs); perf++; } - preempt_enable(); + migrate_enable();
if (perf > bestgenperf) { bestgenperf = perf; @@ -186,7 +186,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
perf = 0;
- preempt_disable(); + migrate_disable(); j0 = jiffies; while ((j1 = jiffies) == j0) cpu_relax(); @@ -196,7 +196,7 @@ static inline const struct raid6_calls *raid6_choose_gen( PAGE_SIZE, *dptrs); perf++; } - preempt_enable(); + migrate_enable();
if (best == *algo) bestxorperf = perf;
On 2021-12-17 10:16:10 [+0800], Yajun Deng wrote:
We found an abnormally high latency when executing modprobe raid6_pq, the latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
How to reproduce:
- Install cyclictest sudo apt install rt-tests
- Run cyclictest example in one terminal sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m
- Modprobe raid6_pq in another terminal sudo modprobe raid6_pq
This is caused by ksoftirqd fail to scheduled due to disable preemption, this time is too long and unreasonable.
Reduce high latency by using migrate_disabl()/emigrate_enable() instead of preempt_disable()/preempt_enable(), the latency won't greater than 100us.
This patch beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for CONFIG_PREEMPT_VOLUNTARY=y.
Why does it matter? This is only during boot-up/ module loading or do I miss something? The delay is a jiffy so it depends on CONFIG_HZ. You do benchmark for the best algorithm and if you get preempted during that period then your results may be wrong and you make a bad selection.
You can either enable one algorithm and or disable CONFIG_RAID6_PQ_BENCHMARK. I don't see the need for this patch not to mention the stable tree.
Sebastian
On Fri, Dec 17, 2021 at 5:42 AM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
On 2021-12-17 10:16:10 [+0800], Yajun Deng wrote:
We found an abnormally high latency when executing modprobe raid6_pq, the latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
How to reproduce:
- Install cyclictest sudo apt install rt-tests
- Run cyclictest example in one terminal sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m
- Modprobe raid6_pq in another terminal sudo modprobe raid6_pq
This is caused by ksoftirqd fail to scheduled due to disable preemption, this time is too long and unreasonable.
Reduce high latency by using migrate_disabl()/emigrate_enable() instead of preempt_disable()/preempt_enable(), the latency won't greater than 100us.
This patch beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for CONFIG_PREEMPT_VOLUNTARY=y.
Why does it matter? This is only during boot-up/ module loading or do I miss something?
Yes this only happens on boot-up and module loading.I don't know RT well enough to tell whether latency during module loading is an issue.
The delay is a jiffy so it depends on CONFIG_HZ. You do benchmark for the best algorithm and if you get preempted during that period then your results may be wrong and you make a bad selection.
With current code, the delay _should be_ 16 jiffies. However, the experiment hits way longer latencies. I agree this may cause inaccurate benchmark results and thus suboptimal RAID algorithm.
I guess the key question is whether long latency at module loading time matters. If that doesn't matter, we should just drop this.
Thanks, Song
On 2021-12-17 09:25:25 [-0800], Song Liu wrote:
The delay is a jiffy so it depends on CONFIG_HZ. You do benchmark for the best algorithm and if you get preempted during that period then your results may be wrong and you make a bad selection.
With current code, the delay _should be_ 16 jiffies. However, the experiment hits way longer latencies. I agree this may cause inaccurate benchmark results and thus suboptimal RAID algorithm.
Everything less than CONFIG_PREEMPT does not have an explicit requirement for preemption so higher latencies are not unusual. *If* this is a problem on <= PREEMPT_VOLUNTARY then a cond_resched() between loops would be the usual thing to do. But only *if* it is a real problem which I doubt. It is not a preemtible kernel after all…
I guess the key question is whether long latency at module loading time matters. If that doesn't matter, we should just drop this.
Correct. And should this be problematic on PREEMPT_RT then I would restrict CONFIG_RAID6_PQ_BENCHMARK to !PREEMPT_RT.
Thanks, Song
Sebastian
On Fri, Dec 17, 2021 at 10:57 PM Song Liu song@kernel.org wrote:
On Fri, Dec 17, 2021 at 5:42 AM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
On 2021-12-17 10:16:10 [+0800], Yajun Deng wrote:
We found an abnormally high latency when executing modprobe raid6_pq, the latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
How to reproduce:
- Install cyclictest sudo apt install rt-tests
- Run cyclictest example in one terminal sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m
- Modprobe raid6_pq in another terminal sudo modprobe raid6_pq
This is caused by ksoftirqd fail to scheduled due to disable preemption, this time is too long and unreasonable.
Reduce high latency by using migrate_disabl()/emigrate_enable() instead of preempt_disable()/preempt_enable(), the latency won't greater than 100us.
This patch beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for CONFIG_PREEMPT_VOLUNTARY=y.
Why does it matter? This is only during boot-up/ module loading or do I miss something?
Yes this only happens on boot-up and module loading.I don't know RT well enough to tell whether latency during module loading is an issue.
Nope. It is not.
The delay is a jiffy so it depends on CONFIG_HZ. You do benchmark for the best algorithm and if you get preempted during that period then your results may be wrong and you make a bad selection.
With current code, the delay _should be_ 16 jiffies. However, the experiment hits way longer latencies. I agree this may cause inaccurate benchmark results and thus suboptimal RAID algorithm.
I explained this in the original thread. All the observed latencies are really expected.
I guess the key question is whether long latency at module loading time matters. If that doesn't matter, we should just drop this.
Again, it does not matter at all and here it is rather desired by design.
Drop this, please.
--nX
Thanks, Song
On Fri, Dec 17, 2021 at 8:09 PM Sebastian Andrzej Siewior bigeasy@linutronix.de wrote:
On 2021-12-17 10:16:10 [+0800], Yajun Deng wrote:
We found an abnormally high latency when executing modprobe raid6_pq, the latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
How to reproduce:
- Install cyclictest sudo apt install rt-tests
- Run cyclictest example in one terminal sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m
- Modprobe raid6_pq in another terminal sudo modprobe raid6_pq
This is caused by ksoftirqd fail to scheduled due to disable preemption, this time is too long and unreasonable.
Reduce high latency by using migrate_disabl()/emigrate_enable() instead of preempt_disable()/preempt_enable(), the latency won't greater than 100us.
This patch beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for CONFIG_PREEMPT_VOLUNTARY=y.
Why does it matter? This is only during boot-up/ module loading or do I miss something? The delay is a jiffy so it depends on CONFIG_HZ. You do benchmark for the best algorithm and if you get preempted during that period then your results may be wrong and you make a bad selection.
You can either enable one algorithm and or disable CONFIG_RAID6_PQ_BENCHMARK. I don't see the need for this patch not to mention the stable tree.
Exactly. We should not touch this. I've just sent a verbose explanation in the original report thread.
--nX
Sebastian
linux-stable-mirror@lists.linaro.org