On Mon, May 02, 2022 at 09:08:31PM +0200, Uladzislau Rezki (Sony) wrote:
Motivation of backport:
- The cfcdef5e30469 ("rcu: Allow rcu_do_batch() to dynamically adjust batch sizes")
broke the default behaviour of "offloading rcu callbacks" setup. In that scenario after each callback the caller context was used to check if it has to be rescheduled giving a CPU time for others. After that change an "offloaded" setup can switch to time-based RCU callbacks processing, what can be long for latency sensitive workloads and SCHED_FIFO processes, i.e. callbacks are invoked for a long time with keeping preemption off and without checking cond_resched(). 2. Our devices which run Android and 5.10 kernel have some critical areas which are sensitive to latency. It is a low latency audio, 8k video, UI stack and so on. For example below is a trace that illustrates a delay of "irq/396-5-0072" RT task to complete IRQ processing:
<snip> rcuop/6-54 [000] d.h2 183.752989: irq_handler_entry: irq=85 name=i2c_geni rcuop/6-54 [000] d.h5 183.753007: sched_waking: comm=irq/396-5-0072 pid=12675 prio=49 target_cpu=000 rcuop/6-54 [000] dNh6 183.753014: sched_wakeup: irq/396-5-0072:12675 [49] success=1 CPU:000 rcuop/6-54 [000] dNh2 183.753015: irq_handler_exit: irq=85 ret=handled rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt ... rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c< rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch rcuop/6-54 [000] d..2 183.755959: sched_switch: rcuop/6:54 [120] R ==> migration/0:16 [0] ... migratio-16 [000] d..2 183.756021: sched_switch: migration/0:16 [0] S ==> irq/396-5-0072:12675 [49] <snip> The "irq/396-5-0072:12675" was delayed for ~3 milliseconds due to introduced side effect. Please note, on our Android devices we get ~70 000 callbacks registered to be invoked by the "rcuop/x" workers. This is during 1 seconds time interval and regular handset usage. Latencies bigger that 3 milliseconds affect our high-resolution audio streaming over the LDAC/Bluetooth stack.
Two patches depend on each other.
One meta-comment. We can't apply changes to older kernels and not newer ones, as you do not want to upgrade your kernel and suffer a regression. This patch series comes from 5.17, but you are backporting to only 5.10. What about 5.15? I can't consider this series unless we have a series also for 5.15 for that reason, we have to keep in sync otherwise things get unmaintainable.
So, have a 5.15 backport as well?
Yep, i can prepare a backport for 5.15 as well. So i can resend the patches for 5.10 and 5.15 stable kernels, in total there will be 4 patches with two separate cover letters.
Does it work for you?
Also, you forgot to cc: the developers of the patches in this 0/X email, that just causes confusion for those that do not receive this message.
Sorry, i missed that point in cover latter. So will update it with appropriate people on my next resend.
Thanks!
-- Uladzislau Rezki