From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 004e8dce9c5595697951f7cd0e9f66b35c92265e upstream
Prohibit probing on instruction which has XEN_EMULATE_PREFIX
or KVM's emulate prefix. Since that prefix is a marker for Xen
and KVM, if we modify the marker by kprobe's int3, that doesn't
work as expected.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: x86(a)kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Stefano Stabellini <sstabellini(a)kernel.org>
Cc: Andrew Cooper <andrew.cooper3(a)citrix.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: xen-devel(a)lists.xenproject.org
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Link: https://lkml.kernel.org/r/156777566048.25081.6296162369492175325.stgit@devn…
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
Cc: stable(a)vger.kernel.org # 5.4.x
---
arch/x86/kernel/kprobes/core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c205d77d57da..3700dc94847c 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -358,6 +358,10 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
kernel_insn_init(insn, dest, MAX_INSN_SIZE);
insn_get_length(insn);
+ /* We can not probe force emulate prefixed instruction */
+ if (insn_has_emulate_prefix(insn))
+ return 0;
+
/* Another subsystem puts a breakpoint, failed to recover */
if (insn->opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
return 0;
--
2.32.0
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Motivation of backport:
-----------------------
1. The cfcdef5e30469 ("rcu: Allow rcu_do_batch() to dynamically adjust batch sizes")
broke the default behaviour of "offloading rcu callbacks" setup. In that scenario
after each callback the caller context was used to check if it has to be rescheduled
giving a CPU time for others. After that change an "offloaded" setup can switch to
time-based RCU callbacks processing, what can be long for latency sensitive workloads
and SCHED_FIFO processes, i.e. callbacks are invoked for a long time with keeping
preemption off and without checking cond_resched().
2. Our devices which run Android and 5.10 kernel have some critical areas which
are sensitive to latency. It is a low latency audio, 8k video, UI stack and so on.
For example below is a trace that illustrates a delay of "irq/396-5-0072" RT task
to complete IRQ processing:
<snip>
rcuop/6-54 [000] d.h2 183.752989: irq_handler_entry: irq=85 name=i2c_geni
rcuop/6-54 [000] d.h5 183.753007: sched_waking: comm=irq/396-5-0072 pid=12675 prio=49 target_cpu=000
rcuop/6-54 [000] dNh6 183.753014: sched_wakeup: irq/396-5-0072:12675 [49] success=1 CPU:000
rcuop/6-54 [000] dNh2 183.753015: irq_handler_exit: irq=85 ret=handled
rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
...
rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
rcuop/6-54 [000] d..2 183.755959: sched_switch: rcuop/6:54 [120] R ==> migration/0:16 [0]
...
migratio-16 [000] d..2 183.756021: sched_switch: migration/0:16 [0] S ==> irq/396-5-0072:12675 [49]
<snip>
The "irq/396-5-0072:12675" was delayed for ~3 milliseconds due to introduced side effect.
Please note, on our Android devices we get ~70 000 callbacks registered to be invoked by
the "rcuop/x" workers. This is during 1 seconds time interval and regular handset usage.
Latencies bigger that 3 milliseconds affect our high-resolution audio streaming over the
LDAC/Bluetooth stack.
Two patches depend on each other.
Frederic Weisbecker (2):
rcu: [for 5.15 stable] Fix callbacks processing time limit retaining cond_resched()
rcu: [for 5.15 stable] Apply callbacks processing time limit only on softirq
kernel/rcu/tree.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
--
2.30.2
Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
moved the switch_mmu_context() to C. While in principle a good idea, it
meant that the function now uses the stack. The stack is not accessible
from real mode though.
So to keep calling the function, let's turn on MSR_DR while we call it.
That way, all pointer references to the stack are handled virtually.
In addition, make sure to save/restore r12 in an SPRG, as it may get
clobbered by the C function.
Reported-by: Matt Evans <matt(a)ozlabs.org>
Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
Signed-off-by: Alexander Graf <graf(a)amazon.com>
Cc: stable(a)vger.kernel.org # v5.14+
---
v1 -> v2:
- Save and restore R12, so that we don't touch volatile registers
while calling into C.
---
arch/powerpc/kvm/book3s_32_sr.S | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
index e3ab9df6cf19..1ce13e3ab072 100644
--- a/arch/powerpc/kvm/book3s_32_sr.S
+++ b/arch/powerpc/kvm/book3s_32_sr.S
@@ -122,11 +122,27 @@
/* 0x0 - 0xb */
- /* 'current->mm' needs to be in r4 */
- tophys(r4, r2)
- lwz r4, MM(r4)
- tophys(r4, r4)
- /* This only clobbers r0, r3, r4 and r5 */
+ /* switch_mmu_context() clobbers r12, rescue it */
+ SET_SCRATCH0(r12)
+
+ /* switch_mmu_context() needs paging, let's enable it */
+ mfmsr r9
+ ori r11, r9, MSR_DR
+ mtmsr r11
+ sync
+
+ /* Calling switch_mmu_context(<inv>, current->mm, <inv>); */
+ lwz r4, MM(r2)
bl switch_mmu_context
+ /* Disable paging again */
+ mfmsr r9
+ li r6, MSR_DR
+ andc r9, r9, r6
+ mtmsr r9
+ sync
+
+ /* restore r12 */
+ GET_SCRATCH0(r12)
+
.endm
--
2.28.0.394.ge197136389
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879