The conversion of x86 VDSO to the generic clock mode storage broke the
paravirt and hyperv clocksource logic. These clock sources have their own
internal sequence counter to validate the clocksource at the point of
reading it. This is necessary because the hypervisor can invalidate the
clocksource asynchronously so a check during the VDSO data update is not
sufficient. If the internal check during read invalidates the clocksource
the read return U64_MAX. The original code checked this efficiently by
testing whether the result (casted to signed) is negative, i.e. bit 63 is
set. This was done that way because an extra indicator for the validity had
more overhead.
The conversion broke this check because the check was replaced by a check
for a valid VDSO clock mode.
The wreckage manifests itself when the paravirt clock is installed as a
valid VDSO clock and during runtime invalidated by the hypervisor,
e.g. after a host suspend/resume cycle. After the invalidation the read
function returns U64_MAX which is used as cycles and makes the clock jump
by ~2200 seconds, and become stale until the 2200 seconds have elapsed
where it starts to jump again. The period of this effect depends on the
shift/mult pair of the clocksource and the jumps and staleness are an
artifact of undefined but reproducible behaviour of math overflow.
Implement an x86 version of the new vdso_cycles_ok() inline which adds this
check back and a variant of vdso_clocksource_ok() which lets the compiler
optimize it out to avoid the extra conditional. That's suboptimal when the
system does not have a VDSO capable clocksource, but that's not the case
which is optimized for.
Reported-by: Miklos Szeredi <miklos(a)szeredi.hu>
Fixes: 5d51bee725cc ("clocksource: Add common vdso clock mode storage")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/x86/include/asm/vdso/gettimeofday.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -271,6 +271,24 @@ static __always_inline const struct vdso
return __vdso_data;
}
+static inline bool arch_vdso_clocksource_ok(const struct vdso_data *vd)
+{
+ return true;
+}
+#define vdso_clocksource_ok arch_vdso_clocksource_ok
+
+/*
+ * Clocksource read value validation to handle PV and HyperV clocksources
+ * which can be invalidated asynchronously and indicate invalidation by
+ * returning U64_MAX, which can be effectively tested by checking for a
+ * negative value after casting it to s64.
+ */
+static inline bool arch_vdso_cycles_ok(u64 cycles)
+{
+ return (s64)cycles >= 0;
+}
+#define vdso_cycles_ok arch_vdso_cycles_ok
+
/*
* x86 specific delta calculation.
*
From: SeongJae Park <sjpark(a)amazon.de>
This commit recommends the patches to replace 'blacklist' and
'whitelist' with the 'blocklist' and 'allowlist', because the new
suggestions are incontrovertible, doesn't make people hurt, and more
self-explanatory.
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
cr https://code.amazon.com/reviews/CR-27247203
---
scripts/spelling.txt | 2 ++
1 file changed, 2 insertions(+)
diff --git a/scripts/spelling.txt b/scripts/spelling.txt
index d9cd24cf0d40..ea785568d8b8 100644
--- a/scripts/spelling.txt
+++ b/scripts/spelling.txt
@@ -230,6 +230,7 @@ beter||better
betweeen||between
bianries||binaries
bitmast||bitmask
+blacklist||blocklist
boardcast||broadcast
borad||board
boundry||boundary
@@ -1495,6 +1496,7 @@ whcih||which
whenver||whenever
wheter||whether
whe||when
+whitelist||allowlist
wierd||weird
wiil||will
wirte||write
--
2.17.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 18dfb5326370991c81a6d1ed6d1aeee055cb8c05 Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <m.othacehe(a)gmail.com>
Date: Sun, 3 May 2020 11:29:55 +0200
Subject: [PATCH] iio: vcnl4000: Fix i2c swapped word reading.
The bytes returned by the i2c reading need to be swapped
unconditionally. Otherwise, on be16 platforms, an incorrect value will be
returned.
Taking the slow path via next merge window as its been around a while
and we have a patch set dependent on this which would be held up.
Fixes: 62a1efb9f868 ("iio: add vcnl4000 combined ALS and proximity sensor")
Signed-off-by: Mathieu Othacehe <m.othacehe(a)gmail.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
index 985cc39ede8e..979746a7d411 100644
--- a/drivers/iio/light/vcnl4000.c
+++ b/drivers/iio/light/vcnl4000.c
@@ -220,7 +220,6 @@ static int vcnl4000_measure(struct vcnl4000_data *data, u8 req_mask,
u8 rdy_mask, u8 data_reg, int *val)
{
int tries = 20;
- __be16 buf;
int ret;
mutex_lock(&data->vcnl4000_lock);
@@ -247,13 +246,12 @@ static int vcnl4000_measure(struct vcnl4000_data *data, u8 req_mask,
goto fail;
}
- ret = i2c_smbus_read_i2c_block_data(data->client,
- data_reg, sizeof(buf), (u8 *) &buf);
+ ret = i2c_smbus_read_word_swapped(data->client, data_reg);
if (ret < 0)
goto fail;
mutex_unlock(&data->vcnl4000_lock);
- *val = be16_to_cpu(buf);
+ *val = ret;
return 0;
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: dbbe2ad02e9df26e372f38cc3e70dab9222c832e
Gitweb: https://git.kernel.org/tip/dbbe2ad02e9df26e372f38cc3e70dab9222c832e
Author: Anthony Steinhauser <asteinhauser(a)google.com>
AuthorDate: Sun, 05 Jan 2020 12:19:43 -08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 09 Jun 2020 10:50:55 +02:00
x86/speculation: Prevent rogue cross-process SSBD shutdown
On context switch the change of TIF_SSBD and TIF_SPEC_IB are evaluated
to adjust the mitigations accordingly. This is optimized to avoid the
expensive MSR write if not needed.
This optimization is buggy and allows an attacker to shutdown the SSBD
protection of a victim process.
The update logic reads the cached base value for the speculation control
MSR which has neither the SSBD nor the STIBP bit set. It then OR's the
SSBD bit only when TIF_SSBD is different and requests the MSR update.
That means if TIF_SSBD of the previous and next task are the same, then
the base value is not updated, even if TIF_SSBD is set. The MSR write is
not requested.
Subsequently if the TIF_STIBP bit differs then the STIBP bit is updated
in the base value and the MSR is written with a wrong SSBD value.
This was introduced when the per task/process conditional STIPB
switching was added on top of the existing SSBD switching.
It is exploitable if the attacker creates a process which enforces SSBD
and has the contrary value of STIBP than the victim process (i.e. if the
victim process enforces STIBP, the attacker process must not enforce it;
if the victim process does not enforce STIBP, the attacker process must
enforce it) and schedule it on the same core as the victim process. If
the victim runs after the attacker the victim becomes vulnerable to
Spectre V4.
To fix this, update the MSR value independent of the TIF_SSBD difference
and dependent on the SSBD mitigation method available. This ensures that
a subsequent STIPB initiated MSR write has the correct state of SSBD.
[ tglx: Handle X86_FEATURE_VIRT_SSBD & X86_FEATURE_VIRT_SSBD correctly
and massaged changelog ]
Fixes: 5bfbe3ad5840 ("x86/speculation: Prepare for per task indirect branch speculation control")
Signed-off-by: Anthony Steinhauser <asteinhauser(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/x86/kernel/process.c | 28 ++++++++++------------------
1 file changed, 10 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 35638f1..8f4533c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -545,28 +545,20 @@ static __always_inline void __speculation_ctrl_update(unsigned long tifp,
lockdep_assert_irqs_disabled();
- /*
- * If TIF_SSBD is different, select the proper mitigation
- * method. Note that if SSBD mitigation is disabled or permanentely
- * enabled this branch can't be taken because nothing can set
- * TIF_SSBD.
- */
- if (tif_diff & _TIF_SSBD) {
- if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) {
+ /* Handle change of TIF_SSBD depending on the mitigation method. */
+ if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) {
+ if (tif_diff & _TIF_SSBD)
amd_set_ssb_virt_state(tifn);
- } else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) {
+ } else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) {
+ if (tif_diff & _TIF_SSBD)
amd_set_core_ssb_state(tifn);
- } else if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) ||
- static_cpu_has(X86_FEATURE_AMD_SSBD)) {
- msr |= ssbd_tif_to_spec_ctrl(tifn);
- updmsr = true;
- }
+ } else if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) ||
+ static_cpu_has(X86_FEATURE_AMD_SSBD)) {
+ updmsr |= !!(tif_diff & _TIF_SSBD);
+ msr |= ssbd_tif_to_spec_ctrl(tifn);
}
- /*
- * Only evaluate TIF_SPEC_IB if conditional STIBP is enabled,
- * otherwise avoid the MSR write.
- */
+ /* Only evaluate TIF_SPEC_IB if conditional STIBP is enabled. */
if (IS_ENABLED(CONFIG_SMP) &&
static_branch_unlikely(&switch_to_cond_stibp)) {
updmsr |= !!(tif_diff & _TIF_SPEC_IB);
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 4d8df8cbb9156b0a0ab3f802b80cb5db57acc0bf
Gitweb: https://git.kernel.org/tip/4d8df8cbb9156b0a0ab3f802b80cb5db57acc0bf
Author: Anthony Steinhauser <asteinhauser(a)google.com>
AuthorDate: Sun, 07 Jun 2020 05:44:19 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 09 Jun 2020 10:50:55 +02:00
x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches.
Currently, it is possible to enable indirect branch speculation even after
it was force-disabled using the PR_SPEC_FORCE_DISABLE option. Moreover, the
PR_GET_SPECULATION_CTRL command gives afterwards an incorrect result
(force-disabled when it is in fact enabled). This also is inconsistent
vs. STIBP and the documention which cleary states that
PR_SPEC_FORCE_DISABLE cannot be undone.
Fix this by actually enforcing force-disabled indirect branch
speculation. PR_SPEC_ENABLE called after PR_SPEC_FORCE_DISABLE now fails
with -EPERM as described in the documentation.
Fixes: 9137bb27e60e ("x86/speculation: Add prctl() control for indirect branch speculation")
Signed-off-by: Anthony Steinhauser <asteinhauser(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/x86/kernel/cpu/bugs.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 8d57562..56f573a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1175,11 +1175,14 @@ static int ib_prctl_set(struct task_struct *task, unsigned long ctrl)
return 0;
/*
* Indirect branch speculation is always disabled in strict
- * mode.
+ * mode. It can neither be enabled if it was force-disabled
+ * by a previous prctl call.
+
*/
if (spectre_v2_user_ibpb == SPECTRE_V2_USER_STRICT ||
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
- spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED)
+ spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ||
+ task_spec_ib_force_disable(task))
return -EPERM;
task_clear_spec_ib_disable(task);
task_update_spec_tif(task);
`jiffies` and `jiffies_64` are meant to alias (two different symbols
that share the same address). Most architectures make the symbols alias
to the same address via linker script assignment in their
arch/<arch>/kernel/vmlinux.lds.S:
jiffies = jiffies_64;
which is effectively a definition of jiffies.
jiffies and jiffies_64 are both forward declared for all arch's in:
include/linux/jiffies.h.
jiffies_64 is defined in kernel/time/timer.c for all arch's.
x86_64 was peculiar in that it wasn't doing the above linker script
assignment, but rather was:
1. defining jiffies in arch/x86/kernel/time.c instead via linker script.
2. overriding the symbol jiffies_64 from kernel/time/timer.c in
arch/x86/kernel/vmlinux.lds.s via `jiffies_64 = jiffies;`.
As Fangrui notes:
```
In LLD, symbol assignments in linker scripts override definitions in
object files. GNU ld appears to have the same behavior. It would
probably make sense for LLD to error "duplicate symbol" but GNU ld is
unlikely to adopt for compatibility reasons.
```
So we have an ODR violation (UB), which we seem to have gotten away
with thus far. Where it becomes harmful is when we:
1. Use -fno-semantic-interposition.
As Fangrui notes:
```
Clang after LLVM
commit 5b22bcc2b70d ("[X86][ELF] Prefer to lower MC_GlobalAddress
operands to .Lfoo$local")
defaults to -fno-semantic-interposition similar semantics which help
-fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined
within the same translation unit. Unlike GCC
-fno-semantic-interposition, Clang emits such relocations referencing
local symbols for non-pic code as well.
```
This causes references to jiffies to refer to `.Ljiffies$local` when
jiffies is defined in the same translation unit. Likewise, references
to jiffies_64 become references to `.Ljiffies_64$local` in translation
units that define jiffies_64. Because these differ from the names
used in the linker script, they will not be rewritten to alias one
another.
Combined with ...
2. Full LTO effectively treats all source files as one translation
unit, causing these local references to be produced everywhere. When
the linker processes the linker script, there are no longer any
references to `jiffies_64` anywhere to replace with `jiffies`. And
thus `.Ljiffies$local` and `.Ljiffies_64$local` no longer alias
at all.
In the process of porting patches enabling Full LTO from arm64 to
x86_64, we observe spooky bugs where the kernel appeared to boot, but
init doesn't get scheduled.
Instead, we can avoid the ODR violation by matching other arch's by
defining jiffies only by linker script. For -fno-semantic-interposition
+ Full LTO, there is no longer a global definition of jiffies for the
compiler to produce a local symbol which the linker script won't ensure
aliases to jiffies_64.
Link: https://github.com/ClangBuiltLinux/linux/issues/852
Fixes: 40747ffa5aa8 ("asmlinkage: Make jiffies visible")
Cc: stable(a)vger.kernel.org
Reported-by: Nathan Chancellor <natechancellor(a)gmail.com>
Reported-by: Alistair Delva <adelva(a)google.com>
Suggested-by: Fangrui Song <maskray(a)google.com>
Debugged-by: Nick Desaulniers <ndesaulniers(a)google.com>
Debugged-by: Sami Tolvanen <samitolvanen(a)google.com>
Signed-off-by: Bob Haarman <inglorion(a)google.com>
---
arch/x86/kernel/time.c | 4 ----
arch/x86/kernel/vmlinux.lds.S | 4 ++--
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index 371a6b348e44..e42faa792c07 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -25,10 +25,6 @@
#include <asm/hpet.h>
#include <asm/time.h>
-#ifdef CONFIG_X86_64
-__visible volatile unsigned long jiffies __cacheline_aligned_in_smp = INITIAL_JIFFIES;
-#endif
-
unsigned long profile_pc(struct pt_regs *regs)
{
unsigned long pc = instruction_pointer(regs);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1bf7e312361f..7c35556c7827 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -40,13 +40,13 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
#ifdef CONFIG_X86_32
OUTPUT_ARCH(i386)
ENTRY(phys_startup_32)
-jiffies = jiffies_64;
#else
OUTPUT_ARCH(i386:x86-64)
ENTRY(phys_startup_64)
-jiffies_64 = jiffies;
#endif
+jiffies = jiffies_64;
+
#if defined(CONFIG_X86_64)
/*
* On 64-bit, align RODATA to 2MB so we retain large page mappings for
--
2.26.2.761.g0e0b3e54be-goog
On a VHE system, the EL1 state is left in the CPU most of the time,
and only syncronized back to memory when vcpu_put() is called (most
of the time on preemption).
Which means that when injecting an exception, we'd better have a way
to either:
(1) write directly to the EL1 sysregs
(2) synchronize the state back to memory, and do the changes there
For an AArch64, we already do (1), so we are safe. Unfortunately,
doing the same thing for AArch32 would be pretty invasive. Instead,
we can easily implement (2) by calling the put/load architectural
backends, and keep preemption disabled. We can then reload the
state back into EL1.
Cc: stable(a)vger.kernel.org
Reported-by: James Morse <james.morse(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
---
arch/arm64/kvm/aarch32.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/arch/arm64/kvm/aarch32.c b/arch/arm64/kvm/aarch32.c
index 0a356aa91aa1..40a62a99fbf8 100644
--- a/arch/arm64/kvm/aarch32.c
+++ b/arch/arm64/kvm/aarch32.c
@@ -33,6 +33,26 @@ static const u8 return_offsets[8][2] = {
[7] = { 4, 4 }, /* FIQ, unused */
};
+static bool pre_fault_synchronize(struct kvm_vcpu *vcpu)
+{
+ preempt_disable();
+ if (vcpu->arch.sysregs_loaded_on_cpu) {
+ kvm_arch_vcpu_put(vcpu);
+ return true;
+ }
+
+ preempt_enable();
+ return false;
+}
+
+static void post_fault_synchronize(struct kvm_vcpu *vcpu, bool loaded)
+{
+ if (loaded) {
+ kvm_arch_vcpu_load(vcpu, smp_processor_id());
+ preempt_enable();
+ }
+}
+
/*
* When an exception is taken, most CPSR fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]).
@@ -155,7 +175,10 @@ static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
void kvm_inject_undef32(struct kvm_vcpu *vcpu)
{
+ bool loaded = pre_fault_synchronize(vcpu);
+
prepare_fault32(vcpu, PSR_AA32_MODE_UND, 4);
+ post_fault_synchronize(vcpu, loaded);
}
/*
@@ -168,6 +191,9 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
u32 vect_offset;
u32 *far, *fsr;
bool is_lpae;
+ bool loaded;
+
+ loaded = pre_fault_synchronize(vcpu);
if (is_pabt) {
vect_offset = 12;
@@ -191,6 +217,8 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
/* no need to shuffle FS[4] into DFSR[10] as its 0 */
*fsr = DFSR_FSC_EXTABT_nLPAE;
}
+
+ post_fault_synchronize(vcpu, loaded);
}
void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr)
--
2.26.2