This is a note to let you know that I've just added the patch titled
objtool: Detect jumps to retpoline thunks
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
objtool-detect-jumps-to-retpoline-thunks.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 39b735332cb8b33a27c28592d969e4016c86c3ea Mon Sep 17 00:00:00 2001
From: Josh Poimboeuf <jpoimboe(a)redhat.com>
Date: Thu, 11 Jan 2018 21:46:23 +0000
Subject: objtool: Detect jumps to retpoline thunks
From: Josh Poimboeuf <jpoimboe(a)redhat.com>
commit 39b735332cb8b33a27c28592d969e4016c86c3ea upstream.
A direct jump to a retpoline thunk is really an indirect jump in
disguise. Change the objtool instruction type accordingly.
Objtool needs to know where indirect branches are so it can detect
switch statement jump tables.
This fixes a bunch of warnings with CONFIG_RETPOLINE like:
arch/x86/events/intel/uncore_nhmex.o: warning: objtool: nhmex_rbox_msr_enable_event()+0x44: sibling call from callable instruction with modified stack frame
kernel/signal.o: warning: objtool: copy_siginfo_to_user()+0x91: sibling call from callable instruction with modified stack frame
...
Signed-off-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: thomas.lendacky(a)amd.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-2-git-send-email-dwmw@amazon.co.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/objtool/check.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -456,6 +456,13 @@ static int add_jump_destinations(struct
} else if (rela->sym->sec->idx) {
dest_sec = rela->sym->sec;
dest_off = rela->sym->sym.st_value + rela->addend + 4;
+ } else if (strstr(rela->sym->name, "_indirect_thunk_")) {
+ /*
+ * Retpoline jumps are really dynamic jumps in
+ * disguise, so convert them accordingly.
+ */
+ insn->type = INSN_JUMP_DYNAMIC;
+ continue;
} else {
/* sibling call */
insn->jump_dest = 0;
Patches currently in stable-queue which might be from jpoimboe(a)redhat.com are
queue-4.14/x86-spectre-add-boot-time-option-to-select-spectre-v2-mitigation.patch
queue-4.14/objtool-detect-jumps-to-retpoline-thunks.patch
queue-4.14/objtool-allow-alternatives-to-be-ignored.patch
queue-4.14/x86-retpoline-add-initial-retpoline-support.patch
This is a note to let you know that I've just added the patch titled
objtool: Allow alternatives to be ignored
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
objtool-allow-alternatives-to-be-ignored.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 258c76059cece01bebae098e81bacb1af2edad17 Mon Sep 17 00:00:00 2001
From: Josh Poimboeuf <jpoimboe(a)redhat.com>
Date: Thu, 11 Jan 2018 21:46:24 +0000
Subject: objtool: Allow alternatives to be ignored
From: Josh Poimboeuf <jpoimboe(a)redhat.com>
commit 258c76059cece01bebae098e81bacb1af2edad17 upstream.
Getting objtool to understand retpolines is going to be a bit of a
challenge. For now, take advantage of the fact that retpolines are
patched in with alternatives. Just read the original (sane)
non-alternative instruction, and ignore the patched-in retpoline.
This allows objtool to understand the control flow *around* the
retpoline, even if it can't yet follow what's inside. This means the
ORC unwinder will fail to unwind from inside a retpoline, but will work
fine otherwise.
Signed-off-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: thomas.lendacky(a)amd.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-3-git-send-email-dwmw@amazon.co.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/objtool/check.c | 62 +++++++++++++++++++++++++++++++++++++++++++++-----
tools/objtool/check.h | 2 -
2 files changed, 57 insertions(+), 7 deletions(-)
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -428,6 +428,40 @@ static void add_ignores(struct objtool_f
}
/*
+ * FIXME: For now, just ignore any alternatives which add retpolines. This is
+ * a temporary hack, as it doesn't allow ORC to unwind from inside a retpoline.
+ * But it at least allows objtool to understand the control flow *around* the
+ * retpoline.
+ */
+static int add_nospec_ignores(struct objtool_file *file)
+{
+ struct section *sec;
+ struct rela *rela;
+ struct instruction *insn;
+
+ sec = find_section_by_name(file->elf, ".rela.discard.nospec");
+ if (!sec)
+ return 0;
+
+ list_for_each_entry(rela, &sec->rela_list, list) {
+ if (rela->sym->type != STT_SECTION) {
+ WARN("unexpected relocation symbol type in %s", sec->name);
+ return -1;
+ }
+
+ insn = find_insn(file, rela->sym->sec, rela->addend);
+ if (!insn) {
+ WARN("bad .discard.nospec entry");
+ return -1;
+ }
+
+ insn->ignore_alts = true;
+ }
+
+ return 0;
+}
+
+/*
* Find the destination instructions for all jumps.
*/
static int add_jump_destinations(struct objtool_file *file)
@@ -509,11 +543,18 @@ static int add_call_destinations(struct
dest_off = insn->offset + insn->len + insn->immediate;
insn->call_dest = find_symbol_by_offset(insn->sec,
dest_off);
+ /*
+ * FIXME: Thanks to retpolines, it's now considered
+ * normal for a function to call within itself. So
+ * disable this warning for now.
+ */
+#if 0
if (!insn->call_dest) {
WARN_FUNC("can't find call dest symbol at offset 0x%lx",
insn->sec, insn->offset, dest_off);
return -1;
}
+#endif
} else if (rela->sym->type == STT_SECTION) {
insn->call_dest = find_symbol_by_offset(rela->sym->sec,
rela->addend+4);
@@ -678,12 +719,6 @@ static int add_special_section_alts(stru
return ret;
list_for_each_entry_safe(special_alt, tmp, &special_alts, list) {
- alt = malloc(sizeof(*alt));
- if (!alt) {
- WARN("malloc failed");
- ret = -1;
- goto out;
- }
orig_insn = find_insn(file, special_alt->orig_sec,
special_alt->orig_off);
@@ -694,6 +729,10 @@ static int add_special_section_alts(stru
goto out;
}
+ /* Ignore retpoline alternatives. */
+ if (orig_insn->ignore_alts)
+ continue;
+
new_insn = NULL;
if (!special_alt->group || special_alt->new_len) {
new_insn = find_insn(file, special_alt->new_sec,
@@ -719,6 +758,13 @@ static int add_special_section_alts(stru
goto out;
}
+ alt = malloc(sizeof(*alt));
+ if (!alt) {
+ WARN("malloc failed");
+ ret = -1;
+ goto out;
+ }
+
alt->insn = new_insn;
list_add_tail(&alt->list, &orig_insn->alts);
@@ -1035,6 +1081,10 @@ static int decode_sections(struct objtoo
add_ignores(file);
+ ret = add_nospec_ignores(file);
+ if (ret)
+ return ret;
+
ret = add_jump_destinations(file);
if (ret)
return ret;
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -44,7 +44,7 @@ struct instruction {
unsigned int len;
unsigned char type;
unsigned long immediate;
- bool alt_group, visited, dead_end, ignore, hint, save, restore;
+ bool alt_group, visited, dead_end, ignore, hint, save, restore, ignore_alts;
struct symbol *call_dest;
struct instruction *jump_dest;
struct list_head alts;
Patches currently in stable-queue which might be from jpoimboe(a)redhat.com are
queue-4.14/x86-spectre-add-boot-time-option-to-select-spectre-v2-mitigation.patch
queue-4.14/objtool-detect-jumps-to-retpoline-thunks.patch
queue-4.14/objtool-allow-alternatives-to-be-ignored.patch
queue-4.14/x86-retpoline-add-initial-retpoline-support.patch
This is a note to let you know that I've just added the patch titled
x86/tboot: Unbreak tboot with PTI enabled
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-tboot-unbreak-tboot-with-pti-enabled.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 262b6b30087246abf09d6275eb0c0dc421bcbe38 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Sat, 6 Jan 2018 18:41:14 +0100
Subject: x86/tboot: Unbreak tboot with PTI enabled
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 262b6b30087246abf09d6275eb0c0dc421bcbe38 upstream.
This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it. PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.
Undo the poison to allow execution.
Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Alan Cox <gnomes(a)lxorguk.ukuu.org.uk>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Jon Masters <jcm(a)redhat.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: Jeff Law <law(a)redhat.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: David" <dwmw(a)amazon.co.uk>
Cc: Nick Clifton <nickc(a)redhat.com>
Link: https://lkml.kernel.org/r/20180108102805.GK25546@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/tboot.c | 1 +
1 file changed, 1 insertion(+)
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long
p4d = p4d_alloc(&tboot_mm, pgd, vaddr);
if (!p4d)
return -1;
+ pgd->pgd &= ~_PAGE_NX;
pud = pud_alloc(&tboot_mm, p4d, vaddr);
if (!pud)
return -1;
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.14/x86-pti-unbreak-efi-old_memmap.patch
queue-4.14/x86-documentation-add-pti-description.patch
queue-4.14/x86-tboot-unbreak-tboot-with-pti-enabled.patch
This is a note to let you know that I've just added the patch titled
x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pti-remove-dead-logic-in-pti_user_pagetable_walk.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8d56eff266f3e41a6c39926269c4c3f58f881a8e Mon Sep 17 00:00:00 2001
From: Jike Song <albcamus(a)gmail.com>
Date: Tue, 9 Jan 2018 00:03:41 +0800
Subject: x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()
From: Jike Song <albcamus(a)gmail.com>
commit 8d56eff266f3e41a6c39926269c4c3f58f881a8e upstream.
The following code contains dead logic:
162 if (pgd_none(*pgd)) {
163 unsigned long new_p4d_page = __get_free_page(gfp);
164 if (!new_p4d_page)
165 return NULL;
166
167 if (pgd_none(*pgd)) {
168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
169 new_p4d_page = 0;
170 }
171 if (new_p4d_page)
172 free_page(new_p4d_page);
173 }
There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.
Dave Hansen explained:
Yes, the double-test was part of an optimization where we attempted to
avoid using a global spinlock in the fork() path. We would check for
unallocated mid-level page tables without the lock. The lock was only
taken when we needed to *make* an entry to avoid collisions.
Now that it is all single-threaded, there is no chance of a collision,
no need for a lock, and no need for the re-check.
As all these functions are only called during init, mark them __init as
well.
Fixes: 03f4424f348e ("x86/mm/pti: Add functions to clone kernel PMDs")
Signed-off-by: Jike Song <albcamus(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Alan Cox <gnomes(a)lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Jiri Koshina <jikos(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Kees Cook <keescook(a)google.com>
Cc: Andi Lutomirski <luto(a)amacapital.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Greg KH <gregkh(a)linux-foundation.org>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/20180108160341.3461-1-albcamus@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pti.c | 32 ++++++--------------------------
1 file changed, 6 insertions(+), 26 deletions(-)
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -149,7 +149,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pg
*
* Returns a pointer to a P4D on success, or NULL on failure.
*/
-static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+static __init p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
{
pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4
if (!new_p4d_page)
return NULL;
- if (pgd_none(*pgd)) {
- set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
- new_p4d_page = 0;
- }
- if (new_p4d_page)
- free_page(new_p4d_page);
+ set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
@@ -182,7 +177,7 @@ static p4d_t *pti_user_pagetable_walk_p4
*
* Returns a pointer to a PMD on success, or NULL on failure.
*/
-static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
{
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pm
if (!new_pud_page)
return NULL;
- if (p4d_none(*p4d)) {
- set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
- new_pud_page = 0;
- }
- if (new_pud_page)
- free_page(new_pud_page);
+ set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pm
if (!new_pmd_page)
return NULL;
- if (pud_none(*pud)) {
- set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
- new_pmd_page = 0;
- }
- if (new_pmd_page)
- free_page(new_pmd_page);
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_
if (!new_pte_page)
return NULL;
- if (pmd_none(*pmd)) {
- set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
- new_pte_page = 0;
- }
- if (new_pte_page)
- free_page(new_pte_page);
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
}
pte = pte_offset_kernel(pmd, address);
Patches currently in stable-queue which might be from albcamus(a)gmail.com are
queue-4.14/x86-mm-pti-remove-dead-logic-in-pti_user_pagetable_walk.patch
This is a note to let you know that I've just added the patch titled
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9c6a73c75864ad9fa49e5fa6513e4c4071c0e29f Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Mon, 8 Jan 2018 16:09:32 -0600
Subject: x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit 9c6a73c75864ad9fa49e5fa6513e4c4071c0e29f upstream.
With LFENCE now a serializing instruction, use LFENCE_RDTSC in preference
to MFENCE_RDTSC. However, since the kernel could be running under a
hypervisor that does not support writing that MSR, read the MSR back and
verify that the bit has been set successfully. If the MSR can be read
and the bit is set, then set the LFENCE_RDTSC feature, otherwise set the
MFENCE_RDTSC feature.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/20180108220932.12580.52458.stgit@tlendack-t1.amdo…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/amd.c | 18 ++++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -354,6 +354,7 @@
#define MSR_FAM10H_NODE_ID 0xc001100c
#define MSR_F10H_DECFG 0xc0011029
#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1
+#define MSR_F10H_DECFG_LFENCE_SERIALIZE BIT_ULL(MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT)
/* K8 MSRs */
#define MSR_K8_TOP_MEM1 0xc001001a
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -829,6 +829,9 @@ static void init_amd(struct cpuinfo_x86
set_cpu_cap(c, X86_FEATURE_K8);
if (cpu_has(c, X86_FEATURE_XMM2)) {
+ unsigned long long val;
+ int ret;
+
/*
* A serializing LFENCE has less overhead than MFENCE, so
* use it for execution serialization. On families which
@@ -839,8 +842,19 @@ static void init_amd(struct cpuinfo_x86
msr_set_bit(MSR_F10H_DECFG,
MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
- /* MFENCE stops RDTSC speculation */
- set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
+ /*
+ * Verify that the MSR write was successful (could be running
+ * under a hypervisor) and only then assume that LFENCE is
+ * serializing.
+ */
+ ret = rdmsrl_safe(MSR_F10H_DECFG, &val);
+ if (!ret && (val & MSR_F10H_DECFG_LFENCE_SERIALIZE)) {
+ /* A serializing LFENCE stops RDTSC speculation */
+ set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
+ } else {
+ /* MFENCE stops RDTSC speculation */
+ set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
+ }
}
/*
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.14/kvm-vmx-scrub-hardware-gprs-at-vm-exit.patch
queue-4.14/x86-mm-pti-remove-dead-logic-in-pti_user_pagetable_walk.patch
queue-4.14/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.14/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.14/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
This is a note to let you know that I've just added the patch titled
x86/cpu/AMD: Make LFENCE a serializing instruction
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu-amd-make-lfence-a-serializing-instruction.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e4d0e84e490790798691aaa0f2e598637f1867ec Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Mon, 8 Jan 2018 16:09:21 -0600
Subject: x86/cpu/AMD: Make LFENCE a serializing instruction
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit e4d0e84e490790798691aaa0f2e598637f1867ec upstream.
To aid in speculation control, make LFENCE a serializing instruction
since it has less overhead than MFENCE. This is done by setting bit 1
of MSR 0xc0011029 (DE_CFG). Some families that support LFENCE do not
have this MSR. For these families, the LFENCE instruction is already
serializing.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/20180108220921.12580.71694.stgit@tlendack-t1.amdo…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/amd.c | 10 ++++++++++
2 files changed, 12 insertions(+)
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -352,6 +352,8 @@
#define FAM10H_MMIO_CONF_BASE_MASK 0xfffffffULL
#define FAM10H_MMIO_CONF_BASE_SHIFT 20
#define MSR_FAM10H_NODE_ID 0xc001100c
+#define MSR_F10H_DECFG 0xc0011029
+#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1
/* K8 MSRs */
#define MSR_K8_TOP_MEM1 0xc001001a
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -829,6 +829,16 @@ static void init_amd(struct cpuinfo_x86
set_cpu_cap(c, X86_FEATURE_K8);
if (cpu_has(c, X86_FEATURE_XMM2)) {
+ /*
+ * A serializing LFENCE has less overhead than MFENCE, so
+ * use it for execution serialization. On families which
+ * don't have that MSR, LFENCE is already serializing.
+ * msr_set_bit() uses the safe accessors, too, even if the MSR
+ * is not present.
+ */
+ msr_set_bit(MSR_F10H_DECFG,
+ MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
+
/* MFENCE stops RDTSC speculation */
set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
}
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.14/kvm-vmx-scrub-hardware-gprs-at-vm-exit.patch
queue-4.14/x86-mm-pti-remove-dead-logic-in-pti_user_pagetable_walk.patch
queue-4.14/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.14/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.14/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
This is a note to let you know that I've just added the patch titled
x86/alternatives: Fix optimize_nops() checking
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-alternatives-fix-optimize_nops-checking.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 612e8e9350fd19cae6900cf36ea0c6892d1a0dca Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Wed, 10 Jan 2018 12:28:16 +0100
Subject: x86/alternatives: Fix optimize_nops() checking
From: Borislav Petkov <bp(a)suse.de>
commit 612e8e9350fd19cae6900cf36ea0c6892d1a0dca upstream.
The alternatives code checks only the first byte whether it is a NOP, but
with NOPs in front of the payload and having actual instructions after it
breaks the "optimized' test.
Make sure to scan all bytes before deciding to optimize the NOPs in there.
Reported-by: David Woodhouse <dwmw2(a)infradead.org>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: Andrew Lutomirski <luto(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/20180110112815.mgciyf5acwacphkq@pd.tnic
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/alternative.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -344,9 +344,12 @@ done:
static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
{
unsigned long flags;
+ int i;
- if (instr[0] != 0x90)
- return;
+ for (i = 0; i < a->padlen; i++) {
+ if (instr[i] != 0x90)
+ return;
+ }
local_irq_save(flags);
add_nops(instr + (a->instrlen - a->padlen), a->padlen);
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.14/x86-microcode-intel-extend-bdw-late-loading-with-a-revision-check.patch
queue-4.14/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.14/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.14/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
This is a note to let you know that I've just added the patch titled
sysfs/cpu: Fix typos in vulnerability documentation
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sysfs-cpu-fix-typos-in-vulnerability-documentation.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9ecccfaa7cb5249bd31bdceb93fcf5bedb8a24d8 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Tue, 9 Jan 2018 15:02:51 +0000
Subject: sysfs/cpu: Fix typos in vulnerability documentation
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit 9ecccfaa7cb5249bd31bdceb93fcf5bedb8a24d8 upstream.
Fixes: 87590ce6e ("sysfs/cpu: Add vulnerability folder")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -378,7 +378,7 @@ What: /sys/devices/system/cpu/vulnerabi
/sys/devices/system/cpu/vulnerabilities/meltdown
/sys/devices/system/cpu/vulnerabilities/spectre_v1
/sys/devices/system/cpu/vulnerabilities/spectre_v2
-Date: Januar 2018
+Date: January 2018
Contact: Linux kernel mailing list <linux-kernel(a)vger.kernel.org>
Description: Information about CPU vulnerabilities
@@ -388,4 +388,4 @@ Description: Information about CPU vulne
"Not affected" CPU is not affected by the vulnerability
"Vulnerable" CPU is affected and no mitigation in effect
- "Mitigation: $M" CPU is affetcted and mitigation $M is in effect
+ "Mitigation: $M" CPU is affected and mitigation $M is in effect
Patches currently in stable-queue which might be from dwmw(a)amazon.co.uk are
queue-4.14/x86-cpufeatures-add-x86_bug_spectre_v.patch
queue-4.14/x86-mm-pti-remove-dead-logic-in-pti_user_pagetable_walk.patch
queue-4.14/sysfs-cpu-fix-typos-in-vulnerability-documentation.patch
queue-4.14/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.14/x86-cpu-implement-cpu-vulnerabilites-sysfs-functions.patch
queue-4.14/x86-tboot-unbreak-tboot-with-pti-enabled.patch
queue-4.14/sysfs-cpu-add-vulnerability-folder.patch
queue-4.14/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
This is a note to let you know that I've just added the patch titled
x86/Documentation: Add PTI description
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-documentation-add-pti-description.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 01c9b17bf673b05bb401b76ec763e9730ccf1376 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Fri, 5 Jan 2018 09:44:36 -0800
Subject: x86/Documentation: Add PTI description
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 01c9b17bf673b05bb401b76ec763e9730ccf1376 upstream.
Add some details about how PTI works, what some of the downsides
are, and how to debug it when things go wrong.
Also document the kernel parameter: 'pti/nopti'.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Cc: Moritz Lipp <moritz.lipp(a)iaik.tugraz.at>
Cc: Daniel Gruss <daniel.gruss(a)iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz(a)iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner(a)student.tugraz.at>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Andi Lutomirsky <luto(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20180105174436.1BC6FA2B@viggo.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 21 ++--
Documentation/x86/pti.txt | 186 ++++++++++++++++++++++++++++++++++++
2 files changed, 200 insertions(+), 7 deletions(-)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2763,8 +2763,6 @@ bytes respectively. Such letter suffixes
nojitter [IA-64] Disables jitter checking for ITC timers.
- nopti [X86-64] Disable KAISER isolation of kernel from user.
-
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
@@ -3327,11 +3325,20 @@ bytes respectively. Such letter suffixes
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
- pti= [X86_64]
- Control KAISER user/kernel address space isolation:
- on - enable
- off - disable
- auto - default setting
+ pti= [X86_64] Control Page Table Isolation of user and
+ kernel address spaces. Disabling this feature
+ removes hardening, but improves performance of
+ system calls and interrupts.
+
+ on - unconditionally enable
+ off - unconditionally disable
+ auto - kernel detects whether your CPU model is
+ vulnerable to issues that PTI mitigates
+
+ Not specifying this option is equivalent to pti=auto.
+
+ nopti [X86_64]
+ Equivalent to pti=off
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
--- /dev/null
+++ b/Documentation/x86/pti.txt
@@ -0,0 +1,186 @@
+Overview
+========
+
+Page Table Isolation (pti, previously known as KAISER[1]) is a
+countermeasure against attacks on the shared user/kernel address
+space such as the "Meltdown" approach[2].
+
+To mitigate this class of attacks, we create an independent set of
+page tables for use only when running userspace applications. When
+the kernel is entered via syscalls, interrupts or exceptions, the
+page tables are switched to the full "kernel" copy. When the system
+switches back to user mode, the user copy is used again.
+
+The userspace page tables contain only a minimal amount of kernel
+data: only what is needed to enter/exit the kernel such as the
+entry/exit functions themselves and the interrupt descriptor table
+(IDT). There are a few strictly unnecessary things that get mapped
+such as the first C function when entering an interrupt (see
+comments in pti.c).
+
+This approach helps to ensure that side-channel attacks leveraging
+the paging structures do not function when PTI is enabled. It can be
+enabled by setting CONFIG_PAGE_TABLE_ISOLATION=y at compile time.
+Once enabled at compile-time, it can be disabled at boot with the
+'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt).
+
+Page Table Management
+=====================
+
+When PTI is enabled, the kernel manages two sets of page tables.
+The first set is very similar to the single set which is present in
+kernels without PTI. This includes a complete mapping of userspace
+that the kernel can use for things like copy_to_user().
+
+Although _complete_, the user portion of the kernel page tables is
+crippled by setting the NX bit in the top level. This ensures
+that any missed kernel->user CR3 switch will immediately crash
+userspace upon executing its first instruction.
+
+The userspace page tables map only the kernel data needed to enter
+and exit the kernel. This data is entirely contained in the 'struct
+cpu_entry_area' structure which is placed in the fixmap which gives
+each CPU's copy of the area a compile-time-fixed virtual address.
+
+For new userspace mappings, the kernel makes the entries in its
+page tables like normal. The only difference is when the kernel
+makes entries in the top (PGD) level. In addition to setting the
+entry in the main kernel PGD, a copy of the entry is made in the
+userspace page tables' PGD.
+
+This sharing at the PGD level also inherently shares all the lower
+layers of the page tables. This leaves a single, shared set of
+userspace page tables to manage. One PTE to lock, one set of
+accessed bits, dirty bits, etc...
+
+Overhead
+========
+
+Protection against side-channel attacks is important. But,
+this protection comes at a cost:
+
+1. Increased Memory Use
+ a. Each process now needs an order-1 PGD instead of order-0.
+ (Consumes an additional 4k per process).
+ b. The 'cpu_entry_area' structure must be 2MB in size and 2MB
+ aligned so that it can be mapped by setting a single PMD
+ entry. This consumes nearly 2MB of RAM once the kernel
+ is decompressed, but no space in the kernel image itself.
+
+2. Runtime Cost
+ a. CR3 manipulation to switch between the page table copies
+ must be done at interrupt, syscall, and exception entry
+ and exit (it can be skipped when the kernel is interrupted,
+ though.) Moves to CR3 are on the order of a hundred
+ cycles, and are required at every entry and exit.
+ b. A "trampoline" must be used for SYSCALL entry. This
+ trampoline depends on a smaller set of resources than the
+ non-PTI SYSCALL entry code, so requires mapping fewer
+ things into the userspace page tables. The downside is
+ that stacks must be switched at entry time.
+ d. Global pages are disabled for all kernel structures not
+ mapped into both kernel and userspace page tables. This
+ feature of the MMU allows different processes to share TLB
+ entries mapping the kernel. Losing the feature means more
+ TLB misses after a context switch. The actual loss of
+ performance is very small, however, never exceeding 1%.
+ d. Process Context IDentifiers (PCID) is a CPU feature that
+ allows us to skip flushing the entire TLB when switching page
+ tables by setting a special bit in CR3 when the page tables
+ are changed. This makes switching the page tables (at context
+ switch, or kernel entry/exit) cheaper. But, on systems with
+ PCID support, the context switch code must flush both the user
+ and kernel entries out of the TLB. The user PCID TLB flush is
+ deferred until the exit to userspace, minimizing the cost.
+ See intel.com/sdm for the gory PCID/INVPCID details.
+ e. The userspace page tables must be populated for each new
+ process. Even without PTI, the shared kernel mappings
+ are created by copying top-level (PGD) entries into each
+ new process. But, with PTI, there are now *two* kernel
+ mappings: one in the kernel page tables that maps everything
+ and one for the entry/exit structures. At fork(), we need to
+ copy both.
+ f. In addition to the fork()-time copying, there must also
+ be an update to the userspace PGD any time a set_pgd() is done
+ on a PGD used to map userspace. This ensures that the kernel
+ and userspace copies always map the same userspace
+ memory.
+ g. On systems without PCID support, each CR3 write flushes
+ the entire TLB. That means that each syscall, interrupt
+ or exception flushes the TLB.
+ h. INVPCID is a TLB-flushing instruction which allows flushing
+ of TLB entries for non-current PCIDs. Some systems support
+ PCIDs, but do not support INVPCID. On these systems, addresses
+ can only be flushed from the TLB for the current PCID. When
+ flushing a kernel address, we need to flush all PCIDs, so a
+ single kernel address flush will require a TLB-flushing CR3
+ write upon the next use of every PCID.
+
+Possible Future Work
+====================
+1. We can be more careful about not actually writing to CR3
+ unless its value is actually changed.
+2. Allow PTI to be enabled/disabled at runtime in addition to the
+ boot-time switching.
+
+Testing
+========
+
+To test stability of PTI, the following test procedure is recommended,
+ideally doing all of these in parallel:
+
+1. Set CONFIG_DEBUG_ENTRY=y
+2. Run several copies of all of the tools/testing/selftests/x86/ tests
+ (excluding MPX and protection_keys) in a loop on multiple CPUs for
+ several minutes. These tests frequently uncover corner cases in the
+ kernel entry code. In general, old kernels might cause these tests
+ themselves to crash, but they should never crash the kernel.
+3. Run the 'perf' tool in a mode (top or record) that generates many
+ frequent performance monitoring non-maskable interrupts (see "NMI"
+ in /proc/interrupts). This exercises the NMI entry/exit code which
+ is known to trigger bugs in code paths that did not expect to be
+ interrupted, including nested NMIs. Using "-c" boosts the rate of
+ NMIs, and using two -c with separate counters encourages nested NMIs
+ and less deterministic behavior.
+
+ while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done
+
+4. Launch a KVM virtual machine.
+5. Run 32-bit binaries on systems supporting the SYSCALL instruction.
+ This has been a lightly-tested code path and needs extra scrutiny.
+
+Debugging
+=========
+
+Bugs in PTI cause a few different signatures of crashes
+that are worth noting here.
+
+ * Failures of the selftests/x86 code. Usually a bug in one of the
+ more obscure corners of entry_64.S
+ * Crashes in early boot, especially around CPU bringup. Bugs
+ in the trampoline code or mappings cause these.
+ * Crashes at the first interrupt. Caused by bugs in entry_64.S,
+ like screwing up a page table switch. Also caused by
+ incorrectly mapping the IRQ handler entry code.
+ * Crashes at the first NMI. The NMI code is separate from main
+ interrupt handlers and can have bugs that do not affect
+ normal interrupts. Also caused by incorrectly mapping NMI
+ code. NMIs that interrupt the entry code must be very
+ careful and can be the cause of crashes that show up when
+ running perf.
+ * Kernel crashes at the first exit to userspace. entry_64.S
+ bugs, or failing to map some of the exit code.
+ * Crashes at first interrupt that interrupts userspace. The paths
+ in entry_64.S that return to userspace are sometimes separate
+ from the ones that return to the kernel.
+ * Double faults: overflowing the kernel stack because of page
+ faults upon page faults. Caused by touching non-pti-mapped
+ data in the entry code, or forgetting to switch to kernel
+ CR3 before calling into C functions which are not pti-mapped.
+ * Userspace segfaults early in boot, sometimes manifesting
+ as mount(8) failing to mount the rootfs. These have
+ tended to be TLB invalidation issues. Usually invalidating
+ the wrong PCID, or otherwise missing an invalidation.
+
+1. https://gruss.cc/files/kaiser.pdf
+2. https://meltdownattack.com/meltdown.pdf
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.9/x86-documentation-add-pti-description.patch