This is a note to let you know that I've just added the patch titled
x86/kaiser: Reenable PARAVIRT
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaiser-reenable-paravirt.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Borislav Petkov <bp(a)suse.de>
Date: Tue, 2 Jan 2018 14:19:49 +0100
Subject: x86/kaiser: Reenable PARAVIRT
From: Borislav Petkov <bp(a)suse.de>
Now that the required bits have been addressed, reenable
PARAVIRT.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -34,7 +34,7 @@ config SECURITY
config KAISER
bool "Remove the kernel mapping in user mode"
default y
- depends on X86_64 && SMP && !PARAVIRT
+ depends on X86_64 && SMP
help
This enforces a strict kernel and user space isolation, in order
to close hardware side channels on kernel address information.
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/x86-kaiser-reenable-paravirt.patch
queue-4.9/x86-kaiser-rename-and-simplify-x86_feature_kaiser-handling.patch
queue-4.9/x86-kaiser-check-boottime-cmdline-params.patch
queue-4.9/x86-kaiser-move-feature-detection-up.patch
This is a note to let you know that I've just added the patch titled
x86/kaiser: Move feature detection up
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaiser-move-feature-detection-up.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Borislav Petkov <bp(a)suse.de>
Date: Mon, 25 Dec 2017 13:57:16 +0100
Subject: x86/kaiser: Move feature detection up
From: Borislav Petkov <bp(a)suse.de>
... before the first use of kaiser_enabled as otherwise funky
things happen:
about to get started...
(XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
(XEN) Pagetable walk from ffff88022a449090:
(XEN) L4[0x110] = 0000000229e0e067 0000000000001e0e
(XEN) L3[0x008] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
entry.o#create_bounce_frame+0x135/0x14d
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.9.1_02-3.21 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff81007460>]
(XEN) RFLAGS: 0000000000000286 EM: 1 CONTEXT: pv guest (d0v0)
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 2 ++
arch/x86/kernel/setup.c | 7 +++++++
arch/x86/mm/kaiser.c | 2 --
3 files changed, 9 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -96,8 +96,10 @@ DECLARE_PER_CPU(unsigned long, x86_cr3_p
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
extern int kaiser_enabled;
+extern void __init kaiser_check_boottime_disable(void);
#else
#define kaiser_enabled 0
+static inline void __init kaiser_check_boottime_disable(void) {}
#endif /* CONFIG_KAISER */
/*
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -114,6 +114,7 @@
#include <asm/microcode.h>
#include <asm/mmu_context.h>
#include <asm/kaslr.h>
+#include <asm/kaiser.h>
/*
* max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -1019,6 +1020,12 @@ void __init setup_arch(char **cmdline_p)
*/
init_hypervisor_platform();
+ /*
+ * This needs to happen right after XENPV is set on xen and
+ * kaiser_enabled is checked below in cleanup_highmap().
+ */
+ kaiser_check_boottime_disable();
+
x86_init.resources.probe_roms();
/* after parse_early_param, so could debug it */
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -310,8 +310,6 @@ void __init kaiser_init(void)
{
int cpu;
- kaiser_check_boottime_disable();
-
if (!kaiser_enabled)
return;
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/x86-kaiser-reenable-paravirt.patch
queue-4.9/x86-kaiser-rename-and-simplify-x86_feature_kaiser-handling.patch
queue-4.9/x86-kaiser-check-boottime-cmdline-params.patch
queue-4.9/x86-kaiser-move-feature-detection-up.patch
This is a note to let you know that I've just added the patch titled
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 27 Aug 2017 16:24:27 -0700
Subject: kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
From: Hugh Dickins <hughd(a)google.com>
Mostly this commit is just unshouting X86_CR3_PCID_KERN_VAR and
X86_CR3_PCID_USER_VAR: we usually name variables in lower-case.
But why does x86_cr3_pcid_noflush need to be __aligned(PAGE_SIZE)?
Ah, it's a leftover from when kaiser_add_user_map() once complained
about mapping the same page twice. Make it __read_mostly instead.
(I'm a little uneasy about all the unrelated data which shares its
page getting user-mapped too, but that was so before, and not a big
deal: though we call it user-mapped, it's not mapped with _PAGE_USER.)
And there is a little change around the two calls to do_nmi().
Previously they set the NOFLUSH bit (if PCID supported) when
forcing to kernel context before do_nmi(); now they also have the
NOFLUSH bit set (if PCID supported) when restoring context after:
nothing done in do_nmi() should require a TLB to be flushed here.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 8 ++++----
arch/x86/include/asm/kaiser.h | 11 +++++------
arch/x86/mm/kaiser.c | 13 +++++++------
3 files changed, 16 insertions(+), 16 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1316,11 +1316,11 @@ ENTRY(nmi)
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
- /* Add back kernel PCID and "no flush" bit */
- orq X86_CR3_PCID_KERN_VAR, %rax
movq %rax, %cr3
#endif
call do_nmi
@@ -1560,11 +1560,11 @@ end_repeat_nmi:
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
- /* Add back kernel PCID and "no flush" bit */
- orq X86_CR3_PCID_KERN_VAR, %rax
movq %rax, %cr3
#endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -25,7 +25,7 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq X86_CR3_PCID_KERN_VAR, \reg
+orq x86_cr3_pcid_noflush, \reg
movq \reg, %cr3
.endm
@@ -37,11 +37,10 @@ movq \reg, %cr3
* not enabled): so that the one register can update both memory and cr3.
*/
movq %cr3, \reg
-andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
+orq PER_CPU_VAR(x86_cr3_pcid_user), \reg
js 9f
/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
-movb \regb, PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
9:
movq \reg, %cr3
.endm
@@ -94,8 +93,8 @@ movq PER_CPU_VAR(unsafe_stack_register_b
*/
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
-extern unsigned long X86_CR3_PCID_KERN_VAR;
-DECLARE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+extern unsigned long x86_cr3_pcid_noflush;
+DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -28,8 +28,8 @@ DEFINE_PER_CPU_USER_MAPPED(unsigned long
* This is also handy because systems that do not support PCIDs
* just end up or'ing a 0 into their CR3, which does no harm.
*/
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR;
-DEFINE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+unsigned long x86_cr3_pcid_noflush __read_mostly;
+DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
/*
* At runtime, the only things we map are some things for CPU
@@ -303,7 +303,8 @@ void __init kaiser_init(void)
sizeof(gate_desc) * NR_VECTORS,
__PAGE_KERNEL);
- kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
+ kaiser_add_user_map_early(&x86_cr3_pcid_noflush,
+ sizeof(x86_cr3_pcid_noflush),
__PAGE_KERNEL);
}
@@ -381,8 +382,8 @@ void kaiser_setup_pcid(void)
* These variables are used by the entry/exit
* code to change PCID and pgd and TLB flushing.
*/
- X86_CR3_PCID_KERN_VAR = kern_cr3;
- this_cpu_write(X86_CR3_PCID_USER_VAR, user_cr3);
+ x86_cr3_pcid_noflush = kern_cr3;
+ this_cpu_write(x86_cr3_pcid_user, user_cr3);
}
/*
@@ -392,7 +393,7 @@ void kaiser_setup_pcid(void)
*/
void kaiser_flush_tlb_on_return_to_user(void)
{
- this_cpu_write(X86_CR3_PCID_USER_VAR,
+ this_cpu_write(x86_cr3_pcid_user,
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
x86/kaiser: Check boottime cmdline params
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaiser-check-boottime-cmdline-params.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Borislav Petkov <bp(a)suse.de>
Date: Tue, 2 Jan 2018 14:19:48 +0100
Subject: x86/kaiser: Check boottime cmdline params
From: Borislav Petkov <bp(a)suse.de>
AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.
Keep the "nopti" for traditional reasons and add pti=<on|off|auto>
like upstream.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 6 +++
arch/x86/mm/kaiser.c | 59 +++++++++++++++++++++++++-----------
2 files changed, 47 insertions(+), 18 deletions(-)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3327,6 +3327,12 @@ bytes respectively. Such letter suffixes
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64]
+ Control KAISER user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -15,6 +15,7 @@
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/desc.h>
+#include <asm/cmdline.h>
int kaiser_enabled __read_mostly = 1;
EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
@@ -263,6 +264,43 @@ static void __init kaiser_init_all_pgds(
WARN_ON(__ret); \
} while (0)
+void __init kaiser_check_boottime_disable(void)
+{
+ bool enable = true;
+ char arg[5];
+ int ret;
+
+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (!strncmp(arg, "on", 2))
+ goto enable;
+
+ if (!strncmp(arg, "off", 3))
+ goto disable;
+
+ if (!strncmp(arg, "auto", 4))
+ goto skip;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti"))
+ goto disable;
+
+skip:
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+ goto disable;
+
+enable:
+ if (enable)
+ setup_force_cpu_cap(X86_FEATURE_KAISER);
+
+ return;
+
+disable:
+ pr_info("Kernel/User page tables isolation: disabled\n");
+ kaiser_enabled = 0;
+ setup_clear_cpu_cap(X86_FEATURE_KAISER);
+}
+
/*
* If anything in here fails, we will likely die on one of the
* first kernel->user transitions and init will die. But, we
@@ -274,12 +312,10 @@ void __init kaiser_init(void)
{
int cpu;
- if (!kaiser_enabled) {
- setup_clear_cpu_cap(X86_FEATURE_KAISER);
- return;
- }
+ kaiser_check_boottime_disable();
- setup_force_cpu_cap(X86_FEATURE_KAISER);
+ if (!kaiser_enabled)
+ return;
kaiser_init_all_pgds();
@@ -423,16 +459,3 @@ void kaiser_flush_tlb_on_return_to_user(
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
-
-static int __init x86_nokaiser_setup(char *s)
-{
- /* nopti doesn't accept parameters */
- if (s)
- return -EINVAL;
-
- kaiser_enabled = 0;
- pr_info("Kernel/User page tables isolation: disabled\n");
-
- return 0;
-}
-early_param("nopti", x86_nokaiser_setup);
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/x86-kaiser-reenable-paravirt.patch
queue-4.9/x86-kaiser-rename-and-simplify-x86_feature_kaiser-handling.patch
queue-4.9/x86-kaiser-check-boottime-cmdline-params.patch
queue-4.9/x86-kaiser-move-feature-detection-up.patch
This is a note to let you know that I've just added the patch titled
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 9 Sep 2017 21:27:32 -0700
Subject: kaiser: vmstat show NR_KAISERTABLE as nr_overhead
From: Hugh Dickins <hughd(a)google.com>
The kaiser update made an interesting choice, never to free any shadow
page tables. Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing. Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).
But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).
Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).
In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too). But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).
["nr_overhead" should of course say "nr_kaisertable", if it needs
to stay; but for the moment we are being coy, preferring that when
Joe Blow notices a new line in his /proc/vmstat, he does not get
too curious about what this "kaiser" stuff might be.]
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 16 +++++++++++-----
include/linux/mmzone.h | 3 ++-
mm/vmstat.c | 1 +
3 files changed, 14 insertions(+), 6 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -121,9 +121,11 @@ static pte_t *kaiser_pagetable_walk(unsi
if (!new_pmd_page)
return NULL;
spin_lock(&shadow_table_allocation_lock);
- if (pud_none(*pud))
+ if (pud_none(*pud)) {
set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
- else
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pmd_page), NR_KAISERTABLE);
+ } else
free_page(new_pmd_page);
spin_unlock(&shadow_table_allocation_lock);
}
@@ -139,9 +141,11 @@ static pte_t *kaiser_pagetable_walk(unsi
if (!new_pte_page)
return NULL;
spin_lock(&shadow_table_allocation_lock);
- if (pmd_none(*pmd))
+ if (pmd_none(*pmd)) {
set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
- else
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pte_page), NR_KAISERTABLE);
+ } else
free_page(new_pte_page);
spin_unlock(&shadow_table_allocation_lock);
}
@@ -205,11 +209,13 @@ static void __init kaiser_init_all_pgds(
pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
pgd_t new_pgd;
- pud_t *pud = pud_alloc_one(&init_mm, PAGE_OFFSET + i * PGDIR_SIZE);
+ pud_t *pud = pud_alloc_one(&init_mm,
+ PAGE_OFFSET + i * PGDIR_SIZE);
if (!pud) {
WARN_ON(1);
break;
}
+ inc_zone_page_state(virt_to_page(pud), NR_KAISERTABLE);
new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
/*
* Make sure not to stomp on some other pgd entry.
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -124,8 +124,9 @@ enum zone_stat_item {
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
NR_KERNEL_STACK_KB, /* measured in KiB */
- /* Second 128 byte cacheline */
+ NR_KAISERTABLE,
NR_BOUNCE,
+ /* Second 128 byte cacheline */
#if IS_ENABLED(CONFIG_ZSMALLOC)
NR_ZSPAGES, /* allocated in zsmalloc */
#endif
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -932,6 +932,7 @@ const char * const vmstat_text[] = {
"nr_slab_unreclaimable",
"nr_page_table_pages",
"nr_kernel_stack",
+ "nr_overhead",
"nr_bounce",
#if IS_ENABLED(CONFIG_ZSMALLOC)
"nr_zspages",
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Tue, 3 Oct 2017 20:49:04 -0700
Subject: kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
From: Hugh Dickins <hughd(a)google.com>
Now that we're playing the ALTERNATIVE game, use that more efficient
method: instead of user-mapping an extra page, and reading an extra
cacheline each time for x86_cr3_pcid_noflush.
Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax)
is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs;
but the one line with $63 in looks clearer, so let's stick with that.
Worried about what happens with an ALTERNATIVE between the jump and
jump label in another ALTERNATIVE? I was, but have checked the
combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64,
and it does a good job.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 7 ++++---
arch/x86/include/asm/kaiser.h | 6 +++---
arch/x86/mm/kaiser.c | 11 +----------
3 files changed, 8 insertions(+), 16 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1084,7 +1084,8 @@ ENTRY(paranoid_entry)
jz 2f
orl $2, %ebx
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
- orq x86_cr3_pcid_noflush, %rax
+ /* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
movq %rax, %cr3
2:
#endif
@@ -1344,7 +1345,7 @@ ENTRY(nmi)
/* %rax is saved above, so OK to clobber here */
ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
- orq x86_cr3_pcid_noflush, %rax
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
@@ -1588,7 +1589,7 @@ end_repeat_nmi:
/* %rax is saved above, so OK to clobber here */
ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
- orq x86_cr3_pcid_noflush, %rax
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -25,7 +25,8 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq x86_cr3_pcid_noflush, \reg
+/* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ALTERNATIVE "", "bts $63, \reg", X86_FEATURE_PCID
movq \reg, %cr3
.endm
@@ -39,7 +40,7 @@ movq \reg, %cr3
movq %cr3, \reg
orq PER_CPU_VAR(x86_cr3_pcid_user), \reg
js 9f
-/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
+/* If PCID enabled, FLUSH this time, reset to NOFLUSH for next time */
movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
9:
movq \reg, %cr3
@@ -90,7 +91,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
*/
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
-extern unsigned long x86_cr3_pcid_noflush;
DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -31,7 +31,6 @@ DEFINE_PER_CPU_USER_MAPPED(unsigned long
* This is also handy because systems that do not support PCIDs
* just end up or'ing a 0 into their CR3, which does no harm.
*/
-unsigned long x86_cr3_pcid_noflush __read_mostly;
DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
/*
@@ -356,10 +355,6 @@ void __init kaiser_init(void)
kaiser_add_user_map_early(&debug_idt_table,
sizeof(gate_desc) * NR_VECTORS,
__PAGE_KERNEL);
-
- kaiser_add_user_map_early(&x86_cr3_pcid_noflush,
- sizeof(x86_cr3_pcid_noflush),
- __PAGE_KERNEL);
}
/* Add a mapping to the shadow mapping, and synchronize the mappings */
@@ -433,18 +428,14 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp,
void kaiser_setup_pcid(void)
{
- unsigned long kern_cr3 = 0;
unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
- if (this_cpu_has(X86_FEATURE_PCID)) {
- kern_cr3 |= X86_CR3_PCID_KERN_NOFLUSH;
+ if (this_cpu_has(X86_FEATURE_PCID))
user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
- }
/*
* These variables are used by the entry/exit
* code to change PCID and pgd and TLB flushing.
*/
- x86_cr3_pcid_noflush = kern_cr3;
this_cpu_write(x86_cr3_pcid_user, user_cr3);
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: tidied up kaiser_add/remove_mapping slightly
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 19:23:08 -0700
Subject: kaiser: tidied up kaiser_add/remove_mapping slightly
From: Hugh Dickins <hughd(a)google.com>
Yes, unmap_pud_range_nofree()'s declaration ought to be in a
header file really, but I'm not sure we want to use it anyway:
so for now just declare it inside kaiser_remove_mapping().
And there doesn't seem to be such a thing as unmap_p4d_range(),
even in a 5-level paging tree.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -285,8 +285,7 @@ void __init kaiser_init(void)
__PAGE_KERNEL);
}
-extern void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end);
-// add a mapping to the shadow-mapping, and synchronize the mappings
+/* Add a mapping to the shadow mapping, and synchronize the mappings */
int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
{
return kaiser_add_user_map((const void *)addr, size, flags);
@@ -294,15 +293,13 @@ int kaiser_add_mapping(unsigned long add
void kaiser_remove_mapping(unsigned long start, unsigned long size)
{
+ extern void unmap_pud_range_nofree(pgd_t *pgd,
+ unsigned long start, unsigned long end);
unsigned long end = start + size;
unsigned long addr;
for (addr = start; addr < end; addr += PGDIR_SIZE) {
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
- /*
- * unmap_p4d_range() handles > P4D_SIZE unmaps,
- * so no need to trim 'end'.
- */
unmap_pud_range_nofree(pgd, addr, end);
}
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: tidied up asm/kaiser.h somewhat
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-tidied-up-asm-kaiser.h-somewhat.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 19:18:07 -0700
Subject: kaiser: tidied up asm/kaiser.h somewhat
From: Hugh Dickins <hughd(a)google.com>
Mainly deleting a surfeit of blank lines, and reflowing header comment.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -1,15 +1,17 @@
#ifndef _ASM_X86_KAISER_H
#define _ASM_X86_KAISER_H
-
-/* This file includes the definitions for the KAISER feature.
- * KAISER is a counter measure against x86_64 side channel attacks on the kernel virtual memory.
- * It has a shodow-pgd for every process. the shadow-pgd has a minimalistic kernel-set mapped,
- * but includes the whole user memory. Within a kernel context switch, or when an interrupt is handled,
- * the pgd is switched to the normal one. When the system switches to user mode, the shadow pgd is enabled.
- * By this, the virtual memory chaches are freed, and the user may not attack the whole kernel memory.
+/*
+ * This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on
+ * the kernel virtual memory. It has a shadow pgd for every process: the
+ * shadow pgd has a minimalistic kernel-set mapped, but includes the whole
+ * user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode,
+ * the shadow pgd is enabled. By this, the virtual memory caches are freed,
+ * and the user may not attack the whole kernel memory.
*
- * A minimalistic kernel mapping holds the parts needed to be mapped in user mode, as the entry/exit functions
- * of the user space, or the stacks.
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user
+ * mode, such as the entry/exit functions of the user space, or the stacks.
*/
#ifdef __ASSEMBLY__
#ifdef CONFIG_KAISER
@@ -48,13 +50,10 @@ _SWITCH_TO_KERNEL_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
.endm
-
.macro SWITCH_USER_CR3_NO_STACK
-
movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
_SWITCH_TO_USER_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-
.endm
#else /* CONFIG_KAISER */
@@ -72,7 +71,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
#else /* __ASSEMBLY__ */
-
#ifdef CONFIG_KAISER
/*
* Upon kernel/user mode switch, it may happen that the address
@@ -80,7 +78,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
* stored. To change the address space, another register is
* needed. A register therefore has to be stored/restored.
*/
-
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
/**
@@ -95,7 +92,6 @@ DECLARE_PER_CPU_USER_MAPPED(unsigned lon
*/
extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
-
/**
* kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
* @addr: the start address of the range
@@ -104,12 +100,12 @@ extern int kaiser_add_mapping(unsigned l
extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
/**
- * kaiser_initialize_mapping - Initalize the shadow mapping
+ * kaiser_init - Initialize the shadow mapping
*
* Most parts of the shadow mapping can be mapped upon boot
* time. Only per-process things like the thread stacks
* or a new LDT have to be mapped at runtime. These boot-
- * time mappings are permanent and nevertunmapped.
+ * time mappings are permanent and never unmapped.
*/
extern void kaiser_init(void);
@@ -117,6 +113,4 @@ extern void kaiser_init(void);
#endif /* __ASSEMBLY */
-
-
#endif /* _ASM_X86_KAISER_H */
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-stack-map-page_size-at-thread_size-page_size.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 18:57:03 -0700
Subject: kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
From: Hugh Dickins <hughd(a)google.com>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/kaiser.h | 40 +++++++++++++++++++++++++++++++++-------
init/main.c | 6 +-----
kernel/fork.c | 7 ++-----
3 files changed, 36 insertions(+), 17 deletions(-)
--- a/include/linux/kaiser.h
+++ b/include/linux/kaiser.h
@@ -1,26 +1,52 @@
-#ifndef _INCLUDE_KAISER_H
-#define _INCLUDE_KAISER_H
+#ifndef _LINUX_KAISER_H
+#define _LINUX_KAISER_H
#ifdef CONFIG_KAISER
#include <asm/kaiser.h>
+
+static inline int kaiser_map_thread_stack(void *stack)
+{
+ /*
+ * Map that page of kernel stack on which we enter from user context.
+ */
+ return kaiser_add_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE, __PAGE_KERNEL);
+}
+
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+ /*
+ * Note: may be called even when kaiser_map_thread_stack() failed.
+ */
+ kaiser_remove_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE);
+}
#else
/*
* These stubs are used whenever CONFIG_KAISER is off, which
- * includes architectures that support KAISER, but have it
- * disabled.
+ * includes architectures that support KAISER, but have it disabled.
*/
static inline void kaiser_init(void)
{
}
-static inline void kaiser_remove_mapping(unsigned long start, unsigned long size)
+static inline int kaiser_add_mapping(unsigned long addr,
+ unsigned long size, unsigned long flags)
+{
+ return 0;
+}
+static inline void kaiser_remove_mapping(unsigned long start,
+ unsigned long size)
{
}
-static inline int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+static inline int kaiser_map_thread_stack(void *stack)
{
return 0;
}
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+}
#endif /* !CONFIG_KAISER */
-#endif /* _INCLUDE_KAISER_H */
+#endif /* _LINUX_KAISER_H */
--- a/init/main.c
+++ b/init/main.c
@@ -80,15 +80,13 @@
#include <linux/integrity.h>
#include <linux/proc_ns.h>
#include <linux/io.h>
+#include <linux/kaiser.h>
#include <asm/io.h>
#include <asm/bugs.h>
#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/cacheflush.h>
-#ifdef CONFIG_KAISER
-#include <asm/kaiser.h>
-#endif
static int kernel_init(void *);
@@ -476,9 +474,7 @@ static void __init mm_init(void)
pgtable_init();
vmalloc_init();
ioremap_huge_init();
-#ifdef CONFIG_KAISER
kaiser_init();
-#endif
}
asmlinkage __visible void __init start_kernel(void)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -212,12 +212,9 @@ static unsigned long *alloc_thread_stack
#endif
}
-extern void kaiser_remove_mapping(unsigned long start_addr, unsigned long size);
static inline void free_thread_stack(struct task_struct *tsk)
{
-#ifdef CONFIG_KAISER
- kaiser_remove_mapping((unsigned long)tsk->stack, THREAD_SIZE);
-#endif
+ kaiser_unmap_thread_stack(tsk->stack);
#ifdef CONFIG_VMAP_STACK
if (task_stack_vm_area(tsk)) {
unsigned long flags;
@@ -501,7 +498,7 @@ static struct task_struct *dup_task_stru
*/
tsk->stack = stack;
- err= kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+ err= kaiser_map_thread_stack(tsk->stack);
if (err)
goto free_stack;
#ifdef CONFIG_VMAP_STACK
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: PCID 0 for kernel and 128 for user
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-pcid-0-for-kernel-and-128-for-user.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Fri, 8 Sep 2017 19:26:30 -0700
Subject: kaiser: PCID 0 for kernel and 128 for user
From: Hugh Dickins <hughd(a)google.com>
Why was 4 chosen for kernel PCID and 6 for user PCID?
No good reason in a backport where PCIDs are only used for Kaiser.
If we continue with those, then we shall need to add Andy Lutomirski's
4.13 commit 6c690ee1039b ("x86/mm: Split read_cr3() into read_cr3_pa()
and __read_cr3()"), which deals with the problem of read_cr3() callers
finding stray bits in the cr3 that they expected to be page-aligned;
and for hibernation, his 4.14 commit f34902c5c6c0 ("x86/hibernate/64:
Mask off CR3's PCID bits in the saved CR3").
But if 0 is used for kernel PCID, then there's no need to add in those
commits - whenever the kernel looks, it sees 0 in the lower bits; and
0 for kernel seems an obvious choice.
And I naughtily propose 128 for user PCID. Because there's a place
in _SWITCH_TO_USER_CR3 where it takes note of the need for TLB FLUSH,
but needs to reset that to NOFLUSH for the next occasion. Currently
it does so with a "movb $(0x80)" into the high byte of the per-cpu
quadword, but that will cause a machine without PCID support to crash.
Now, if %al just happened to have 0x80 in it at that point, on a
machine with PCID support, but 0 on a machine without PCID support...
(That will go badly wrong once the pgd can be at a physical address
above 2^56, but even with 5-level paging, physical goes up to 2^52.)
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 19 ++++++++++++-------
arch/x86/include/asm/pgtable_types.h | 7 ++++---
arch/x86/mm/tlb.c | 3 +++
3 files changed, 19 insertions(+), 10 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -29,14 +29,19 @@ orq X86_CR3_PCID_KERN_VAR, \reg
movq \reg, %cr3
.endm
-.macro _SWITCH_TO_USER_CR3 reg
+.macro _SWITCH_TO_USER_CR3 reg regb
+/*
+ * regb must be the low byte portion of reg: because we have arranged
+ * for the low byte of the user PCID to serve as the high byte of NOFLUSH
+ * (0x80 for each when PCID is enabled, or 0x00 when PCID and NOFLUSH are
+ * not enabled): so that the one register can update both memory and cr3.
+ */
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
orq PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
js 9f
-// FLUSH this time, reset to NOFLUSH for next time
-// But if nopcid? Consider using 0x80 for user pcid?
-movb $(0x80), PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
+movb \regb, PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
9:
movq \reg, %cr3
.endm
@@ -49,7 +54,7 @@ popq %rax
.macro SWITCH_USER_CR3
pushq %rax
-_SWITCH_TO_USER_CR3 %rax
+_SWITCH_TO_USER_CR3 %rax %al
popq %rax
.endm
@@ -61,7 +66,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
.macro SWITCH_USER_CR3_NO_STACK
movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax
+_SWITCH_TO_USER_CR3 %rax %al
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
.endm
@@ -69,7 +74,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
.macro SWITCH_KERNEL_CR3 reg
.endm
-.macro SWITCH_USER_CR3 reg
+.macro SWITCH_USER_CR3 reg regb
.endm
.macro SWITCH_USER_CR3_NO_STACK
.endm
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -146,16 +146,17 @@
/* Mask for all the PCID-related bits in CR3: */
#define X86_CR3_PCID_MASK (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
+
#if defined(CONFIG_KAISER) && defined(CONFIG_X86_64)
-#define X86_CR3_PCID_ASID_KERN (_AC(0x4,UL))
-#define X86_CR3_PCID_ASID_USER (_AC(0x6,UL))
+/* Let X86_CR3_PCID_ASID_USER be usable for the X86_CR3_PCID_NOFLUSH bit */
+#define X86_CR3_PCID_ASID_USER (_AC(0x80,UL))
#define X86_CR3_PCID_KERN_FLUSH (X86_CR3_PCID_ASID_KERN)
#define X86_CR3_PCID_USER_FLUSH (X86_CR3_PCID_ASID_USER)
#define X86_CR3_PCID_KERN_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
#define X86_CR3_PCID_USER_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
#else
-#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
#define X86_CR3_PCID_ASID_USER (_AC(0x0,UL))
/*
* PCIDs are unsupported on 32-bit and none of these bits can be
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -50,6 +50,9 @@ static void load_new_mm_cr3(pgd_t *pgdir
* invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
* do it here, but can only be used if X86_FEATURE_INVPCID is
* available - and many machines support pcid without invpcid.
+ *
+ * The line below is a no-op: X86_CR3_PCID_KERN_FLUSH is now 0;
+ * but keep that line in there in case something changes.
*/
new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
kaiser_flush_tlb_on_return_to_user();
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: paranoid_entry pass cr3 need to paranoid_exit
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Tue, 26 Sep 2017 18:43:07 -0700
Subject: kaiser: paranoid_entry pass cr3 need to paranoid_exit
From: Hugh Dickins <hughd(a)google.com>
Neel Natu points out that paranoid_entry() was wrong to assume that
an entry that did not need swapgs would not need SWITCH_KERNEL_CR3:
paranoid_entry (used for debug breakpoint, int3, double fault or MCE;
though I think it's only the MCE case that is cause for concern here)
can break in at an awkward time, between cr3 switch and swapgs, but
its handling always needs kernel gs and kernel cr3.
Easy to fix in itself, but paranoid_entry() also needs to convey to
paranoid_exit() (and my reading of macro idtentry says paranoid_entry
and paranoid_exit are always paired) how to restore the prior state.
The swapgs state is already conveyed by %ebx (0 or 1), so extend that
also to convey when SWITCH_USER_CR3 will be needed (2 or 3).
(Yes, I'd much prefer that 0 meant no swapgs, whereas it's the other
way round: and a convention shared with error_entry() and error_exit(),
which I don't want to touch. Perhaps I should have inverted the bit
for switch cr3 too, but did not.)
paranoid_exit() would be straightforward, except for TRACE_IRQS: it
did TRACE_IRQS_IRETQ when doing swapgs, but TRACE_IRQS_IRETQ_DEBUG
when not: which is it supposed to use when SWITCH_USER_CR3 is split
apart from that? As best as I can determine, commit 5963e317b1e9
("ftrace/x86: Do not change stacks in DEBUG when calling lockdep")
missed the swapgs case, and should have used TRACE_IRQS_IRETQ_DEBUG
there too (the discrepancy has nothing to do with the liberal use
of _NO_STACK and _UNSAFE_STACK hereabouts: TRACE_IRQS_OFF_DEBUG has
just been used in all cases); discrepancy lovingly preserved across
several paranoid_exit() cleanups, but I'm now removing it.
Neel further indicates that to use SWITCH_USER_CR3_NO_STACK there in
paranoid_exit() is now not only unnecessary but unsafe: might corrupt
syscall entry's unsafe_stack_register_backup of %rax. Just use
SWITCH_USER_CR3: and delete SWITCH_USER_CR3_NO_STACK altogether,
before we make the mistake of using it again.
hughd adds: this commit fixes an issue in the Kaiser-without-PCIDs
part of the series, and ought to be moved earlier, if you decided
to make a release of Kaiser-without-PCIDs.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 46 ++++++++++++++++++++++++++++++---------
arch/x86/entry/entry_64_compat.S | 2 -
arch/x86/include/asm/kaiser.h | 8 ------
3 files changed, 37 insertions(+), 19 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1053,7 +1053,11 @@ idtentry machine_check has_error_cod
/*
* Save all registers in pt_regs, and switch gs if needed.
* Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ *
+ * Return: ebx=0: needs swapgs but not SWITCH_USER_CR3 in paranoid_exit
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3 in paranoid_exit
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3 in paranoid_exit
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs in paranoid_exit
*/
ENTRY(paranoid_entry)
cld
@@ -1065,9 +1069,26 @@ ENTRY(paranoid_entry)
testl %edx, %edx
js 1f /* negative -> in kernel */
SWAPGS
- SWITCH_KERNEL_CR3
xorl %ebx, %ebx
-1: ret
+1:
+#ifdef CONFIG_KAISER
+ /*
+ * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
+ * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
+ * Do a conditional SWITCH_KERNEL_CR3: this could safely be done
+ * unconditionally, but we need to find out whether the reverse
+ * should be done on return (conveyed to paranoid_exit in %ebx).
+ */
+ movq %cr3, %rax
+ testl $KAISER_SHADOW_PGD_OFFSET, %eax
+ jz 2f
+ orl $2, %ebx
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ orq x86_cr3_pcid_noflush, %rax
+ movq %rax, %cr3
+2:
+#endif
+ ret
END(paranoid_entry)
/*
@@ -1080,20 +1101,25 @@ END(paranoid_entry)
* be complicated. Fortunately, we there's no good reason
* to try to handle preemption here.
*
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry: ebx=0: needs swapgs but not SWITCH_USER_CR3
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs
*/
ENTRY(paranoid_exit)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF_DEBUG
- testl %ebx, %ebx /* swapgs needed? */
+ TRACE_IRQS_IRETQ_DEBUG
+#ifdef CONFIG_KAISER
+ testl $2, %ebx /* SWITCH_USER_CR3 needed? */
+ jz paranoid_exit_no_switch
+ SWITCH_USER_CR3
+paranoid_exit_no_switch:
+#endif
+ testl $1, %ebx /* swapgs needed? */
jnz paranoid_exit_no_swapgs
- TRACE_IRQS_IRETQ
- SWITCH_USER_CR3_NO_STACK
SWAPGS_UNSAFE_STACK
- jmp paranoid_exit_restore
paranoid_exit_no_swapgs:
- TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
RESTORE_EXTRA_REGS
RESTORE_C_REGS
REMOVE_PT_GPREGS_FROM_STACK 8
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -343,7 +343,7 @@ ENTRY(entry_INT80_compat)
/* Go back to user mode. */
TRACE_IRQS_ON
- SWITCH_USER_CR3_NO_STACK
+ SWITCH_USER_CR3
SWAPGS
jmp restore_regs_and_iret
END(entry_INT80_compat)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -63,20 +63,12 @@ _SWITCH_TO_KERNEL_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
.endm
-.macro SWITCH_USER_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax %al
-movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-.endm
-
#else /* CONFIG_KAISER */
.macro SWITCH_KERNEL_CR3 reg
.endm
.macro SWITCH_USER_CR3 reg regb
.endm
-.macro SWITCH_USER_CR3_NO_STACK
-.endm
.macro SWITCH_KERNEL_CR3_NO_STACK
.endm
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 9 Sep 2017 17:31:18 -0700
Subject: kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
From: Hugh Dickins <hughd(a)google.com>
There's a 0x1000 in various places, which looks better with a name.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 4 ++--
arch/x86/include/asm/kaiser.h | 7 +++++--
2 files changed, 7 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1318,7 +1318,7 @@ ENTRY(nmi)
movq %cr3, %rax
pushq %rax
#ifdef CONFIG_KAISER_REAL_SWITCH
- andq $(~0x1000), %rax
+ andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
#endif
movq %rax, %cr3
#endif
@@ -1561,7 +1561,7 @@ end_repeat_nmi:
movq %cr3, %rax
pushq %rax
#ifdef CONFIG_KAISER_REAL_SWITCH
- andq $(~0x1000), %rax
+ andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
#endif
movq %rax, %cr3
#endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -13,13 +13,16 @@
* A minimalistic kernel mapping holds the parts needed to be mapped in user
* mode, such as the entry/exit functions of the user space, or the stacks.
*/
+
+#define KAISER_SHADOW_PGD_OFFSET 0x1000
+
#ifdef __ASSEMBLY__
#ifdef CONFIG_KAISER
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
#ifdef CONFIG_KAISER_REAL_SWITCH
-andq $(~0x1000), \reg
+andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
#endif
movq \reg, %cr3
.endm
@@ -27,7 +30,7 @@ movq \reg, %cr3
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
#ifdef CONFIG_KAISER_REAL_SWITCH
-orq $(0x1000), \reg
+orq $(KAISER_SHADOW_PGD_OFFSET), \reg
#endif
movq \reg, %cr3
.endm
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Thu, 17 Aug 2017 15:00:37 -0700
Subject: kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
From: Hugh Dickins <hughd(a)google.com>
We have many machines (Westmere, Sandybridge, Ivybridge) supporting
PCID but not INVPCID: on these load_new_mm_cr3() simply crashed.
Flushing user context inside load_new_mm_cr3() without the use of
invpcid is difficult: momentarily switch from kernel to user context
and back to do so? I'm not sure whether that can be safely done at
all, and would risk polluting user context with kernel internals,
and kernel context with stale user externals.
Instead, follow the hint in the comment that was there: change
X86_CR3_PCID_USER_VAR to be a per-cpu variable, then load_new_mm_cr3()
can leave a note in it, for SWITCH_USER_CR3 on return to userspace to
flush user context TLB, instead of default X86_CR3_PCID_USER_NOFLUSH.
Which works well enough that there's no need to do it this way only
when invpcid is unsupported: it's a good alternative to invpcid here.
But there's a couple of inlines in asm/tlbflush.h that need to do the
same trick, so it's best to localize all this per-cpu business in
mm/kaiser.c: moving that part of the initialization from setup_pcid()
to kaiser_setup_pcid(); with kaiser_flush_tlb_on_return_to_user() the
function for noting an X86_CR3_PCID_USER_FLUSH. And let's keep a
KAISER_SHADOW_PGD_OFFSET in there, to avoid the extra OR on exit.
I did try to make the feature tests in asm/tlbflush.h more consistent
with each other: there seem to be far too many ways of performing such
tests, and I don't have a good grasp of their differences. At first
I converted them all to be static_cpu_has(): but that proved to be a
mistake, as the comment in __native_flush_tlb_single() hints; so then
I reversed and made them all this_cpu_has(). Probably all gratuitous
change, but that's the way it's working at present.
I am slightly bothered by the way non-per-cpu X86_CR3_PCID_KERN_VAR
gets re-initialized by each cpu (before and after these changes):
no problem when (as usual) all cpus on a machine have the same
features, but in principle incorrect. However, my experiment
to per-cpu-ify that one did not end well...
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 18 +++++++-----
arch/x86/include/asm/tlbflush.h | 56 +++++++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 22 ---------------
arch/x86/mm/kaiser.c | 50 +++++++++++++++++++++++++++++++----
arch/x86/mm/tlb.c | 46 ++++++++++++--------------------
5 files changed, 113 insertions(+), 79 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -32,13 +32,12 @@ movq \reg, %cr3
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-/*
- * This can obviously be one instruction by putting the
- * KAISER_SHADOW_PGD_OFFSET bit in the X86_CR3_PCID_USER_VAR.
- * But, just leave it now for simplicity.
- */
-orq X86_CR3_PCID_USER_VAR, \reg
-orq $(KAISER_SHADOW_PGD_OFFSET), \reg
+orq PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
+js 9f
+// FLUSH this time, reset to NOFLUSH for next time
+// But if nopcid? Consider using 0x80 for user pcid?
+movb $(0x80), PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+9:
movq \reg, %cr3
.endm
@@ -90,6 +89,11 @@ movq PER_CPU_VAR(unsafe_stack_register_b
*/
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+extern unsigned long X86_CR3_PCID_KERN_VAR;
+DECLARE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+
/**
* kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
* @addr: the start address of the range
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -13,6 +13,7 @@ static inline void __invpcid(unsigned lo
unsigned long type)
{
struct { u64 d[2]; } desc = { { pcid, addr } };
+
/*
* The memory clobber is because the whole point is to invalidate
* stale TLB entries and, especially if we're flushing global
@@ -131,27 +132,42 @@ static inline void cr4_set_bits_and_upda
cr4_set_bits(mask);
}
+/*
+ * Declare a couple of kaiser interfaces here for convenience,
+ * to avoid the need for asm/kaiser.h in unexpected places.
+ */
+#ifdef CONFIG_KAISER
+extern void kaiser_setup_pcid(void);
+extern void kaiser_flush_tlb_on_return_to_user(void);
+#else
+static inline void kaiser_setup_pcid(void)
+{
+}
+static inline void kaiser_flush_tlb_on_return_to_user(void)
+{
+}
+#endif
+
static inline void __native_flush_tlb(void)
{
- if (!cpu_feature_enabled(X86_FEATURE_INVPCID)) {
+ if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
- * If current->mm == NULL then we borrow a mm which may change during a
- * task switch and therefore we must not be preempted while we write CR3
- * back:
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
- preempt_disable();
- native_write_cr3(native_read_cr3());
- preempt_enable();
+ invpcid_flush_all_nonglobals();
return;
}
+
/*
- * We are no longer using globals with KAISER, so a
- * "nonglobals" flush would work too. But, this is more
- * conservative.
- *
- * Note, this works with CR4.PCIDE=0 or 1.
+ * If current->mm == NULL then we borrow a mm which may change during a
+ * task switch and therefore we must not be preempted while we write CR3
+ * back:
*/
- invpcid_flush_all();
+ preempt_disable();
+ if (this_cpu_has(X86_FEATURE_PCID))
+ kaiser_flush_tlb_on_return_to_user();
+ native_write_cr3(native_read_cr3());
+ preempt_enable();
}
static inline void __native_flush_tlb_global_irq_disabled(void)
@@ -167,9 +183,13 @@ static inline void __native_flush_tlb_gl
static inline void __native_flush_tlb_global(void)
{
+#ifdef CONFIG_KAISER
+ /* Globals are not used at all */
+ __native_flush_tlb();
+#else
unsigned long flags;
- if (static_cpu_has(X86_FEATURE_INVPCID)) {
+ if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
@@ -186,10 +206,9 @@ static inline void __native_flush_tlb_gl
* be called from deep inside debugging code.)
*/
raw_local_irq_save(flags);
-
__native_flush_tlb_global_irq_disabled();
-
raw_local_irq_restore(flags);
+#endif
}
static inline void __native_flush_tlb_single(unsigned long addr)
@@ -200,9 +219,12 @@ static inline void __native_flush_tlb_si
*
* The ASIDs used below are hard-coded. But, we must not
* call invpcid(type=1/2) before CR4.PCIDE=1. Just call
- * invpcid in the case we are called early.
+ * invlpg in the case we are called early.
*/
+
if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+ if (this_cpu_has(X86_FEATURE_PCID))
+ kaiser_flush_tlb_on_return_to_user();
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
return;
}
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -324,33 +324,12 @@ static __always_inline void setup_smap(s
}
}
-/*
- * These can have bit 63 set, so we can not just use a plain "or"
- * instruction to get their value or'd into CR3. It would take
- * another register. So, we use a memory reference to these
- * instead.
- *
- * This is also handy because systems that do not support
- * PCIDs just end up or'ing a 0 into their CR3, which does
- * no harm.
- */
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR = 0;
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_USER_VAR = 0;
-
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
if (cpu_has(c, X86_FEATURE_PGE)) {
cr4_set_bits(X86_CR4_PCIDE);
/*
- * These variables are used by the entry/exit
- * code to change PCIDs.
- */
-#ifdef CONFIG_KAISER
- X86_CR3_PCID_KERN_VAR = X86_CR3_PCID_KERN_NOFLUSH;
- X86_CR3_PCID_USER_VAR = X86_CR3_PCID_USER_NOFLUSH;
-#endif
- /*
* INVPCID has two "groups" of types:
* 1/2: Invalidate an individual address
* 3/4: Invalidate all contexts
@@ -375,6 +354,7 @@ static void setup_pcid(struct cpuinfo_x8
clear_cpu_cap(c, X86_FEATURE_PCID);
}
}
+ kaiser_setup_pcid();
}
/*
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -11,12 +11,26 @@
#include <linux/uaccess.h>
#include <asm/kaiser.h>
+#include <asm/tlbflush.h> /* to verify its kaiser declarations */
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/desc.h>
+
#ifdef CONFIG_KAISER
+__visible
+DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3. It would take
+ * another register. So, we use a memory reference to these instead.
+ *
+ * This is also handy because systems that do not support PCIDs
+ * just end up or'ing a 0 into their CR3, which does no harm.
+ */
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR;
+DEFINE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
-__visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
/*
* At runtime, the only things we map are some things for CPU
* hotplug, and stacks for new processes. No two CPUs will ever
@@ -238,9 +252,6 @@ static void __init kaiser_init_all_pgds(
WARN_ON(__ret); \
} while (0)
-extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
-extern unsigned long X86_CR3_PCID_KERN_VAR;
-extern unsigned long X86_CR3_PCID_USER_VAR;
/*
* If anything in here fails, we will likely die on one of the
* first kernel->user transitions and init will die. But, we
@@ -294,8 +305,6 @@ void __init kaiser_init(void)
kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
__PAGE_KERNEL);
- kaiser_add_user_map_early(&X86_CR3_PCID_USER_VAR, PAGE_SIZE,
- __PAGE_KERNEL);
}
/* Add a mapping to the shadow mapping, and synchronize the mappings */
@@ -358,4 +367,33 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp,
}
return pgd;
}
+
+void kaiser_setup_pcid(void)
+{
+ unsigned long kern_cr3 = 0;
+ unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
+
+ if (this_cpu_has(X86_FEATURE_PCID)) {
+ kern_cr3 |= X86_CR3_PCID_KERN_NOFLUSH;
+ user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
+ }
+ /*
+ * These variables are used by the entry/exit
+ * code to change PCID and pgd and TLB flushing.
+ */
+ X86_CR3_PCID_KERN_VAR = kern_cr3;
+ this_cpu_write(X86_CR3_PCID_USER_VAR, user_cr3);
+}
+
+/*
+ * Make a note that this cpu will need to flush USER tlb on return to user.
+ * Caller checks whether this_cpu_has(X86_FEATURE_PCID) before calling:
+ * if cpu does not, then the NOFLUSH bit will never have been set.
+ */
+void kaiser_flush_tlb_on_return_to_user(void)
+{
+ this_cpu_write(X86_CR3_PCID_USER_VAR,
+ X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
+}
+EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
#include <linux/interrupt.h>
#include <linux/export.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
-#include <linux/debugfs.h>
+#include <asm/kaiser.h>
/*
* TLB flushing, formerly SMP-only
@@ -38,34 +39,23 @@ static void load_new_mm_cr3(pgd_t *pgdir
{
unsigned long new_mm_cr3 = __pa(pgdir);
- /*
- * KAISER, plus PCIDs needs some extra work here. But,
- * if either of features is not present, we need no
- * PCIDs here and just do a normal, full TLB flush with
- * the write_cr3()
- */
- if (!IS_ENABLED(CONFIG_KAISER) ||
- !cpu_feature_enabled(X86_FEATURE_PCID))
- goto out_set_cr3;
- /*
- * We reuse the same PCID for different tasks, so we must
- * flush all the entires for the PCID out when we change
- * tasks.
- */
- new_mm_cr3 = X86_CR3_PCID_KERN_FLUSH | __pa(pgdir);
-
- /*
- * The flush from load_cr3() may leave old TLB entries
- * for userspace in place. We must flush that context
- * separately. We can theoretically delay doing this
- * until we actually load up the userspace CR3, but
- * that's a bit tricky. We have to have the "need to
- * flush userspace PCID" bit per-cpu and check it in the
- * exit-to-userspace paths.
- */
- invpcid_flush_single_context(X86_CR3_PCID_ASID_USER);
+#ifdef CONFIG_KAISER
+ if (this_cpu_has(X86_FEATURE_PCID)) {
+ /*
+ * We reuse the same PCID for different tasks, so we must
+ * flush all the entries for the PCID out when we change tasks.
+ * Flush KERN below, flush USER when returning to userspace in
+ * kaiser's SWITCH_USER_CR3 (_SWITCH_TO_USER_CR3) macro.
+ *
+ * invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
+ * do it here, but can only be used if X86_FEATURE_INVPCID is
+ * available - and many machines support pcid without invpcid.
+ */
+ new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
+ kaiser_flush_tlb_on_return_to_user();
+ }
+#endif /* CONFIG_KAISER */
-out_set_cr3:
/*
* Caution: many callers of this function expect
* that load_cr3() is serializing and orders TLB
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 4 Nov 2017 18:43:06 -0700
Subject: kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
From: Hugh Dickins <hughd(a)google.com>
Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID
check, instead of each caller doing it inline first: nobody needs
to optimize for the noPCID case, it's clearer this way, and better
suits later changes. Replace those no-op X86_CR3_PCID_KERN_FLUSH lines
by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 4 ++--
arch/x86/mm/kaiser.c | 6 +++---
arch/x86/mm/tlb.c | 8 ++++----
3 files changed, 9 insertions(+), 9 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -158,7 +158,7 @@ static inline void __native_flush_tlb(vo
* back:
*/
preempt_disable();
- if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled)
kaiser_flush_tlb_on_return_to_user();
native_write_cr3(native_read_cr3());
preempt_enable();
@@ -217,7 +217,7 @@ static inline void __native_flush_tlb_si
*/
if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
- if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled)
kaiser_flush_tlb_on_return_to_user();
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
return;
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -435,12 +435,12 @@ void kaiser_setup_pcid(void)
/*
* Make a note that this cpu will need to flush USER tlb on return to user.
- * Caller checks whether this_cpu_has(X86_FEATURE_PCID) before calling:
- * if cpu does not, then the NOFLUSH bit will never have been set.
+ * If cpu does not have PCID, then the NOFLUSH bit will never have been set.
*/
void kaiser_flush_tlb_on_return_to_user(void)
{
- this_cpu_write(x86_cr3_pcid_user,
+ if (this_cpu_has(X86_FEATURE_PCID))
+ this_cpu_write(x86_cr3_pcid_user,
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -39,7 +39,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
{
unsigned long new_mm_cr3 = __pa(pgdir);
- if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID)) {
+ if (kaiser_enabled) {
/*
* We reuse the same PCID for different tasks, so we must
* flush all the entries for the PCID out when we change tasks.
@@ -50,10 +50,10 @@ static void load_new_mm_cr3(pgd_t *pgdir
* do it here, but can only be used if X86_FEATURE_INVPCID is
* available - and many machines support pcid without invpcid.
*
- * The line below is a no-op: X86_CR3_PCID_KERN_FLUSH is now 0;
- * but keep that line in there in case something changes.
+ * If X86_CR3_PCID_KERN_FLUSH actually added something, then it
+ * would be needed in the write_cr3() below - if PCIDs enabled.
*/
- new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
+ BUILD_BUG_ON(X86_CR3_PCID_KERN_FLUSH);
kaiser_flush_tlb_on_return_to_user();
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: kaiser_remove_mapping() move along the pgd
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Mon, 2 Oct 2017 10:57:24 -0700
Subject: kaiser: kaiser_remove_mapping() move along the pgd
From: Hugh Dickins <hughd(a)google.com>
When removing the bogus comment from kaiser_remove_mapping(),
I really ought to have checked the extent of its bogosity: as
Neel points out, there is nothing to stop unmap_pud_range_nofree()
from continuing beyond the end of a pud (and starting in the wrong
position on the next).
Fix kaiser_remove_mapping() to constrain the extent and advance pgd
pointer correctly: use pgd_addr_end() macro as used throughout base
mm (but don't assume page-rounded start and size in this case).
But this bug was very unlikely to trigger in this backport: since
any buddy allocation is contained within a single pud extent, and
we are not using vmapped stacks (and are only mapping one page of
stack anyway): the only way to hit this bug here would be when
freeing a large modified ldt.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -319,11 +319,13 @@ void kaiser_remove_mapping(unsigned long
extern void unmap_pud_range_nofree(pgd_t *pgd,
unsigned long start, unsigned long end);
unsigned long end = start + size;
- unsigned long addr;
+ unsigned long addr, next;
+ pgd_t *pgd;
- for (addr = start; addr < end; addr += PGDIR_SIZE) {
- pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
- unmap_pud_range_nofree(pgd, addr, end);
+ pgd = native_get_shadow_pgd(pgd_offset_k(start));
+ for (addr = start; addr < end; pgd++, addr = next) {
+ next = pgd_addr_end(addr, end);
+ unmap_pud_range_nofree(pgd, addr, next);
}
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: fix unlikely error in alloc_ldt_struct()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Mon, 4 Dec 2017 20:13:35 -0800
Subject: kaiser: fix unlikely error in alloc_ldt_struct()
From: Hugh Dickins <hughd(a)google.com>
An error from kaiser_add_mapping() here is not at all likely, but
Eric Biggers rightly points out that __free_ldt_struct() relies on
new_ldt->size being initialized: move that up.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/ldt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -79,11 +79,11 @@ static struct ldt_struct *alloc_ldt_stru
ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
__PAGE_KERNEL);
+ new_ldt->size = size;
if (ret) {
__free_ldt_struct(new_ldt);
return NULL;
}
- new_ldt->size = size;
return new_ldt;
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Thu, 21 Sep 2017 20:39:56 -0700
Subject: kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
From: Hugh Dickins <hughd(a)google.com>
pjt has observed that nmi's second (nmi_from_kernel) call to do_nmi()
adjusted the %rdi regs arg, rightly when CONFIG_KAISER, but wrongly
when not CONFIG_KAISER.
Although the minimal change is to add an #ifdef CONFIG_KAISER around
the addq line, that looks cluttered, and I prefer how the first call
to do_nmi() handled it: prepare args in %rdi and %rsi before getting
into the CONFIG_KAISER block, since it does not touch them at all.
And while we're here, place the "#ifdef CONFIG_KAISER" that follows
each, to enclose the "Unconditionally restore CR3" comment: matching
how the "Unconditionally use kernel CR3" comment above is enclosed.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1323,12 +1323,13 @@ ENTRY(nmi)
movq %rax, %cr3
#endif
call do_nmi
+
+#ifdef CONFIG_KAISER
/*
* Unconditionally restore CR3. I know we return to
* kernel code that needs user CR3, but do we ever return
* to "user mode" where we need the kernel CR3?
*/
-#ifdef CONFIG_KAISER
popq %rax
mov %rax, %cr3
#endif
@@ -1552,6 +1553,8 @@ end_repeat_nmi:
SWAPGS
xorl %ebx, %ebx
1:
+ movq %rsp, %rdi
+ movq $-1, %rsi
#ifdef CONFIG_KAISER
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
@@ -1564,16 +1567,14 @@ end_repeat_nmi:
#endif
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
- movq %rsp, %rdi
- addq $8, %rdi /* point %rdi at ptregs, fixed up for CR3 */
- movq $-1, %rsi
call do_nmi
+
+#ifdef CONFIG_KAISER
/*
* Unconditionally restore CR3. We might be returning to
* kernel code that needs user CR3, like just just before
* a sysret.
*/
-#ifdef CONFIG_KAISER
popq %rax
mov %rax, %cr3
#endif
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: KAISER depends on SMP
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-kaiser-depends-on-smp.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Wed, 13 Sep 2017 14:03:10 -0700
Subject: kaiser: KAISER depends on SMP
From: Hugh Dickins <hughd(a)google.com>
It is absurd that KAISER should depend on SMP, but apparently nobody
has tried a UP build before: which breaks on implicit declaration of
function 'per_cpu_offset' in arch/x86/mm/kaiser.c.
Now, you would expect that to be trivially fixed up; but looking at
the System.map when that block is #ifdef'ed out of kaiser_init(),
I see that in a UP build __per_cpu_user_mapped_end is precisely at
__per_cpu_user_mapped_start, and the items carefully gathered into
that section for user-mapping on SMP, dispersed elsewhere on UP.
So, some other kind of section assignment will be needed on UP,
but implementing that is not a priority: just make KAISER depend
on SMP for now.
Also inserted a blank line before the option, tidied up the
brief Kconfig help message, and added an "If unsure, Y".
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/Kconfig | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -30,14 +30,16 @@ config SECURITY
model will be used.
If you are unsure how to answer this question, answer N.
+
config KAISER
bool "Remove the kernel mapping in user mode"
default y
- depends on X86_64
- depends on !PARAVIRT
+ depends on X86_64 && SMP && !PARAVIRT
help
- This enforces a strict kernel and user space isolation in order to close
- hardware side channels on kernel address information.
+ This enforces a strict kernel and user space isolation, in order
+ to close hardware side channels on kernel address information.
+
+ If you are unsure how to answer this question, answer Y.
config KAISER_REAL_SWITCH
bool "KAISER: actually switch page tables"
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-enomem-if-kaiser_pagetable_walk-null.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 18:48:02 -0700
Subject: kaiser: ENOMEM if kaiser_pagetable_walk() NULL
From: Hugh Dickins <hughd(a)google.com>
kaiser_add_user_map() took no notice when kaiser_pagetable_walk() failed.
And avoid its might_sleep() when atomic (though atomic at present unused).
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -98,11 +98,11 @@ static pte_t *kaiser_pagetable_walk(unsi
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- might_sleep();
if (is_atomic) {
gfp &= ~GFP_KERNEL;
gfp |= __GFP_HIGH | __GFP_ATOMIC;
- }
+ } else
+ might_sleep();
if (pgd_none(*pgd)) {
WARN_ONCE(1, "All shadow pgds should have been populated");
@@ -159,13 +159,17 @@ int kaiser_add_user_map(const void *__st
unsigned long end_addr = PAGE_ALIGN(start_addr + size);
unsigned long target_address;
- for (;address < end_addr; address += PAGE_SIZE) {
+ for (; address < end_addr; address += PAGE_SIZE) {
target_address = get_pa_from_mapping(address);
if (target_address == -1) {
ret = -EIO;
break;
}
pte = kaiser_pagetable_walk(address, false);
+ if (!pte) {
+ ret = -ENOMEM;
+ break;
+ }
if (pte_none(*pte)) {
set_pte(pte, __pte(flags | target_address));
} else {
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: enhanced by kernel and user PCIDs
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-enhanced-by-kernel-and-user-pcids.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Wed, 30 Aug 2017 16:23:00 -0700
Subject: kaiser: enhanced by kernel and user PCIDs
From: Hugh Dickins <hughd(a)google.com>
Merged performance improvements to Kaiser, using distinct kernel
and user Process Context Identifiers to minimize the TLB flushing.
[This work actually all from Dave Hansen 2017-08-30:
still omitting trackswitch mods, and KAISER_REAL_SWITCH deleted.]
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 10 ++++-
arch/x86/entry/entry_64_compat.S | 1
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/kaiser.h | 15 ++++++-
arch/x86/include/asm/pgtable_types.h | 26 +++++++++++++
arch/x86/include/asm/tlbflush.h | 54 +++++++++++++++++++++++-----
arch/x86/include/uapi/asm/processor-flags.h | 3 +
arch/x86/kernel/cpu/common.c | 34 +++++++++++++++++
arch/x86/kvm/x86.c | 3 +
arch/x86/mm/kaiser.c | 7 +++
arch/x86/mm/tlb.c | 46 ++++++++++++++++++++++-
11 files changed, 182 insertions(+), 18 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1317,7 +1317,10 @@ ENTRY(nmi)
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
pushq %rax
- andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
+ /* mask off "user" bit of pgd address and 12 PCID bits: */
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ /* Add back kernel PCID and "no flush" bit */
+ orq X86_CR3_PCID_KERN_VAR, %rax
movq %rax, %cr3
#endif
call do_nmi
@@ -1558,7 +1561,10 @@ end_repeat_nmi:
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
pushq %rax
- andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
+ /* mask off "user" bit of pgd address and 12 PCID bits: */
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ /* Add back kernel PCID and "no flush" bit */
+ orq X86_CR3_PCID_KERN_VAR, %rax
movq %rax, %cr3
#endif
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -13,6 +13,7 @@
#include <asm/irqflags.h>
#include <asm/asm.h>
#include <asm/smap.h>
+#include <asm/pgtable_types.h>
#include <asm/kaiser.h>
#include <linux/linkage.h>
#include <linux/err.h>
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -189,6 +189,7 @@
#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */
#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
+#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 4) /* Effectively INVPCID && CR4.PCIDE=1 */
#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -1,5 +1,8 @@
#ifndef _ASM_X86_KAISER_H
#define _ASM_X86_KAISER_H
+
+#include <uapi/asm/processor-flags.h> /* For PCID constants */
+
/*
* This file includes the definitions for the KAISER feature.
* KAISER is a counter measure against x86_64 side channel attacks on
@@ -21,13 +24,21 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
-andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
+andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
+orq X86_CR3_PCID_KERN_VAR, \reg
movq \reg, %cr3
.endm
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
-orq $(KAISER_SHADOW_PGD_OFFSET), \reg
+andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
+/*
+ * This can obviously be one instruction by putting the
+ * KAISER_SHADOW_PGD_OFFSET bit in the X86_CR3_PCID_USER_VAR.
+ * But, just leave it now for simplicity.
+ */
+orq X86_CR3_PCID_USER_VAR, \reg
+orq $(KAISER_SHADOW_PGD_OFFSET), \reg
movq \reg, %cr3
.endm
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -141,6 +141,32 @@
_PAGE_SOFT_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
+/* The ASID is the lower 12 bits of CR3 */
+#define X86_CR3_PCID_ASID_MASK (_AC((1<<12)-1,UL))
+
+/* Mask for all the PCID-related bits in CR3: */
+#define X86_CR3_PCID_MASK (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#if defined(CONFIG_KAISER) && defined(CONFIG_X86_64)
+#define X86_CR3_PCID_ASID_KERN (_AC(0x4,UL))
+#define X86_CR3_PCID_ASID_USER (_AC(0x6,UL))
+
+#define X86_CR3_PCID_KERN_FLUSH (X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_FLUSH (X86_CR3_PCID_ASID_USER)
+#define X86_CR3_PCID_KERN_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
+#define X86_CR3_PCID_USER_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
+#else
+#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
+#define X86_CR3_PCID_ASID_USER (_AC(0x0,UL))
+/*
+ * PCIDs are unsupported on 32-bit and none of these bits can be
+ * set in CR3:
+ */
+#define X86_CR3_PCID_KERN_FLUSH (0)
+#define X86_CR3_PCID_USER_FLUSH (0)
+#define X86_CR3_PCID_KERN_NOFLUSH (0)
+#define X86_CR3_PCID_USER_NOFLUSH (0)
+#endif
+
/*
* The cache modes defined here are used to translate between pure SW usage
* and the HW defined cache mode bits and/or PAT entries.
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -13,7 +13,6 @@ static inline void __invpcid(unsigned lo
unsigned long type)
{
struct { u64 d[2]; } desc = { { pcid, addr } };
-
/*
* The memory clobber is because the whole point is to invalidate
* stale TLB entries and, especially if we're flushing global
@@ -134,14 +133,25 @@ static inline void cr4_set_bits_and_upda
static inline void __native_flush_tlb(void)
{
+ if (!cpu_feature_enabled(X86_FEATURE_INVPCID)) {
+ /*
+ * If current->mm == NULL then we borrow a mm which may change during a
+ * task switch and therefore we must not be preempted while we write CR3
+ * back:
+ */
+ preempt_disable();
+ native_write_cr3(native_read_cr3());
+ preempt_enable();
+ return;
+ }
/*
- * If current->mm == NULL then we borrow a mm which may change during a
- * task switch and therefore we must not be preempted while we write CR3
- * back:
- */
- preempt_disable();
- native_write_cr3(native_read_cr3());
- preempt_enable();
+ * We are no longer using globals with KAISER, so a
+ * "nonglobals" flush would work too. But, this is more
+ * conservative.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
+ */
+ invpcid_flush_all();
}
static inline void __native_flush_tlb_global_irq_disabled(void)
@@ -163,6 +173,8 @@ static inline void __native_flush_tlb_gl
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
+ *
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
invpcid_flush_all();
return;
@@ -182,7 +194,31 @@ static inline void __native_flush_tlb_gl
static inline void __native_flush_tlb_single(unsigned long addr)
{
- asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+ /*
+ * SIMICS #GP's if you run INVPCID with type 2/3
+ * and X86_CR4_PCIDE clear. Shame!
+ *
+ * The ASIDs used below are hard-coded. But, we must not
+ * call invpcid(type=1/2) before CR4.PCIDE=1. Just call
+ * invpcid in the case we are called early.
+ */
+ if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+ asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
+ return;
+ }
+ /* Flush the address out of both PCIDs. */
+ /*
+ * An optimization here might be to determine addresses
+ * that are only kernel-mapped and only flush the kernel
+ * ASID. But, userspace flushes are probably much more
+ * important performance-wise.
+ *
+ * Make sure to do only a single invpcid when KAISER is
+ * disabled and we have only a single ASID.
+ */
+ if (X86_CR3_PCID_ASID_KERN != X86_CR3_PCID_ASID_USER)
+ invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
+ invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
}
static inline void __flush_tlb_all(void)
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -77,7 +77,8 @@
#define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT)
#define X86_CR3_PCD_BIT 4 /* Page Cache Disable */
#define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT)
-#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */
+#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
+#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
/*
* Intel CPU features in CR4
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -324,11 +324,45 @@ static __always_inline void setup_smap(s
}
}
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3. It would take
+ * another register. So, we use a memory reference to these
+ * instead.
+ *
+ * This is also handy because systems that do not support
+ * PCIDs just end up or'ing a 0 into their CR3, which does
+ * no harm.
+ */
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR = 0;
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_USER_VAR = 0;
+
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
if (cpu_has(c, X86_FEATURE_PGE)) {
cr4_set_bits(X86_CR4_PCIDE);
+ /*
+ * These variables are used by the entry/exit
+ * code to change PCIDs.
+ */
+#ifdef CONFIG_KAISER
+ X86_CR3_PCID_KERN_VAR = X86_CR3_PCID_KERN_NOFLUSH;
+ X86_CR3_PCID_USER_VAR = X86_CR3_PCID_USER_NOFLUSH;
+#endif
+ /*
+ * INVPCID has two "groups" of types:
+ * 1/2: Invalidate an individual address
+ * 3/4: Invalidate all contexts
+ *
+ * 1/2 take a PCID, but 3/4 do not. So, 3/4
+ * ignore the PCID argument in the descriptor.
+ * But, we have to be careful not to call 1/2
+ * with an actual non-zero PCID in them before
+ * we do the above cr4_set_bits().
+ */
+ if (cpu_has(c, X86_FEATURE_INVPCID))
+ set_cpu_cap(c, X86_FEATURE_INVPCID_SINGLE);
} else {
/*
* flush_tlb_all(), as currently implemented, won't
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -773,7 +773,8 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, u
return 1;
/* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
- if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
+ if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_ASID_MASK) ||
+ !is_long_mode(vcpu))
return 1;
}
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -239,6 +239,8 @@ static void __init kaiser_init_all_pgds(
} while (0)
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+extern unsigned long X86_CR3_PCID_KERN_VAR;
+extern unsigned long X86_CR3_PCID_USER_VAR;
/*
* If anything in here fails, we will likely die on one of the
* first kernel->user transitions and init will die. But, we
@@ -289,6 +291,11 @@ void __init kaiser_init(void)
kaiser_add_user_map_early(&debug_idt_table,
sizeof(gate_desc) * NR_VECTORS,
__PAGE_KERNEL);
+
+ kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&X86_CR3_PCID_USER_VAR, PAGE_SIZE,
+ __PAGE_KERNEL);
}
/* Add a mapping to the shadow mapping, and synchronize the mappings */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -34,6 +34,46 @@ struct flush_tlb_info {
unsigned long flush_end;
};
+static void load_new_mm_cr3(pgd_t *pgdir)
+{
+ unsigned long new_mm_cr3 = __pa(pgdir);
+
+ /*
+ * KAISER, plus PCIDs needs some extra work here. But,
+ * if either of features is not present, we need no
+ * PCIDs here and just do a normal, full TLB flush with
+ * the write_cr3()
+ */
+ if (!IS_ENABLED(CONFIG_KAISER) ||
+ !cpu_feature_enabled(X86_FEATURE_PCID))
+ goto out_set_cr3;
+ /*
+ * We reuse the same PCID for different tasks, so we must
+ * flush all the entires for the PCID out when we change
+ * tasks.
+ */
+ new_mm_cr3 = X86_CR3_PCID_KERN_FLUSH | __pa(pgdir);
+
+ /*
+ * The flush from load_cr3() may leave old TLB entries
+ * for userspace in place. We must flush that context
+ * separately. We can theoretically delay doing this
+ * until we actually load up the userspace CR3, but
+ * that's a bit tricky. We have to have the "need to
+ * flush userspace PCID" bit per-cpu and check it in the
+ * exit-to-userspace paths.
+ */
+ invpcid_flush_single_context(X86_CR3_PCID_ASID_USER);
+
+out_set_cr3:
+ /*
+ * Caution: many callers of this function expect
+ * that load_cr3() is serializing and orders TLB
+ * fills with respect to the mm_cpumask writes.
+ */
+ write_cr3(new_mm_cr3);
+}
+
/*
* We cannot call mmdrop() because we are in interrupt context,
* instead update mm->cpu_vm_mask.
@@ -45,7 +85,7 @@ void leave_mm(int cpu)
BUG();
if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
- load_cr3(swapper_pg_dir);
+ load_new_mm_cr3(swapper_pg_dir);
/*
* This gets called in the idle path where RCU
* functions differently. Tracing normally
@@ -120,7 +160,7 @@ void switch_mm_irqs_off(struct mm_struct
* ordering guarantee we need.
*
*/
- load_cr3(next->pgd);
+ load_new_mm_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
@@ -167,7 +207,7 @@ void switch_mm_irqs_off(struct mm_struct
* As above, load_cr3() is serializing and orders TLB
* fills with respect to the mm_cpumask write.
*/
- load_cr3(next->pgd);
+ load_new_mm_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
load_mm_cr4(next);
load_mm_ldt(next);
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 29 Oct 2017 11:36:19 -0700
Subject: kaiser: drop is_atomic arg to kaiser_pagetable_walk()
From: Hugh Dickins <hughd(a)google.com>
I have not observed a might_sleep() warning from setup_fixmap_gdt()'s
use of kaiser_add_mapping() in our tree (why not?), but like upstream
we have not provided a way for that to pass is_atomic true down to
kaiser_pagetable_walk(), and at startup it's far from a likely source
of trouble: so just delete the walk's is_atomic arg and might_sleep().
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -107,19 +107,13 @@ static inline unsigned long get_pa_from_
*
* Returns a pointer to a PTE on success, or NULL on failure.
*/
-static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic)
+static pte_t *kaiser_pagetable_walk(unsigned long address)
{
pmd_t *pmd;
pud_t *pud;
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- if (is_atomic) {
- gfp &= ~GFP_KERNEL;
- gfp |= __GFP_HIGH | __GFP_ATOMIC;
- } else
- might_sleep();
-
if (pgd_none(*pgd)) {
WARN_ONCE(1, "All shadow pgds should have been populated");
return NULL;
@@ -194,7 +188,7 @@ static int kaiser_add_user_map(const voi
ret = -EIO;
break;
}
- pte = kaiser_pagetable_walk(address, false);
+ pte = kaiser_pagetable_walk(address);
if (!pte) {
ret = -ENOMEM;
break;
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: fix build and FIXME in alloc_ldt_struct()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 17:09:44 -0700
Subject: kaiser: fix build and FIXME in alloc_ldt_struct()
From: Hugh Dickins <hughd(a)google.com>
Include linux/kaiser.h instead of asm/kaiser.h to build ldt.c without
CONFIG_KAISER. kaiser_add_mapping() does already return an error code,
so fix the FIXME.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/ldt.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -16,9 +16,9 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <linux/kaiser.h>
#include <asm/ldt.h>
-#include <asm/kaiser.h>
#include <asm/desc.h>
#include <asm/mmu_context.h>
#include <asm/syscalls.h>
@@ -49,7 +49,7 @@ static struct ldt_struct *alloc_ldt_stru
{
struct ldt_struct *new_ldt;
int alloc_size;
- int ret = 0;
+ int ret;
if (size > LDT_ENTRIES)
return NULL;
@@ -77,10 +77,8 @@ static struct ldt_struct *alloc_ldt_stru
return NULL;
}
- // FIXME: make kaiser_add_mapping() return an error code
- // when it fails
- kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
- __PAGE_KERNEL);
+ ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+ __PAGE_KERNEL);
if (ret) {
__free_ldt_struct(new_ldt);
return NULL;
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: do not set _PAGE_NX on pgd_none
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-do-not-set-_page_nx-on-pgd_none.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Tue, 5 Sep 2017 12:05:01 -0700
Subject: kaiser: do not set _PAGE_NX on pgd_none
From: Hugh Dickins <hughd(a)google.com>
native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).
The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too. So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:
A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.
This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.
What we need is a flag to set_pgd(), to tell it we're dealing with
userspace. Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that. But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.
But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.
Also make an unnecessary change to KASLR's init_trampoline(): it was
using set_pgd() to assign a pgd-value to a global variable (not in a
pg directory page), which was rather scary given Kaiser's previous
set_pgd() implementation: not a problem now, but too scary to leave
as was, it could easily blow up if we have to change set_pgd() again.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/boot/compressed/misc.h | 1
arch/x86/include/asm/pgtable_64.h | 51 +++++++++-----------------------------
arch/x86/mm/kaiser.c | 42 +++++++++++++++++++++++++++++++
arch/x86/mm/kaslr.c | 4 +-
4 files changed, 58 insertions(+), 40 deletions(-)
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,6 +9,7 @@
*/
#undef CONFIG_PARAVIRT
#undef CONFIG_PARAVIRT_SPINLOCKS
+#undef CONFIG_KAISER
#undef CONFIG_KASAN
#include <linux/linkage.h>
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -107,61 +107,36 @@ static inline void native_pud_clear(pud_
}
#ifdef CONFIG_KAISER
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd);
+
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
{
- return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
+ return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
}
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
{
- return (pgd_t *)(void*)((unsigned long)(void*)pgdp & ~(unsigned long)PAGE_SIZE);
+ return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE);
}
#else
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
{
BUILD_BUG_ON(1);
return NULL;
}
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
{
return pgdp;
}
#endif /* CONFIG_KAISER */
-/*
- * Page table pages are page-aligned. The lower half of the top
- * level is used for userspace and the top half for the kernel.
- * This returns true for user pages that need to get copied into
- * both the user and kernel copies of the page tables, and false
- * for kernel pages that should only be in the kernel copy.
- */
-static inline bool is_userspace_pgd(void *__ptr)
-{
- unsigned long ptr = (unsigned long)__ptr;
-
- return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
-}
-
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
-#ifdef CONFIG_KAISER
- pteval_t extra_kern_pgd_flags = 0;
- /* Do we need to also populate the shadow pgd? */
- if (is_userspace_pgd(pgdp)) {
- native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
- /*
- * Even if the entry is *mapping* userspace, ensure
- * that userspace can not use it. This way, if we
- * get out to userspace running on the kernel CR3,
- * userspace will crash instead of running.
- */
- extra_kern_pgd_flags = _PAGE_NX;
- }
- pgdp->pgd = pgd.pgd;
- pgdp->pgd |= extra_kern_pgd_flags;
-#else /* CONFIG_KAISER */
- *pgdp = pgd;
-#endif
+ *pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
}
static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -302,4 +302,46 @@ void kaiser_remove_mapping(unsigned long
unmap_pud_range_nofree(pgd, addr, end);
}
}
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(pgd_t *pgdp)
+{
+ return ((unsigned long)pgdp % PAGE_SIZE) < (PAGE_SIZE / 2);
+}
+
+pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ /*
+ * Do we need to also populate the shadow pgd? Check _PAGE_USER to
+ * skip cases like kexec and EFI which make temporary low mappings.
+ */
+ if (pgd.pgd & _PAGE_USER) {
+ if (is_userspace_pgd(pgdp)) {
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ /*
+ * Even if the entry is *mapping* userspace, ensure
+ * that userspace can not use it. This way, if we
+ * get out to userspace running on the kernel CR3,
+ * userspace will crash instead of running.
+ */
+ pgd.pgd |= _PAGE_NX;
+ }
+ } else if (!pgd.pgd) {
+ /*
+ * pgd_clear() cannot check _PAGE_USER, and is even used to
+ * clear corrupted pgd entries: so just rely on cases like
+ * kexec and EFI never to be using pgd_clear().
+ */
+ if (!WARN_ON_ONCE((unsigned long)pgdp & PAGE_SIZE) &&
+ is_userspace_pgd(pgdp))
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ }
+ return pgd;
+}
#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -189,6 +189,6 @@ void __meminit init_trampoline(void)
*pud_tramp = *pud;
}
- set_pgd(&trampoline_pgd_entry,
- __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+ /* Avoid set_pgd(), in case it's complicated by CONFIG_KAISER */
+ trampoline_pgd_entry = __pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: disabled on Xen PV
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-disabled-on-xen-pv.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Jiri Kosina <jkosina(a)suse.cz>
Date: Tue, 2 Jan 2018 14:19:49 +0100
Subject: kaiser: disabled on Xen PV
From: Jiri Kosina <jkosina(a)suse.cz>
Kaiser cannot be used on paravirtualized MMUs (namely reading and writing CR3).
This does not work with KAISER as the CR3 switch from and to user space PGD
would require to map the whole XEN_PV machinery into both.
More importantly, enabling KAISER on Xen PV doesn't make too much sense, as PV
guests use distinct %cr3 values for kernel and user already.
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -263,6 +263,9 @@ void __init kaiser_check_boottime_disabl
char arg[5];
int ret;
+ if (boot_cpu_has(X86_FEATURE_XENPV))
+ goto silent_disable;
+
ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
if (ret > 0) {
if (!strncmp(arg, "on", 2))
@@ -290,6 +293,8 @@ enable:
disable:
pr_info("Kernel/User page tables isolation: disabled\n");
+
+silent_disable:
kaiser_enabled = 0;
setup_clear_cpu_cap(X86_FEATURE_KAISER);
}
Patches currently in stable-queue which might be from jkosina(a)suse.cz are
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-disabled-on-xen-pv.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
This is a note to let you know that I've just added the patch titled
kaiser: asm/tlbflush.h handle noPGE at lower level
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 4 Nov 2017 18:23:24 -0700
Subject: kaiser: asm/tlbflush.h handle noPGE at lower level
From: Hugh Dickins <hughd(a)google.com>
I found asm/tlbflush.h too twisty, and think it safer not to avoid
__native_flush_tlb_global_irq_disabled() in the kaiser_enabled case,
but instead let it handle kaiser_enabled along with cr3: it can just
use __native_flush_tlb() for that, no harm in re-disabling preemption.
(This is not the same change as Kirill and Dave have suggested for
upstream, flipping PGE in cr4: that's neat, but needs a cpu_has_pge
check; cr3 is enough for kaiser, and thought to be cheaper than cr4.)
Also delete the X86_FEATURE_INVPCID invpcid_flush_all_nonglobals()
preference from __native_flush_tlb(): unlike the invpcid_flush_all()
preference in __native_flush_tlb_global(), it's not seen in upstream
4.14, and was recently reported to be surprisingly slow.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 27 +++------------------------
1 file changed, 3 insertions(+), 24 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -152,14 +152,6 @@ static inline void kaiser_flush_tlb_on_r
static inline void __native_flush_tlb(void)
{
- if (this_cpu_has(X86_FEATURE_INVPCID)) {
- /*
- * Note, this works with CR4.PCIDE=0 or 1.
- */
- invpcid_flush_all_nonglobals();
- return;
- }
-
/*
* If current->mm == NULL then we borrow a mm which may change during a
* task switch and therefore we must not be preempted while we write CR3
@@ -183,11 +175,8 @@ static inline void __native_flush_tlb_gl
/* restore PGE as it was before */
native_write_cr4(cr4);
} else {
- /*
- * x86_64 microcode update comes this way when CR4.PGE is not
- * enabled, and it's safer for all callers to allow this case.
- */
- native_write_cr3(native_read_cr3());
+ /* do it with cr3, letting kaiser flush user PCID */
+ __native_flush_tlb();
}
}
@@ -195,12 +184,6 @@ static inline void __native_flush_tlb_gl
{
unsigned long flags;
- if (kaiser_enabled) {
- /* Globals are not used at all */
- __native_flush_tlb();
- return;
- }
-
if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
@@ -256,11 +239,7 @@ static inline void __native_flush_tlb_si
static inline void __flush_tlb_all(void)
{
- if (boot_cpu_has(X86_FEATURE_PGE))
- __flush_tlb_global();
- else
- __flush_tlb();
-
+ __flush_tlb_global();
/*
* Note: if we somehow had PCID but not PGE, then this wouldn't work --
* we'd end up flushing kernel translations for the current ASID but
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: delete KAISER_REAL_SWITCH option
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-delete-kaiser_real_switch-option.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 18:30:43 -0700
Subject: kaiser: delete KAISER_REAL_SWITCH option
From: Hugh Dickins <hughd(a)google.com>
We fail to see what CONFIG_KAISER_REAL_SWITCH is for: it seems to be
left over from early development, and now just obscures tricky parts
of the code. Delete it before adding PCIDs, or nokaiser boot option.
(Or if there is some good reason to keep the option, then it needs
a help text - and a "depends on KAISER", so that all those without
KAISER are not asked the question. But we'd much rather delete it.)
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 4 ----
arch/x86/include/asm/kaiser.h | 4 ----
security/Kconfig | 4 ----
3 files changed, 12 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1317,9 +1317,7 @@ ENTRY(nmi)
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
pushq %rax
-#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
-#endif
movq %rax, %cr3
#endif
call do_nmi
@@ -1560,9 +1558,7 @@ end_repeat_nmi:
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
pushq %rax
-#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
-#endif
movq %rax, %cr3
#endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -21,17 +21,13 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
-#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
-#endif
movq \reg, %cr3
.endm
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
-#ifdef CONFIG_KAISER_REAL_SWITCH
orq $(KAISER_SHADOW_PGD_OFFSET), \reg
-#endif
movq \reg, %cr3
.endm
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -41,10 +41,6 @@ config KAISER
If you are unsure how to answer this question, answer Y.
-config KAISER_REAL_SWITCH
- bool "KAISER: actually switch page tables"
- default y
-
config SECURITYFS
bool "Enable the securityfs filesystem"
help
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: align addition to x86/mm/Makefile
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-align-addition-to-x86-mm-makefile.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 19:51:10 -0700
Subject: kaiser: align addition to x86/mm/Makefile
From: Hugh Dickins <hughd(a)google.com>
Use tab not space so they line up properly, kaslr.o also.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -37,5 +37,5 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulatio
obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
-obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
-obj-$(CONFIG_KAISER) += kaiser.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+obj-$(CONFIG_KAISER) += kaiser.o
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: add "nokaiser" boot option, using ALTERNATIVE
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-add-nokaiser-boot-option-using-alternative.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 20:37:21 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 24 Sep 2017 16:59:49 -0700
Subject: kaiser: add "nokaiser" boot option, using ALTERNATIVE
From: Hugh Dickins <hughd(a)google.com>
Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime. That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER is fabricated.
Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.
It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h). I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.
Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments. But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in probe_page_size_mask().
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 2 +
arch/x86/entry/entry_64.S | 15 ++++++-----
arch/x86/include/asm/cpufeatures.h | 3 ++
arch/x86/include/asm/kaiser.h | 27 +++++++++++++++------
arch/x86/include/asm/pgtable.h | 20 +++++++++++----
arch/x86/include/asm/pgtable_64.h | 13 +++-------
arch/x86/include/asm/pgtable_types.h | 4 ---
arch/x86/include/asm/tlbflush.h | 39 +++++++++++++++++++------------
arch/x86/kernel/cpu/common.c | 28 +++++++++++++++++++++-
arch/x86/kernel/espfix_64.c | 3 +-
arch/x86/kernel/head_64.S | 4 +--
arch/x86/mm/init.c | 2 -
arch/x86/mm/init_64.c | 10 +++++++
arch/x86/mm/kaiser.c | 26 +++++++++++++++++---
arch/x86/mm/pgtable.c | 8 +-----
arch/x86/mm/tlb.c | 4 ---
tools/arch/x86/include/asm/cpufeatures.h | 3 ++
17 files changed, 146 insertions(+), 65 deletions(-)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2763,6 +2763,8 @@ bytes respectively. Such letter suffixes
nojitter [IA-64] Disables jitter checking for ITC timers.
+ nokaiser [X86-64] Disable KAISER isolation of kernel from user.
+
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1079,7 +1079,7 @@ ENTRY(paranoid_entry)
* unconditionally, but we need to find out whether the reverse
* should be done on return (conveyed to paranoid_exit in %ebx).
*/
- movq %cr3, %rax
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
testl $KAISER_SHADOW_PGD_OFFSET, %eax
jz 2f
orl $2, %ebx
@@ -1111,6 +1111,7 @@ ENTRY(paranoid_exit)
TRACE_IRQS_OFF_DEBUG
TRACE_IRQS_IRETQ_DEBUG
#ifdef CONFIG_KAISER
+ /* No ALTERNATIVE for X86_FEATURE_KAISER: paranoid_entry sets %ebx */
testl $2, %ebx /* SWITCH_USER_CR3 needed? */
jz paranoid_exit_no_switch
SWITCH_USER_CR3
@@ -1341,13 +1342,14 @@ ENTRY(nmi)
#ifdef CONFIG_KAISER
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
- movq %cr3, %rax
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
movq %rax, %cr3
+2:
#endif
call do_nmi
@@ -1357,8 +1359,7 @@ ENTRY(nmi)
* kernel code that needs user CR3, but do we ever return
* to "user mode" where we need the kernel CR3?
*/
- popq %rax
- mov %rax, %cr3
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
#endif
/*
@@ -1585,13 +1586,14 @@ end_repeat_nmi:
#ifdef CONFIG_KAISER
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
- movq %cr3, %rax
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
movq %rax, %cr3
+2:
#endif
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
@@ -1603,8 +1605,7 @@ end_repeat_nmi:
* kernel code that needs user CR3, like just just before
* a sysret.
*/
- popq %rax
- mov %rax, %cr3
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
#endif
testl %ebx, %ebx /* swapgs needed? */
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -198,6 +198,9 @@
#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_KAISER w/o nokaiser */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -46,28 +46,33 @@ movq \reg, %cr3
.endm
.macro SWITCH_KERNEL_CR3
-pushq %rax
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
_SWITCH_TO_KERNEL_CR3 %rax
popq %rax
+8:
.endm
.macro SWITCH_USER_CR3
-pushq %rax
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
_SWITCH_TO_USER_CR3 %rax %al
popq %rax
+8:
.endm
.macro SWITCH_KERNEL_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+ALTERNATIVE "jmp 8f", \
+ __stringify(movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)), \
+ X86_FEATURE_KAISER
_SWITCH_TO_KERNEL_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+8:
.endm
#else /* CONFIG_KAISER */
-.macro SWITCH_KERNEL_CR3 reg
+.macro SWITCH_KERNEL_CR3
.endm
-.macro SWITCH_USER_CR3 reg regb
+.macro SWITCH_USER_CR3
.endm
.macro SWITCH_KERNEL_CR3_NO_STACK
.endm
@@ -90,6 +95,16 @@ DECLARE_PER_CPU(unsigned long, x86_cr3_p
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif /* CONFIG_KAISER */
+
+/*
+ * Kaiser function prototypes are needed even when CONFIG_KAISER is not set,
+ * so as to build with tests on kaiser_enabled instead of #ifdefs.
+ */
+
/**
* kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
* @addr: the start address of the range
@@ -119,8 +134,6 @@ extern void kaiser_remove_mapping(unsign
*/
extern void kaiser_init(void);
-#endif /* CONFIG_KAISER */
-
#endif /* __ASSEMBLY */
#endif /* _ASM_X86_KAISER_H */
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -18,6 +18,12 @@
#ifndef __ASSEMBLY__
#include <asm/x86_init.h>
+#ifdef CONFIG_KAISER
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif
+
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
@@ -697,7 +703,7 @@ static inline int pgd_bad(pgd_t pgd)
* page table by accident; it will fault on the first
* instruction it tries to run. See native_set_pgd().
*/
- if (IS_ENABLED(CONFIG_KAISER))
+ if (kaiser_enabled)
ignore_flags |= _PAGE_NX;
return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
@@ -913,12 +919,14 @@ static inline void pmdp_set_wrprotect(st
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
#ifdef CONFIG_KAISER
- /* Clone the shadow pgd part as well */
- memcpy(native_get_shadow_pgd(dst),
- native_get_shadow_pgd(src),
- count * sizeof(pgd_t));
+ if (kaiser_enabled) {
+ /* Clone the shadow pgd part as well */
+ memcpy(native_get_shadow_pgd(dst),
+ native_get_shadow_pgd(src),
+ count * sizeof(pgd_t));
+ }
#endif
}
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -111,13 +111,12 @@ extern pgd_t kaiser_set_shadow_pgd(pgd_t
static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
{
+#ifdef CONFIG_DEBUG_VM
+ /* linux/mmdebug.h may not have been included at this point */
+ BUG_ON(!kaiser_enabled);
+#endif
return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
}
-
-static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
-{
- return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE);
-}
#else
static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
{
@@ -128,10 +127,6 @@ static inline pgd_t *native_get_shadow_p
BUILD_BUG_ON(1);
return NULL;
}
-static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
-{
- return pgdp;
-}
#endif /* CONFIG_KAISER */
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -45,11 +45,7 @@
#define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE)
-#ifdef CONFIG_KAISER
-#define _PAGE_GLOBAL (_AT(pteval_t, 0))
-#else
#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
-#endif
#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
#define _PAGE_SOFTW2 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -137,9 +137,11 @@ static inline void cr4_set_bits_and_upda
* to avoid the need for asm/kaiser.h in unexpected places.
*/
#ifdef CONFIG_KAISER
+extern int kaiser_enabled;
extern void kaiser_setup_pcid(void);
extern void kaiser_flush_tlb_on_return_to_user(void);
#else
+#define kaiser_enabled 0
static inline void kaiser_setup_pcid(void)
{
}
@@ -164,7 +166,7 @@ static inline void __native_flush_tlb(vo
* back:
*/
preempt_disable();
- if (this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
kaiser_flush_tlb_on_return_to_user();
native_write_cr3(native_read_cr3());
preempt_enable();
@@ -175,20 +177,30 @@ static inline void __native_flush_tlb_gl
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- /* clear PGE */
- native_write_cr4(cr4 & ~X86_CR4_PGE);
- /* write old PGE again and flush TLBs */
- native_write_cr4(cr4);
+ if (cr4 & X86_CR4_PGE) {
+ /* clear PGE and flush TLB of all entries */
+ native_write_cr4(cr4 & ~X86_CR4_PGE);
+ /* restore PGE as it was before */
+ native_write_cr4(cr4);
+ } else {
+ /*
+ * x86_64 microcode update comes this way when CR4.PGE is not
+ * enabled, and it's safer for all callers to allow this case.
+ */
+ native_write_cr3(native_read_cr3());
+ }
}
static inline void __native_flush_tlb_global(void)
{
-#ifdef CONFIG_KAISER
- /* Globals are not used at all */
- __native_flush_tlb();
-#else
unsigned long flags;
+ if (kaiser_enabled) {
+ /* Globals are not used at all */
+ __native_flush_tlb();
+ return;
+ }
+
if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
@@ -208,7 +220,6 @@ static inline void __native_flush_tlb_gl
raw_local_irq_save(flags);
__native_flush_tlb_global_irq_disabled();
raw_local_irq_restore(flags);
-#endif
}
static inline void __native_flush_tlb_single(unsigned long addr)
@@ -223,7 +234,7 @@ static inline void __native_flush_tlb_si
*/
if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
- if (this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
kaiser_flush_tlb_on_return_to_user();
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
return;
@@ -238,9 +249,9 @@ static inline void __native_flush_tlb_si
* Make sure to do only a single invpcid when KAISER is
* disabled and we have only a single ASID.
*/
- if (X86_CR3_PCID_ASID_KERN != X86_CR3_PCID_ASID_USER)
- invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
- invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ if (kaiser_enabled)
+ invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
}
static inline void __flush_tlb_all(void)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -179,6 +179,20 @@ static int __init x86_pcid_setup(char *s
return 1;
}
__setup("nopcid", x86_pcid_setup);
+
+static int __init x86_nokaiser_setup(char *s)
+{
+ /* nokaiser doesn't accept parameters */
+ if (s)
+ return -EINVAL;
+#ifdef CONFIG_KAISER
+ kaiser_enabled = 0;
+ setup_clear_cpu_cap(X86_FEATURE_KAISER);
+ pr_info("nokaiser: KAISER feature disabled\n");
+#endif
+ return 0;
+}
+early_param("nokaiser", x86_nokaiser_setup);
#endif
static int __init x86_noinvpcid_setup(char *s)
@@ -327,7 +341,7 @@ static __always_inline void setup_smap(s
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
- if (cpu_has(c, X86_FEATURE_PGE)) {
+ if (cpu_has(c, X86_FEATURE_PGE) || kaiser_enabled) {
cr4_set_bits(X86_CR4_PCIDE);
/*
* INVPCID has two "groups" of types:
@@ -799,6 +813,10 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
init_scattered_cpuid_features(c);
+#ifdef CONFIG_KAISER
+ if (kaiser_enabled)
+ set_cpu_cap(c, X86_FEATURE_KAISER);
+#endif
}
static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
@@ -1537,6 +1555,14 @@ void cpu_init(void)
* try to read it.
*/
cr4_init_shadow();
+ if (!kaiser_enabled) {
+ /*
+ * secondary_startup_64() deferred setting PGE in cr4:
+ * probe_page_size_mask() sets it on the boot cpu,
+ * but it needs to be set on each secondary cpu.
+ */
+ cr4_set_bits(X86_CR4_PGE);
+ }
/*
* Load microcode on this cpu if a valid microcode is available.
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -132,9 +132,10 @@ void __init init_espfix_bsp(void)
* area to ensure it is mapped into the shadow user page
* tables.
*/
- if (IS_ENABLED(CONFIG_KAISER))
+ if (kaiser_enabled) {
set_pgd(native_get_shadow_pgd(pgd_p),
__pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
+ }
/* Randomize the locations */
init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -190,8 +190,8 @@ ENTRY(secondary_startup_64)
movq $(init_level4_pgt - __START_KERNEL_map), %rax
1:
- /* Enable PAE mode and PGE */
- movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
+ /* Enable PAE and PSE, but defer PGE until kaiser_enabled is decided */
+ movl $(X86_CR4_PAE | X86_CR4_PSE), %ecx
movq %rcx, %cr4
/* Setup early boot stage 4 level pagetables. */
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -177,7 +177,7 @@ static void __init probe_page_size_mask(
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
- if (boot_cpu_has(X86_FEATURE_PGE)) {
+ if (boot_cpu_has(X86_FEATURE_PGE) && !kaiser_enabled) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
__supported_pte_mask |= _PAGE_GLOBAL;
} else
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -324,6 +324,16 @@ void __init cleanup_highmap(void)
continue;
if (vaddr < (unsigned long) _text || vaddr > end)
set_pmd(pmd, __pmd(0));
+ else if (kaiser_enabled) {
+ /*
+ * level2_kernel_pgt is initialized with _PAGE_GLOBAL:
+ * clear that now. This is not important, so long as
+ * CR4.PGE remains clear, but it removes an anomaly.
+ * Physical mapping setup below avoids _PAGE_GLOBAL
+ * by use of massage_pgprot() inside pfn_pte() etc.
+ */
+ set_pmd(pmd, pmd_clear_flags(*pmd, _PAGE_GLOBAL));
+ }
}
}
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -16,7 +16,9 @@
#include <asm/pgalloc.h>
#include <asm/desc.h>
-#ifdef CONFIG_KAISER
+int kaiser_enabled __read_mostly = 1;
+EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
+
__visible
DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
@@ -167,8 +169,8 @@ static pte_t *kaiser_pagetable_walk(unsi
return pte_offset_kernel(pmd, address);
}
-int kaiser_add_user_map(const void *__start_addr, unsigned long size,
- unsigned long flags)
+static int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+ unsigned long flags)
{
int ret = 0;
pte_t *pte;
@@ -177,6 +179,15 @@ int kaiser_add_user_map(const void *__st
unsigned long end_addr = PAGE_ALIGN(start_addr + size);
unsigned long target_address;
+ /*
+ * It is convenient for callers to pass in __PAGE_KERNEL etc,
+ * and there is no actual harm from setting _PAGE_GLOBAL, so
+ * long as CR4.PGE is not set. But it is nonetheless troubling
+ * to see Kaiser itself setting _PAGE_GLOBAL (now that "nokaiser"
+ * requires that not to be #defined to 0): so mask it off here.
+ */
+ flags &= ~_PAGE_GLOBAL;
+
for (; address < end_addr; address += PAGE_SIZE) {
target_address = get_pa_from_mapping(address);
if (target_address == -1) {
@@ -263,6 +274,8 @@ void __init kaiser_init(void)
{
int cpu;
+ if (!kaiser_enabled)
+ return;
kaiser_init_all_pgds();
for_each_possible_cpu(cpu) {
@@ -311,6 +324,8 @@ void __init kaiser_init(void)
/* Add a mapping to the shadow mapping, and synchronize the mappings */
int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
{
+ if (!kaiser_enabled)
+ return 0;
return kaiser_add_user_map((const void *)addr, size, flags);
}
@@ -322,6 +337,8 @@ void kaiser_remove_mapping(unsigned long
unsigned long addr, next;
pgd_t *pgd;
+ if (!kaiser_enabled)
+ return;
pgd = native_get_shadow_pgd(pgd_offset_k(start));
for (addr = start; addr < end; pgd++, addr = next) {
next = pgd_addr_end(addr, end);
@@ -343,6 +360,8 @@ static inline bool is_userspace_pgd(pgd_
pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
{
+ if (!kaiser_enabled)
+ return pgd;
/*
* Do we need to also populate the shadow pgd? Check _PAGE_USER to
* skip cases like kexec and EFI which make temporary low mappings.
@@ -399,4 +418,3 @@ void kaiser_flush_tlb_on_return_to_user(
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
-#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -345,16 +345,12 @@ static inline void _pgd_free(pgd_t *pgd)
}
#else
-#ifdef CONFIG_KAISER
/*
- * Instead of one pmd, we aquire two pmds. Being order-1, it is
+ * Instead of one pgd, Kaiser acquires two pgds. Being order-1, it is
* both 8k in size and 8k-aligned. That lets us just flip bit 12
* in a pointer to swap between the two 4k halves.
*/
-#define PGD_ALLOCATION_ORDER 1
-#else
-#define PGD_ALLOCATION_ORDER 0
-#endif
+#define PGD_ALLOCATION_ORDER kaiser_enabled
static inline pgd_t *_pgd_alloc(void)
{
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -39,8 +39,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
{
unsigned long new_mm_cr3 = __pa(pgdir);
-#ifdef CONFIG_KAISER
- if (this_cpu_has(X86_FEATURE_PCID)) {
+ if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID)) {
/*
* We reuse the same PCID for different tasks, so we must
* flush all the entries for the PCID out when we change tasks.
@@ -57,7 +56,6 @@ static void load_new_mm_cr3(pgd_t *pgdir
new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
kaiser_flush_tlb_on_return_to_user();
}
-#endif /* CONFIG_KAISER */
/*
* Caution: many callers of this function expect
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,9 @@
#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_KAISER w/o nokaiser */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.9/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.9/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.9/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.9/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.9/kaiser-merged-update.patch
queue-4.9/kaiser-delete-kaiser_real_switch-option.patch
queue-4.9/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.9/kaiser-fix-perf-crashes.patch
queue-4.9/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.9/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.9/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.9/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.9/kaiser-align-addition-to-x86-mm-makefile.patch
queue-4.9/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.9/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.9/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.9/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.9/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.9/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.9/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.9/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.9/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.9/kaiser-kernel-address-isolation.patch
queue-4.9/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.9/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.9/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.9/kaiser-kaiser-depends-on-smp.patch
queue-4.9/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
x86/paravirt: Dont patch flush_tlb_single
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-paravirt-dont-patch-flush_tlb_single.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 4 Dec 2017 15:07:30 +0100
Subject: x86/paravirt: Dont patch flush_tlb_single
From: Thomas Gleixner <tglx(a)linutronix.de>
commit a035795499ca1c2bd1928808d1a156eda1420383 upstream
native_flush_tlb_single() will be changed with the upcoming
PAGE_TABLE_ISOLATION feature. This requires to have more code in
there than INVLPG.
Remove the paravirt patching for it.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
Reviewed-by: Juergen Gross <jgross(a)suse.com>
Acked-by: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Borislav Petkov <bpetkov(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Eduardo Valentin <eduval(a)amazon.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: aliguori(a)amazon.com
Cc: daniel.gruss(a)iaik.tugraz.at
Cc: hughd(a)google.com
Cc: keescook(a)google.com
Cc: linux-mm(a)kvack.org
Cc: michael.schwarz(a)iaik.tugraz.at
Cc: moritz.lipp(a)iaik.tugraz.at
Cc: richard.fellner(a)student.tugraz.at
Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.de
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Acked-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/paravirt_patch_64.c | 2 --
1 file changed, 2 deletions(-)
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -9,7 +9,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq;
DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
DEF_NATIVE(pv_cpu_ops, clts, "clts");
DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
@@ -62,7 +61,6 @@ unsigned native_patch(u8 type, u16 clobb
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
PATCH_SITE(pv_cpu_ops, clts);
- PATCH_SITE(pv_mmu_ops, flush_tlb_single);
PATCH_SITE(pv_cpu_ops, wbinvd);
#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/x86-boot-add-early-cmdline-parsing-for-options-with-arguments.patch
This is a note to let you know that I've just added the patch titled
x86/kaiser: Move feature detection up
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaiser-move-feature-detection-up.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Borislav Petkov <bp(a)suse.de>
Date: Mon, 25 Dec 2017 13:57:16 +0100
Subject: x86/kaiser: Move feature detection up
From: Borislav Petkov <bp(a)suse.de>
... before the first use of kaiser_enabled as otherwise funky
things happen:
about to get started...
(XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
(XEN) Pagetable walk from ffff88022a449090:
(XEN) L4[0x110] = 0000000229e0e067 0000000000001e0e
(XEN) L3[0x008] = 0000000000000000 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
entry.o#create_bounce_frame+0x135/0x14d
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.9.1_02-3.21 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff81007460>]
(XEN) RFLAGS: 0000000000000286 EM: 1 CONTEXT: pv guest (d0v0)
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 2 ++
arch/x86/kernel/setup.c | 7 +++++++
arch/x86/mm/kaiser.c | 2 --
3 files changed, 9 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -96,8 +96,10 @@ DECLARE_PER_CPU(unsigned long, x86_cr3_p
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
extern int kaiser_enabled;
+extern void __init kaiser_check_boottime_disable(void);
#else
#define kaiser_enabled 0
+static inline void __init kaiser_check_boottime_disable(void) {}
#endif /* CONFIG_KAISER */
/*
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -112,6 +112,7 @@
#include <asm/alternative.h>
#include <asm/prom.h>
#include <asm/microcode.h>
+#include <asm/kaiser.h>
/*
* max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -1016,6 +1017,12 @@ void __init setup_arch(char **cmdline_p)
*/
init_hypervisor_platform();
+ /*
+ * This needs to happen right after XENPV is set on xen and
+ * kaiser_enabled is checked below in cleanup_highmap().
+ */
+ kaiser_check_boottime_disable();
+
x86_init.resources.probe_roms();
/* after parse_early_param, so could debug it */
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -311,8 +311,6 @@ void __init kaiser_init(void)
{
int cpu;
- kaiser_check_boottime_disable();
-
if (!kaiser_enabled)
return;
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/x86-kaiser-reenable-paravirt.patch
queue-4.4/x86-kaiser-rename-and-simplify-x86_feature_kaiser-handling.patch
queue-4.4/x86-kaiser-check-boottime-cmdline-params.patch
queue-4.4/x86-kaiser-move-feature-detection-up.patch
This is a note to let you know that I've just added the patch titled
x86/kaiser: Reenable PARAVIRT
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaiser-reenable-paravirt.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Borislav Petkov <bp(a)suse.de>
Date: Tue, 2 Jan 2018 14:19:49 +0100
Subject: x86/kaiser: Reenable PARAVIRT
From: Borislav Petkov <bp(a)suse.de>
Now that the required bits have been addressed, reenable
PARAVIRT.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -34,7 +34,7 @@ config SECURITY
config KAISER
bool "Remove the kernel mapping in user mode"
default y
- depends on X86_64 && SMP && !PARAVIRT
+ depends on X86_64 && SMP
help
This enforces a strict kernel and user space isolation, in order
to close hardware side channels on kernel address information.
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/x86-kaiser-reenable-paravirt.patch
queue-4.4/x86-kaiser-rename-and-simplify-x86_feature_kaiser-handling.patch
queue-4.4/x86-kaiser-check-boottime-cmdline-params.patch
queue-4.4/x86-kaiser-move-feature-detection-up.patch
This is a note to let you know that I've just added the patch titled
x86/kaiser: Check boottime cmdline params
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-kaiser-check-boottime-cmdline-params.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Borislav Petkov <bp(a)suse.de>
Date: Tue, 2 Jan 2018 14:19:48 +0100
Subject: x86/kaiser: Check boottime cmdline params
From: Borislav Petkov <bp(a)suse.de>
AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.
Keep the "nopti" for traditional reasons and add pti=<on|off|auto>
like upstream.
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 6 +++
arch/x86/mm/kaiser.c | 59 +++++++++++++++++++++++++-----------
2 files changed, 47 insertions(+), 18 deletions(-)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3056,6 +3056,12 @@ bytes respectively. Such letter suffixes
pt. [PARIDE]
See Documentation/blockdev/paride.txt.
+ pti= [X86_64]
+ Control KAISER user/kernel address space isolation:
+ on - enable
+ off - disable
+ auto - default setting
+
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -16,6 +16,7 @@
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/desc.h>
+#include <asm/cmdline.h>
int kaiser_enabled __read_mostly = 1;
EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
@@ -264,6 +265,43 @@ static void __init kaiser_init_all_pgds(
WARN_ON(__ret); \
} while (0)
+void __init kaiser_check_boottime_disable(void)
+{
+ bool enable = true;
+ char arg[5];
+ int ret;
+
+ ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
+ if (ret > 0) {
+ if (!strncmp(arg, "on", 2))
+ goto enable;
+
+ if (!strncmp(arg, "off", 3))
+ goto disable;
+
+ if (!strncmp(arg, "auto", 4))
+ goto skip;
+ }
+
+ if (cmdline_find_option_bool(boot_command_line, "nopti"))
+ goto disable;
+
+skip:
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+ goto disable;
+
+enable:
+ if (enable)
+ setup_force_cpu_cap(X86_FEATURE_KAISER);
+
+ return;
+
+disable:
+ pr_info("Kernel/User page tables isolation: disabled\n");
+ kaiser_enabled = 0;
+ setup_clear_cpu_cap(X86_FEATURE_KAISER);
+}
+
/*
* If anything in here fails, we will likely die on one of the
* first kernel->user transitions and init will die. But, we
@@ -275,12 +313,10 @@ void __init kaiser_init(void)
{
int cpu;
- if (!kaiser_enabled) {
- setup_clear_cpu_cap(X86_FEATURE_KAISER);
- return;
- }
+ kaiser_check_boottime_disable();
- setup_force_cpu_cap(X86_FEATURE_KAISER);
+ if (!kaiser_enabled)
+ return;
kaiser_init_all_pgds();
@@ -424,16 +460,3 @@ void kaiser_flush_tlb_on_return_to_user(
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
-
-static int __init x86_nokaiser_setup(char *s)
-{
- /* nopti doesn't accept parameters */
- if (s)
- return -EINVAL;
-
- kaiser_enabled = 0;
- pr_info("Kernel/User page tables isolation: disabled\n");
-
- return 0;
-}
-early_param("nopti", x86_nokaiser_setup);
Patches currently in stable-queue which might be from bp(a)suse.de are
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/x86-kaiser-reenable-paravirt.patch
queue-4.4/x86-kaiser-rename-and-simplify-x86_feature_kaiser-handling.patch
queue-4.4/x86-kaiser-check-boottime-cmdline-params.patch
queue-4.4/x86-kaiser-move-feature-detection-up.patch
This is a note to let you know that I've just added the patch titled
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 27 Aug 2017 16:24:27 -0700
Subject: kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
From: Hugh Dickins <hughd(a)google.com>
Mostly this commit is just unshouting X86_CR3_PCID_KERN_VAR and
X86_CR3_PCID_USER_VAR: we usually name variables in lower-case.
But why does x86_cr3_pcid_noflush need to be __aligned(PAGE_SIZE)?
Ah, it's a leftover from when kaiser_add_user_map() once complained
about mapping the same page twice. Make it __read_mostly instead.
(I'm a little uneasy about all the unrelated data which shares its
page getting user-mapped too, but that was so before, and not a big
deal: though we call it user-mapped, it's not mapped with _PAGE_USER.)
And there is a little change around the two calls to do_nmi().
Previously they set the NOFLUSH bit (if PCID supported) when
forcing to kernel context before do_nmi(); now they also have the
NOFLUSH bit set (if PCID supported) when restoring context after:
nothing done in do_nmi() should require a TLB to be flushed here.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 8 ++++----
arch/x86/include/asm/kaiser.h | 11 +++++------
arch/x86/mm/kaiser.c | 13 +++++++------
3 files changed, 16 insertions(+), 16 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1290,11 +1290,11 @@ ENTRY(nmi)
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
- /* Add back kernel PCID and "no flush" bit */
- orq X86_CR3_PCID_KERN_VAR, %rax
movq %rax, %cr3
#endif
call do_nmi
@@ -1534,11 +1534,11 @@ end_repeat_nmi:
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
+ /* If PCID enabled, NOFLUSH now and NOFLUSH on return */
+ orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
- /* Add back kernel PCID and "no flush" bit */
- orq X86_CR3_PCID_KERN_VAR, %rax
movq %rax, %cr3
#endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -25,7 +25,7 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq X86_CR3_PCID_KERN_VAR, \reg
+orq x86_cr3_pcid_noflush, \reg
movq \reg, %cr3
.endm
@@ -37,11 +37,10 @@ movq \reg, %cr3
* not enabled): so that the one register can update both memory and cr3.
*/
movq %cr3, \reg
-andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
+orq PER_CPU_VAR(x86_cr3_pcid_user), \reg
js 9f
/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
-movb \regb, PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
9:
movq \reg, %cr3
.endm
@@ -94,8 +93,8 @@ movq PER_CPU_VAR(unsafe_stack_register_b
*/
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
-extern unsigned long X86_CR3_PCID_KERN_VAR;
-DECLARE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+extern unsigned long x86_cr3_pcid_noflush;
+DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -29,8 +29,8 @@ DEFINE_PER_CPU_USER_MAPPED(unsigned long
* This is also handy because systems that do not support PCIDs
* just end up or'ing a 0 into their CR3, which does no harm.
*/
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR;
-DEFINE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+unsigned long x86_cr3_pcid_noflush __read_mostly;
+DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
/*
* At runtime, the only things we map are some things for CPU
@@ -304,7 +304,8 @@ void __init kaiser_init(void)
sizeof(gate_desc) * NR_VECTORS,
__PAGE_KERNEL);
- kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
+ kaiser_add_user_map_early(&x86_cr3_pcid_noflush,
+ sizeof(x86_cr3_pcid_noflush),
__PAGE_KERNEL);
}
@@ -384,8 +385,8 @@ void kaiser_setup_pcid(void)
* These variables are used by the entry/exit
* code to change PCID and pgd and TLB flushing.
*/
- X86_CR3_PCID_KERN_VAR = kern_cr3;
- this_cpu_write(X86_CR3_PCID_USER_VAR, user_cr3);
+ x86_cr3_pcid_noflush = kern_cr3;
+ this_cpu_write(x86_cr3_pcid_user, user_cr3);
}
/*
@@ -395,7 +396,7 @@ void kaiser_setup_pcid(void)
*/
void kaiser_flush_tlb_on_return_to_user(void)
{
- this_cpu_write(X86_CR3_PCID_USER_VAR,
+ this_cpu_write(x86_cr3_pcid_user,
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 9 Sep 2017 21:27:32 -0700
Subject: kaiser: vmstat show NR_KAISERTABLE as nr_overhead
From: Hugh Dickins <hughd(a)google.com>
The kaiser update made an interesting choice, never to free any shadow
page tables. Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing. Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).
But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).
Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).
In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too). But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 16 +++++++++++-----
include/linux/mmzone.h | 3 ++-
mm/vmstat.c | 1 +
3 files changed, 14 insertions(+), 6 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -122,9 +122,11 @@ static pte_t *kaiser_pagetable_walk(unsi
if (!new_pmd_page)
return NULL;
spin_lock(&shadow_table_allocation_lock);
- if (pud_none(*pud))
+ if (pud_none(*pud)) {
set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
- else
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pmd_page), NR_KAISERTABLE);
+ } else
free_page(new_pmd_page);
spin_unlock(&shadow_table_allocation_lock);
}
@@ -140,9 +142,11 @@ static pte_t *kaiser_pagetable_walk(unsi
if (!new_pte_page)
return NULL;
spin_lock(&shadow_table_allocation_lock);
- if (pmd_none(*pmd))
+ if (pmd_none(*pmd)) {
set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
- else
+ __inc_zone_page_state(virt_to_page((void *)
+ new_pte_page), NR_KAISERTABLE);
+ } else
free_page(new_pte_page);
spin_unlock(&shadow_table_allocation_lock);
}
@@ -206,11 +210,13 @@ static void __init kaiser_init_all_pgds(
pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
pgd_t new_pgd;
- pud_t *pud = pud_alloc_one(&init_mm, PAGE_OFFSET + i * PGDIR_SIZE);
+ pud_t *pud = pud_alloc_one(&init_mm,
+ PAGE_OFFSET + i * PGDIR_SIZE);
if (!pud) {
WARN_ON(1);
break;
}
+ inc_zone_page_state(virt_to_page(pud), NR_KAISERTABLE);
new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
/*
* Make sure not to stomp on some other pgd entry.
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -131,8 +131,9 @@ enum zone_stat_item {
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE, /* used for pagetables */
- NR_KERNEL_STACK,
/* Second 128 byte cacheline */
+ NR_KERNEL_STACK,
+ NR_KAISERTABLE,
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -736,6 +736,7 @@ const char * const vmstat_text[] = {
"nr_slab_unreclaimable",
"nr_page_table_pages",
"nr_kernel_stack",
+ "nr_overhead",
"nr_unstable",
"nr_bounce",
"nr_vmscan_write",
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Tue, 3 Oct 2017 20:49:04 -0700
Subject: kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
From: Hugh Dickins <hughd(a)google.com>
Now that we're playing the ALTERNATIVE game, use that more efficient
method: instead of user-mapping an extra page, and reading an extra
cacheline each time for x86_cr3_pcid_noflush.
Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax)
is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs;
but the one line with $63 in looks clearer, so let's stick with that.
Worried about what happens with an ALTERNATIVE between the jump and
jump label in another ALTERNATIVE? I was, but have checked the
combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64,
and it does a good job.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 7 ++++---
arch/x86/include/asm/kaiser.h | 6 +++---
arch/x86/mm/kaiser.c | 11 +----------
3 files changed, 8 insertions(+), 16 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1056,7 +1056,8 @@ ENTRY(paranoid_entry)
jz 2f
orl $2, %ebx
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
- orq x86_cr3_pcid_noflush, %rax
+ /* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
movq %rax, %cr3
2:
#endif
@@ -1318,7 +1319,7 @@ ENTRY(nmi)
/* %rax is saved above, so OK to clobber here */
ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
- orq x86_cr3_pcid_noflush, %rax
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
@@ -1562,7 +1563,7 @@ end_repeat_nmi:
/* %rax is saved above, so OK to clobber here */
ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
- orq x86_cr3_pcid_noflush, %rax
+ ALTERNATIVE "", "bts $63, %rax", X86_FEATURE_PCID
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -25,7 +25,8 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-orq x86_cr3_pcid_noflush, \reg
+/* If PCID enabled, set X86_CR3_PCID_NOFLUSH_BIT */
+ALTERNATIVE "", "bts $63, \reg", X86_FEATURE_PCID
movq \reg, %cr3
.endm
@@ -39,7 +40,7 @@ movq \reg, %cr3
movq %cr3, \reg
orq PER_CPU_VAR(x86_cr3_pcid_user), \reg
js 9f
-/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
+/* If PCID enabled, FLUSH this time, reset to NOFLUSH for next time */
movb \regb, PER_CPU_VAR(x86_cr3_pcid_user+7)
9:
movq \reg, %cr3
@@ -90,7 +91,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
*/
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
-extern unsigned long x86_cr3_pcid_noflush;
DECLARE_PER_CPU(unsigned long, x86_cr3_pcid_user);
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -32,7 +32,6 @@ DEFINE_PER_CPU_USER_MAPPED(unsigned long
* This is also handy because systems that do not support PCIDs
* just end up or'ing a 0 into their CR3, which does no harm.
*/
-unsigned long x86_cr3_pcid_noflush __read_mostly;
DEFINE_PER_CPU(unsigned long, x86_cr3_pcid_user);
/*
@@ -357,10 +356,6 @@ void __init kaiser_init(void)
kaiser_add_user_map_early(&debug_idt_table,
sizeof(gate_desc) * NR_VECTORS,
__PAGE_KERNEL);
-
- kaiser_add_user_map_early(&x86_cr3_pcid_noflush,
- sizeof(x86_cr3_pcid_noflush),
- __PAGE_KERNEL);
}
/* Add a mapping to the shadow mapping, and synchronize the mappings */
@@ -434,18 +429,14 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp,
void kaiser_setup_pcid(void)
{
- unsigned long kern_cr3 = 0;
unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
- if (this_cpu_has(X86_FEATURE_PCID)) {
- kern_cr3 |= X86_CR3_PCID_KERN_NOFLUSH;
+ if (this_cpu_has(X86_FEATURE_PCID))
user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
- }
/*
* These variables are used by the entry/exit
* code to change PCID and pgd and TLB flushing.
*/
- x86_cr3_pcid_noflush = kern_cr3;
this_cpu_write(x86_cr3_pcid_user, user_cr3);
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: tidied up kaiser_add/remove_mapping slightly
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 19:23:08 -0700
Subject: kaiser: tidied up kaiser_add/remove_mapping slightly
From: Hugh Dickins <hughd(a)google.com>
Yes, unmap_pud_range_nofree()'s declaration ought to be in a
header file really, but I'm not sure we want to use it anyway:
so for now just declare it inside kaiser_remove_mapping().
And there doesn't seem to be such a thing as unmap_p4d_range(),
even in a 5-level paging tree.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -286,8 +286,7 @@ void __init kaiser_init(void)
__PAGE_KERNEL);
}
-extern void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end);
-// add a mapping to the shadow-mapping, and synchronize the mappings
+/* Add a mapping to the shadow mapping, and synchronize the mappings */
int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
{
return kaiser_add_user_map((const void *)addr, size, flags);
@@ -295,15 +294,13 @@ int kaiser_add_mapping(unsigned long add
void kaiser_remove_mapping(unsigned long start, unsigned long size)
{
+ extern void unmap_pud_range_nofree(pgd_t *pgd,
+ unsigned long start, unsigned long end);
unsigned long end = start + size;
unsigned long addr;
for (addr = start; addr < end; addr += PGDIR_SIZE) {
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
- /*
- * unmap_p4d_range() handles > P4D_SIZE unmaps,
- * so no need to trim 'end'.
- */
unmap_pud_range_nofree(pgd, addr, end);
}
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: tidied up asm/kaiser.h somewhat
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-tidied-up-asm-kaiser.h-somewhat.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 19:18:07 -0700
Subject: kaiser: tidied up asm/kaiser.h somewhat
From: Hugh Dickins <hughd(a)google.com>
Mainly deleting a surfeit of blank lines, and reflowing header comment.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -1,15 +1,17 @@
#ifndef _ASM_X86_KAISER_H
#define _ASM_X86_KAISER_H
-
-/* This file includes the definitions for the KAISER feature.
- * KAISER is a counter measure against x86_64 side channel attacks on the kernel virtual memory.
- * It has a shodow-pgd for every process. the shadow-pgd has a minimalistic kernel-set mapped,
- * but includes the whole user memory. Within a kernel context switch, or when an interrupt is handled,
- * the pgd is switched to the normal one. When the system switches to user mode, the shadow pgd is enabled.
- * By this, the virtual memory chaches are freed, and the user may not attack the whole kernel memory.
+/*
+ * This file includes the definitions for the KAISER feature.
+ * KAISER is a counter measure against x86_64 side channel attacks on
+ * the kernel virtual memory. It has a shadow pgd for every process: the
+ * shadow pgd has a minimalistic kernel-set mapped, but includes the whole
+ * user memory. Within a kernel context switch, or when an interrupt is handled,
+ * the pgd is switched to the normal one. When the system switches to user mode,
+ * the shadow pgd is enabled. By this, the virtual memory caches are freed,
+ * and the user may not attack the whole kernel memory.
*
- * A minimalistic kernel mapping holds the parts needed to be mapped in user mode, as the entry/exit functions
- * of the user space, or the stacks.
+ * A minimalistic kernel mapping holds the parts needed to be mapped in user
+ * mode, such as the entry/exit functions of the user space, or the stacks.
*/
#ifdef __ASSEMBLY__
#ifdef CONFIG_KAISER
@@ -48,13 +50,10 @@ _SWITCH_TO_KERNEL_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
.endm
-
.macro SWITCH_USER_CR3_NO_STACK
-
movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
_SWITCH_TO_USER_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-
.endm
#else /* CONFIG_KAISER */
@@ -72,7 +71,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
#else /* __ASSEMBLY__ */
-
#ifdef CONFIG_KAISER
/*
* Upon kernel/user mode switch, it may happen that the address
@@ -80,7 +78,6 @@ movq PER_CPU_VAR(unsafe_stack_register_b
* stored. To change the address space, another register is
* needed. A register therefore has to be stored/restored.
*/
-
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
/**
@@ -95,7 +92,6 @@ DECLARE_PER_CPU_USER_MAPPED(unsigned lon
*/
extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
-
/**
* kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
* @addr: the start address of the range
@@ -104,12 +100,12 @@ extern int kaiser_add_mapping(unsigned l
extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
/**
- * kaiser_initialize_mapping - Initalize the shadow mapping
+ * kaiser_init - Initialize the shadow mapping
*
* Most parts of the shadow mapping can be mapped upon boot
* time. Only per-process things like the thread stacks
* or a new LDT have to be mapped at runtime. These boot-
- * time mappings are permanent and nevertunmapped.
+ * time mappings are permanent and never unmapped.
*/
extern void kaiser_init(void);
@@ -117,6 +113,4 @@ extern void kaiser_init(void);
#endif /* __ASSEMBLY */
-
-
#endif /* _ASM_X86_KAISER_H */
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-stack-map-page_size-at-thread_size-page_size.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 18:57:03 -0700
Subject: kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
From: Hugh Dickins <hughd(a)google.com>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/kaiser.h | 40 +++++++++++++++++++++++++++++++++-------
init/main.c | 6 +-----
kernel/fork.c | 7 ++-----
3 files changed, 36 insertions(+), 17 deletions(-)
--- a/include/linux/kaiser.h
+++ b/include/linux/kaiser.h
@@ -1,26 +1,52 @@
-#ifndef _INCLUDE_KAISER_H
-#define _INCLUDE_KAISER_H
+#ifndef _LINUX_KAISER_H
+#define _LINUX_KAISER_H
#ifdef CONFIG_KAISER
#include <asm/kaiser.h>
+
+static inline int kaiser_map_thread_stack(void *stack)
+{
+ /*
+ * Map that page of kernel stack on which we enter from user context.
+ */
+ return kaiser_add_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE, __PAGE_KERNEL);
+}
+
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+ /*
+ * Note: may be called even when kaiser_map_thread_stack() failed.
+ */
+ kaiser_remove_mapping((unsigned long)stack +
+ THREAD_SIZE - PAGE_SIZE, PAGE_SIZE);
+}
#else
/*
* These stubs are used whenever CONFIG_KAISER is off, which
- * includes architectures that support KAISER, but have it
- * disabled.
+ * includes architectures that support KAISER, but have it disabled.
*/
static inline void kaiser_init(void)
{
}
-static inline void kaiser_remove_mapping(unsigned long start, unsigned long size)
+static inline int kaiser_add_mapping(unsigned long addr,
+ unsigned long size, unsigned long flags)
+{
+ return 0;
+}
+static inline void kaiser_remove_mapping(unsigned long start,
+ unsigned long size)
{
}
-static inline int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+static inline int kaiser_map_thread_stack(void *stack)
{
return 0;
}
+static inline void kaiser_unmap_thread_stack(void *stack)
+{
+}
#endif /* !CONFIG_KAISER */
-#endif /* _INCLUDE_KAISER_H */
+#endif /* _LINUX_KAISER_H */
--- a/init/main.c
+++ b/init/main.c
@@ -81,15 +81,13 @@
#include <linux/integrity.h>
#include <linux/proc_ns.h>
#include <linux/io.h>
+#include <linux/kaiser.h>
#include <asm/io.h>
#include <asm/bugs.h>
#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/cacheflush.h>
-#ifdef CONFIG_KAISER
-#include <asm/kaiser.h>
-#endif
static int kernel_init(void *);
@@ -495,9 +493,7 @@ static void __init mm_init(void)
pgtable_init();
vmalloc_init();
ioremap_huge_init();
-#ifdef CONFIG_KAISER
kaiser_init();
-#endif
}
asmlinkage __visible void __init start_kernel(void)
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -168,12 +168,9 @@ static struct thread_info *alloc_thread_
return page ? page_address(page) : NULL;
}
-extern void kaiser_remove_mapping(unsigned long start_addr, unsigned long size);
static inline void free_thread_info(struct thread_info *ti)
{
-#ifdef CONFIG_KAISER
- kaiser_remove_mapping((unsigned long)ti, THREAD_SIZE);
-#endif
+ kaiser_unmap_thread_stack(ti);
free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
}
# else
@@ -358,7 +355,7 @@ static struct task_struct *dup_task_stru
tsk->stack = ti;
- err= kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+ err = kaiser_map_thread_stack(tsk->stack);
if (err)
goto free_ti;
#ifdef CONFIG_SECCOMP
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: PCID 0 for kernel and 128 for user
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-pcid-0-for-kernel-and-128-for-user.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Fri, 8 Sep 2017 19:26:30 -0700
Subject: kaiser: PCID 0 for kernel and 128 for user
From: Hugh Dickins <hughd(a)google.com>
Why was 4 chosen for kernel PCID and 6 for user PCID?
No good reason in a backport where PCIDs are only used for Kaiser.
If we continue with those, then we shall need to add Andy Lutomirski's
4.13 commit 6c690ee1039b ("x86/mm: Split read_cr3() into read_cr3_pa()
and __read_cr3()"), which deals with the problem of read_cr3() callers
finding stray bits in the cr3 that they expected to be page-aligned;
and for hibernation, his 4.14 commit f34902c5c6c0 ("x86/hibernate/64:
Mask off CR3's PCID bits in the saved CR3").
But if 0 is used for kernel PCID, then there's no need to add in those
commits - whenever the kernel looks, it sees 0 in the lower bits; and
0 for kernel seems an obvious choice.
And I naughtily propose 128 for user PCID. Because there's a place
in _SWITCH_TO_USER_CR3 where it takes note of the need for TLB FLUSH,
but needs to reset that to NOFLUSH for the next occasion. Currently
it does so with a "movb $(0x80)" into the high byte of the per-cpu
quadword, but that will cause a machine without PCID support to crash.
Now, if %al just happened to have 0x80 in it at that point, on a
machine with PCID support, but 0 on a machine without PCID support...
(That will go badly wrong once the pgd can be at a physical address
above 2^56, but even with 5-level paging, physical goes up to 2^52.)
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 19 ++++++++++++-------
arch/x86/include/asm/pgtable_types.h | 7 ++++---
arch/x86/mm/tlb.c | 3 +++
3 files changed, 19 insertions(+), 10 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -29,14 +29,19 @@ orq X86_CR3_PCID_KERN_VAR, \reg
movq \reg, %cr3
.endm
-.macro _SWITCH_TO_USER_CR3 reg
+.macro _SWITCH_TO_USER_CR3 reg regb
+/*
+ * regb must be the low byte portion of reg: because we have arranged
+ * for the low byte of the user PCID to serve as the high byte of NOFLUSH
+ * (0x80 for each when PCID is enabled, or 0x00 when PCID and NOFLUSH are
+ * not enabled): so that the one register can update both memory and cr3.
+ */
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
orq PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
js 9f
-// FLUSH this time, reset to NOFLUSH for next time
-// But if nopcid? Consider using 0x80 for user pcid?
-movb $(0x80), PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+/* FLUSH this time, reset to NOFLUSH for next time (if PCID enabled) */
+movb \regb, PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
9:
movq \reg, %cr3
.endm
@@ -49,7 +54,7 @@ popq %rax
.macro SWITCH_USER_CR3
pushq %rax
-_SWITCH_TO_USER_CR3 %rax
+_SWITCH_TO_USER_CR3 %rax %al
popq %rax
.endm
@@ -61,7 +66,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
.macro SWITCH_USER_CR3_NO_STACK
movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax
+_SWITCH_TO_USER_CR3 %rax %al
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
.endm
@@ -69,7 +74,7 @@ movq PER_CPU_VAR(unsafe_stack_register_b
.macro SWITCH_KERNEL_CR3 reg
.endm
-.macro SWITCH_USER_CR3 reg
+.macro SWITCH_USER_CR3 reg regb
.endm
.macro SWITCH_USER_CR3_NO_STACK
.endm
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -111,16 +111,17 @@
/* Mask for all the PCID-related bits in CR3: */
#define X86_CR3_PCID_MASK (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_MASK)
+#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
+
#if defined(CONFIG_KAISER) && defined(CONFIG_X86_64)
-#define X86_CR3_PCID_ASID_KERN (_AC(0x4,UL))
-#define X86_CR3_PCID_ASID_USER (_AC(0x6,UL))
+/* Let X86_CR3_PCID_ASID_USER be usable for the X86_CR3_PCID_NOFLUSH bit */
+#define X86_CR3_PCID_ASID_USER (_AC(0x80,UL))
#define X86_CR3_PCID_KERN_FLUSH (X86_CR3_PCID_ASID_KERN)
#define X86_CR3_PCID_USER_FLUSH (X86_CR3_PCID_ASID_USER)
#define X86_CR3_PCID_KERN_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_KERN)
#define X86_CR3_PCID_USER_NOFLUSH (X86_CR3_PCID_NOFLUSH | X86_CR3_PCID_ASID_USER)
#else
-#define X86_CR3_PCID_ASID_KERN (_AC(0x0,UL))
#define X86_CR3_PCID_ASID_USER (_AC(0x0,UL))
/*
* PCIDs are unsupported on 32-bit and none of these bits can be
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -50,6 +50,9 @@ static void load_new_mm_cr3(pgd_t *pgdir
* invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
* do it here, but can only be used if X86_FEATURE_INVPCID is
* available - and many machines support pcid without invpcid.
+ *
+ * The line below is a no-op: X86_CR3_PCID_KERN_FLUSH is now 0;
+ * but keep that line in there in case something changes.
*/
new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
kaiser_flush_tlb_on_return_to_user();
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: paranoid_entry pass cr3 need to paranoid_exit
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Tue, 26 Sep 2017 18:43:07 -0700
Subject: kaiser: paranoid_entry pass cr3 need to paranoid_exit
From: Hugh Dickins <hughd(a)google.com>
Neel Natu points out that paranoid_entry() was wrong to assume that
an entry that did not need swapgs would not need SWITCH_KERNEL_CR3:
paranoid_entry (used for debug breakpoint, int3, double fault or MCE;
though I think it's only the MCE case that is cause for concern here)
can break in at an awkward time, between cr3 switch and swapgs, but
its handling always needs kernel gs and kernel cr3.
Easy to fix in itself, but paranoid_entry() also needs to convey to
paranoid_exit() (and my reading of macro idtentry says paranoid_entry
and paranoid_exit are always paired) how to restore the prior state.
The swapgs state is already conveyed by %ebx (0 or 1), so extend that
also to convey when SWITCH_USER_CR3 will be needed (2 or 3).
(Yes, I'd much prefer that 0 meant no swapgs, whereas it's the other
way round: and a convention shared with error_entry() and error_exit(),
which I don't want to touch. Perhaps I should have inverted the bit
for switch cr3 too, but did not.)
paranoid_exit() would be straightforward, except for TRACE_IRQS: it
did TRACE_IRQS_IRETQ when doing swapgs, but TRACE_IRQS_IRETQ_DEBUG
when not: which is it supposed to use when SWITCH_USER_CR3 is split
apart from that? As best as I can determine, commit 5963e317b1e9
("ftrace/x86: Do not change stacks in DEBUG when calling lockdep")
missed the swapgs case, and should have used TRACE_IRQS_IRETQ_DEBUG
there too (the discrepancy has nothing to do with the liberal use
of _NO_STACK and _UNSAFE_STACK hereabouts: TRACE_IRQS_OFF_DEBUG has
just been used in all cases); discrepancy lovingly preserved across
several paranoid_exit() cleanups, but I'm now removing it.
Neel further indicates that to use SWITCH_USER_CR3_NO_STACK there in
paranoid_exit() is now not only unnecessary but unsafe: might corrupt
syscall entry's unsafe_stack_register_backup of %rax. Just use
SWITCH_USER_CR3: and delete SWITCH_USER_CR3_NO_STACK altogether,
before we make the mistake of using it again.
hughd adds: this commit fixes an issue in the Kaiser-without-PCIDs
part of the series, and ought to be moved earlier, if you decided
to make a release of Kaiser-without-PCIDs.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 46 ++++++++++++++++++++++++++++++++----------
arch/x86/include/asm/kaiser.h | 8 -------
2 files changed, 36 insertions(+), 18 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1025,7 +1025,11 @@ idtentry machine_check has_error_cod
/*
* Save all registers in pt_regs, and switch gs if needed.
* Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ *
+ * Return: ebx=0: needs swapgs but not SWITCH_USER_CR3 in paranoid_exit
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3 in paranoid_exit
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3 in paranoid_exit
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs in paranoid_exit
*/
ENTRY(paranoid_entry)
cld
@@ -1037,9 +1041,26 @@ ENTRY(paranoid_entry)
testl %edx, %edx
js 1f /* negative -> in kernel */
SWAPGS
- SWITCH_KERNEL_CR3
xorl %ebx, %ebx
-1: ret
+1:
+#ifdef CONFIG_KAISER
+ /*
+ * We might have come in between a swapgs and a SWITCH_KERNEL_CR3
+ * on entry, or between a SWITCH_USER_CR3 and a swapgs on exit.
+ * Do a conditional SWITCH_KERNEL_CR3: this could safely be done
+ * unconditionally, but we need to find out whether the reverse
+ * should be done on return (conveyed to paranoid_exit in %ebx).
+ */
+ movq %cr3, %rax
+ testl $KAISER_SHADOW_PGD_OFFSET, %eax
+ jz 2f
+ orl $2, %ebx
+ andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
+ orq x86_cr3_pcid_noflush, %rax
+ movq %rax, %cr3
+2:
+#endif
+ ret
END(paranoid_entry)
/*
@@ -1052,20 +1073,25 @@ END(paranoid_entry)
* be complicated. Fortunately, we there's no good reason
* to try to handle preemption here.
*
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry: ebx=0: needs swapgs but not SWITCH_USER_CR3
+ * ebx=1: needs neither swapgs nor SWITCH_USER_CR3
+ * ebx=2: needs both swapgs and SWITCH_USER_CR3
+ * ebx=3: needs SWITCH_USER_CR3 but not swapgs
*/
ENTRY(paranoid_exit)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF_DEBUG
- testl %ebx, %ebx /* swapgs needed? */
+ TRACE_IRQS_IRETQ_DEBUG
+#ifdef CONFIG_KAISER
+ testl $2, %ebx /* SWITCH_USER_CR3 needed? */
+ jz paranoid_exit_no_switch
+ SWITCH_USER_CR3
+paranoid_exit_no_switch:
+#endif
+ testl $1, %ebx /* swapgs needed? */
jnz paranoid_exit_no_swapgs
- TRACE_IRQS_IRETQ
- SWITCH_USER_CR3_NO_STACK
SWAPGS_UNSAFE_STACK
- jmp paranoid_exit_restore
paranoid_exit_no_swapgs:
- TRACE_IRQS_IRETQ_DEBUG
-paranoid_exit_restore:
RESTORE_EXTRA_REGS
RESTORE_C_REGS
REMOVE_PT_GPREGS_FROM_STACK 8
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -63,20 +63,12 @@ _SWITCH_TO_KERNEL_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
.endm
-.macro SWITCH_USER_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
-_SWITCH_TO_USER_CR3 %rax %al
-movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
-.endm
-
#else /* CONFIG_KAISER */
.macro SWITCH_KERNEL_CR3 reg
.endm
.macro SWITCH_USER_CR3 reg regb
.endm
-.macro SWITCH_USER_CR3_NO_STACK
-.endm
.macro SWITCH_KERNEL_CR3_NO_STACK
.endm
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 9 Sep 2017 17:31:18 -0700
Subject: kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
From: Hugh Dickins <hughd(a)google.com>
There's a 0x1000 in various places, which looks better with a name.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 4 ++--
arch/x86/include/asm/kaiser.h | 7 +++++--
2 files changed, 7 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1292,7 +1292,7 @@ ENTRY(nmi)
movq %cr3, %rax
pushq %rax
#ifdef CONFIG_KAISER_REAL_SWITCH
- andq $(~0x1000), %rax
+ andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
#endif
movq %rax, %cr3
#endif
@@ -1535,7 +1535,7 @@ end_repeat_nmi:
movq %cr3, %rax
pushq %rax
#ifdef CONFIG_KAISER_REAL_SWITCH
- andq $(~0x1000), %rax
+ andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
#endif
movq %rax, %cr3
#endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -13,13 +13,16 @@
* A minimalistic kernel mapping holds the parts needed to be mapped in user
* mode, such as the entry/exit functions of the user space, or the stacks.
*/
+
+#define KAISER_SHADOW_PGD_OFFSET 0x1000
+
#ifdef __ASSEMBLY__
#ifdef CONFIG_KAISER
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
#ifdef CONFIG_KAISER_REAL_SWITCH
-andq $(~0x1000), \reg
+andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
#endif
movq \reg, %cr3
.endm
@@ -27,7 +30,7 @@ movq \reg, %cr3
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
#ifdef CONFIG_KAISER_REAL_SWITCH
-orq $(0x1000), \reg
+orq $(KAISER_SHADOW_PGD_OFFSET), \reg
#endif
movq \reg, %cr3
.endm
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: merged update
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-merged-update.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Wed, 30 Aug 2017 16:23:00 -0700
Subject: kaiser: merged update
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 106 ++++++++++-
arch/x86/include/asm/kaiser.h | 43 ++--
arch/x86/include/asm/pgtable.h | 18 +
arch/x86/include/asm/pgtable_64.h | 48 ++++-
arch/x86/include/asm/pgtable_types.h | 6
arch/x86/kernel/espfix_64.c | 13 -
arch/x86/kernel/head_64.S | 19 +-
arch/x86/kernel/ldt.c | 27 ++
arch/x86/kernel/tracepoint.c | 2
arch/x86/mm/kaiser.c | 318 +++++++++++++++++++++++++----------
arch/x86/mm/pageattr.c | 63 +++++-
arch/x86/mm/pgtable.c | 40 +---
include/linux/kaiser.h | 26 ++
kernel/fork.c | 9
security/Kconfig | 5
15 files changed, 553 insertions(+), 190 deletions(-)
create mode 100644 include/linux/kaiser.h
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -212,6 +212,13 @@ entry_SYSCALL_64_fastpath:
movq RIP(%rsp), %rcx
movq EFLAGS(%rsp), %r11
RESTORE_C_REGS_EXCEPT_RCX_R11
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
SWITCH_USER_CR3
movq RSP(%rsp), %rsp
/*
@@ -350,11 +357,25 @@ GLOBAL(int_ret_from_sys_call)
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
RESTORE_C_REGS_EXCEPT_RCX_R11
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
SWITCH_USER_CR3
movq RSP(%rsp), %rsp
USERGS_SYSRET64
opportunistic_sysret_failed:
+ /*
+ * This opens a window where we have a user CR3, but are
+ * running in the kernel. This makes using the CS
+ * register useless for telling whether or not we need to
+ * switch CR3 in NMIs. Normal interrupts are OK because
+ * they are off here.
+ */
SWITCH_USER_CR3
SWAPGS
jmp restore_c_regs_and_iret
@@ -1059,6 +1080,13 @@ ENTRY(error_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+ /*
+ * error_entry() always returns with a kernel gsbase and
+ * CR3. We must also have a kernel CR3/gsbase before
+ * calling TRACE_IRQS_*. Just unconditionally switch to
+ * the kernel CR3 here.
+ */
+ SWITCH_KERNEL_CR3
xorl %ebx, %ebx
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
@@ -1069,7 +1097,6 @@ ENTRY(error_entry)
* from user mode due to an IRET fault.
*/
SWAPGS
- SWITCH_KERNEL_CR3
.Lerror_entry_from_usermode_after_swapgs:
/*
@@ -1122,7 +1149,7 @@ ENTRY(error_entry)
* Switch to kernel gsbase:
*/
SWAPGS
- SWITCH_KERNEL_CR3
+
/*
* Pretend that the exception came from user mode: set up pt_regs
* as if we faulted immediately after IRET and clear EBX so that
@@ -1222,7 +1249,10 @@ ENTRY(nmi)
*/
SWAPGS_UNSAFE_STACK
- SWITCH_KERNEL_CR3_NO_STACK
+ /*
+ * percpu variables are mapped with user CR3, so no need
+ * to switch CR3 here.
+ */
cld
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@@ -1256,14 +1286,33 @@ ENTRY(nmi)
movq %rsp, %rdi
movq $-1, %rsi
+#ifdef CONFIG_KAISER
+ /* Unconditionally use kernel CR3 for do_nmi() */
+ /* %rax is saved above, so OK to clobber here */
+ movq %cr3, %rax
+ pushq %rax
+#ifdef CONFIG_KAISER_REAL_SWITCH
+ andq $(~0x1000), %rax
+#endif
+ movq %rax, %cr3
+#endif
call do_nmi
+ /*
+ * Unconditionally restore CR3. I know we return to
+ * kernel code that needs user CR3, but do we ever return
+ * to "user mode" where we need the kernel CR3?
+ */
+#ifdef CONFIG_KAISER
+ popq %rax
+ mov %rax, %cr3
+#endif
/*
* Return back to user mode. We must *not* do the normal exit
- * work, because we don't want to enable interrupts. Fortunately,
- * do_nmi doesn't modify pt_regs.
+ * work, because we don't want to enable interrupts. Do not
+ * switch to user CR3: we might be going back to kernel code
+ * that had a user CR3 set.
*/
- SWITCH_USER_CR3
SWAPGS
jmp restore_c_regs_and_iret
@@ -1459,23 +1508,54 @@ end_repeat_nmi:
ALLOC_PT_GPREGS_ON_STACK
/*
- * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
- * as we should not be calling schedule in NMI context.
- * Even with normal interrupts enabled. An NMI should not be
- * setting NEED_RESCHED or anything that normal interrupts and
- * exceptions might do.
+ * Use the same approach as paranoid_entry to handle SWAPGS, but
+ * without CR3 handling since we do that differently in NMIs. No
+ * need to use paranoid_exit as we should not be calling schedule
+ * in NMI context. Even with normal interrupts enabled. An NMI
+ * should not be setting NEED_RESCHED or anything that normal
+ * interrupts and exceptions might do.
*/
- call paranoid_entry
+ cld
+ SAVE_C_REGS
+ SAVE_EXTRA_REGS
+ movl $1, %ebx
+ movl $MSR_GS_BASE, %ecx
+ rdmsr
+ testl %edx, %edx
+ js 1f /* negative -> in kernel */
+ SWAPGS
+ xorl %ebx, %ebx
+1:
+#ifdef CONFIG_KAISER
+ /* Unconditionally use kernel CR3 for do_nmi() */
+ /* %rax is saved above, so OK to clobber here */
+ movq %cr3, %rax
+ pushq %rax
+#ifdef CONFIG_KAISER_REAL_SWITCH
+ andq $(~0x1000), %rax
+#endif
+ movq %rax, %cr3
+#endif
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
movq %rsp, %rdi
+ addq $8, %rdi /* point %rdi at ptregs, fixed up for CR3 */
movq $-1, %rsi
call do_nmi
+ /*
+ * Unconditionally restore CR3. We might be returning to
+ * kernel code that needs user CR3, like just just before
+ * a sysret.
+ */
+#ifdef CONFIG_KAISER
+ popq %rax
+ mov %rax, %cr3
+#endif
testl %ebx, %ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
- SWITCH_USER_CR3_NO_STACK
+ /* We fixed up CR3 above, so no need to switch it here */
SWAPGS_UNSAFE_STACK
nmi_restore:
RESTORE_EXTRA_REGS
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -16,13 +16,17 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
+#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~0x1000), \reg
+#endif
movq \reg, %cr3
.endm
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
+#ifdef CONFIG_KAISER_REAL_SWITCH
orq $(0x1000), \reg
+#endif
movq \reg, %cr3
.endm
@@ -65,48 +69,53 @@ movq PER_CPU_VAR(unsafe_stack_register_b
.endm
#endif /* CONFIG_KAISER */
+
#else /* __ASSEMBLY__ */
#ifdef CONFIG_KAISER
-// Upon kernel/user mode switch, it may happen that
-// the address space has to be switched before the registers have been stored.
-// To change the address space, another register is needed.
-// A register therefore has to be stored/restored.
-//
-DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+/*
+ * Upon kernel/user mode switch, it may happen that the address
+ * space has to be switched before the registers have been
+ * stored. To change the address space, another register is
+ * needed. A register therefore has to be stored/restored.
+*/
-#endif /* CONFIG_KAISER */
+DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
/**
- * shadowmem_add_mapping - map a virtual memory part to the shadow mapping
+ * kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
* @addr: the start address of the range
* @size: the size of the range
* @flags: The mapping flags of the pages
*
- * the mapping is done on a global scope, so no bigger synchronization has to be done.
- * the pages have to be manually unmapped again when they are not needed any longer.
+ * The mapping is done on a global scope, so no bigger
+ * synchronization has to be done. the pages have to be
+ * manually unmapped again when they are not needed any longer.
*/
-extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
+extern int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
/**
- * shadowmem_remove_mapping - unmap a virtual memory part of the shadow mapping
+ * kaiser_remove_mapping - unmap a virtual memory part of the shadow mapping
* @addr: the start address of the range
* @size: the size of the range
*/
extern void kaiser_remove_mapping(unsigned long start, unsigned long size);
/**
- * shadowmem_initialize_mapping - Initalize the shadow mapping
+ * kaiser_initialize_mapping - Initalize the shadow mapping
*
- * most parts of the shadow mapping can be mapped upon boot time.
- * only the thread stacks have to be mapped on runtime.
- * the mapped regions are not unmapped at all.
+ * Most parts of the shadow mapping can be mapped upon boot
+ * time. Only per-process things like the thread stacks
+ * or a new LDT have to be mapped at runtime. These boot-
+ * time mappings are permanent and nevertunmapped.
*/
extern void kaiser_init(void);
-#endif
+#endif /* CONFIG_KAISER */
+
+#endif /* __ASSEMBLY */
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -653,7 +653,17 @@ static inline pud_t *pud_offset(pgd_t *p
static inline int pgd_bad(pgd_t pgd)
{
- return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+ pgdval_t ignore_flags = _PAGE_USER;
+ /*
+ * We set NX on KAISER pgds that map userspace memory so
+ * that userspace can not meaningfully use the kernel
+ * page table by accident; it will fault on the first
+ * instruction it tries to run. See native_set_pgd().
+ */
+ if (IS_ENABLED(CONFIG_KAISER))
+ ignore_flags |= _PAGE_NX;
+
+ return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
}
static inline int pgd_none(pgd_t pgd)
@@ -857,8 +867,10 @@ static inline void clone_pgd_range(pgd_t
{
memcpy(dst, src, count * sizeof(pgd_t));
#ifdef CONFIG_KAISER
- // clone the shadow pgd part as well
- memcpy(native_get_shadow_pgd(dst), native_get_shadow_pgd(src), count * sizeof(pgd_t));
+ /* Clone the shadow pgd part as well */
+ memcpy(native_get_shadow_pgd(dst),
+ native_get_shadow_pgd(src),
+ count * sizeof(pgd_t));
#endif
}
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -107,26 +107,58 @@ static inline void native_pud_clear(pud_
}
#ifdef CONFIG_KAISER
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp) {
+static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+{
return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
}
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp) {
+static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+{
return (pgd_t *)(void*)((unsigned long)(void*)pgdp & ~(unsigned long)PAGE_SIZE);
}
+#else
+static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+{
+ BUILD_BUG_ON(1);
+ return NULL;
+}
+static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+{
+ return pgdp;
+}
#endif /* CONFIG_KAISER */
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(void *__ptr)
+{
+ unsigned long ptr = (unsigned long)__ptr;
+
+ return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
+}
+
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
#ifdef CONFIG_KAISER
- // We know that a pgd is page aligned.
- // Therefore the lower indices have to be mapped to user space.
- // These pages are mapped to the shadow mapping.
- if ((((unsigned long)pgdp) % PAGE_SIZE) < (PAGE_SIZE / 2)) {
+ pteval_t extra_kern_pgd_flags = 0;
+ /* Do we need to also populate the shadow pgd? */
+ if (is_userspace_pgd(pgdp)) {
native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ /*
+ * Even if the entry is *mapping* userspace, ensure
+ * that userspace can not use it. This way, if we
+ * get out to userspace running on the kernel CR3,
+ * userspace will crash instead of running.
+ */
+ extra_kern_pgd_flags = _PAGE_NX;
}
-
- pgdp->pgd = pgd.pgd & ~_PAGE_USER;
+ pgdp->pgd = pgd.pgd;
+ pgdp->pgd |= extra_kern_pgd_flags;
#else /* CONFIG_KAISER */
*pgdp = pgd;
#endif
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -42,7 +42,7 @@
#ifdef CONFIG_KAISER
#define _PAGE_GLOBAL (_AT(pteval_t, 0))
#else
-#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
#endif
#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
#define _PAGE_SOFTW2 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
@@ -93,11 +93,7 @@
#define _PAGE_NX (_AT(pteval_t, 0))
#endif
-#ifdef CONFIG_KAISER
-#define _PAGE_PROTNONE (_AT(pteval_t, 0))
-#else
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
-#endif
#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_ACCESSED | _PAGE_DIRTY)
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -127,11 +127,14 @@ void __init init_espfix_bsp(void)
/* Install the espfix pud into the kernel page directory */
pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)];
pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page);
-#ifdef CONFIG_KAISER
- // add the esp stack pud to the shadow mapping here.
- // This can be done directly, because the fixup stack has its own pud
- set_pgd(native_get_shadow_pgd(pgd_p), __pgd(_PAGE_TABLE | __pa((pud_t *)espfix_pud_page)));
-#endif
+ /*
+ * Just copy the top-level PGD that is mapping the espfix
+ * area to ensure it is mapped into the shadow user page
+ * tables.
+ */
+ if (IS_ENABLED(CONFIG_KAISER))
+ set_pgd(native_get_shadow_pgd(pgd_p),
+ __pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
/* Randomize the locations */
init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -442,11 +442,24 @@ early_idt_ripmsg:
GLOBAL(name)
#ifdef CONFIG_KAISER
+/*
+ * Each PGD needs to be 8k long and 8k aligned. We do not
+ * ever go out to userspace with these, so we do not
+ * strictly *need* the second page, but this allows us to
+ * have a single set_pgd() implementation that does not
+ * need to worry about whether it has 4k or 8k to work
+ * with.
+ *
+ * This ensures PGDs are 8k long:
+ */
+#define KAISER_USER_PGD_FILL 512
+/* This ensures they are 8k-aligned: */
#define NEXT_PGD_PAGE(name) \
.balign 2 * PAGE_SIZE; \
GLOBAL(name)
#else
#define NEXT_PGD_PAGE(name) NEXT_PAGE(name)
+#define KAISER_USER_PGD_FILL 0
#endif
/* Automate the creation of 1 to 1 mapping pmd entries */
@@ -461,6 +474,7 @@ GLOBAL(name)
NEXT_PGD_PAGE(early_level4_pgt)
.fill 511,8,0
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -469,7 +483,8 @@ NEXT_PAGE(early_dynamic_pgts)
#ifndef CONFIG_XEN
NEXT_PGD_PAGE(init_level4_pgt)
- .fill 2*512,8,0
+ .fill 512,8,0
+ .fill KAISER_USER_PGD_FILL,8,0
#else
NEXT_PGD_PAGE(init_level4_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -478,6 +493,7 @@ NEXT_PGD_PAGE(init_level4_pgt)
.org init_level4_pgt + L4_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
.quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(level3_ident_pgt)
.quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -488,6 +504,7 @@ NEXT_PAGE(level2_ident_pgt)
*/
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#endif
+ .fill KAISER_USER_PGD_FILL,8,0
NEXT_PAGE(level3_kernel_pgt)
.fill L3_START_KERNEL,8,0
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -18,6 +18,7 @@
#include <linux/uaccess.h>
#include <asm/ldt.h>
+#include <asm/kaiser.h>
#include <asm/desc.h>
#include <asm/mmu_context.h>
#include <asm/syscalls.h>
@@ -34,11 +35,21 @@ static void flush_ldt(void *current_mm)
set_ldt(pc->ldt->entries, pc->ldt->size);
}
+static void __free_ldt_struct(struct ldt_struct *ldt)
+{
+ if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+ vfree(ldt->entries);
+ else
+ free_page((unsigned long)ldt->entries);
+ kfree(ldt);
+}
+
/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
static struct ldt_struct *alloc_ldt_struct(int size)
{
struct ldt_struct *new_ldt;
int alloc_size;
+ int ret = 0;
if (size > LDT_ENTRIES)
return NULL;
@@ -66,6 +77,14 @@ static struct ldt_struct *alloc_ldt_stru
return NULL;
}
+ // FIXME: make kaiser_add_mapping() return an error code
+ // when it fails
+ kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+ __PAGE_KERNEL);
+ if (ret) {
+ __free_ldt_struct(new_ldt);
+ return NULL;
+ }
new_ldt->size = size;
return new_ldt;
}
@@ -92,12 +111,10 @@ static void free_ldt_struct(struct ldt_s
if (likely(!ldt))
return;
+ kaiser_remove_mapping((unsigned long)ldt->entries,
+ ldt->size * LDT_ENTRY_SIZE);
paravirt_free_ldt(ldt->entries, ldt->size);
- if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
- vfree(ldt->entries);
- else
- free_page((unsigned long)ldt->entries);
- kfree(ldt);
+ __free_ldt_struct(ldt);
}
/*
--- a/arch/x86/kernel/tracepoint.c
+++ b/arch/x86/kernel/tracepoint.c
@@ -9,10 +9,12 @@
#include <linux/atomic.h>
atomic_t trace_idt_ctr = ATOMIC_INIT(0);
+__aligned(PAGE_SIZE)
struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
(unsigned long) trace_idt_table };
/* No need to be aligned, but done to keep all IDTs defined the same way. */
+__aligned(PAGE_SIZE)
gate_desc trace_idt_table[NR_VECTORS] __page_aligned_bss;
static int trace_irq_vector_refcount;
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -1,160 +1,306 @@
-
-
+#include <linux/bug.h>
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/string.h>
#include <linux/types.h>
#include <linux/bug.h>
#include <linux/init.h>
+#include <linux/interrupt.h>
#include <linux/spinlock.h>
#include <linux/mm.h>
-
#include <linux/uaccess.h>
+#include <linux/ftrace.h>
+
+#include <asm/kaiser.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/desc.h>
#ifdef CONFIG_KAISER
__visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+/*
+ * At runtime, the only things we map are some things for CPU
+ * hotplug, and stacks for new processes. No two CPUs will ever
+ * be populating the same addresses, so we only need to ensure
+ * that we protect between two CPUs trying to allocate and
+ * populate the same page table page.
+ *
+ * Only take this lock when doing a set_p[4um]d(), but it is not
+ * needed for doing a set_pte(). We assume that only the *owner*
+ * of a given allocation will be doing this for _their_
+ * allocation.
+ *
+ * This ensures that once a system has been running for a while
+ * and there have been stacks all over and these page tables
+ * are fully populated, there will be no further acquisitions of
+ * this lock.
+ */
+static DEFINE_SPINLOCK(shadow_table_allocation_lock);
-/**
- * Get the real ppn from a address in kernel mapping.
- * @param address The virtual adrress
- * @return the physical address
+/*
+ * Returns -1 on error.
*/
-static inline unsigned long get_pa_from_mapping (unsigned long address)
+static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
- pgd = pgd_offset_k(address);
- BUG_ON(pgd_none(*pgd) || pgd_large(*pgd));
+ pgd = pgd_offset_k(vaddr);
+ /*
+ * We made all the kernel PGDs present in kaiser_init().
+ * We expect them to stay that way.
+ */
+ BUG_ON(pgd_none(*pgd));
+ /*
+ * PGDs are either 512GB or 128TB on all x86_64
+ * configurations. We don't handle these.
+ */
+ BUG_ON(pgd_large(*pgd));
+
+ pud = pud_offset(pgd, vaddr);
+ if (pud_none(*pud)) {
+ WARN_ON_ONCE(1);
+ return -1;
+ }
- pud = pud_offset(pgd, address);
- BUG_ON(pud_none(*pud));
+ if (pud_large(*pud))
+ return (pud_pfn(*pud) << PAGE_SHIFT) | (vaddr & ~PUD_PAGE_MASK);
- if (pud_large(*pud)) {
- return (pud_pfn(*pud) << PAGE_SHIFT) | (address & ~PUD_PAGE_MASK);
+ pmd = pmd_offset(pud, vaddr);
+ if (pmd_none(*pmd)) {
+ WARN_ON_ONCE(1);
+ return -1;
}
- pmd = pmd_offset(pud, address);
- BUG_ON(pmd_none(*pmd));
+ if (pmd_large(*pmd))
+ return (pmd_pfn(*pmd) << PAGE_SHIFT) | (vaddr & ~PMD_PAGE_MASK);
- if (pmd_large(*pmd)) {
- return (pmd_pfn(*pmd) << PAGE_SHIFT) | (address & ~PMD_PAGE_MASK);
+ pte = pte_offset_kernel(pmd, vaddr);
+ if (pte_none(*pte)) {
+ WARN_ON_ONCE(1);
+ return -1;
}
- pte = pte_offset_kernel(pmd, address);
- BUG_ON(pte_none(*pte));
-
- return (pte_pfn(*pte) << PAGE_SHIFT) | (address & ~PAGE_MASK);
+ return (pte_pfn(*pte) << PAGE_SHIFT) | (vaddr & ~PAGE_MASK);
}
-void _kaiser_copy (unsigned long start_addr, unsigned long size,
- unsigned long flags)
+/*
+ * This is a relatively normal page table walk, except that it
+ * also tries to allocate page tables pages along the way.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic)
{
- pgd_t *pgd;
- pud_t *pud;
pmd_t *pmd;
- pte_t *pte;
- unsigned long address;
- unsigned long end_addr = start_addr + size;
- unsigned long target_address;
+ pud_t *pud;
+ pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
+ gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- for (address = PAGE_ALIGN(start_addr - (PAGE_SIZE - 1));
- address < PAGE_ALIGN(end_addr); address += PAGE_SIZE) {
- target_address = get_pa_from_mapping(address);
+ might_sleep();
+ if (is_atomic) {
+ gfp &= ~GFP_KERNEL;
+ gfp |= __GFP_HIGH | __GFP_ATOMIC;
+ }
- pgd = native_get_shadow_pgd(pgd_offset_k(address));
+ if (pgd_none(*pgd)) {
+ WARN_ONCE(1, "All shadow pgds should have been populated");
+ return NULL;
+ }
+ BUILD_BUG_ON(pgd_large(*pgd) != 0);
- BUG_ON(pgd_none(*pgd) && "All shadow pgds should be mapped at this time\n");
- BUG_ON(pgd_large(*pgd));
+ pud = pud_offset(pgd, address);
+ /* The shadow page tables do not use large mappings: */
+ if (pud_large(*pud)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pud_none(*pud)) {
+ unsigned long new_pmd_page = __get_free_page(gfp);
+ if (!new_pmd_page)
+ return NULL;
+ spin_lock(&shadow_table_allocation_lock);
+ if (pud_none(*pud))
+ set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
+ else
+ free_page(new_pmd_page);
+ spin_unlock(&shadow_table_allocation_lock);
+ }
- pud = pud_offset(pgd, address);
- if (pud_none(*pud)) {
- set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd_alloc_one(0, address))));
- }
- BUG_ON(pud_large(*pud));
+ pmd = pmd_offset(pud, address);
+ /* The shadow page tables do not use large mappings: */
+ if (pmd_large(*pmd)) {
+ WARN_ON(1);
+ return NULL;
+ }
+ if (pmd_none(*pmd)) {
+ unsigned long new_pte_page = __get_free_page(gfp);
+ if (!new_pte_page)
+ return NULL;
+ spin_lock(&shadow_table_allocation_lock);
+ if (pmd_none(*pmd))
+ set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+ else
+ free_page(new_pte_page);
+ spin_unlock(&shadow_table_allocation_lock);
+ }
- pmd = pmd_offset(pud, address);
- if (pmd_none(*pmd)) {
- set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte_alloc_one_kernel(0, address))));
- }
- BUG_ON(pmd_large(*pmd));
+ return pte_offset_kernel(pmd, address);
+}
- pte = pte_offset_kernel(pmd, address);
+int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+ unsigned long flags)
+{
+ int ret = 0;
+ pte_t *pte;
+ unsigned long start_addr = (unsigned long )__start_addr;
+ unsigned long address = start_addr & PAGE_MASK;
+ unsigned long end_addr = PAGE_ALIGN(start_addr + size);
+ unsigned long target_address;
+
+ for (;address < end_addr; address += PAGE_SIZE) {
+ target_address = get_pa_from_mapping(address);
+ if (target_address == -1) {
+ ret = -EIO;
+ break;
+ }
+ pte = kaiser_pagetable_walk(address, false);
if (pte_none(*pte)) {
set_pte(pte, __pte(flags | target_address));
} else {
- BUG_ON(__pa(pte_page(*pte)) != target_address);
+ pte_t tmp;
+ set_pte(&tmp, __pte(flags | target_address));
+ WARN_ON_ONCE(!pte_same(*pte, tmp));
}
}
+ return ret;
}
-// at first, add a pmd for every pgd entry in the shadowmem-kernel-part of the kernel mapping
-static inline void __init _kaiser_init(void)
+static int kaiser_add_user_map_ptrs(const void *start, const void *end, unsigned long flags)
+{
+ unsigned long size = end - start;
+
+ return kaiser_add_user_map(start, size, flags);
+}
+
+/*
+ * Ensure that the top level of the (shadow) page tables are
+ * entirely populated. This ensures that all processes that get
+ * forked have the same entries. This way, we do not have to
+ * ever go set up new entries in older processes.
+ *
+ * Note: we never free these, so there are no updates to them
+ * after this.
+ */
+static void __init kaiser_init_all_pgds(void)
{
pgd_t *pgd;
int i = 0;
pgd = native_get_shadow_pgd(pgd_offset_k((unsigned long )0));
for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
- set_pgd(pgd + i, __pgd(_PAGE_TABLE |__pa(pud_alloc_one(0, 0))));
+ pgd_t new_pgd;
+ pud_t *pud = pud_alloc_one(&init_mm, PAGE_OFFSET + i * PGDIR_SIZE);
+ if (!pud) {
+ WARN_ON(1);
+ break;
+ }
+ new_pgd = __pgd(_KERNPG_TABLE |__pa(pud));
+ /*
+ * Make sure not to stomp on some other pgd entry.
+ */
+ if (!pgd_none(pgd[i])) {
+ WARN_ON(1);
+ continue;
+ }
+ set_pgd(pgd + i, new_pgd);
}
}
+#define kaiser_add_user_map_early(start, size, flags) do { \
+ int __ret = kaiser_add_user_map(start, size, flags); \
+ WARN_ON(__ret); \
+} while (0)
+
+#define kaiser_add_user_map_ptrs_early(start, end, flags) do { \
+ int __ret = kaiser_add_user_map_ptrs(start, end, flags); \
+ WARN_ON(__ret); \
+} while (0)
+
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
-spinlock_t shadow_table_lock;
+/*
+ * If anything in here fails, we will likely die on one of the
+ * first kernel->user transitions and init will die. But, we
+ * will have most of the kernel up by then and should be able to
+ * get a clean warning out of it. If we BUG_ON() here, we run
+ * the risk of being before we have good console output.
+ */
void __init kaiser_init(void)
{
int cpu;
- spin_lock_init(&shadow_table_lock);
-
- spin_lock(&shadow_table_lock);
- _kaiser_init();
+ kaiser_init_all_pgds();
for_each_possible_cpu(cpu) {
- // map the per cpu user variables
- _kaiser_copy(
- (unsigned long) (__per_cpu_user_mapped_start + per_cpu_offset(cpu)),
- (unsigned long) __per_cpu_user_mapped_end - (unsigned long) __per_cpu_user_mapped_start,
- __PAGE_KERNEL);
- }
-
- // map the entry/exit text section, which is responsible to switch between user- and kernel mode
- _kaiser_copy(
- (unsigned long) __entry_text_start,
- (unsigned long) __entry_text_end - (unsigned long) __entry_text_start,
- __PAGE_KERNEL_RX);
-
- // the fixed map address of the idt_table
- _kaiser_copy(
- (unsigned long) idt_descr.address,
- sizeof(gate_desc) * NR_VECTORS,
- __PAGE_KERNEL_RO);
+ void *percpu_vaddr = __per_cpu_user_mapped_start +
+ per_cpu_offset(cpu);
+ unsigned long percpu_sz = __per_cpu_user_mapped_end -
+ __per_cpu_user_mapped_start;
+ kaiser_add_user_map_early(percpu_vaddr, percpu_sz,
+ __PAGE_KERNEL);
+ }
- spin_unlock(&shadow_table_lock);
+ /*
+ * Map the entry/exit text section, which is needed at
+ * switches from user to and from kernel.
+ */
+ kaiser_add_user_map_ptrs_early(__entry_text_start, __entry_text_end,
+ __PAGE_KERNEL_RX);
+
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
+ kaiser_add_user_map_ptrs_early(__irqentry_text_start,
+ __irqentry_text_end,
+ __PAGE_KERNEL_RX);
+#endif
+ kaiser_add_user_map_early((void *)idt_descr.address,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL_RO);
+#ifdef CONFIG_TRACING
+ kaiser_add_user_map_early(&trace_idt_descr,
+ sizeof(trace_idt_descr),
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&trace_idt_table,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL);
+#endif
+ kaiser_add_user_map_early(&debug_idt_descr, sizeof(debug_idt_descr),
+ __PAGE_KERNEL);
+ kaiser_add_user_map_early(&debug_idt_table,
+ sizeof(gate_desc) * NR_VECTORS,
+ __PAGE_KERNEL);
}
+extern void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end);
// add a mapping to the shadow-mapping, and synchronize the mappings
-void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
{
- spin_lock(&shadow_table_lock);
- _kaiser_copy(addr, size, flags);
- spin_unlock(&shadow_table_lock);
+ return kaiser_add_user_map((const void *)addr, size, flags);
}
-extern void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end);
void kaiser_remove_mapping(unsigned long start, unsigned long size)
{
- pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(start));
- spin_lock(&shadow_table_lock);
- do {
- unmap_pud_range(pgd, start, start + size);
- } while (pgd++ != native_get_shadow_pgd(pgd_offset_k(start + size)));
- spin_unlock(&shadow_table_lock);
+ unsigned long end = start + size;
+ unsigned long addr;
+
+ for (addr = start; addr < end; addr += PGDIR_SIZE) {
+ pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
+ /*
+ * unmap_p4d_range() handles > P4D_SIZE unmaps,
+ * so no need to trim 'end'.
+ */
+ unmap_pud_range_nofree(pgd, addr, end);
+ }
}
#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -52,6 +52,7 @@ static DEFINE_SPINLOCK(cpa_lock);
#define CPA_FLUSHTLB 1
#define CPA_ARRAY 2
#define CPA_PAGES_ARRAY 4
+#define CPA_FREE_PAGETABLES 8
#ifdef CONFIG_PROC_FS
static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -723,10 +724,13 @@ static int split_large_page(struct cpa_d
return 0;
}
-static bool try_to_free_pte_page(pte_t *pte)
+static bool try_to_free_pte_page(struct cpa_data *cpa, pte_t *pte)
{
int i;
+ if (!(cpa->flags & CPA_FREE_PAGETABLES))
+ return false;
+
for (i = 0; i < PTRS_PER_PTE; i++)
if (!pte_none(pte[i]))
return false;
@@ -735,10 +739,13 @@ static bool try_to_free_pte_page(pte_t *
return true;
}
-static bool try_to_free_pmd_page(pmd_t *pmd)
+static bool try_to_free_pmd_page(struct cpa_data *cpa, pmd_t *pmd)
{
int i;
+ if (!(cpa->flags & CPA_FREE_PAGETABLES))
+ return false;
+
for (i = 0; i < PTRS_PER_PMD; i++)
if (!pmd_none(pmd[i]))
return false;
@@ -759,7 +766,9 @@ static bool try_to_free_pud_page(pud_t *
return true;
}
-static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
+static bool unmap_pte_range(struct cpa_data *cpa, pmd_t *pmd,
+ unsigned long start,
+ unsigned long end)
{
pte_t *pte = pte_offset_kernel(pmd, start);
@@ -770,22 +779,23 @@ static bool unmap_pte_range(pmd_t *pmd,
pte++;
}
- if (try_to_free_pte_page((pte_t *)pmd_page_vaddr(*pmd))) {
+ if (try_to_free_pte_page(cpa, (pte_t *)pmd_page_vaddr(*pmd))) {
pmd_clear(pmd);
return true;
}
return false;
}
-static void __unmap_pmd_range(pud_t *pud, pmd_t *pmd,
+static void __unmap_pmd_range(struct cpa_data *cpa, pud_t *pud, pmd_t *pmd,
unsigned long start, unsigned long end)
{
- if (unmap_pte_range(pmd, start, end))
- if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+ if (unmap_pte_range(cpa, pmd, start, end))
+ if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
pud_clear(pud);
}
-static void unmap_pmd_range(pud_t *pud, unsigned long start, unsigned long end)
+static void unmap_pmd_range(struct cpa_data *cpa, pud_t *pud,
+ unsigned long start, unsigned long end)
{
pmd_t *pmd = pmd_offset(pud, start);
@@ -796,7 +806,7 @@ static void unmap_pmd_range(pud_t *pud,
unsigned long next_page = (start + PMD_SIZE) & PMD_MASK;
unsigned long pre_end = min_t(unsigned long, end, next_page);
- __unmap_pmd_range(pud, pmd, start, pre_end);
+ __unmap_pmd_range(cpa, pud, pmd, start, pre_end);
start = pre_end;
pmd++;
@@ -809,7 +819,8 @@ static void unmap_pmd_range(pud_t *pud,
if (pmd_large(*pmd))
pmd_clear(pmd);
else
- __unmap_pmd_range(pud, pmd, start, start + PMD_SIZE);
+ __unmap_pmd_range(cpa, pud, pmd,
+ start, start + PMD_SIZE);
start += PMD_SIZE;
pmd++;
@@ -819,17 +830,19 @@ static void unmap_pmd_range(pud_t *pud,
* 4K leftovers?
*/
if (start < end)
- return __unmap_pmd_range(pud, pmd, start, end);
+ return __unmap_pmd_range(cpa, pud, pmd, start, end);
/*
* Try again to free the PMD page if haven't succeeded above.
*/
if (!pud_none(*pud))
- if (try_to_free_pmd_page((pmd_t *)pud_page_vaddr(*pud)))
+ if (try_to_free_pmd_page(cpa, (pmd_t *)pud_page_vaddr(*pud)))
pud_clear(pud);
}
-void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+static void __unmap_pud_range(struct cpa_data *cpa, pgd_t *pgd,
+ unsigned long start,
+ unsigned long end)
{
pud_t *pud = pud_offset(pgd, start);
@@ -840,7 +853,7 @@ void unmap_pud_range(pgd_t *pgd, unsigne
unsigned long next_page = (start + PUD_SIZE) & PUD_MASK;
unsigned long pre_end = min_t(unsigned long, end, next_page);
- unmap_pmd_range(pud, start, pre_end);
+ unmap_pmd_range(cpa, pud, start, pre_end);
start = pre_end;
pud++;
@@ -854,7 +867,7 @@ void unmap_pud_range(pgd_t *pgd, unsigne
if (pud_large(*pud))
pud_clear(pud);
else
- unmap_pmd_range(pud, start, start + PUD_SIZE);
+ unmap_pmd_range(cpa, pud, start, start + PUD_SIZE);
start += PUD_SIZE;
pud++;
@@ -864,7 +877,7 @@ void unmap_pud_range(pgd_t *pgd, unsigne
* 2M leftovers?
*/
if (start < end)
- unmap_pmd_range(pud, start, end);
+ unmap_pmd_range(cpa, pud, start, end);
/*
* No need to try to free the PUD page because we'll free it in
@@ -872,6 +885,24 @@ void unmap_pud_range(pgd_t *pgd, unsigne
*/
}
+static void unmap_pud_range(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+ struct cpa_data cpa = {
+ .flags = CPA_FREE_PAGETABLES,
+ };
+
+ __unmap_pud_range(&cpa, pgd, start, end);
+}
+
+void unmap_pud_range_nofree(pgd_t *pgd, unsigned long start, unsigned long end)
+{
+ struct cpa_data cpa = {
+ .flags = 0,
+ };
+
+ __unmap_pud_range(&cpa, pgd, start, end);
+}
+
static void unmap_pgd_range(pgd_t *root, unsigned long addr, unsigned long end)
{
pgd_t *pgd_entry = root + pgd_index(addr);
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -340,40 +340,26 @@ static inline void _pgd_free(pgd_t *pgd)
kmem_cache_free(pgd_cache, pgd);
}
#else
-static inline pgd_t *_pgd_alloc(void)
-{
-#ifdef CONFIG_KAISER
- // Instead of one PML4, we aquire two PML4s and, thus, an 8kb-aligned memory
- // block. Therefore, we have to allocate at least 3 pages. However, the
- // __get_free_pages returns us 4 pages. Hence, we store the base pointer at
- // the beginning of the page of our 8kb-aligned memory block in order to
- // correctly free it afterwars.
- unsigned long pages = __get_free_pages(PGALLOC_GFP, get_order(4*PAGE_SIZE));
-
- if(native_get_normal_pgd((pgd_t*) pages) == (pgd_t*) pages)
- {
- *((unsigned long*)(pages + 2 * PAGE_SIZE)) = pages;
- return (pgd_t *) pages;
- }
- else
- {
- *((unsigned long*)(pages + 3 * PAGE_SIZE)) = pages;
- return (pgd_t *) (pages + PAGE_SIZE);
- }
+#ifdef CONFIG_KAISER
+/*
+ * Instead of one pmd, we aquire two pmds. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
+#define PGD_ALLOCATION_ORDER 1
#else
- return (pgd_t *)__get_free_page(PGALLOC_GFP);
+#define PGD_ALLOCATION_ORDER 0
#endif
+
+static inline pgd_t *_pgd_alloc(void)
+{
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
}
static inline void _pgd_free(pgd_t *pgd)
{
-#ifdef CONFIG_KAISER
- unsigned long pages = *((unsigned long*) ((char*) pgd + 2 * PAGE_SIZE));
- free_pages(pages, get_order(4*PAGE_SIZE));
-#else
- free_page((unsigned long)pgd);
-#endif
+ free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
}
#endif /* CONFIG_X86_PAE */
--- /dev/null
+++ b/include/linux/kaiser.h
@@ -0,0 +1,26 @@
+#ifndef _INCLUDE_KAISER_H
+#define _INCLUDE_KAISER_H
+
+#ifdef CONFIG_KAISER
+#include <asm/kaiser.h>
+#else
+
+/*
+ * These stubs are used whenever CONFIG_KAISER is off, which
+ * includes architectures that support KAISER, but have it
+ * disabled.
+ */
+
+static inline void kaiser_init(void)
+{
+}
+static inline void kaiser_remove_mapping(unsigned long start, unsigned long size)
+{
+}
+static inline int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
+{
+ return 0;
+}
+
+#endif /* !CONFIG_KAISER */
+#endif /* _INCLUDE_KAISER_H */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -58,6 +58,7 @@
#include <linux/tsacct_kern.h>
#include <linux/cn_proc.h>
#include <linux/freezer.h>
+#include <linux/kaiser.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
@@ -335,7 +336,6 @@ void set_task_stack_end_magic(struct tas
*stackend = STACK_END_MAGIC; /* for overflow detection */
}
-extern void kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags);
static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
{
struct task_struct *tsk;
@@ -357,9 +357,10 @@ static struct task_struct *dup_task_stru
goto free_ti;
tsk->stack = ti;
-#ifdef CONFIG_KAISER
- kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
-#endif
+
+ err= kaiser_add_mapping((unsigned long)tsk->stack, THREAD_SIZE, __PAGE_KERNEL);
+ if (err)
+ goto free_ti;
#ifdef CONFIG_SECCOMP
/*
* We must handle setting up seccomp filters once we're under
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -32,12 +32,17 @@ config SECURITY
If you are unsure how to answer this question, answer N.
config KAISER
bool "Remove the kernel mapping in user mode"
+ default y
depends on X86_64
depends on !PARAVIRT
help
This enforces a strict kernel and user space isolation in order to close
hardware side channels on kernel address information.
+config KAISER_REAL_SWITCH
+ bool "KAISER: actually switch page tables"
+ default y
+
config SECURITYFS
bool "Enable the securityfs filesystem"
help
Patches currently in stable-queue which might be from dave.hansen(a)linux.intel.com are
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
This is a note to let you know that I've just added the patch titled
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Thu, 17 Aug 2017 15:00:37 -0700
Subject: kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
From: Hugh Dickins <hughd(a)google.com>
We have many machines (Westmere, Sandybridge, Ivybridge) supporting
PCID but not INVPCID: on these load_new_mm_cr3() simply crashed.
Flushing user context inside load_new_mm_cr3() without the use of
invpcid is difficult: momentarily switch from kernel to user context
and back to do so? I'm not sure whether that can be safely done at
all, and would risk polluting user context with kernel internals,
and kernel context with stale user externals.
Instead, follow the hint in the comment that was there: change
X86_CR3_PCID_USER_VAR to be a per-cpu variable, then load_new_mm_cr3()
can leave a note in it, for SWITCH_USER_CR3 on return to userspace to
flush user context TLB, instead of default X86_CR3_PCID_USER_NOFLUSH.
Which works well enough that there's no need to do it this way only
when invpcid is unsupported: it's a good alternative to invpcid here.
But there's a couple of inlines in asm/tlbflush.h that need to do the
same trick, so it's best to localize all this per-cpu business in
mm/kaiser.c: moving that part of the initialization from setup_pcid()
to kaiser_setup_pcid(); with kaiser_flush_tlb_on_return_to_user() the
function for noting an X86_CR3_PCID_USER_FLUSH. And let's keep a
KAISER_SHADOW_PGD_OFFSET in there, to avoid the extra OR on exit.
I did try to make the feature tests in asm/tlbflush.h more consistent
with each other: there seem to be far too many ways of performing such
tests, and I don't have a good grasp of their differences. At first
I converted them all to be static_cpu_has(): but that proved to be a
mistake, as the comment in __native_flush_tlb_single() hints; so then
I reversed and made them all this_cpu_has(). Probably all gratuitous
change, but that's the way it's working at present.
I am slightly bothered by the way non-per-cpu X86_CR3_PCID_KERN_VAR
gets re-initialized by each cpu (before and after these changes):
no problem when (as usual) all cpus on a machine have the same
features, but in principle incorrect. However, my experiment
to per-cpu-ify that one did not end well...
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/kaiser.h | 18 +++++++-----
arch/x86/include/asm/tlbflush.h | 58 +++++++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 22 ---------------
arch/x86/mm/kaiser.c | 50 ++++++++++++++++++++++++++++++----
arch/x86/mm/tlb.c | 46 ++++++++++++-------------------
5 files changed, 114 insertions(+), 80 deletions(-)
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -32,13 +32,12 @@ movq \reg, %cr3
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), \reg
-/*
- * This can obviously be one instruction by putting the
- * KAISER_SHADOW_PGD_OFFSET bit in the X86_CR3_PCID_USER_VAR.
- * But, just leave it now for simplicity.
- */
-orq X86_CR3_PCID_USER_VAR, \reg
-orq $(KAISER_SHADOW_PGD_OFFSET), \reg
+orq PER_CPU_VAR(X86_CR3_PCID_USER_VAR), \reg
+js 9f
+// FLUSH this time, reset to NOFLUSH for next time
+// But if nopcid? Consider using 0x80 for user pcid?
+movb $(0x80), PER_CPU_VAR(X86_CR3_PCID_USER_VAR+7)
+9:
movq \reg, %cr3
.endm
@@ -90,6 +89,11 @@ movq PER_CPU_VAR(unsafe_stack_register_b
*/
DECLARE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+extern unsigned long X86_CR3_PCID_KERN_VAR;
+DECLARE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
+
+extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+
/**
* kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
* @addr: the start address of the range
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -12,6 +12,7 @@ static inline void __invpcid(unsigned lo
unsigned long type)
{
struct { u64 d[2]; } desc = { { pcid, addr } };
+
/*
* The memory clobber is because the whole point is to invalidate
* stale TLB entries and, especially if we're flushing global
@@ -130,27 +131,42 @@ static inline void cr4_set_bits_and_upda
cr4_set_bits(mask);
}
+/*
+ * Declare a couple of kaiser interfaces here for convenience,
+ * to avoid the need for asm/kaiser.h in unexpected places.
+ */
+#ifdef CONFIG_KAISER
+extern void kaiser_setup_pcid(void);
+extern void kaiser_flush_tlb_on_return_to_user(void);
+#else
+static inline void kaiser_setup_pcid(void)
+{
+}
+static inline void kaiser_flush_tlb_on_return_to_user(void)
+{
+}
+#endif
+
static inline void __native_flush_tlb(void)
{
- if (!cpu_feature_enabled(X86_FEATURE_INVPCID)) {
- /*
- * If current->mm == NULL then we borrow a mm which may change during a
- * task switch and therefore we must not be preempted while we write CR3
- * back:
+ if (this_cpu_has(X86_FEATURE_INVPCID)) {
+ /*
+ * Note, this works with CR4.PCIDE=0 or 1.
*/
- preempt_disable();
- native_write_cr3(native_read_cr3());
- preempt_enable();
+ invpcid_flush_all_nonglobals();
return;
}
+
/*
- * We are no longer using globals with KAISER, so a
- * "nonglobals" flush would work too. But, this is more
- * conservative.
- *
- * Note, this works with CR4.PCIDE=0 or 1.
+ * If current->mm == NULL then we borrow a mm which may change during a
+ * task switch and therefore we must not be preempted while we write CR3
+ * back:
*/
- invpcid_flush_all();
+ preempt_disable();
+ if (this_cpu_has(X86_FEATURE_PCID))
+ kaiser_flush_tlb_on_return_to_user();
+ native_write_cr3(native_read_cr3());
+ preempt_enable();
}
static inline void __native_flush_tlb_global_irq_disabled(void)
@@ -166,9 +182,13 @@ static inline void __native_flush_tlb_gl
static inline void __native_flush_tlb_global(void)
{
+#ifdef CONFIG_KAISER
+ /* Globals are not used at all */
+ __native_flush_tlb();
+#else
unsigned long flags;
- if (static_cpu_has(X86_FEATURE_INVPCID)) {
+ if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
* to CR4 sandwiched inside an IRQ flag save/restore.
@@ -185,10 +205,9 @@ static inline void __native_flush_tlb_gl
* be called from deep inside debugging code.)
*/
raw_local_irq_save(flags);
-
__native_flush_tlb_global_irq_disabled();
-
raw_local_irq_restore(flags);
+#endif
}
static inline void __native_flush_tlb_single(unsigned long addr)
@@ -199,9 +218,12 @@ static inline void __native_flush_tlb_si
*
* The ASIDs used below are hard-coded. But, we must not
* call invpcid(type=1/2) before CR4.PCIDE=1. Just call
- * invpcid in the case we are called early.
+ * invlpg in the case we are called early.
*/
+
if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
+ if (this_cpu_has(X86_FEATURE_PCID))
+ kaiser_flush_tlb_on_return_to_user();
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
return;
}
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -321,33 +321,12 @@ static __always_inline void setup_smap(s
}
}
-/*
- * These can have bit 63 set, so we can not just use a plain "or"
- * instruction to get their value or'd into CR3. It would take
- * another register. So, we use a memory reference to these
- * instead.
- *
- * This is also handy because systems that do not support
- * PCIDs just end up or'ing a 0 into their CR3, which does
- * no harm.
- */
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR = 0;
-__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_USER_VAR = 0;
-
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
if (cpu_has(c, X86_FEATURE_PGE)) {
cr4_set_bits(X86_CR4_PCIDE);
/*
- * These variables are used by the entry/exit
- * code to change PCIDs.
- */
-#ifdef CONFIG_KAISER
- X86_CR3_PCID_KERN_VAR = X86_CR3_PCID_KERN_NOFLUSH;
- X86_CR3_PCID_USER_VAR = X86_CR3_PCID_USER_NOFLUSH;
-#endif
- /*
* INVPCID has two "groups" of types:
* 1/2: Invalidate an individual address
* 3/4: Invalidate all contexts
@@ -372,6 +351,7 @@ static void setup_pcid(struct cpuinfo_x8
clear_cpu_cap(c, X86_FEATURE_PCID);
}
}
+ kaiser_setup_pcid();
}
/*
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -12,12 +12,26 @@
#include <linux/ftrace.h>
#include <asm/kaiser.h>
+#include <asm/tlbflush.h> /* to verify its kaiser declarations */
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/desc.h>
+
#ifdef CONFIG_KAISER
+__visible
+DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
+
+/*
+ * These can have bit 63 set, so we can not just use a plain "or"
+ * instruction to get their value or'd into CR3. It would take
+ * another register. So, we use a memory reference to these instead.
+ *
+ * This is also handy because systems that do not support PCIDs
+ * just end up or'ing a 0 into their CR3, which does no harm.
+ */
+__aligned(PAGE_SIZE) unsigned long X86_CR3_PCID_KERN_VAR;
+DEFINE_PER_CPU(unsigned long, X86_CR3_PCID_USER_VAR);
-__visible DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
/*
* At runtime, the only things we map are some things for CPU
* hotplug, and stacks for new processes. No two CPUs will ever
@@ -239,9 +253,6 @@ static void __init kaiser_init_all_pgds(
WARN_ON(__ret); \
} while (0)
-extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
-extern unsigned long X86_CR3_PCID_KERN_VAR;
-extern unsigned long X86_CR3_PCID_USER_VAR;
/*
* If anything in here fails, we will likely die on one of the
* first kernel->user transitions and init will die. But, we
@@ -295,8 +306,6 @@ void __init kaiser_init(void)
kaiser_add_user_map_early(&X86_CR3_PCID_KERN_VAR, PAGE_SIZE,
__PAGE_KERNEL);
- kaiser_add_user_map_early(&X86_CR3_PCID_USER_VAR, PAGE_SIZE,
- __PAGE_KERNEL);
}
/* Add a mapping to the shadow mapping, and synchronize the mappings */
@@ -361,4 +370,33 @@ pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp,
}
return pgd;
}
+
+void kaiser_setup_pcid(void)
+{
+ unsigned long kern_cr3 = 0;
+ unsigned long user_cr3 = KAISER_SHADOW_PGD_OFFSET;
+
+ if (this_cpu_has(X86_FEATURE_PCID)) {
+ kern_cr3 |= X86_CR3_PCID_KERN_NOFLUSH;
+ user_cr3 |= X86_CR3_PCID_USER_NOFLUSH;
+ }
+ /*
+ * These variables are used by the entry/exit
+ * code to change PCID and pgd and TLB flushing.
+ */
+ X86_CR3_PCID_KERN_VAR = kern_cr3;
+ this_cpu_write(X86_CR3_PCID_USER_VAR, user_cr3);
+}
+
+/*
+ * Make a note that this cpu will need to flush USER tlb on return to user.
+ * Caller checks whether this_cpu_has(X86_FEATURE_PCID) before calling:
+ * if cpu does not, then the NOFLUSH bit will never have been set.
+ */
+void kaiser_flush_tlb_on_return_to_user(void)
+{
+ this_cpu_write(X86_CR3_PCID_USER_VAR,
+ X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
+}
+EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
-#include <linux/debugfs.h>
+#include <asm/kaiser.h>
/*
* TLB flushing, formerly SMP-only
@@ -38,34 +39,23 @@ static void load_new_mm_cr3(pgd_t *pgdir
{
unsigned long new_mm_cr3 = __pa(pgdir);
- /*
- * KAISER, plus PCIDs needs some extra work here. But,
- * if either of features is not present, we need no
- * PCIDs here and just do a normal, full TLB flush with
- * the write_cr3()
- */
- if (!IS_ENABLED(CONFIG_KAISER) ||
- !cpu_feature_enabled(X86_FEATURE_PCID))
- goto out_set_cr3;
- /*
- * We reuse the same PCID for different tasks, so we must
- * flush all the entires for the PCID out when we change
- * tasks.
- */
- new_mm_cr3 = X86_CR3_PCID_KERN_FLUSH | __pa(pgdir);
-
- /*
- * The flush from load_cr3() may leave old TLB entries
- * for userspace in place. We must flush that context
- * separately. We can theoretically delay doing this
- * until we actually load up the userspace CR3, but
- * that's a bit tricky. We have to have the "need to
- * flush userspace PCID" bit per-cpu and check it in the
- * exit-to-userspace paths.
- */
- invpcid_flush_single_context(X86_CR3_PCID_ASID_USER);
+#ifdef CONFIG_KAISER
+ if (this_cpu_has(X86_FEATURE_PCID)) {
+ /*
+ * We reuse the same PCID for different tasks, so we must
+ * flush all the entries for the PCID out when we change tasks.
+ * Flush KERN below, flush USER when returning to userspace in
+ * kaiser's SWITCH_USER_CR3 (_SWITCH_TO_USER_CR3) macro.
+ *
+ * invpcid_flush_single_context(X86_CR3_PCID_ASID_USER) could
+ * do it here, but can only be used if X86_FEATURE_INVPCID is
+ * available - and many machines support pcid without invpcid.
+ */
+ new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
+ kaiser_flush_tlb_on_return_to_user();
+ }
+#endif /* CONFIG_KAISER */
-out_set_cr3:
/*
* Caution: many callers of this function expect
* that load_cr3() is serializing and orders TLB
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: kaiser_remove_mapping() move along the pgd
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Mon, 2 Oct 2017 10:57:24 -0700
Subject: kaiser: kaiser_remove_mapping() move along the pgd
From: Hugh Dickins <hughd(a)google.com>
When removing the bogus comment from kaiser_remove_mapping(),
I really ought to have checked the extent of its bogosity: as
Neel points out, there is nothing to stop unmap_pud_range_nofree()
from continuing beyond the end of a pud (and starting in the wrong
position on the next).
Fix kaiser_remove_mapping() to constrain the extent and advance pgd
pointer correctly: use pgd_addr_end() macro as used throughout base
mm (but don't assume page-rounded start and size in this case).
But this bug was very unlikely to trigger in this backport: since
any buddy allocation is contained within a single pud extent, and
we are not using vmapped stacks (and are only mapping one page of
stack anyway): the only way to hit this bug here would be when
freeing a large modified ldt.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -297,11 +297,13 @@ void kaiser_remove_mapping(unsigned long
extern void unmap_pud_range_nofree(pgd_t *pgd,
unsigned long start, unsigned long end);
unsigned long end = start + size;
- unsigned long addr;
+ unsigned long addr, next;
+ pgd_t *pgd;
- for (addr = start; addr < end; addr += PGDIR_SIZE) {
- pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(addr));
- unmap_pud_range_nofree(pgd, addr, end);
+ pgd = native_get_shadow_pgd(pgd_offset_k(start));
+ for (addr = start; addr < end; pgd++, addr = next) {
+ next = pgd_addr_end(addr, end);
+ unmap_pud_range_nofree(pgd, addr, next);
}
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 4 Nov 2017 18:43:06 -0700
Subject: kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
From: Hugh Dickins <hughd(a)google.com>
Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID
check, instead of each caller doing it inline first: nobody needs
to optimize for the noPCID case, it's clearer this way, and better
suits later changes. Replace those no-op X86_CR3_PCID_KERN_FLUSH lines
by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 4 ++--
arch/x86/mm/kaiser.c | 6 +++---
arch/x86/mm/tlb.c | 8 ++++----
3 files changed, 9 insertions(+), 9 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -157,7 +157,7 @@ static inline void __native_flush_tlb(vo
* back:
*/
preempt_disable();
- if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled)
kaiser_flush_tlb_on_return_to_user();
native_write_cr3(native_read_cr3());
preempt_enable();
@@ -216,7 +216,7 @@ static inline void __native_flush_tlb_si
*/
if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
- if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled)
kaiser_flush_tlb_on_return_to_user();
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
return;
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -436,12 +436,12 @@ void kaiser_setup_pcid(void)
/*
* Make a note that this cpu will need to flush USER tlb on return to user.
- * Caller checks whether this_cpu_has(X86_FEATURE_PCID) before calling:
- * if cpu does not, then the NOFLUSH bit will never have been set.
+ * If cpu does not have PCID, then the NOFLUSH bit will never have been set.
*/
void kaiser_flush_tlb_on_return_to_user(void)
{
- this_cpu_write(x86_cr3_pcid_user,
+ if (this_cpu_has(X86_FEATURE_PCID))
+ this_cpu_write(x86_cr3_pcid_user,
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -39,7 +39,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
{
unsigned long new_mm_cr3 = __pa(pgdir);
- if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID)) {
+ if (kaiser_enabled) {
/*
* We reuse the same PCID for different tasks, so we must
* flush all the entries for the PCID out when we change tasks.
@@ -50,10 +50,10 @@ static void load_new_mm_cr3(pgd_t *pgdir
* do it here, but can only be used if X86_FEATURE_INVPCID is
* available - and many machines support pcid without invpcid.
*
- * The line below is a no-op: X86_CR3_PCID_KERN_FLUSH is now 0;
- * but keep that line in there in case something changes.
+ * If X86_CR3_PCID_KERN_FLUSH actually added something, then it
+ * would be needed in the write_cr3() below - if PCIDs enabled.
*/
- new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
+ BUILD_BUG_ON(X86_CR3_PCID_KERN_FLUSH);
kaiser_flush_tlb_on_return_to_user();
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: KAISER depends on SMP
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-kaiser-depends-on-smp.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Wed, 13 Sep 2017 14:03:10 -0700
Subject: kaiser: KAISER depends on SMP
From: Hugh Dickins <hughd(a)google.com>
It is absurd that KAISER should depend on SMP, but apparently nobody
has tried a UP build before: which breaks on implicit declaration of
function 'per_cpu_offset' in arch/x86/mm/kaiser.c.
Now, you would expect that to be trivially fixed up; but looking at
the System.map when that block is #ifdef'ed out of kaiser_init(),
I see that in a UP build __per_cpu_user_mapped_end is precisely at
__per_cpu_user_mapped_start, and the items carefully gathered into
that section for user-mapping on SMP, dispersed elsewhere on UP.
So, some other kind of section assignment will be needed on UP,
but implementing that is not a priority: just make KAISER depend
on SMP for now.
Also inserted a blank line before the option, tidied up the
brief Kconfig help message, and added an "If unsure, Y".
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/Kconfig | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -30,14 +30,16 @@ config SECURITY
model will be used.
If you are unsure how to answer this question, answer N.
+
config KAISER
bool "Remove the kernel mapping in user mode"
default y
- depends on X86_64
- depends on !PARAVIRT
+ depends on X86_64 && SMP && !PARAVIRT
help
- This enforces a strict kernel and user space isolation in order to close
- hardware side channels on kernel address information.
+ This enforces a strict kernel and user space isolation, in order
+ to close hardware side channels on kernel address information.
+
+ If you are unsure how to answer this question, answer Y.
config KAISER_REAL_SWITCH
bool "KAISER: actually switch page tables"
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: fix unlikely error in alloc_ldt_struct()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Mon, 4 Dec 2017 20:13:35 -0800
Subject: kaiser: fix unlikely error in alloc_ldt_struct()
From: Hugh Dickins <hughd(a)google.com>
An error from kaiser_add_mapping() here is not at all likely, but
Eric Biggers rightly points out that __free_ldt_struct() relies on
new_ldt->size being initialized: move that up.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/ldt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -79,11 +79,11 @@ static struct ldt_struct *alloc_ldt_stru
ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
__PAGE_KERNEL);
+ new_ldt->size = size;
if (ret) {
__free_ldt_struct(new_ldt);
return NULL;
}
- new_ldt->size = size;
return new_ldt;
}
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Thu, 21 Sep 2017 20:39:56 -0700
Subject: kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
From: Hugh Dickins <hughd(a)google.com>
pjt has observed that nmi's second (nmi_from_kernel) call to do_nmi()
adjusted the %rdi regs arg, rightly when CONFIG_KAISER, but wrongly
when not CONFIG_KAISER.
Although the minimal change is to add an #ifdef CONFIG_KAISER around
the addq line, that looks cluttered, and I prefer how the first call
to do_nmi() handled it: prepare args in %rdi and %rsi before getting
into the CONFIG_KAISER block, since it does not touch them at all.
And while we're here, place the "#ifdef CONFIG_KAISER" that follows
each, to enclose the "Unconditionally restore CR3" comment: matching
how the "Unconditionally use kernel CR3" comment above is enclosed.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1297,12 +1297,13 @@ ENTRY(nmi)
movq %rax, %cr3
#endif
call do_nmi
+
+#ifdef CONFIG_KAISER
/*
* Unconditionally restore CR3. I know we return to
* kernel code that needs user CR3, but do we ever return
* to "user mode" where we need the kernel CR3?
*/
-#ifdef CONFIG_KAISER
popq %rax
mov %rax, %cr3
#endif
@@ -1526,6 +1527,8 @@ end_repeat_nmi:
SWAPGS
xorl %ebx, %ebx
1:
+ movq %rsp, %rdi
+ movq $-1, %rsi
#ifdef CONFIG_KAISER
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
@@ -1538,16 +1541,14 @@ end_repeat_nmi:
#endif
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
- movq %rsp, %rdi
- addq $8, %rdi /* point %rdi at ptregs, fixed up for CR3 */
- movq $-1, %rsi
call do_nmi
+
+#ifdef CONFIG_KAISER
/*
* Unconditionally restore CR3. We might be returning to
* kernel code that needs user CR3, like just just before
* a sysret.
*/
-#ifdef CONFIG_KAISER
popq %rax
mov %rax, %cr3
#endif
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: fix build and FIXME in alloc_ldt_struct()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 17:09:44 -0700
Subject: kaiser: fix build and FIXME in alloc_ldt_struct()
From: Hugh Dickins <hughd(a)google.com>
Include linux/kaiser.h instead of asm/kaiser.h to build ldt.c without
CONFIG_KAISER. kaiser_add_mapping() does already return an error code,
so fix the FIXME.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/ldt.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -16,9 +16,9 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <linux/kaiser.h>
#include <asm/ldt.h>
-#include <asm/kaiser.h>
#include <asm/desc.h>
#include <asm/mmu_context.h>
#include <asm/syscalls.h>
@@ -49,7 +49,7 @@ static struct ldt_struct *alloc_ldt_stru
{
struct ldt_struct *new_ldt;
int alloc_size;
- int ret = 0;
+ int ret;
if (size > LDT_ENTRIES)
return NULL;
@@ -77,10 +77,8 @@ static struct ldt_struct *alloc_ldt_stru
return NULL;
}
- // FIXME: make kaiser_add_mapping() return an error code
- // when it fails
- kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
- __PAGE_KERNEL);
+ ret = kaiser_add_mapping((unsigned long)new_ldt->entries, alloc_size,
+ __PAGE_KERNEL);
if (ret) {
__free_ldt_struct(new_ldt);
return NULL;
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-enomem-if-kaiser_pagetable_walk-null.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 18:48:02 -0700
Subject: kaiser: ENOMEM if kaiser_pagetable_walk() NULL
From: Hugh Dickins <hughd(a)google.com>
kaiser_add_user_map() took no notice when kaiser_pagetable_walk() failed.
And avoid its might_sleep() when atomic (though atomic at present unused).
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -99,11 +99,11 @@ static pte_t *kaiser_pagetable_walk(unsi
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- might_sleep();
if (is_atomic) {
gfp &= ~GFP_KERNEL;
gfp |= __GFP_HIGH | __GFP_ATOMIC;
- }
+ } else
+ might_sleep();
if (pgd_none(*pgd)) {
WARN_ONCE(1, "All shadow pgds should have been populated");
@@ -160,13 +160,17 @@ int kaiser_add_user_map(const void *__st
unsigned long end_addr = PAGE_ALIGN(start_addr + size);
unsigned long target_address;
- for (;address < end_addr; address += PAGE_SIZE) {
+ for (; address < end_addr; address += PAGE_SIZE) {
target_address = get_pa_from_mapping(address);
if (target_address == -1) {
ret = -EIO;
break;
}
pte = kaiser_pagetable_walk(address, false);
+ if (!pte) {
+ ret = -ENOMEM;
+ break;
+ }
if (pte_none(*pte)) {
set_pte(pte, __pte(flags | target_address));
} else {
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 29 Oct 2017 11:36:19 -0700
Subject: kaiser: drop is_atomic arg to kaiser_pagetable_walk()
From: Hugh Dickins <hughd(a)google.com>
I have not observed a might_sleep() warning from setup_fixmap_gdt()'s
use of kaiser_add_mapping() in our tree (why not?), but like upstream
we have not provided a way for that to pass is_atomic true down to
kaiser_pagetable_walk(), and at startup it's far from a likely source
of trouble: so just delete the walk's is_atomic arg and might_sleep().
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -108,19 +108,13 @@ static inline unsigned long get_pa_from_
*
* Returns a pointer to a PTE on success, or NULL on failure.
*/
-static pte_t *kaiser_pagetable_walk(unsigned long address, bool is_atomic)
+static pte_t *kaiser_pagetable_walk(unsigned long address)
{
pmd_t *pmd;
pud_t *pud;
pgd_t *pgd = native_get_shadow_pgd(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
- if (is_atomic) {
- gfp &= ~GFP_KERNEL;
- gfp |= __GFP_HIGH | __GFP_ATOMIC;
- } else
- might_sleep();
-
if (pgd_none(*pgd)) {
WARN_ONCE(1, "All shadow pgds should have been populated");
return NULL;
@@ -195,7 +189,7 @@ static int kaiser_add_user_map(const voi
ret = -EIO;
break;
}
- pte = kaiser_pagetable_walk(address, false);
+ pte = kaiser_pagetable_walk(address);
if (!pte) {
ret = -ENOMEM;
break;
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: do not set _PAGE_NX on pgd_none
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-do-not-set-_page_nx-on-pgd_none.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Tue, 5 Sep 2017 12:05:01 -0700
Subject: kaiser: do not set _PAGE_NX on pgd_none
From: Hugh Dickins <hughd(a)google.com>
native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).
The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too. So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:
A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.
This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.
What we need is a flag to set_pgd(), to tell it we're dealing with
userspace. Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that. But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.
But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/boot/compressed/misc.h | 1
arch/x86/include/asm/pgtable_64.h | 51 +++++++++-----------------------------
arch/x86/mm/kaiser.c | 42 +++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+), 38 deletions(-)
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,6 +9,7 @@
*/
#undef CONFIG_PARAVIRT
#undef CONFIG_PARAVIRT_SPINLOCKS
+#undef CONFIG_KAISER
#undef CONFIG_KASAN
#include <linux/linkage.h>
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -107,61 +107,36 @@ static inline void native_pud_clear(pud_
}
#ifdef CONFIG_KAISER
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+extern pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd);
+
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
{
- return (pgd_t *)(void*)((unsigned long)(void*)pgdp | (unsigned long)PAGE_SIZE);
+ return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
}
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
{
- return (pgd_t *)(void*)((unsigned long)(void*)pgdp & ~(unsigned long)PAGE_SIZE);
+ return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE);
}
#else
-static inline pgd_t * native_get_shadow_pgd(pgd_t *pgdp)
+static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ return pgd;
+}
+static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
{
BUILD_BUG_ON(1);
return NULL;
}
-static inline pgd_t * native_get_normal_pgd(pgd_t *pgdp)
+static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
{
return pgdp;
}
#endif /* CONFIG_KAISER */
-/*
- * Page table pages are page-aligned. The lower half of the top
- * level is used for userspace and the top half for the kernel.
- * This returns true for user pages that need to get copied into
- * both the user and kernel copies of the page tables, and false
- * for kernel pages that should only be in the kernel copy.
- */
-static inline bool is_userspace_pgd(void *__ptr)
-{
- unsigned long ptr = (unsigned long)__ptr;
-
- return ((ptr % PAGE_SIZE) < (PAGE_SIZE / 2));
-}
-
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
-#ifdef CONFIG_KAISER
- pteval_t extra_kern_pgd_flags = 0;
- /* Do we need to also populate the shadow pgd? */
- if (is_userspace_pgd(pgdp)) {
- native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
- /*
- * Even if the entry is *mapping* userspace, ensure
- * that userspace can not use it. This way, if we
- * get out to userspace running on the kernel CR3,
- * userspace will crash instead of running.
- */
- extra_kern_pgd_flags = _PAGE_NX;
- }
- pgdp->pgd = pgd.pgd;
- pgdp->pgd |= extra_kern_pgd_flags;
-#else /* CONFIG_KAISER */
- *pgdp = pgd;
-#endif
+ *pgdp = kaiser_set_shadow_pgd(pgdp, pgd);
}
static inline void native_pgd_clear(pgd_t *pgd)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -303,4 +303,46 @@ void kaiser_remove_mapping(unsigned long
unmap_pud_range_nofree(pgd, addr, end);
}
}
+
+/*
+ * Page table pages are page-aligned. The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ * This returns true for user pages that need to get copied into
+ * both the user and kernel copies of the page tables, and false
+ * for kernel pages that should only be in the kernel copy.
+ */
+static inline bool is_userspace_pgd(pgd_t *pgdp)
+{
+ return ((unsigned long)pgdp % PAGE_SIZE) < (PAGE_SIZE / 2);
+}
+
+pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+ /*
+ * Do we need to also populate the shadow pgd? Check _PAGE_USER to
+ * skip cases like kexec and EFI which make temporary low mappings.
+ */
+ if (pgd.pgd & _PAGE_USER) {
+ if (is_userspace_pgd(pgdp)) {
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ /*
+ * Even if the entry is *mapping* userspace, ensure
+ * that userspace can not use it. This way, if we
+ * get out to userspace running on the kernel CR3,
+ * userspace will crash instead of running.
+ */
+ pgd.pgd |= _PAGE_NX;
+ }
+ } else if (!pgd.pgd) {
+ /*
+ * pgd_clear() cannot check _PAGE_USER, and is even used to
+ * clear corrupted pgd entries: so just rely on cases like
+ * kexec and EFI never to be using pgd_clear().
+ */
+ if (!WARN_ON_ONCE((unsigned long)pgdp & PAGE_SIZE) &&
+ is_userspace_pgd(pgdp))
+ native_get_shadow_pgd(pgdp)->pgd = pgd.pgd;
+ }
+ return pgd;
+}
#endif /* CONFIG_KAISER */
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: delete KAISER_REAL_SWITCH option
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-delete-kaiser_real_switch-option.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 3 Sep 2017 18:30:43 -0700
Subject: kaiser: delete KAISER_REAL_SWITCH option
From: Hugh Dickins <hughd(a)google.com>
We fail to see what CONFIG_KAISER_REAL_SWITCH is for: it seems to be
left over from early development, and now just obscures tricky parts
of the code. Delete it before adding PCIDs, or nokaiser boot option.
(Or if there is some good reason to keep the option, then it needs
a help text - and a "depends on KAISER", so that all those without
KAISER are not asked the question.)
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_64.S | 4 ----
arch/x86/include/asm/kaiser.h | 4 ----
security/Kconfig | 4 ----
3 files changed, 12 deletions(-)
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1291,9 +1291,7 @@ ENTRY(nmi)
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
pushq %rax
-#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
-#endif
movq %rax, %cr3
#endif
call do_nmi
@@ -1534,9 +1532,7 @@ end_repeat_nmi:
/* %rax is saved above, so OK to clobber here */
movq %cr3, %rax
pushq %rax
-#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~KAISER_SHADOW_PGD_OFFSET), %rax
-#endif
movq %rax, %cr3
#endif
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -21,17 +21,13 @@
.macro _SWITCH_TO_KERNEL_CR3 reg
movq %cr3, \reg
-#ifdef CONFIG_KAISER_REAL_SWITCH
andq $(~KAISER_SHADOW_PGD_OFFSET), \reg
-#endif
movq \reg, %cr3
.endm
.macro _SWITCH_TO_USER_CR3 reg
movq %cr3, \reg
-#ifdef CONFIG_KAISER_REAL_SWITCH
orq $(KAISER_SHADOW_PGD_OFFSET), \reg
-#endif
movq \reg, %cr3
.endm
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -41,10 +41,6 @@ config KAISER
If you are unsure how to answer this question, answer Y.
-config KAISER_REAL_SWITCH
- bool "KAISER: actually switch page tables"
- default y
-
config SECURITYFS
bool "Enable the securityfs filesystem"
help
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: disabled on Xen PV
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-disabled-on-xen-pv.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Jiri Kosina <jkosina(a)suse.cz>
Date: Tue, 2 Jan 2018 14:19:49 +0100
Subject: kaiser: disabled on Xen PV
From: Jiri Kosina <jkosina(a)suse.cz>
Kaiser cannot be used on paravirtualized MMUs (namely reading and writing CR3).
This does not work with KAISER as the CR3 switch from and to user space PGD
would require to map the whole XEN_PV machinery into both.
More importantly, enabling KAISER on Xen PV doesn't make too much sense, as PV
guests use distinct %cr3 values for kernel and user already.
Signed-off-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/kaiser.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -264,6 +264,9 @@ void __init kaiser_check_boottime_disabl
char arg[5];
int ret;
+ if (boot_cpu_has(X86_FEATURE_XENPV))
+ goto silent_disable;
+
ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg));
if (ret > 0) {
if (!strncmp(arg, "on", 2))
@@ -291,6 +294,8 @@ enable:
disable:
pr_info("Kernel/User page tables isolation: disabled\n");
+
+silent_disable:
kaiser_enabled = 0;
setup_clear_cpu_cap(X86_FEATURE_KAISER);
}
Patches currently in stable-queue which might be from jkosina(a)suse.cz are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-disabled-on-xen-pv.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: asm/tlbflush.h handle noPGE at lower level
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sat, 4 Nov 2017 18:23:24 -0700
Subject: kaiser: asm/tlbflush.h handle noPGE at lower level
From: Hugh Dickins <hughd(a)google.com>
I found asm/tlbflush.h too twisty, and think it safer not to avoid
__native_flush_tlb_global_irq_disabled() in the kaiser_enabled case,
but instead let it handle kaiser_enabled along with cr3: it can just
use __native_flush_tlb() for that, no harm in re-disabling preemption.
(This is not the same change as Kirill and Dave have suggested for
upstream, flipping PGE in cr4: that's neat, but needs a cpu_has_pge
check; cr3 is enough for kaiser, and thought to be cheaper than cr4.)
Also delete the X86_FEATURE_INVPCID invpcid_flush_all_nonglobals()
preference from __native_flush_tlb(): unlike the invpcid_flush_all()
preference in __native_flush_tlb_global(), it's not seen in upstream
4.14, and was recently reported to be surprisingly slow.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/tlbflush.h | 27 +++------------------------
1 file changed, 3 insertions(+), 24 deletions(-)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -151,14 +151,6 @@ static inline void kaiser_flush_tlb_on_r
static inline void __native_flush_tlb(void)
{
- if (this_cpu_has(X86_FEATURE_INVPCID)) {
- /*
- * Note, this works with CR4.PCIDE=0 or 1.
- */
- invpcid_flush_all_nonglobals();
- return;
- }
-
/*
* If current->mm == NULL then we borrow a mm which may change during a
* task switch and therefore we must not be preempted while we write CR3
@@ -182,11 +174,8 @@ static inline void __native_flush_tlb_gl
/* restore PGE as it was before */
native_write_cr4(cr4);
} else {
- /*
- * x86_64 microcode update comes this way when CR4.PGE is not
- * enabled, and it's safer for all callers to allow this case.
- */
- native_write_cr3(native_read_cr3());
+ /* do it with cr3, letting kaiser flush user PCID */
+ __native_flush_tlb();
}
}
@@ -194,12 +183,6 @@ static inline void __native_flush_tlb_gl
{
unsigned long flags;
- if (kaiser_enabled) {
- /* Globals are not used at all */
- __native_flush_tlb();
- return;
- }
-
if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
@@ -255,11 +238,7 @@ static inline void __native_flush_tlb_si
static inline void __flush_tlb_all(void)
{
- if (cpu_has_pge)
- __flush_tlb_global();
- else
- __flush_tlb();
-
+ __flush_tlb_global();
/*
* Note: if we somehow had PCID but not PGE, then this wouldn't work --
* we'd end up flushing kernel translations for the current ASID but
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: add "nokaiser" boot option, using ALTERNATIVE
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-add-nokaiser-boot-option-using-alternative.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Sun, 24 Sep 2017 16:59:49 -0700
Subject: kaiser: add "nokaiser" boot option, using ALTERNATIVE
From: Hugh Dickins <hughd(a)google.com>
Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime. That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER is fabricated.
Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.
It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h). I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.
Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments. But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in probe_page_size_mask().
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Documentation/kernel-parameters.txt | 2 +
arch/x86/entry/entry_64.S | 15 +++++++------
arch/x86/include/asm/cpufeature.h | 3 ++
arch/x86/include/asm/kaiser.h | 27 +++++++++++++++++-------
arch/x86/include/asm/pgtable.h | 20 ++++++++++++-----
arch/x86/include/asm/pgtable_64.h | 13 +++--------
arch/x86/include/asm/pgtable_types.h | 4 ---
arch/x86/include/asm/tlbflush.h | 39 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 28 ++++++++++++++++++++++++-
arch/x86/kernel/espfix_64.c | 3 +-
arch/x86/kernel/head_64.S | 4 +--
arch/x86/mm/init.c | 2 -
arch/x86/mm/init_64.c | 10 ++++++++
arch/x86/mm/kaiser.c | 26 +++++++++++++++++++----
arch/x86/mm/pgtable.c | 8 +------
arch/x86/mm/tlb.c | 4 ---
16 files changed, 143 insertions(+), 65 deletions(-)
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2523,6 +2523,8 @@ bytes respectively. Such letter suffixes
nojitter [IA-64] Disables jitter checking for ITC timers.
+ nokaiser [X86-64] Disable KAISER isolation of kernel from user.
+
no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1051,7 +1051,7 @@ ENTRY(paranoid_entry)
* unconditionally, but we need to find out whether the reverse
* should be done on return (conveyed to paranoid_exit in %ebx).
*/
- movq %cr3, %rax
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
testl $KAISER_SHADOW_PGD_OFFSET, %eax
jz 2f
orl $2, %ebx
@@ -1083,6 +1083,7 @@ ENTRY(paranoid_exit)
TRACE_IRQS_OFF_DEBUG
TRACE_IRQS_IRETQ_DEBUG
#ifdef CONFIG_KAISER
+ /* No ALTERNATIVE for X86_FEATURE_KAISER: paranoid_entry sets %ebx */
testl $2, %ebx /* SWITCH_USER_CR3 needed? */
jz paranoid_exit_no_switch
SWITCH_USER_CR3
@@ -1315,13 +1316,14 @@ ENTRY(nmi)
#ifdef CONFIG_KAISER
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
- movq %cr3, %rax
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
movq %rax, %cr3
+2:
#endif
call do_nmi
@@ -1331,8 +1333,7 @@ ENTRY(nmi)
* kernel code that needs user CR3, but do we ever return
* to "user mode" where we need the kernel CR3?
*/
- popq %rax
- mov %rax, %cr3
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
#endif
/*
@@ -1559,13 +1560,14 @@ end_repeat_nmi:
#ifdef CONFIG_KAISER
/* Unconditionally use kernel CR3 for do_nmi() */
/* %rax is saved above, so OK to clobber here */
- movq %cr3, %rax
+ ALTERNATIVE "jmp 2f", "movq %cr3, %rax", X86_FEATURE_KAISER
/* If PCID enabled, NOFLUSH now and NOFLUSH on return */
orq x86_cr3_pcid_noflush, %rax
pushq %rax
/* mask off "user" bit of pgd address and 12 PCID bits: */
andq $(~(X86_CR3_PCID_ASID_MASK | KAISER_SHADOW_PGD_OFFSET)), %rax
movq %rax, %cr3
+2:
#endif
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
@@ -1577,8 +1579,7 @@ end_repeat_nmi:
* kernel code that needs user CR3, like just just before
* a sysret.
*/
- popq %rax
- mov %rax, %cr3
+ ALTERNATIVE "", "popq %rax; movq %rax, %cr3", X86_FEATURE_KAISER
#endif
testl %ebx, %ebx /* swapgs needed? */
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -200,6 +200,9 @@
#define X86_FEATURE_HWP_PKG_REQ ( 7*32+14) /* Intel HWP_PKG_REQ */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
+/* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */
+#define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_KAISER w/o nokaiser */
+
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */
--- a/arch/x86/include/asm/kaiser.h
+++ b/arch/x86/include/asm/kaiser.h
@@ -46,28 +46,33 @@ movq \reg, %cr3
.endm
.macro SWITCH_KERNEL_CR3
-pushq %rax
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
_SWITCH_TO_KERNEL_CR3 %rax
popq %rax
+8:
.endm
.macro SWITCH_USER_CR3
-pushq %rax
+ALTERNATIVE "jmp 8f", "pushq %rax", X86_FEATURE_KAISER
_SWITCH_TO_USER_CR3 %rax %al
popq %rax
+8:
.endm
.macro SWITCH_KERNEL_CR3_NO_STACK
-movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)
+ALTERNATIVE "jmp 8f", \
+ __stringify(movq %rax, PER_CPU_VAR(unsafe_stack_register_backup)), \
+ X86_FEATURE_KAISER
_SWITCH_TO_KERNEL_CR3 %rax
movq PER_CPU_VAR(unsafe_stack_register_backup), %rax
+8:
.endm
#else /* CONFIG_KAISER */
-.macro SWITCH_KERNEL_CR3 reg
+.macro SWITCH_KERNEL_CR3
.endm
-.macro SWITCH_USER_CR3 reg regb
+.macro SWITCH_USER_CR3
.endm
.macro SWITCH_KERNEL_CR3_NO_STACK
.endm
@@ -90,6 +95,16 @@ DECLARE_PER_CPU(unsigned long, x86_cr3_p
extern char __per_cpu_user_mapped_start[], __per_cpu_user_mapped_end[];
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif /* CONFIG_KAISER */
+
+/*
+ * Kaiser function prototypes are needed even when CONFIG_KAISER is not set,
+ * so as to build with tests on kaiser_enabled instead of #ifdefs.
+ */
+
/**
* kaiser_add_mapping - map a virtual memory part to the shadow (user) mapping
* @addr: the start address of the range
@@ -119,8 +134,6 @@ extern void kaiser_remove_mapping(unsign
*/
extern void kaiser_init(void);
-#endif /* CONFIG_KAISER */
-
#endif /* __ASSEMBLY */
#endif /* _ASM_X86_KAISER_H */
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -18,6 +18,12 @@
#ifndef __ASSEMBLY__
#include <asm/x86_init.h>
+#ifdef CONFIG_KAISER
+extern int kaiser_enabled;
+#else
+#define kaiser_enabled 0
+#endif
+
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
@@ -660,7 +666,7 @@ static inline int pgd_bad(pgd_t pgd)
* page table by accident; it will fault on the first
* instruction it tries to run. See native_set_pgd().
*/
- if (IS_ENABLED(CONFIG_KAISER))
+ if (kaiser_enabled)
ignore_flags |= _PAGE_NX;
return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE;
@@ -865,12 +871,14 @@ static inline void pmdp_set_wrprotect(st
*/
static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
{
- memcpy(dst, src, count * sizeof(pgd_t));
+ memcpy(dst, src, count * sizeof(pgd_t));
#ifdef CONFIG_KAISER
- /* Clone the shadow pgd part as well */
- memcpy(native_get_shadow_pgd(dst),
- native_get_shadow_pgd(src),
- count * sizeof(pgd_t));
+ if (kaiser_enabled) {
+ /* Clone the shadow pgd part as well */
+ memcpy(native_get_shadow_pgd(dst),
+ native_get_shadow_pgd(src),
+ count * sizeof(pgd_t));
+ }
#endif
}
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -111,13 +111,12 @@ extern pgd_t kaiser_set_shadow_pgd(pgd_t
static inline pgd_t *native_get_shadow_pgd(pgd_t *pgdp)
{
+#ifdef CONFIG_DEBUG_VM
+ /* linux/mmdebug.h may not have been included at this point */
+ BUG_ON(!kaiser_enabled);
+#endif
return (pgd_t *)((unsigned long)pgdp | (unsigned long)PAGE_SIZE);
}
-
-static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
-{
- return (pgd_t *)((unsigned long)pgdp & ~(unsigned long)PAGE_SIZE);
-}
#else
static inline pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
{
@@ -128,10 +127,6 @@ static inline pgd_t *native_get_shadow_p
BUILD_BUG_ON(1);
return NULL;
}
-static inline pgd_t *native_get_normal_pgd(pgd_t *pgdp)
-{
- return pgdp;
-}
#endif /* CONFIG_KAISER */
static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -39,11 +39,7 @@
#define _PAGE_ACCESSED (_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
#define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
#define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE)
-#ifdef CONFIG_KAISER
-#define _PAGE_GLOBAL (_AT(pteval_t, 0))
-#else
#define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
-#endif
#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
#define _PAGE_SOFTW2 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
#define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT)
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -136,9 +136,11 @@ static inline void cr4_set_bits_and_upda
* to avoid the need for asm/kaiser.h in unexpected places.
*/
#ifdef CONFIG_KAISER
+extern int kaiser_enabled;
extern void kaiser_setup_pcid(void);
extern void kaiser_flush_tlb_on_return_to_user(void);
#else
+#define kaiser_enabled 0
static inline void kaiser_setup_pcid(void)
{
}
@@ -163,7 +165,7 @@ static inline void __native_flush_tlb(vo
* back:
*/
preempt_disable();
- if (this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
kaiser_flush_tlb_on_return_to_user();
native_write_cr3(native_read_cr3());
preempt_enable();
@@ -174,20 +176,30 @@ static inline void __native_flush_tlb_gl
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- /* clear PGE */
- native_write_cr4(cr4 & ~X86_CR4_PGE);
- /* write old PGE again and flush TLBs */
- native_write_cr4(cr4);
+ if (cr4 & X86_CR4_PGE) {
+ /* clear PGE and flush TLB of all entries */
+ native_write_cr4(cr4 & ~X86_CR4_PGE);
+ /* restore PGE as it was before */
+ native_write_cr4(cr4);
+ } else {
+ /*
+ * x86_64 microcode update comes this way when CR4.PGE is not
+ * enabled, and it's safer for all callers to allow this case.
+ */
+ native_write_cr3(native_read_cr3());
+ }
}
static inline void __native_flush_tlb_global(void)
{
-#ifdef CONFIG_KAISER
- /* Globals are not used at all */
- __native_flush_tlb();
-#else
unsigned long flags;
+ if (kaiser_enabled) {
+ /* Globals are not used at all */
+ __native_flush_tlb();
+ return;
+ }
+
if (this_cpu_has(X86_FEATURE_INVPCID)) {
/*
* Using INVPCID is considerably faster than a pair of writes
@@ -207,7 +219,6 @@ static inline void __native_flush_tlb_gl
raw_local_irq_save(flags);
__native_flush_tlb_global_irq_disabled();
raw_local_irq_restore(flags);
-#endif
}
static inline void __native_flush_tlb_single(unsigned long addr)
@@ -222,7 +233,7 @@ static inline void __native_flush_tlb_si
*/
if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) {
- if (this_cpu_has(X86_FEATURE_PCID))
+ if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID))
kaiser_flush_tlb_on_return_to_user();
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
return;
@@ -237,9 +248,9 @@ static inline void __native_flush_tlb_si
* Make sure to do only a single invpcid when KAISER is
* disabled and we have only a single ASID.
*/
- if (X86_CR3_PCID_ASID_KERN != X86_CR3_PCID_ASID_USER)
- invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
- invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ if (kaiser_enabled)
+ invpcid_flush_one(X86_CR3_PCID_ASID_USER, addr);
+ invpcid_flush_one(X86_CR3_PCID_ASID_KERN, addr);
}
static inline void __flush_tlb_all(void)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -178,6 +178,20 @@ static int __init x86_pcid_setup(char *s
return 1;
}
__setup("nopcid", x86_pcid_setup);
+
+static int __init x86_nokaiser_setup(char *s)
+{
+ /* nokaiser doesn't accept parameters */
+ if (s)
+ return -EINVAL;
+#ifdef CONFIG_KAISER
+ kaiser_enabled = 0;
+ setup_clear_cpu_cap(X86_FEATURE_KAISER);
+ pr_info("nokaiser: KAISER feature disabled\n");
+#endif
+ return 0;
+}
+early_param("nokaiser", x86_nokaiser_setup);
#endif
static int __init x86_noinvpcid_setup(char *s)
@@ -324,7 +338,7 @@ static __always_inline void setup_smap(s
static void setup_pcid(struct cpuinfo_x86 *c)
{
if (cpu_has(c, X86_FEATURE_PCID)) {
- if (cpu_has(c, X86_FEATURE_PGE)) {
+ if (cpu_has(c, X86_FEATURE_PGE) || kaiser_enabled) {
cr4_set_bits(X86_CR4_PCIDE);
/*
* INVPCID has two "groups" of types:
@@ -747,6 +761,10 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
c->x86_power = cpuid_edx(0x80000007);
init_scattered_cpuid_features(c);
+#ifdef CONFIG_KAISER
+ if (kaiser_enabled)
+ set_cpu_cap(c, X86_FEATURE_KAISER);
+#endif
}
static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
@@ -1406,6 +1424,14 @@ void cpu_init(void)
* try to read it.
*/
cr4_init_shadow();
+ if (!kaiser_enabled) {
+ /*
+ * secondary_startup_64() deferred setting PGE in cr4:
+ * probe_page_size_mask() sets it on the boot cpu,
+ * but it needs to be set on each secondary cpu.
+ */
+ cr4_set_bits(X86_CR4_PGE);
+ }
/*
* Load microcode on this cpu if a valid microcode is available.
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -132,9 +132,10 @@ void __init init_espfix_bsp(void)
* area to ensure it is mapped into the shadow user page
* tables.
*/
- if (IS_ENABLED(CONFIG_KAISER))
+ if (kaiser_enabled) {
set_pgd(native_get_shadow_pgd(pgd_p),
__pgd(_KERNPG_TABLE | __pa((pud_t *)espfix_pud_page)));
+ }
/* Randomize the locations */
init_espfix_random();
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -183,8 +183,8 @@ ENTRY(secondary_startup_64)
movq $(init_level4_pgt - __START_KERNEL_map), %rax
1:
- /* Enable PAE mode and PGE */
- movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
+ /* Enable PAE and PSE, but defer PGE until kaiser_enabled is decided */
+ movl $(X86_CR4_PAE | X86_CR4_PSE), %ecx
movq %rcx, %cr4
/* Setup early boot stage 4 level pagetables. */
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -165,7 +165,7 @@ static void __init probe_page_size_mask(
cr4_set_bits_and_update_boot(X86_CR4_PSE);
/* Enable PGE if available */
- if (cpu_has_pge) {
+ if (cpu_has_pge && !kaiser_enabled) {
cr4_set_bits_and_update_boot(X86_CR4_PGE);
__supported_pte_mask |= _PAGE_GLOBAL;
} else
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -395,6 +395,16 @@ void __init cleanup_highmap(void)
continue;
if (vaddr < (unsigned long) _text || vaddr > end)
set_pmd(pmd, __pmd(0));
+ else if (kaiser_enabled) {
+ /*
+ * level2_kernel_pgt is initialized with _PAGE_GLOBAL:
+ * clear that now. This is not important, so long as
+ * CR4.PGE remains clear, but it removes an anomaly.
+ * Physical mapping setup below avoids _PAGE_GLOBAL
+ * by use of massage_pgprot() inside pfn_pte() etc.
+ */
+ set_pmd(pmd, pmd_clear_flags(*pmd, _PAGE_GLOBAL));
+ }
}
}
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -17,7 +17,9 @@
#include <asm/pgalloc.h>
#include <asm/desc.h>
-#ifdef CONFIG_KAISER
+int kaiser_enabled __read_mostly = 1;
+EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */
+
__visible
DEFINE_PER_CPU_USER_MAPPED(unsigned long, unsafe_stack_register_backup);
@@ -168,8 +170,8 @@ static pte_t *kaiser_pagetable_walk(unsi
return pte_offset_kernel(pmd, address);
}
-int kaiser_add_user_map(const void *__start_addr, unsigned long size,
- unsigned long flags)
+static int kaiser_add_user_map(const void *__start_addr, unsigned long size,
+ unsigned long flags)
{
int ret = 0;
pte_t *pte;
@@ -178,6 +180,15 @@ int kaiser_add_user_map(const void *__st
unsigned long end_addr = PAGE_ALIGN(start_addr + size);
unsigned long target_address;
+ /*
+ * It is convenient for callers to pass in __PAGE_KERNEL etc,
+ * and there is no actual harm from setting _PAGE_GLOBAL, so
+ * long as CR4.PGE is not set. But it is nonetheless troubling
+ * to see Kaiser itself setting _PAGE_GLOBAL (now that "nokaiser"
+ * requires that not to be #defined to 0): so mask it off here.
+ */
+ flags &= ~_PAGE_GLOBAL;
+
for (; address < end_addr; address += PAGE_SIZE) {
target_address = get_pa_from_mapping(address);
if (target_address == -1) {
@@ -264,6 +275,8 @@ void __init kaiser_init(void)
{
int cpu;
+ if (!kaiser_enabled)
+ return;
kaiser_init_all_pgds();
for_each_possible_cpu(cpu) {
@@ -312,6 +325,8 @@ void __init kaiser_init(void)
/* Add a mapping to the shadow mapping, and synchronize the mappings */
int kaiser_add_mapping(unsigned long addr, unsigned long size, unsigned long flags)
{
+ if (!kaiser_enabled)
+ return 0;
return kaiser_add_user_map((const void *)addr, size, flags);
}
@@ -323,6 +338,8 @@ void kaiser_remove_mapping(unsigned long
unsigned long addr, next;
pgd_t *pgd;
+ if (!kaiser_enabled)
+ return;
pgd = native_get_shadow_pgd(pgd_offset_k(start));
for (addr = start; addr < end; pgd++, addr = next) {
next = pgd_addr_end(addr, end);
@@ -344,6 +361,8 @@ static inline bool is_userspace_pgd(pgd_
pgd_t kaiser_set_shadow_pgd(pgd_t *pgdp, pgd_t pgd)
{
+ if (!kaiser_enabled)
+ return pgd;
/*
* Do we need to also populate the shadow pgd? Check _PAGE_USER to
* skip cases like kexec and EFI which make temporary low mappings.
@@ -400,4 +419,3 @@ void kaiser_flush_tlb_on_return_to_user(
X86_CR3_PCID_USER_FLUSH | KAISER_SHADOW_PGD_OFFSET);
}
EXPORT_SYMBOL(kaiser_flush_tlb_on_return_to_user);
-#endif /* CONFIG_KAISER */
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -341,16 +341,12 @@ static inline void _pgd_free(pgd_t *pgd)
}
#else
-#ifdef CONFIG_KAISER
/*
- * Instead of one pmd, we aquire two pmds. Being order-1, it is
+ * Instead of one pgd, Kaiser acquires two pgds. Being order-1, it is
* both 8k in size and 8k-aligned. That lets us just flip bit 12
* in a pointer to swap between the two 4k halves.
*/
-#define PGD_ALLOCATION_ORDER 1
-#else
-#define PGD_ALLOCATION_ORDER 0
-#endif
+#define PGD_ALLOCATION_ORDER kaiser_enabled
static inline pgd_t *_pgd_alloc(void)
{
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -39,8 +39,7 @@ static void load_new_mm_cr3(pgd_t *pgdir
{
unsigned long new_mm_cr3 = __pa(pgdir);
-#ifdef CONFIG_KAISER
- if (this_cpu_has(X86_FEATURE_PCID)) {
+ if (kaiser_enabled && this_cpu_has(X86_FEATURE_PCID)) {
/*
* We reuse the same PCID for different tasks, so we must
* flush all the entries for the PCID out when we change tasks.
@@ -57,7 +56,6 @@ static void load_new_mm_cr3(pgd_t *pgdir
new_mm_cr3 |= X86_CR3_PCID_KERN_FLUSH;
kaiser_flush_tlb_on_return_to_user();
}
-#endif /* CONFIG_KAISER */
/*
* Caution: many callers of this function expect
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
This is a note to let you know that I've just added the patch titled
kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Jan 3 18:58:12 CET 2018
From: Hugh Dickins <hughd(a)google.com>
Date: Fri, 13 Oct 2017 12:10:00 -0700
Subject: kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
From: Hugh Dickins <hughd(a)google.com>
Synthetic filesystem mempressure testing has shown softlockups, with
hour-long page allocation stalls, and pgd_alloc() trying for order:1
with __GFP_REPEAT in one of the backtraces each time.
That's _pgd_alloc() going for a Kaiser double-pgd, using the __GFP_REPEAT
common to all page table allocations, but actually having no effect on
order:0 (see should_alloc_oom() and should_continue_reclaim() in this
tree, but beware that ports to another tree might behave differently).
Order:1 stack allocation has been working satisfactorily without
__GFP_REPEAT forever, and page table allocation only asks __GFP_REPEAT
for awkward occasions in a long-running process: it's not appropriate
at fork or exec time, and seems to be doing much more harm than good:
getting those contiguous pages under very heavy mempressure can be
hard (though even without it, Kaiser does generate more mempressure).
Mask out that __GFP_REPEAT inside _pgd_alloc(). Why not take it out
of the PGALLOG_GFP altogether, as v4.7 commit a3a9a59d2067 ("x86: get
rid of superfluous __GFP_REPEAT") did? Because I think that might
make a difference to our page table memcg charging, which I'd prefer
not to interfere with at this time.
hughd adds: __alloc_pages_slowpath() in the 4.4.89-stable tree handles
__GFP_REPEAT a little differently than in prod kernel or 3.18.72-stable,
so it may not always be exactly a no-op on order:0 pages, as said above;
but I think still appropriate to omit it from Kaiser or non-Kaiser pgd.
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pgtable.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,7 +6,7 @@
#include <asm/fixmap.h>
#include <asm/mtrr.h>
-#define PGALLOC_GFP GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
#ifdef CONFIG_HIGHPTE
#define PGALLOC_USER_GFP __GFP_HIGHMEM
@@ -354,7 +354,9 @@ static inline void _pgd_free(pgd_t *pgd)
static inline pgd_t *_pgd_alloc(void)
{
- return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER);
+ /* No __GFP_REPEAT: to avoid page allocation stalls in order-1 case */
+ return (pgd_t *)__get_free_pages(PGALLOC_GFP & ~__GFP_REPEAT,
+ PGD_ALLOCATION_ORDER);
}
static inline void _pgd_free(pgd_t *pgd)
Patches currently in stable-queue which might be from hughd(a)google.com are
queue-4.4/kaiser-vmstat-show-nr_kaisertable-as-nr_overhead.patch
queue-4.4/kaiser-add-nokaiser-boot-option-using-alternative.patch
queue-4.4/kaiser-fix-unlikely-error-in-alloc_ldt_struct.patch
queue-4.4/kaiser-_pgd_alloc-without-__gfp_repeat-to-avoid-stalls.patch
queue-4.4/kaiser-kaiser_flush_tlb_on_return_to_user-check-pcid.patch
queue-4.4/x86-paravirt-dont-patch-flush_tlb_single.patch
queue-4.4/kaiser-merged-update.patch
queue-4.4/kaiser-delete-kaiser_real_switch-option.patch
queue-4.4/kaiser-kaiser_remove_mapping-move-along-the-pgd.patch
queue-4.4/kaiser-fix-perf-crashes.patch
queue-4.4/kaiser-drop-is_atomic-arg-to-kaiser_pagetable_walk.patch
queue-4.4/kaiser-load_new_mm_cr3-let-switch_user_cr3-flush-user.patch
queue-4.4/kaiser-enhanced-by-kernel-and-user-pcids.patch
queue-4.4/kaiser-x86_cr3_pcid_noflush-and-x86_cr3_pcid_user.patch
queue-4.4/kaiser-use-alternative-instead-of-x86_cr3_pcid_noflush.patch
queue-4.4/kaiser-stack-map-page_size-at-thread_size-page_size.patch
queue-4.4/kaiser-name-that-0x1000-kaiser_shadow_pgd_offset.patch
queue-4.4/kaiser-fix-regs-to-do_nmi-ifndef-config_kaiser.patch
queue-4.4/kaiser-do-not-set-_page_nx-on-pgd_none.patch
queue-4.4/kaiser-tidied-up-asm-kaiser.h-somewhat.patch
queue-4.4/kaiser-cleanups-while-trying-for-gold-link.patch
queue-4.4/kaiser-tidied-up-kaiser_add-remove_mapping-slightly.patch
queue-4.4/kaiser-fix-build-and-fixme-in-alloc_ldt_struct.patch
queue-4.4/kaiser-kernel-address-isolation.patch
queue-4.4/kaiser-enomem-if-kaiser_pagetable_walk-null.patch
queue-4.4/kaiser-asm-tlbflush.h-handle-nopge-at-lower-level.patch
queue-4.4/kaiser-paranoid_entry-pass-cr3-need-to-paranoid_exit.patch
queue-4.4/kaiser-kaiser-depends-on-smp.patch
queue-4.4/kaiser-pcid-0-for-kernel-and-128-for-user.patch
On Wed, Jan 03, 2018 at 03:41:21PM +0000, Mark Brown wrote:
> On Wed, Jan 03, 2018 at 04:32:35PM +0100, Arnd Bergmann wrote:
>
> > It might be a problem with different 'make' versions, or something else
> > in the environment that leads to "-Wno-frame-address" not being set
> > on the command line instead.
>
> I'm using "GNU Make 4.1" - this is all just the default Debian stuff,
> I've not customized my x86 host environment.
Make 4.2.1 here with gcc 7.2.1 and I get tons of build warnings on my
3.18.y tree on my laptop, but my build machine running Fedora (make
4.2.1 and gcc 7.2.1) I get no build warnings at all.
I've given up trying to track it down for my laptop, something is
preventing the compiler flag from being added to the gcc command line,
and I don't care anymore...
thanks,
greg k-h
Commit-ID: f24c4d478013d82bd1b943df566fff3561d52864
Gitweb: https://git.kernel.org/tip/f24c4d478013d82bd1b943df566fff3561d52864
Author: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
AuthorDate: Tue, 2 Jan 2018 17:21:10 +0000
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Wed, 3 Jan 2018 13:54:31 +0100
efi/capsule-loader: Reinstate virtual capsule mapping
Commit:
82c3768b8d68 ("efi/capsule-loader: Use a cached copy of the capsule header")
... refactored the capsule loading code that maps the capsule header,
to avoid having to map it several times.
However, as it turns out, the vmap() call we ended up removing did not
just map the header, but the entire capsule image, and dropping this
virtual mapping breaks capsules that are processed by the firmware
immediately (i.e., without a reboot).
Unfortunately, that change was part of a larger refactor that allowed
a quirk to be implemented for Quark, which has a non-standard memory
layout for capsules, and we have slightly painted ourselves into a
corner by allowing quirk code to mangle the capsule header and memory
layout.
So we need to fix this without breaking Quark. Fortunately, Quark does
not appear to care about the virtual mapping, and so we can simply
do a partial revert of commit:
2a457fb31df6 ("efi/capsule-loader: Use page addresses rather than struct page pointers")
... and create a vmap() mapping of the entire capsule (including header)
based on the reinstated struct page array, unless running on Quark, in
which case we pass the capsule header copy as before.
Reported-by: Ge Song <ge.song(a)hxt-semitech.com>
Tested-by: Bryan O'Donoghue <pure.logic(a)nexus-software.ie>
Tested-by: Ge Song <ge.song(a)hxt-semitech.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-efi(a)vger.kernel.org
Fixes: 82c3768b8d68 ("efi/capsule-loader: Use a cached copy of the capsule header")
Link: http://lkml.kernel.org/r/20180102172110.17018-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
arch/x86/platform/efi/quirks.c | 13 +++++++++-
drivers/firmware/efi/capsule-loader.c | 45 ++++++++++++++++++++++++++++-------
include/linux/efi.h | 4 +++-
3 files changed, 52 insertions(+), 10 deletions(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 8a99a2e..5b513cc 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -592,7 +592,18 @@ static int qrk_capsule_setup_info(struct capsule_info *cap_info, void **pkbuff,
/*
* Update the first page pointer to skip over the CSH header.
*/
- cap_info->pages[0] += csh->headersize;
+ cap_info->phys[0] += csh->headersize;
+
+ /*
+ * cap_info->capsule should point at a virtual mapping of the entire
+ * capsule, starting at the capsule header. Our image has the Quark
+ * security header prepended, so we cannot rely on the default vmap()
+ * mapping created by the generic capsule code.
+ * Given that the Quark firmware does not appear to care about the
+ * virtual mapping, let's just point cap_info->capsule at our copy
+ * of the capsule header.
+ */
+ cap_info->capsule = &cap_info->header;
return 1;
}
diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c
index ec8ac5c..055e2e8 100644
--- a/drivers/firmware/efi/capsule-loader.c
+++ b/drivers/firmware/efi/capsule-loader.c
@@ -20,10 +20,6 @@
#define NO_FURTHER_WRITE_ACTION -1
-#ifndef phys_to_page
-#define phys_to_page(x) pfn_to_page((x) >> PAGE_SHIFT)
-#endif
-
/**
* efi_free_all_buff_pages - free all previous allocated buffer pages
* @cap_info: pointer to current instance of capsule_info structure
@@ -35,7 +31,7 @@
static void efi_free_all_buff_pages(struct capsule_info *cap_info)
{
while (cap_info->index > 0)
- __free_page(phys_to_page(cap_info->pages[--cap_info->index]));
+ __free_page(cap_info->pages[--cap_info->index]);
cap_info->index = NO_FURTHER_WRITE_ACTION;
}
@@ -71,6 +67,14 @@ int __efi_capsule_setup_info(struct capsule_info *cap_info)
cap_info->pages = temp_page;
+ temp_page = krealloc(cap_info->phys,
+ pages_needed * sizeof(phys_addr_t *),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!temp_page)
+ return -ENOMEM;
+
+ cap_info->phys = temp_page;
+
return 0;
}
@@ -105,9 +109,24 @@ int __weak efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff,
**/
static ssize_t efi_capsule_submit_update(struct capsule_info *cap_info)
{
+ bool do_vunmap = false;
int ret;
- ret = efi_capsule_update(&cap_info->header, cap_info->pages);
+ /*
+ * cap_info->capsule may have been assigned already by a quirk
+ * handler, so only overwrite it if it is NULL
+ */
+ if (!cap_info->capsule) {
+ cap_info->capsule = vmap(cap_info->pages, cap_info->index,
+ VM_MAP, PAGE_KERNEL);
+ if (!cap_info->capsule)
+ return -ENOMEM;
+ do_vunmap = true;
+ }
+
+ ret = efi_capsule_update(cap_info->capsule, cap_info->phys);
+ if (do_vunmap)
+ vunmap(cap_info->capsule);
if (ret) {
pr_err("capsule update failed\n");
return ret;
@@ -165,10 +184,12 @@ static ssize_t efi_capsule_write(struct file *file, const char __user *buff,
goto failed;
}
- cap_info->pages[cap_info->index++] = page_to_phys(page);
+ cap_info->pages[cap_info->index] = page;
+ cap_info->phys[cap_info->index] = page_to_phys(page);
cap_info->page_bytes_remain = PAGE_SIZE;
+ cap_info->index++;
} else {
- page = phys_to_page(cap_info->pages[cap_info->index - 1]);
+ page = cap_info->pages[cap_info->index - 1];
}
kbuff = kmap(page);
@@ -252,6 +273,7 @@ static int efi_capsule_release(struct inode *inode, struct file *file)
struct capsule_info *cap_info = file->private_data;
kfree(cap_info->pages);
+ kfree(cap_info->phys);
kfree(file->private_data);
file->private_data = NULL;
return 0;
@@ -281,6 +303,13 @@ static int efi_capsule_open(struct inode *inode, struct file *file)
return -ENOMEM;
}
+ cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL);
+ if (!cap_info->phys) {
+ kfree(cap_info->pages);
+ kfree(cap_info);
+ return -ENOMEM;
+ }
+
file->private_data = cap_info;
return 0;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index d813f7b..29fdf80 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -140,11 +140,13 @@ struct efi_boot_memmap {
struct capsule_info {
efi_capsule_header_t header;
+ efi_capsule_header_t *capsule;
int reset_type;
long index;
size_t count;
size_t total_size;
- phys_addr_t *pages;
+ struct page **pages;
+ phys_addr_t *phys;
size_t page_bytes_remain;
};
The patch
ASoC: skl: Fix kernel warning due to zero NHTL entry
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From 20a1ea2222e7cbf96e9bf8579362e971491e6aea Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Wed, 3 Jan 2018 16:38:46 +0100
Subject: [PATCH] ASoC: skl: Fix kernel warning due to zero NHTL entry
I got the following kernel warning when loading snd-soc-skl module on
Dell Latitude 7270 laptop:
memremap attempted on mixed range 0x0000000000000000 size: 0x0
WARNING: CPU: 0 PID: 484 at kernel/memremap.c:98 memremap+0x8a/0x180
Call Trace:
skl_nhlt_init+0x82/0xf0 [snd_soc_skl]
skl_probe+0x2ee/0x7c0 [snd_soc_skl]
....
It seems that the machine doesn't support the SKL DSP gives the empty
NHLT entry, and it triggers the warning. For avoiding it, let do the
zero check before calling memremap().
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/intel/skylake/skl-nhlt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/intel/skylake/skl-nhlt.c b/sound/soc/intel/skylake/skl-nhlt.c
index d14c50a60289..1ce414d86d8a 100644
--- a/sound/soc/intel/skylake/skl-nhlt.c
+++ b/sound/soc/intel/skylake/skl-nhlt.c
@@ -43,7 +43,8 @@ struct nhlt_acpi_table *skl_nhlt_init(struct device *dev)
obj = acpi_evaluate_dsm(handle, &osc_guid, 1, 1, NULL);
if (obj && obj->type == ACPI_TYPE_BUFFER) {
nhlt_ptr = (struct nhlt_resource_desc *)obj->buffer.pointer;
- nhlt_table = (struct nhlt_acpi_table *)
+ if (nhlt_ptr->length)
+ nhlt_table = (struct nhlt_acpi_table *)
memremap(nhlt_ptr->min_addr, nhlt_ptr->length,
MEMREMAP_WB);
ACPI_FREE(obj);
--
2.15.1
On Wed, Jan 3, 2018 at 3:59 PM, Mark Brown <broonie(a)kernel.org> wrote:
> On Wed, Jan 03, 2018 at 03:52:12PM +0100, Arnd Bergmann wrote:
>
>> Greg and Mark, which x86-64 compiler versions do you use?
>
> gcc (Debian 6.3.0-18) 6.3.0 20170516
I installed that version here, same as my other gcc-6 versions, the
"-Wno-frame-address" option gets set and interpreted correctly:
$ git checkout v3.18.91 ; touch kernel/sched/core.c ; make V=1
defconfig kernel/sched/core.o CC=gcc-6 2>&1 | tail
...
gcc-6 -Wp,-MD,kernel/sched/.core.o.d -nostdinc -isystem
/usr/lib/gcc/x86_64-linux-gnu/6/include -I./arch/x86/include
-Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi
-Iarch/x86/include/generated/uapi -I./include/uapi
-Iinclude/generated/uapi -include ./include/linux/kconfig.h
-D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration
-Wno-format-security -std=gnu89 -m64 -mno-80387 -mno-fp-ret-in-387
-mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time
-maccumulate-outgoing-args -DCONFIG_AS_CFI=1
-DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1
-DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1
-DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare
-fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
-mno-avx -fno-delete-null-pointer-checks -Wno-frame-address -fno-PIE
-O2 --param=allow-store-data-races=0 -Wframe-larger-than=2048
-fno-stack-protector -Wno-unused-but-set-variable
-Wno-unused-const-variable -fno-omit-frame-pointer
-fno-optimize-sibling-calls -fno-var-tracking-assignments
-Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow
-fno-stack-check -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -DCC_HAVE_ASM_GOTO
-D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(core)"
-D"KBUILD_MODNAME=KBUILD_STR(core)" -c -o kernel/sched/core.o
kernel/sched/core.c
It might be a problem with different 'make' versions, or something else
in the environment that leads to "-Wno-frame-address" not being set
on the command line instead.
Arnd
This is a note to let you know that I've just added the patch titled
x86/boot: Add early cmdline parsing for options with arguments
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-boot-add-early-cmdline-parsing-for-options-with-arguments.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e505371dd83963caae1a37ead9524e8d997341be Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Mon, 17 Jul 2017 16:10:33 -0500
Subject: x86/boot: Add early cmdline parsing for options with arguments
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit e505371dd83963caae1a37ead9524e8d997341be upstream.
Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Larry Woodman <lwoodman(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Toshimitsu Kani <toshi.kani(a)hpe.com>
Cc: kasan-dev(a)googlegroups.com
Cc: kvm(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-efi(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.150031921…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/cmdline.h | 2
arch/x86/lib/cmdline.c | 105 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 107 insertions(+)
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
#define _ASM_X86_CMDLINE_H
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+ char *buffer, int bufsize);
#endif /* _ASM_X86_CMDLINE_H */
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -82,3 +82,108 @@ int cmdline_find_option_bool(const char
return 0; /* Buffer overrun */
}
+
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+ const char *option, char *buffer, int bufsize)
+{
+ char c;
+ int pos = 0, len = -1;
+ const char *opptr = NULL;
+ char *bufptr = buffer;
+ enum {
+ st_wordstart = 0, /* Start of word/after whitespace */
+ st_wordcmp, /* Comparing this word */
+ st_wordskip, /* Miscompare, skip */
+ st_bufcpy, /* Copying this to buffer */
+ } state = st_wordstart;
+
+ if (!cmdline)
+ return -1; /* No command line */
+
+ /*
+ * This 'pos' check ensures we do not overrun
+ * a non-NULL-terminated 'cmdline'
+ */
+ while (pos++ < max_cmdline_size) {
+ c = *(char *)cmdline++;
+ if (!c)
+ break;
+
+ switch (state) {
+ case st_wordstart:
+ if (myisspace(c))
+ break;
+
+ state = st_wordcmp;
+ opptr = option;
+ /* fall through */
+
+ case st_wordcmp:
+ if ((c == '=') && !*opptr) {
+ /*
+ * We matched all the way to the end of the
+ * option we were looking for, prepare to
+ * copy the argument.
+ */
+ len = 0;
+ bufptr = buffer;
+ state = st_bufcpy;
+ break;
+ } else if (c == *opptr++) {
+ /*
+ * We are currently matching, so continue
+ * to the next character on the cmdline.
+ */
+ break;
+ }
+ state = st_wordskip;
+ /* fall through */
+
+ case st_wordskip:
+ if (myisspace(c))
+ state = st_wordstart;
+ break;
+
+ case st_bufcpy:
+ if (myisspace(c)) {
+ state = st_wordstart;
+ } else {
+ /*
+ * Increment len, but don't overrun the
+ * supplied buffer and leave room for the
+ * NULL terminator.
+ */
+ if (++len < bufsize)
+ *bufptr++ = c;
+ }
+ break;
+ }
+ }
+
+ if (bufsize)
+ *bufptr = '\0';
+
+ return len;
+}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+ int bufsize)
+{
+ return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+ buffer, bufsize);
+}
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.4/x86-boot-add-early-cmdline-parsing-for-options-with-arguments.patch
This is a note to let you know that I've just added the patch titled
x86/boot: Add early cmdline parsing for options with arguments
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-boot-add-early-cmdline-parsing-for-options-with-arguments.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e505371dd83963caae1a37ead9524e8d997341be Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Mon, 17 Jul 2017 16:10:33 -0500
Subject: x86/boot: Add early cmdline parsing for options with arguments
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tom Lendacky <thomas.lendacky(a)amd.com>
commit e505371dd83963caae1a37ead9524e8d997341be upstream.
Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Larry Woodman <lwoodman(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Toshimitsu Kani <toshi.kani(a)hpe.com>
Cc: kasan-dev(a)googlegroups.com
Cc: kvm(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-efi(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Link: http://lkml.kernel.org/r/36b5f97492a9745dce27682305f990fc20e5cf8a.150031921…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/cmdline.h | 2
arch/x86/lib/cmdline.c | 105 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 107 insertions(+)
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
#define _ASM_X86_CMDLINE_H
int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+ char *buffer, int bufsize);
#endif /* _ASM_X86_CMDLINE_H */
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -104,7 +104,112 @@ __cmdline_find_option_bool(const char *c
return 0; /* Buffer overrun */
}
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+ const char *option, char *buffer, int bufsize)
+{
+ char c;
+ int pos = 0, len = -1;
+ const char *opptr = NULL;
+ char *bufptr = buffer;
+ enum {
+ st_wordstart = 0, /* Start of word/after whitespace */
+ st_wordcmp, /* Comparing this word */
+ st_wordskip, /* Miscompare, skip */
+ st_bufcpy, /* Copying this to buffer */
+ } state = st_wordstart;
+
+ if (!cmdline)
+ return -1; /* No command line */
+
+ /*
+ * This 'pos' check ensures we do not overrun
+ * a non-NULL-terminated 'cmdline'
+ */
+ while (pos++ < max_cmdline_size) {
+ c = *(char *)cmdline++;
+ if (!c)
+ break;
+
+ switch (state) {
+ case st_wordstart:
+ if (myisspace(c))
+ break;
+
+ state = st_wordcmp;
+ opptr = option;
+ /* fall through */
+
+ case st_wordcmp:
+ if ((c == '=') && !*opptr) {
+ /*
+ * We matched all the way to the end of the
+ * option we were looking for, prepare to
+ * copy the argument.
+ */
+ len = 0;
+ bufptr = buffer;
+ state = st_bufcpy;
+ break;
+ } else if (c == *opptr++) {
+ /*
+ * We are currently matching, so continue
+ * to the next character on the cmdline.
+ */
+ break;
+ }
+ state = st_wordskip;
+ /* fall through */
+
+ case st_wordskip:
+ if (myisspace(c))
+ state = st_wordstart;
+ break;
+
+ case st_bufcpy:
+ if (myisspace(c)) {
+ state = st_wordstart;
+ } else {
+ /*
+ * Increment len, but don't overrun the
+ * supplied buffer and leave room for the
+ * NULL terminator.
+ */
+ if (++len < bufsize)
+ *bufptr++ = c;
+ }
+ break;
+ }
+ }
+
+ if (bufsize)
+ *bufptr = '\0';
+
+ return len;
+}
+
int cmdline_find_option_bool(const char *cmdline, const char *option)
{
return __cmdline_find_option_bool(cmdline, COMMAND_LINE_SIZE, option);
}
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+ int bufsize)
+{
+ return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+ buffer, bufsize);
+}
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.9/x86-boot-add-early-cmdline-parsing-for-options-with-arguments.patch
If we are asked for number of entries of an offset bigger than the
sg list we should not crash.
Cc: stable(a)vger.kernel.org
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
---
drivers/staging/ccree/ssi_buffer_mgr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/ccree/ssi_buffer_mgr.c b/drivers/staging/ccree/ssi_buffer_mgr.c
index 13cb062..9e3f557 100644
--- a/drivers/staging/ccree/ssi_buffer_mgr.c
+++ b/drivers/staging/ccree/ssi_buffer_mgr.c
@@ -94,7 +94,7 @@ static unsigned int cc_get_sgl_nents(struct device *dev,
{
unsigned int nents = 0;
- while (nbytes) {
+ while (nbytes && sg_list) {
if (sg_list->length) {
nents++;
/* get the number of bytes in the last entry */
--
2.7.4
If we ran out of DMA pool buffers, we get into the unmap
code path with a NULL before. Deal with this by checking
the virtual mapping is not NULL.
Cc: stable(a)vger.kernel.org
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
---
drivers/staging/ccree/ssi_buffer_mgr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/ccree/ssi_buffer_mgr.c b/drivers/staging/ccree/ssi_buffer_mgr.c
index 5e3cff3..13cb062 100644
--- a/drivers/staging/ccree/ssi_buffer_mgr.c
+++ b/drivers/staging/ccree/ssi_buffer_mgr.c
@@ -465,7 +465,8 @@ void cc_unmap_blkcipher_request(struct device *dev, void *ctx,
DMA_TO_DEVICE);
}
/* Release pool */
- if (req_ctx->dma_buf_type == CC_DMA_BUF_MLLI) {
+ if (req_ctx->dma_buf_type == CC_DMA_BUF_MLLI &&
+ req_ctx->mlli_params.mlli_virt_addr) {
dma_pool_free(req_ctx->mlli_params.curr_pool,
req_ctx->mlli_params.mlli_virt_addr,
req_ctx->mlli_params.mlli_dma_addr);
--
2.7.4
On Wed, Dec 20, 2017 at 11:49 PM, kernelci.org bot <bot(a)kernelci.org> wrote:
> Errors summary:
> 1 arch/mips/sgi-ip22/Platform:29: *** gcc doesn't support needed option -mr10k-cache-barrier=store. Stop.
This needs a backport of
c018595d83a3 ("MIPS: ip22: Fix ip28 build for modern gcc")
> 6 include/linux/irqdesc.h:94:33: error: 'NR_IRQS' undeclared here (not in a function)
> 1 arch/mips/jz4740/gpio.c:46:32: error: initializer element is not constant
> 1 arch/mips/jz4740/gpio.c:45:32: error: initializer element is not constant
> 1 arch/mips/jz4740/gpio.c:44:32: error: initializer element is not constant
> 1 arch/mips/jz4740/gpio.c:43:32: error: initializer element is not constant
> 1 arch/mips/jz4740/gpio.c:426:14: error: implicit declaration of function 'JZ4740_IRQ_INTC_GPIO' [-Werror=implicit-function-declaration]
> 1 arch/mips/jz4740/gpio.c:268:9: error: implicit declaration of function 'JZ4740_IRQ_GPIO' [-Werror=implicit-function-declaration]
Not sure, may have been fixed by
832f5dacfa0b ("MIPS: Remove all the uses of custom gpio.h")
which doesn't look appropriate for stable backports
> Warnings summary:
> 81 include/linux/blkdev.h:624:26: warning: switch condition has boolean value [-Wswitch-bool]
> 23 drivers/mtd/mtd_blkdevs.c:100:2: warning: switch condition has boolean value [-Wswitch-bool]
10fbd36e362a ("blk: rq_data_dir() should not return a boolean")
> 35 mm/page_alloc.c:5435:34: warning: array subscript is below array bounds [-Warray-bounds]
I think this was fixed by
90cae1fe1c35 ("mm/init: fix zone boundary creation")
which was intended to address another problem but got backported to 4.4
> 20 arch/arm/include/asm/cmpxchg.h:122:3: warning: value computed is not used [-Wunused-value]
> 5 arch/arm/include/asm/cmpxchg.h:205:3: warning: value computed is not used [-Wunused-value]
d975440bf80c ("asm-generic: cmpxchg: avoid warnings from macro-ized
cmpxchg() implementations")
> 10 drivers/rtc/rtc-pcf8563.c:444:5: warning: 'alm_pending' may be used uninitialized in this function [-Wmaybe-uninitialized]
cd1420d3a909 ("rtc: pfc8563: fix uninitialized variable warning")
> 6 crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/wp512.c:987:1: warning: the frame size of 1344 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/wp512.c:987:1: warning: the frame size of 1280 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/wp512.c:987:1: warning: the frame size of 1152 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/wp512.c:987:1: warning: the frame size of 1144 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/wp512.c:987:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/serpent_generic.c:436:1: warning: the frame size of 1472 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 1 crypto/serpent_generic.c:436:1: warning: the frame size of 1456 bytes is larger than 1024 bytes [-Wframe-larger-than=]
7d6e9105026 ("crypto: improve gcc optimization flags for serpent and wp512")
> 4 fs/nfsd/nfs4state.c:3962:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
6ac75368e1a6 ("nfsd: work around a gcc-5.1 warning")
> 3 drivers/net/wireless/brcm80211/brcmfmac/fwsignal.c:1478:8: warning: 'skb' may be used uninitialized in this function [-Wmaybe-uninitialized]
22f44150aad7 ("brcmfmac: avoid gcc-5.1 warning")
> 2 lib/iommu-common.c:15:24: warning: large integer implicitly truncated to unsigned type [-Woverflow]
447f6a95a9c8 ("lib/iommu-common.c: do not use 0xffffffffffffffffl for
computing align_mask")
> 2 drivers/hid/hid-input.c:1163:67: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
09a5c34e8d6b ("HID: hid-input: Add parentheses to quell gcc warning")
> 2 drivers/atm/iphase.c:1176:12: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
cbb41b91e68a ("atm: iphase: fix misleading indention")
> 2 arch/x86/include/asm/msr.h:209:23: warning: right shift count >= width of type [-Wshift-count-overflow]
cf991de2f614 ("x86/asm/msr: Make wrmsrl_safe() a function")
47edb65178cb ("x86/asm/msr: Make wrmsrl() a function")
> 1 fs/gfs2/dir.c:978:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 fs/gfs2/dir.c:759:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
d39cb4a59729 ("gfs2: avoid uninitialized variable warning")
> 1 drivers/net/ethernet/ti/cpmac.c:1238:2: warning: #warning FIXME: unhardcode gpio&reset bits [-Wcpp]
d43e6fb4ac4a ("cpmac: remove hopeless #warning")
> 1 drivers/scsi/mvsas/mv_sas.c:736:3: warning: this 'else' clause does not guard... [-Wmisleading-indentation]
7789cd39274c ("mvsas: fix misleading indentation")
> 1 drivers/scsi/bfa/bfa_ioc.c:3673:4: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
> 1 drivers/scsi/bfa/bfa_ioc.c:3665:3: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
b7f4d6343820 ("bfa: Fix indentation")
> 1 arch/mips/ralink/rt305x.c:94:13: warning: 'rt305x_wdt_reset' defined but not used [-Wunused-function]
886f9c69fc68 ("MIPS: ralink: Remove unused rt*_wdt_reset functions")
> 1 arch/mips/ralink/prom.c:65:2: warning: 'argv' is used uninitialized in this function [-Wuninitialized]
> 1 arch/mips/ralink/prom.c:65:2: warning: 'argc' is used uninitialized in this function [-Wuninitialized]
9c48568b3692 ("MIPS: ralink: Cosmetic change to prom_init().")
> 1 arch/mips/dec/int-handler.S:198: Warning: macro instruction expanded into multiple instructions in a branch delay slot
> 1 arch/mips/dec/int-handler.S:149: Warning: macro instruction expanded into multiple instructions in a branch delay slot
3021773c7c3e ("MIPS: DEC: Avoid la pseudo-instruction in delay slots")
> 1 arch/arm/kernel/head-nommu.S:167: Warning: Use of r13 as a source register is deprecated when r15 is the destination register.
970d96f9a81b ("ARM: 8383/1: nommu: avoid deprecated source register on mov")
> 1 .config:871:warning: override: NOHIGHMEM changes choice state
> 1 .config:870:warning: override: SLOB changes choice state
> 1 .config:868:warning: override: KERNEL_XZ changes choice state
> 1 .config:829:warning: override: SLOB changes choice state
> 1 .config:827:warning: override: KERNEL_XZ changes choice state
> 1 .config:794:warning: override: SLOB changes choice state
> 1 .config:771:warning: override: SLOB changes choice state
236dec051078 ("kconfig: tinyconfig: provide whole choice blocks to
avoid warnings")
> 1 drivers/scsi/be2iscsi/be_main.c:3168:18: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]
dd29dae00d39 ("be2iscsi: Fix bogus WARN_ON length check")
> 1 drivers/scsi/aic94xx/aic94xx_sds.c:597:2: warning: 'offs' may be used uninitialized in this function [-Wmaybe-uninitialized]
36dd5acd1965 ("aic94xx: Skip reading user settings if flash is not found")
> 1 drivers/rtc/rtc-armada38x.c:91:22: warning: unused variable 'flags' [-Wunused-variable]
d80238bbcad3 ("rtc: armada38x: Remove unused variable from
armada38x_rtc_set_time()")
> 1 drivers/net/wireless/iwlegacy/3945.c:1022:5: warning: suggest explicit braces to avoid ambiguous 'else' [-Wparentheses]
2cce76c3fab4 ("iwlegacy: avoid warning about missing braces")
> 1 drivers/net/ethernet/neterion/vxge/vxge-main.c:2149:13: warning: 'adaptive_coalesce_rx_interrupts' defined but not used [-Wunused-function]
> 1 drivers/net/ethernet/neterion/vxge/vxge-main.c:2121:13: warning: 'adaptive_coalesce_tx_interrupts' defined but not used [-Wunused-function]
57e7c8cef224 ("net: vxge: avoid unused function warnings")
> 1 drivers/net/ethernet/dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture undefined [-Wcpp]
de92718883dd ("net: tulip: turn compile-time warning into dev_warn()")
> 1 drivers/net/ethernet/dec/tulip/uli526x.c:1086:4: warning: this 'else' clause does not guard... [-Wmisleading-indentation]
e1395a321eab ("drivers/net/ethernet/dec/tulip/uli526x.c: fix
misleading indentation in uli526x_timer")
> 1 drivers/mtd/maps/pmcmsp-flash.c:149:30: warning: passing argument 1 of 'strncpy' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
79ad07d45743 ("mtd: pmcmsp-flash: Allocating too much in init_msp_flash()")
906b268477bc ("mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy")
> 1 drivers/media/platform/coda/./trace.h:12:0: warning: "TRACE_SYSTEM_STRING" redefined
> 1 include/trace/ftrace.h:28:0: warning: "TRACE_SYSTEM_STRING" redefined
ae1c75d6ce3a ("[media] coda: remove extraneous TRACE_SYSTEM_STRING")
> 1 drivers/net/fddi/defxx.c:595:10: warning: '*((void *)&bar_len+8)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:595:10: warning: '*((void *)&bar_len+4)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:594:5: warning: '*((void *)&bar_start+8)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:594:5: warning: '*((void *)&bar_start+4)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:586:10: warning: 'bar_start' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:586:10: warning: 'bar_len' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:583:10: warning: 'bar_start' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:583:10: warning: 'bar_len' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:3720:5: warning: '*((void *)&bar_start+8)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 drivers/net/fddi/defxx.c:3720:5: warning: '*((void *)&bar_start+16)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 include/linux/ioport.h:199:37: warning: 'bar_start' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 include/linux/ioport.h:198:33: warning: 'bar_start' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 include/linux/ioport.h:198:33: warning: 'bar_len' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 include/linux/ioport.h:198:33: warning: '*((void *)&bar_len+8)' may be used uninitialized in this function [-Wmaybe-uninitialized]
> 1 include/linux/ioport.h:198:33: warning: '*((void *)&bar_len+16)' may be used uninitialized in this function [-Wmaybe-uninitialized]
62f2aaabcf41 ("defxx: fix build warning")
> 1 include/linux/kernel.h:729:17: warning: comparison of distinct pointer types lacks a cast
> 1 include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast
30544af54837 ("e1000e: fix call to do_div() to use u64 arg")
> 1 drivers/scsi/megaraid/megaraid_sas_fusion.c:1723:3: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
4a5c814d9339 ("megaraid_sas : Add separate functions for building
sysPD IOs and non RW LDIOs")
not sure if that is appropriate for stable or even applies.
Arnd
This is the start of the stable review cycle for the 3.18.91 release.
There are 32 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jan 3 13:59:44 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.91-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.91-rc1
Linus Torvalds <torvalds(a)linux-foundation.org>
n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
Daniel Thompson <daniel.thompson(a)linaro.org>
usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
Oliver Neukum <oneukum(a)suse.com>
usb: add RESET_RESUME for ELSA MicroLink 56K
Dmitry Fleytman Dmitry Fleytman <dmitry.fleytman(a)gmail.com>
usb: Add device quirk for Logitech HD Pro Webcam C925e
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add support for Telit ME910 PID 0x1101
Mohamed Ghannam <simo.ghannam(a)gmail.com>
net: ipv4: fix for a race condition in raw_sendmsg
Tonghao Zhang <xiangxia.m.yue(a)gmail.com>
sctp: Replace use of sockets_allocated with specified macro.
Tobias Jordan <Tobias.Jordan(a)elektrobit.com>
net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
Brian King <brking(a)linux.vnet.ibm.com>
tg3: Fix rx hang on MTU change with 5717/5719
Christoph Paasch <cpaasch(a)apple.com>
tcp md5sig: Use skb's saddr when replying to an incoming segment
Sebastian Sjoholm <ssjoholm(a)mac.com>
net: qmi_wwan: add Sierra EM7565 1199:9091
Kevin Cernekee <cernekee(a)chromium.org>
netlink: Add netns check on taps
Kevin Cernekee <cernekee(a)chromium.org>
net: igmp: Use correct source address on IGMPv3 reports
Eric Dumazet <edumazet(a)google.com>
ipv6: mcast: better catch silly mtu values
Eric Dumazet <edumazet(a)google.com>
ipv4: igmp: guard against silly MTU values
Linus Torvalds <torvalds(a)linux-foundation.org>
kbuild: add '-fno-stack-check' to kernel build options
Johan Hovold <johan(a)kernel.org>
ASoC: twl4030: fix child-node lookup
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ring-buffer: Mask out the info bits when returning buffer page length
Jing Xia <jing.xia(a)spreadtrum.com>
tracing: Fix crash when it fails to alloc ring buffer
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Fix possible double free on failure of allocating trace buffer
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Remove extra zeroing out of the ring buffer page
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: clear interface link status on port disable
Ravi Bangoria <ravi.bangoria(a)linux.vnet.ibm.com>
powerpc/perf: Dereference BHRB entries safely
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: X86: Fix load RFLAGS w/o the fixed bit
Helge Deller <deller(a)gmx.de>
parisc: Hide Diva-built-in serial aux and graphics card
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
Takashi Iwai <tiwai(a)suse.de>
ALSA: rawmidi: Avoid racy info ioctl via ctl device
Johan Hovold <johan(a)kernel.org>
mfd: twl6040: Fix child-node lookup
Johan Hovold <johan(a)kernel.org>
mfd: twl4030-audio: Fix sibling-node lookup
Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
crypto: mcryptd - protect the per-CPU queue with a lock
Takashi Iwai <tiwai(a)suse.de>
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
-------------
Diffstat:
Makefile | 7 ++++--
arch/powerpc/perf/core-book3s.c | 8 +++++--
arch/x86/kvm/x86.c | 2 +-
crypto/mcryptd.c | 23 ++++++++----------
drivers/acpi/apei/erst.c | 2 +-
drivers/mfd/twl4030-audio.c | 9 +++++--
drivers/mfd/twl6040.c | 12 ++++++----
drivers/net/ethernet/broadcom/tg3.c | 4 +++-
drivers/net/ethernet/marvell/mvmdio.c | 3 ++-
drivers/net/ethernet/marvell/mvneta.c | 4 ++++
drivers/net/usb/qmi_wwan.c | 1 +
drivers/parisc/lba_pci.c | 33 ++++++++++++++++++++++++++
drivers/pci/pci-driver.c | 7 +++++-
drivers/tty/n_tty.c | 4 ++--
drivers/usb/core/quirks.c | 6 ++++-
drivers/usb/host/xhci-pci.c | 3 +++
drivers/usb/serial/option.c | 8 +++++++
include/crypto/mcryptd.h | 1 +
include/net/ip.h | 2 ++
kernel/trace/ring_buffer.c | 6 ++++-
kernel/trace/trace.c | 13 ++++-------
net/ipv4/devinet.c | 2 +-
net/ipv4/igmp.c | 44 +++++++++++++++++++++++++++--------
net/ipv4/ip_tunnel.c | 4 ++--
net/ipv4/raw.c | 15 ++++++++----
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/mcast.c | 25 ++++++++++++--------
net/ipv6/tcp_ipv6.c | 2 +-
net/netlink/af_netlink.c | 3 +++
net/sctp/socket.c | 4 ++--
sound/core/rawmidi.c | 15 +++++++++---
sound/soc/codecs/twl4030.c | 4 +++-
sound/usb/mixer.c | 27 ++++++++++++---------
33 files changed, 217 insertions(+), 88 deletions(-)
From: Len Brown <len.brown(a)intel.com>
If native_calibrate_tsc() can not discover the TSC frequency,
via CPUID or via built-in table, it must return without
setting X86_FEATURE_TSC_KNOWN_FREQ. Otherwise, X86_FEATURE_TSC_KNOWN_FREQ
will prevent TSC refined calibration.
This patch allows Linux to correctly support future Intel hardware,
that has (cpu_khz != tsc_khz), and support for CPUID.15
without support for CPUID.15.crystal_khz.
This patch is needed since X86_FEATURE_TSC_KNOWN_FREQ was added in Linux-4.10:
commit 4ca4df0b7eb0
("x86/tsc: Mark TSC frequency determined by CPUID as known")
If not applied, such systems will run with tsc_khz = cpu_khz,
which may result in under-stated TSC rate, and time-of-day drift.
Signed-off-by: Len Brown <len.brown(a)intel.com>
Cc: Bin Gao <bin.gao(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v4.10+
---
arch/x86/kernel/tsc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 8ea117f8142e..ce4b71119c36 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -612,6 +612,8 @@ unsigned long native_calibrate_tsc(void)
}
}
+ if (crystal_khz == 0)
+ return 0;
/*
* TSC frequency determined by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This
--
2.14.0-rc0
Display WA #1183 was recently added to workaround
"Failures when enabling DPLL0 with eDP link rate 2.16
or 4.32 GHz and CD clock frequency 308.57 or 617.14 MHz
(CDCLK_CTL CD Frequency Select 10b or 11b) used in this
enabling or in previous enabling."
This workaround was designed to minimize the impact only
to save the bad case with that link rates. But HW engineers
indicated that it should be safe to apply broadly, although
they were expecting the DPLL0 link rate to be unchanged on
runtime.
We need to cover 2 cases: when we are in fact enabling DPLL0
and when we are just changing the frequency with small
differences.
This is based on previous patch by Rodrigo Vivi with suggestions
from Ville Syrjälä.
Cc: Arthur J Runyan <arthur.j.runyan(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204232210.4958-1-lucas.d…
[ Lucas: Backport to 4.15 adding back variable that has been removed on
commits not meant to be backported ]
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
drivers/gpu/drm/i915/i915_reg.h | 2 ++
drivers/gpu/drm/i915/intel_cdclk.c | 35 ++++++++++++++++++++++++---------
drivers/gpu/drm/i915/intel_runtime_pm.c | 10 ++++++++++
3 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3866c49bc390..333f40bc03bb 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6977,6 +6977,7 @@ enum {
#define RESET_PCH_HANDSHAKE_ENABLE (1<<4)
#define GEN8_CHICKEN_DCPR_1 _MMIO(0x46430)
+#define SKL_SELECT_ALTERNATE_DC_EXIT (1<<30)
#define MASK_WAKEMEM (1<<13)
#define SKL_DFSM _MMIO(0x51000)
@@ -8522,6 +8523,7 @@ enum skl_power_gate {
#define BXT_CDCLK_CD2X_DIV_SEL_2 (2<<22)
#define BXT_CDCLK_CD2X_DIV_SEL_4 (3<<22)
#define BXT_CDCLK_CD2X_PIPE(pipe) ((pipe)<<20)
+#define CDCLK_DIVMUX_CD_OVERRIDE (1<<19)
#define BXT_CDCLK_CD2X_PIPE_NONE BXT_CDCLK_CD2X_PIPE(3)
#define BXT_CDCLK_SSA_PRECHARGE_ENABLE (1<<16)
#define CDCLK_FREQ_DECIMAL_MASK (0x7ff)
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index b2a6d62b71c0..60cf4e58389a 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -860,16 +860,10 @@ static void skl_set_preferred_cdclk_vco(struct drm_i915_private *dev_priv,
static void skl_dpll0_enable(struct drm_i915_private *dev_priv, int vco)
{
- int min_cdclk = skl_calc_cdclk(0, vco);
u32 val;
WARN_ON(vco != 8100000 && vco != 8640000);
- /* select the minimum CDCLK before enabling DPLL 0 */
- val = CDCLK_FREQ_337_308 | skl_cdclk_decimal(min_cdclk);
- I915_WRITE(CDCLK_CTL, val);
- POSTING_READ(CDCLK_CTL);
-
/*
* We always enable DPLL0 with the lowest link rate possible, but still
* taking into account the VCO required to operate the eDP panel at the
@@ -923,7 +917,7 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv,
{
int cdclk = cdclk_state->cdclk;
int vco = cdclk_state->vco;
- u32 freq_select, pcu_ack;
+ u32 freq_select, pcu_ack, cdclk_ctl;
int ret;
WARN_ON((cdclk == 24000) != (vco == 0));
@@ -940,7 +934,7 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv,
return;
}
- /* set CDCLK_CTL */
+ /* Choose frequency for this cdclk */
switch (cdclk) {
case 450000:
case 432000:
@@ -968,10 +962,33 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv,
dev_priv->cdclk.hw.vco != vco)
skl_dpll0_disable(dev_priv);
+ cdclk_ctl = I915_READ(CDCLK_CTL);
+
+ if (dev_priv->cdclk.hw.vco != vco) {
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK);
+ cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+ }
+
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl |= CDCLK_DIVMUX_CD_OVERRIDE;
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+ POSTING_READ(CDCLK_CTL);
+
if (dev_priv->cdclk.hw.vco != vco)
skl_dpll0_enable(dev_priv, vco);
- I915_WRITE(CDCLK_CTL, freq_select | skl_cdclk_decimal(cdclk));
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+
+ cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk);
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
+
+ /* Wa Display #1183: skl,kbl,cfl */
+ cdclk_ctl &= ~CDCLK_DIVMUX_CD_OVERRIDE;
+ I915_WRITE(CDCLK_CTL, cdclk_ctl);
POSTING_READ(CDCLK_CTL);
/* inform PCU of the change */
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 8af286c63d3b..e0bc2debdad0 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -598,6 +598,11 @@ void gen9_enable_dc5(struct drm_i915_private *dev_priv)
DRM_DEBUG_KMS("Enabling DC5\n");
+ /* Wa Display #1183: skl,kbl,cfl */
+ if (IS_GEN9_BC(dev_priv))
+ I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) |
+ SKL_SELECT_ALTERNATE_DC_EXIT);
+
gen9_set_dc_state(dev_priv, DC_STATE_EN_UPTO_DC5);
}
@@ -625,6 +630,11 @@ void skl_disable_dc6(struct drm_i915_private *dev_priv)
{
DRM_DEBUG_KMS("Disabling DC6\n");
+ /* Wa Display #1183: skl,kbl,cfl */
+ if (IS_GEN9_BC(dev_priv))
+ I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) |
+ SKL_SELECT_ALTERNATE_DC_EXIT);
+
gen9_set_dc_state(dev_priv, DC_STATE_DISABLE);
}
--
2.14.3
This is a note to let you know that I've just added the patch titled
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 600647d467c6d04b3954b41a6ee1795b5ae00550 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <ncardwell(a)google.com>
Date: Thu, 7 Dec 2017 12:43:32 -0500
Subject: tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
From: Neal Cardwell <ncardwell(a)google.com>
commit 600647d467c6d04b3954b41a6ee1795b5ae00550 upstream.
Fix BBR so that upon notification of a loss recovery undo BBR resets
long-term bandwidth sampling.
Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
can cause BBR to spuriously estimate that we are seeing loss rates
high enough to trigger long-term bandwidth estimation. To avoid that
problem, this commit resets long-term bandwidth sampling on loss
recovery undo events.
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Reviewed-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_bbr.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -847,6 +847,7 @@ static u32 bbr_undo_cwnd(struct sock *sk
bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
bbr->full_bw_cnt = 0;
+ bbr_reset_lt_bw_sampling(sk);
return tcp_sk(sk)->snd_cwnd;
}
Patches currently in stable-queue which might be from ncardwell(a)google.com are
queue-4.9/tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
queue-4.9/tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
This is a note to let you know that I've just added the patch titled
tcp_bbr: reset full pipe detection on loss recovery undo
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2f6c498e4f15d27852c04ed46d804a39137ba364 Mon Sep 17 00:00:00 2001
From: Neal Cardwell <ncardwell(a)google.com>
Date: Thu, 7 Dec 2017 12:43:31 -0500
Subject: tcp_bbr: reset full pipe detection on loss recovery undo
From: Neal Cardwell <ncardwell(a)google.com>
commit 2f6c498e4f15d27852c04ed46d804a39137ba364 upstream.
Fix BBR so that upon notification of a loss recovery undo BBR resets
the full pipe detection (STARTUP exit) state machine.
Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
could previously cause BBR to spuriously estimate that the pipe is
full.
Since spurious loss recovery means that our overall sending will have
slowed down spuriously, this commit gives a flow more time to probe
robustly for bandwidth and decide the pipe is really full.
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Reviewed-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_bbr.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -843,6 +843,10 @@ static u32 bbr_sndbuf_expand(struct sock
*/
static u32 bbr_undo_cwnd(struct sock *sk)
{
+ struct bbr *bbr = inet_csk_ca(sk);
+
+ bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
+ bbr->full_bw_cnt = 0;
return tcp_sk(sk)->snd_cwnd;
}
Patches currently in stable-queue which might be from ncardwell(a)google.com are
queue-4.9/tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
queue-4.9/tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
Call bdev_get_queue(bdev) after bdev->bd_disk has been initialized
instead of just before that pointer has been initialized. This patch
avoids that the following command
pktsetup 1 /dev/sr0
triggers the following kernel crash:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000548
IP: pkt_setup_dev+0x2db/0x670 [pktcdvd]
CPU: 2 PID: 724 Comm: pktsetup Not tainted 4.15.0-rc4-dbg+ #1
Call Trace:
pkt_ctl_ioctl+0xce/0x1c0 [pktcdvd]
do_vfs_ioctl+0x8e/0x670
SyS_ioctl+0x3c/0x70
entry_SYSCALL_64_fastpath+0x23/0x9a
Reported-by: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Fixes: commit ca18d6f769d2 ("block: Make most scsi_req_init() calls implicit")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Tested-by: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Cc: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Cc: <stable(a)vger.kernel.org> # v4.13
---
drivers/block/pktcdvd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 2659b2534073..531a0915066b 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2579,14 +2579,14 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
bdev = bdget(dev);
if (!bdev)
return -ENOMEM;
+ ret = blkdev_get(bdev, FMODE_READ | FMODE_NDELAY, NULL);
+ if (ret)
+ return ret;
if (!blk_queue_scsi_passthrough(bdev_get_queue(bdev))) {
WARN_ONCE(true, "Attempt to register a non-SCSI queue\n");
- bdput(bdev);
+ blkdev_put(bdev, FMODE_READ | FMODE_NDELAY);
return -EINVAL;
}
- ret = blkdev_get(bdev, FMODE_READ | FMODE_NDELAY, NULL);
- if (ret)
- return ret;
/* This is safe, since we have a reference from open(). */
__module_get(THIS_MODULE);
--
2.15.1
Commit 523e1d399ce0 ("block: make gendisk hold a reference to its queue")
modified add_disk() and disk_release() but did not update any of the
error paths that trigger a put_disk() call after disk->queue has been
assigned. That introduced the following behavior in the pktcdvd driver
if pkt_new_dev() fails:
Kernel BUG at 00000000e98fd882 [verbose debug info unavailable]
Since disk_release() calls blk_put_queue() anyway if disk->queue != NULL,
fix this by removing the blk_cleanup_queue() call from the pkt_setup_dev()
error path.
Fixes: commit 523e1d399ce0 ("block: make gendisk hold a reference to its queue")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Maciej S. Szmigiero <mail(a)maciej.szmigiero.name>
Cc: <stable(a)vger.kernel.org> # v3.2
---
drivers/block/pktcdvd.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 67974796c350..2659b2534073 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2745,7 +2745,7 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
pd->pkt_dev = MKDEV(pktdev_major, idx);
ret = pkt_new_dev(pd, dev);
if (ret)
- goto out_new_dev;
+ goto out_mem2;
/* inherit events of the host device */
disk->events = pd->bdev->bd_disk->events;
@@ -2763,8 +2763,6 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
mutex_unlock(&ctl_mutex);
return 0;
-out_new_dev:
- blk_cleanup_queue(disk->queue);
out_mem2:
put_disk(disk);
out_mem:
--
2.15.1
This is a note to let you know that I've just added the patch titled
mei: me: allow runtime pm for platform with D0i3
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From cc365dcf0e56271bedf3de95f88922abe248e951 Mon Sep 17 00:00:00 2001
From: Tomas Winkler <tomas.winkler(a)intel.com>
Date: Tue, 2 Jan 2018 12:01:41 +0200
Subject: mei: me: allow runtime pm for platform with D0i3
>From the pci power documentation:
"The driver itself should not call pm_runtime_allow(), though. Instead,
it should let user space or some platform-specific code do that (user space
can do it via sysfs as stated above)..."
However, the S0ix residency cannot be reached without MEI device getting
into low power state. Hence, for mei devices that support D0i3, it's better
to make runtime power management mandatory and not rely on the system
integration such as udev rules.
This policy cannot be applied globally as some older platforms
were found to have broken power management.
Cc: <stable(a)vger.kernel.org> v4.13+
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Reviewed-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/pci-me.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index f4f17552c9b8..4a0ccda4d04b 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -238,8 +238,11 @@ static int mei_me_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
*/
mei_me_set_pm_domain(dev);
- if (mei_pg_is_enabled(dev))
+ if (mei_pg_is_enabled(dev)) {
pm_runtime_put_noidle(&pdev->dev);
+ if (hw->d0i3_supported)
+ pm_runtime_allow(&pdev->dev);
+ }
dev_dbg(&pdev->dev, "initialization successful.\n");
--
2.15.1
From: Len Brown <len.brown(a)intel.com>
Linux-4.9 added INTEL_FAM6_SKYLAKE_X to native_calibrate_tsc(),
allowing the kernel to calculate the TSC frequency,
instead of calibrating:
commit 6baf3d61821f
("x86/tsc: Add additional Intel CPU models to the crystal quirk list")
There are several problems with doing this.
The first is that while SKX servers use a 25 MHz crystal,
SKX workstations (with same model #) use a 24 MHz crystal.
This results in a -4.0% time drift rate on SKX workstations.
But even SKX servers have a problem.
SKX subjects the crystal to an EMI reduction circuit that
reduces its actual value by (approximately) -0.25%.
This results in -1 second per 10 minute time drift
as compared to network time on SKX servers.
This also results in an even more subtle symptom on systems
that use the LAPIC timer (versus the TSC deadline timer).
Clock ticks scheduled with the LAPIC timer arrive a few usec
before the time they are expected (according to the slow TSC).
This causes Linux to poll-idle, when it should be in an idle
power saving state. The idle and clock code do not graciously
recover from this error, sometimes resulting in significant polling
and significant power impact.
So stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X.
native_calibrate_tsc() will return 0, boot will run with
tsc_khz = cpu_khz, and the TSC refined calibration will
update tsc_khz to correct for the difference.
Yes, a future patch can detect 24 vs 25 MHz crystals,
and it can estimate a correction for the EMI circuit impact.
However, that would effect boot-time only, as we'd probably
choose to keep the refined TSC calibration, because the EMI
correction estimate may not be precise.
Signed-off-by: Len Brown <len.brown(a)intel.com>
Cc: Prarit Bhargava <prarit(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # v4.9+
---
arch/x86/kernel/tsc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 49d772672367..19ca0004caf6 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -602,7 +602,6 @@ unsigned long native_calibrate_tsc(void)
case INTEL_FAM6_KABYLAKE_DESKTOP:
crystal_khz = 24000; /* 24.0 MHz */
break;
- case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_ATOM_DENVERTON:
crystal_khz = 25000; /* 25.0 MHz */
break;
--
2.14.0-rc0
Tree/Branch: v3.2.97
Git describe: v3.2.97
Commit: 3993a5061a Linux 3.2.97
Build Time: 0 min 4 sec
Passed: 0 / 4 ( 0.00 %)
Failed: 4 / 4 (100.00 %)
Errors: 6
Warnings: 20
Section Mismatches: 0
Failed defconfigs:
x86_64-allnoconfig
arm-allmodconfig
arm-allnoconfig
x86_64-defconfig
Errors:
x86_64-allnoconfig
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
arm-allnoconfig
/home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
x86_64-defconfig
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
2 warnings 0 mismatches : arm-allmodconfig
19 warnings 0 mismatches : arm-allnoconfig
-------------------------------------------------------------------------------
Errors summary: 6
2 /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
2 /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
1 /home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
Warnings Summary: 20
2 .config:27:warning: symbol value '' invalid for PHYS_OFFSET
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
1 /home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
x86_64-allnoconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
arm-allmodconfig : FAIL, 0 errors, 2 warnings, 0 section mismatches
Warnings:
.config:27:warning: symbol value '' invalid for PHYS_OFFSET
.config:27:warning: symbol value '' invalid for PHYS_OFFSET
-------------------------------------------------------------------------------
arm-allnoconfig : FAIL, 4 errors, 19 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function)
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type
/home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type
Warnings:
/home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
/home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef]
-------------------------------------------------------------------------------
x86_64-defconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches
Errors:
/home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
Tree/Branch: v3.16.52
Git describe: v3.16.52
Commit: bde16624c6 Linux 3.16.52
Build Time: 37 min 38 sec
Passed: 9 / 9 (100.00 %)
Failed: 0 / 9 ( 0.00 %)
Errors: 0
Warnings: 7
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
1 warnings 0 mismatches : arm64-allnoconfig
2 warnings 0 mismatches : arm64-allmodconfig
1 warnings 0 mismatches : arm-multi_v5_defconfig
1 warnings 0 mismatches : arm-multi_v7_defconfig
1 warnings 0 mismatches : x86_64-defconfig
4 warnings 0 mismatches : arm-allmodconfig
1 warnings 0 mismatches : arm-allnoconfig
1 warnings 0 mismatches : x86_64-allnoconfig
3 warnings 0 mismatches : arm64-defconfig
-------------------------------------------------------------------------------
Warnings Summary: 7
7 ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
2 ../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
2 ../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
1 ../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
1 ../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function]
1 ../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label]
1 ../arch/x86/kernel/cpu/common.c:961:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
arm64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm64-allmodconfig : PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm-multi_v5_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
arm-multi_v7_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
x86_64-defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
-------------------------------------------------------------------------------
arm-allmodconfig : PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label]
../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable]
-------------------------------------------------------------------------------
arm-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches
Warnings:
../arch/x86/kernel/cpu/common.c:961:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function]
-------------------------------------------------------------------------------
arm64-defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized]
../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: Jeff Layton <jlayton(a)redhat.com>
Acked-by: Al Viro <viro(a)zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
cc: stable(a)vger.kernel.org # 2.6.32+
---
include/linux/fscache.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index f4ff47d4a893..fe0c349684fa 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -755,7 +755,7 @@ bool fscache_maybe_release_page(struct fscache_cookie *cookie,
{
if (fscache_cookie_valid(cookie) && PageFsCache(page))
return __fscache_maybe_release_page(cookie, page, gfp);
- return false;
+ return true;
}
/**
This reverts commit fd865802c66bc451dc515ed89360f84376ce1a56.
This commit causes a regression on some QCA ROME chips. The USB device
reset happens in btusb_open(), hence firmware loading gets interrupted.
Furthermore, this commit stops working after commit
("a0085f2510e8976614ad8f766b209448b385492f Bluetooth: btusb: driver to
enable the usb-wakeup feature"). Reset-resume quirk only gets enabled in
btusb_suspend() when it's not a wakeup source.
If we really want to reset the USB device, we need to do it before
btusb_open(). Let's handle it in drivers/usb/core/quirks.c.
Cc: stable(a)vger.kernel.org
Cc: Leif Liddy <leif.linux(a)gmail.com>
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Cc: Brian Norris <briannorris(a)chromium.org>
Cc: Daniel Drake <drake(a)endlessm.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
---
Daniel, Cc you because this also affects your original quirk patch for
Realtek btusb.
drivers/bluetooth/btusb.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index f7120c9eb9bd..da353c4acdc7 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -3117,12 +3117,6 @@ static int btusb_probe(struct usb_interface *intf,
if (id->driver_info & BTUSB_QCA_ROME) {
data->setup_on_usb = btusb_setup_qca;
hdev->set_bdaddr = btusb_set_bdaddr_ath3012;
-
- /* QCA Rome devices lose their updated firmware over suspend,
- * but the USB hub doesn't notice any status change.
- * Explicitly request a device reset on resume.
- */
- set_bit(BTUSB_RESET_RESUME, &data->flags);
}
#ifdef CONFIG_BT_HCIBTUSB_RTL
--
2.14.1
This is a note to let you know that I've just added the patch titled
mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-vmstat-make-nr_tlb_remote_flush_received-available-even-on-up.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5dd0b16cdaff9b94da06074d5888b03235c0bf17 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Mon, 5 Jun 2017 07:40:25 -0700
Subject: mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
From: Andy Lutomirski <luto(a)kernel.org>
commit 5dd0b16cdaff9b94da06074d5888b03235c0bf17 upstream.
This fixes CONFIG_SMP=n, CONFIG_DEBUG_TLBFLUSH=y without introducing
further #ifdef soup. Caught by a Kbuild bot randconfig build.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Fixes: ce4a4e565f52 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code")
Link: http://lkml.kernel.org/r/76da9a3cc4415996f2ad2c905b93414add322021.149667361…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/vm_event_item.h | 2 --
1 file changed, 2 deletions(-)
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -89,10 +89,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
#endif
#endif
#ifdef CONFIG_DEBUG_TLBFLUSH
-#ifdef CONFIG_SMP
NR_TLB_REMOTE_FLUSH, /* cpu tried to flush others' tlbs */
NR_TLB_REMOTE_FLUSH_RECEIVED,/* cpu received ipi for flush */
-#endif /* CONFIG_SMP */
NR_TLB_LOCAL_FLUSH_ALL,
NR_TLB_LOCAL_FLUSH_ONE,
#endif /* CONFIG_DEBUG_TLBFLUSH */
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.9/x86-vm86-32-switch-to-flush_tlb_mm_range-in-mark_screen_rdonly.patch
queue-4.9/x86-mm-add-the-nopcid-boot-option-to-turn-off-pcid.patch
queue-4.9/x86-mm-enable-cr4.pcide-on-supported-systems.patch
queue-4.9/mm-vmstat-make-nr_tlb_remote_flush_received-available-even-on-up.patch
queue-4.9/x86-smpboot-remove-stale-tlb-flush-invocations.patch
queue-4.9/x86-mm-remove-the-up-asm-tlbflush.h-code-always-use-the-formerly-smp-code.patch
queue-4.9/x86-mm-reimplement-flush_tlb_page-using-flush_tlb_mm_range.patch
queue-4.9/x86-mm-make-flush_tlb_mm_range-more-predictable.patch
queue-4.9/x86-mm-remove-flush_tlb-and-flush_tlb_current_task.patch
queue-4.9/x86-mm-disable-pcid-on-32-bit-kernels.patch
queue-4.9/x86-mm-64-fix-reboot-interaction-with-cr4.pcide.patch