This is a note to let you know that I've just added the patch titled
x86/cpu: Factor out application of forced CPU caps
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-cpu-factor-out-application-of-forced-cpu-caps.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 8bf1ebca215c262e48c15a4a15f175991776f57f Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto(a)kernel.org>
Date: Wed, 18 Jan 2017 11:15:38 -0800
Subject: x86/cpu: Factor out application of forced CPU caps
From: Andy Lutomirski <luto(a)kernel.org>
commit 8bf1ebca215c262e48c15a4a15f175991776f57f upstream.
There are multiple call sites that apply forced CPU caps. Factor
them into a helper.
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Fenghua Yu <fenghua.yu(a)intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matthew Whitehead <tedheadster(a)gmail.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: One Thousand Gnomes <gnomes(a)lxorguk.ukuu.org.uk>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu(a)intel.com>
Link: http://lkml.kernel.org/r/623ff7555488122143e4417de09b18be2085ad06.148470501…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/kernel/cpu/common.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -664,6 +664,16 @@ void cpu_detect(struct cpuinfo_x86 *c)
}
}
+static void apply_forced_caps(struct cpuinfo_x86 *c)
+{
+ int i;
+
+ for (i = 0; i < NCAPINTS; i++) {
+ c->x86_capability[i] &= ~cpu_caps_cleared[i];
+ c->x86_capability[i] |= cpu_caps_set[i];
+ }
+}
+
void get_cpu_cap(struct cpuinfo_x86 *c)
{
u32 tfms, xlvl;
@@ -955,11 +965,8 @@ static void identify_cpu(struct cpuinfo_
if (this_cpu->c_identify)
this_cpu->c_identify(c);
- /* Clear/Set all flags overriden by options, after probe */
- for (i = 0; i < NCAPINTS; i++) {
- c->x86_capability[i] &= ~cpu_caps_cleared[i];
- c->x86_capability[i] |= cpu_caps_set[i];
- }
+ /* Clear/Set all flags overridden by options, after probe */
+ apply_forced_caps(c);
#ifdef CONFIG_X86_64
c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
@@ -1020,10 +1027,7 @@ static void identify_cpu(struct cpuinfo_
* Clear/Set all flags overriden by options, need do it
* before following smp all cpus cap AND.
*/
- for (i = 0; i < NCAPINTS; i++) {
- c->x86_capability[i] &= ~cpu_caps_cleared[i];
- c->x86_capability[i] |= cpu_caps_set[i];
- }
+ apply_forced_caps(c);
/*
* On SMP, boot_cpu_data holds the common feature set between
Patches currently in stable-queue which might be from luto(a)kernel.org are
queue-4.4/x86-mm-pat-dev-mem-remove-superfluous-error-message.patch
queue-4.4/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.4/x86-cpufeatures-make-cpu-bugs-sticky.patch
queue-4.4/x86-vsdo-fix-build-on-paravirt_clock-y-kvm_guest-n.patch
queue-4.4/x86-pti-efi-broken-conversion-from-efi-to-kernel-page-table.patch
queue-4.4/x86-documentation-add-pti-description.patch
queue-4.4/x86-cpu-factor-out-application-of-forced-cpu-caps.patch
queue-4.4/selftests-x86-add-test_vsyscall.patch
queue-4.4/x86-cpu-merge-bugs.c-and-bugs_64.c.patch
queue-4.4/x86-alternatives-fix-optimize_nops-checking.patch
The bounce buffer is gone from the MMC core, and now we found out
that there are some (crippled) i.MX boards out there that have broken
ADMA (cannot do scatter-gather), and broken PIO so they must use
SDMA. Closer examination shows a less significant slowdown also on
SDMA-only capable Laptop hosts.
SDMA sets down the number of segments to one, so that each segment
gets turned into a singular request that ping-pongs to the block
layer before the next request/segment is issued.
Apparently it happens a lot that the block layer send requests
that include a lot of physically discontigous segments. My guess
is that this phenomenon is coming from the file system.
These devices that cannot handle scatterlists in hardware can see
major benefits from a DMA-contigous bounce buffer.
This patch accumulates those fragmented scatterlists in a physically
contigous bounce buffer so that we can issue bigger DMA data chunks
to/from the card.
When tested with thise PCI-integrated host (1217:8221) that
only supports SDMA:
0b:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS
SD/MMC Card Reader Controller (rev 05)
This patch gave ~1Mbyte/s improved throughput on large reads and
writes when testing using iozone than without the patch.
On the i.MX SDHCI controllers on the crippled i.MX 25 and i.MX 35
the patch restores the performance to what it was before we removed
the bounce buffers, and then some: performance is better than ever
because we now allocate a bounce buffer the size of the maximum
single request the SDMA engine can handle. On the PCI laptop this
is 256K, whereas with the old bounce buffer code it was 64K max.
Cc: Benjamin Beckmeyer <beckmeyer.b(a)rittal.de>
Cc: Pierre Ossman <pierre(a)ossman.eu>
Cc: Benoît Thébaudeau <benoit(a)wsystem.com>
Cc: Fabio Estevam <fabio.estevam(a)nxp.com>
Cc: stable(a)vger.kernel.org
Fixes: de3ee99b097d ("mmc: Delete bounce buffer handling")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
ChangeLog v4->v5:
- Go back to dma_alloc_coherent() as this apparently works better.
- Keep the other changes, cap for 64KB, fall back to single segments.
- Requesting a test of this on i.MX. (Sorry Benjamin.)
ChangeLog v3->v4:
- Cap the bounce buffer to 64KB instead of the biggest segment
as we experience diminishing returns with buffers > 64KB.
- Instead of using dma_alloc_coherent(), use good old devm_kmalloc()
and issue dma_sync_single_for*() to explicitly switch
ownership between CPU and the device. This way we exercise the
cache better and may consume less CPU.
- Bail out with single segments if we cannot allocate a bounce
buffer.
- Tested on the PCI SDHCI on my laptop: requesting a new test
on i.MX from Benjamin. (Please!)
ChangeLog v2->v3:
- Rewrite the commit message a bit
- Add Benjamin's Tested-by
- Add Fixes and stable tags
ChangeLog v1->v2:
- Skip the remapping and fiddling with the buffer, instead use
dma_alloc_coherent() and use a simple, coherent bounce buffer.
- Couple kernel messages to ->parent of the mmc_host as it relates
to the hardware characteristics.
---
drivers/mmc/host/sdhci.c | 105 +++++++++++++++++++++++++++++++++++++++++++----
drivers/mmc/host/sdhci.h | 3 ++
2 files changed, 100 insertions(+), 8 deletions(-)
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index e9290a3439d5..4e594d5e3185 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -19,6 +19,7 @@
#include <linux/io.h>
#include <linux/module.h>
#include <linux/dma-mapping.h>
+#include <linux/sizes.h>
#include <linux/slab.h>
#include <linux/scatterlist.h>
#include <linux/swiotlb.h>
@@ -502,8 +503,22 @@ static int sdhci_pre_dma_transfer(struct sdhci_host *host,
if (data->host_cookie == COOKIE_PRE_MAPPED)
return data->sg_count;
- sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
- mmc_get_dma_dir(data));
+ /* Bounce write requests to the bounce buffer */
+ if (host->bounce_buffer) {
+ if (mmc_get_dma_dir(data) == DMA_TO_DEVICE) {
+ /* Copy the data to the bounce buffer */
+ sg_copy_to_buffer(data->sg, data->sg_len,
+ host->bounce_buffer,
+ host->bounce_buffer_size);
+ }
+ /* Just a dummy value */
+ sg_count = 1;
+ } else {
+ /* Just access the data directly from memory */
+ sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg,
+ data->sg_len,
+ mmc_get_dma_dir(data));
+ }
if (sg_count == 0)
return -ENOSPC;
@@ -858,8 +873,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
SDHCI_ADMA_ADDRESS_HI);
} else {
WARN_ON(sg_cnt != 1);
- sdhci_writel(host, sg_dma_address(data->sg),
- SDHCI_DMA_ADDRESS);
+ /* Bounce buffer goes to work */
+ if (host->bounce_buffer)
+ sdhci_writel(host, host->bounce_addr,
+ SDHCI_DMA_ADDRESS);
+ else
+ sdhci_writel(host, sg_dma_address(data->sg),
+ SDHCI_DMA_ADDRESS);
}
}
@@ -2248,7 +2268,12 @@ static void sdhci_pre_req(struct mmc_host *mmc, struct mmc_request *mrq)
mrq->data->host_cookie = COOKIE_UNMAPPED;
- if (host->flags & SDHCI_REQ_USE_DMA)
+ /*
+ * No pre-mapping in the pre hook if we're using the bounce buffer,
+ * for that we would need two bounce buffers since one buffer is
+ * in flight when this is getting called.
+ */
+ if (host->flags & SDHCI_REQ_USE_DMA && !host->bounce_buffer)
sdhci_pre_dma_transfer(host, mrq->data, COOKIE_PRE_MAPPED);
}
@@ -2352,8 +2377,23 @@ static bool sdhci_request_done(struct sdhci_host *host)
struct mmc_data *data = mrq->data;
if (data && data->host_cookie == COOKIE_MAPPED) {
- dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
- mmc_get_dma_dir(data));
+ if (host->bounce_buffer) {
+ /*
+ * On reads, copy the bounced data into the
+ * sglist
+ */
+ if (mmc_get_dma_dir(data) == DMA_FROM_DEVICE) {
+ sg_copy_from_buffer(data->sg,
+ data->sg_len,
+ host->bounce_buffer,
+ host->bounce_buffer_size);
+ }
+ } else {
+ /* Unmap the raw data */
+ dma_unmap_sg(mmc_dev(host->mmc), data->sg,
+ data->sg_len,
+ mmc_get_dma_dir(data));
+ }
data->host_cookie = COOKIE_UNMAPPED;
}
}
@@ -2636,7 +2676,12 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
*/
if (intmask & SDHCI_INT_DMA_END) {
u32 dmastart, dmanow;
- dmastart = sg_dma_address(host->data->sg);
+
+ if (host->bounce_buffer)
+ dmastart = host->bounce_addr;
+ else
+ dmastart = sg_dma_address(host->data->sg);
+
dmanow = dmastart + host->data->bytes_xfered;
/*
* Force update to the next DMA block boundary.
@@ -3713,6 +3758,47 @@ int sdhci_setup_host(struct sdhci_host *host)
*/
mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 65535;
+ if (mmc->max_segs == 1) {
+ unsigned int max_blocks;
+ unsigned int max_seg_size;
+
+ max_seg_size = SZ_64K;
+ if (mmc->max_req_size < max_seg_size)
+ max_seg_size = mmc->max_req_size;
+ max_blocks = max_seg_size / 512;
+ dev_info(mmc->parent,
+ "host only supports SDMA, activate bounce buffer\n");
+
+ /*
+ * When we just support one segment, we can get significant
+ * speedup by the help of a bounce buffer to group scattered
+ * reads/writes together.
+ */
+ host->bounce_buffer = dma_alloc_coherent(mmc->parent,
+ max_seg_size,
+ &host->bounce_addr,
+ GFP_KERNEL);
+ if (!host->bounce_buffer) {
+ dev_err(mmc->parent,
+ "failed to allocate %u bytes for bounce buffer, falling back to single segments\n",
+ max_seg_size);
+ /*
+ * Exiting with zero here makes sure we proceed with
+ * mmc->max_segs == 1.
+ */
+ return 0;
+ }
+ host->bounce_buffer_size = max_seg_size;
+
+ /* Lie about this since we're bouncing */
+ mmc->max_segs = max_blocks;
+ mmc->max_seg_size = max_seg_size;
+
+ dev_info(mmc->parent,
+ "bounce buffer: bounce up to %u segments into one, max segment size %u bytes\n",
+ max_blocks, max_seg_size);
+ }
+
return 0;
unreg:
@@ -3743,6 +3829,9 @@ void sdhci_cleanup_host(struct sdhci_host *host)
host->align_addr);
host->adma_table = NULL;
host->align_buffer = NULL;
+ if (host->bounce_buffer)
+ dma_free_coherent(mmc->parent, host->bounce_buffer_size,
+ host->bounce_buffer, host->bounce_addr);
}
EXPORT_SYMBOL_GPL(sdhci_cleanup_host);
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 54bc444c317f..865e09618d22 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -440,6 +440,9 @@ struct sdhci_host {
int irq; /* Device IRQ */
void __iomem *ioaddr; /* Mapped address */
+ char *bounce_buffer; /* For packing SDMA reads/writes */
+ dma_addr_t bounce_addr;
+ size_t bounce_buffer_size;
const struct sdhci_ops *ops; /* Low level hw interface */
--
2.14.3
This is a note to let you know that I've just added the patch titled
x86/retpoline/xen: Convert Xen hypercall indirect jumps
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-retpoline-xen-convert-xen-hypercall-indirect-jumps.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ea08816d5b185ab3d09e95e393f265af54560350 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 11 Jan 2018 21:46:31 +0000
Subject: x86/retpoline/xen: Convert Xen hypercall indirect jumps
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit ea08816d5b185ab3d09e95e393f265af54560350 upstream.
Convert indirect call in Xen hypercall to use non-speculative sequence,
when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Arjan van de Ven <arjan(a)linux.intel.com>
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Juergen Gross <jgross(a)suse.com>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: thomas.lendacky(a)amd.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-10-git-send-email-dwmw@amazon.co…
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/xen/hypercall.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -44,6 +44,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/smap.h>
+#include <asm/nospec-branch.h>
#include <xen/interface/xen.h>
#include <xen/interface/sched.h>
@@ -216,9 +217,9 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);
stac();
- asm volatile("call *%[call]"
+ asm volatile(CALL_NOSPEC
: __HYPERCALL_5PARAM
- : [call] "a" (&hypercall_page[call])
+ : [thunk_target] "a" (&hypercall_page[call])
: __HYPERCALL_CLOBBER5);
clac();
Patches currently in stable-queue which might be from dwmw(a)amazon.co.uk are
queue-4.9/x86-spectre-add-boot-time-option-to-select-spectre-v2-mitigation.patch
queue-4.9/x86-retpoline-irq32-convert-assembler-indirect-jumps.patch
queue-4.9/objtool-detect-jumps-to-retpoline-thunks.patch
queue-4.9/x86-cpufeatures-add-x86_bug_spectre_v.patch
queue-4.9/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.9/x86-retpoline-hyperv-convert-assembler-indirect-jumps.patch
queue-4.9/x86-retpoline-entry-convert-entry-assembler-indirect-jumps.patch
queue-4.9/sysfs-cpu-fix-typos-in-vulnerability-documentation.patch
queue-4.9/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.9/x86-cpufeatures-make-cpu-bugs-sticky.patch
queue-4.9/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.9/x86-retpoline-ftrace-convert-ftrace-assembler-indirect-jumps.patch
queue-4.9/objtool-allow-alternatives-to-be-ignored.patch
queue-4.9/x86-cpu-implement-cpu-vulnerabilites-sysfs-functions.patch
queue-4.9/x86-retpoline-crypto-convert-crypto-assembler-indirect-jumps.patch
queue-4.9/x86-cpu-factor-out-application-of-forced-cpu-caps.patch
queue-4.9/x86-retpoline-xen-convert-xen-hypercall-indirect-jumps.patch
queue-4.9/x86-retpoline-checksum32-convert-assembler-indirect-jumps.patch
queue-4.9/x86-mm-32-move-setup_clear_cpu_cap-x86_feature_pcid-earlier.patch
queue-4.9/sysfs-cpu-add-vulnerability-folder.patch
queue-4.9/x86-retpoline-fill-return-stack-buffer-on-vmexit.patch
queue-4.9/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.9/x86-retpoline-remove-compile-time-warning.patch
queue-4.9/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.9/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
queue-4.9/x86-retpoline-add-initial-retpoline-support.patch
This is a note to let you know that I've just added the patch titled
x86/retpoline: Remove compile time warning
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-retpoline-remove-compile-time-warning.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b8b9ce4b5aec8de9e23cabb0a26b78641f9ab1d6 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 14 Jan 2018 22:13:29 +0100
Subject: x86/retpoline: Remove compile time warning
From: Thomas Gleixner <tglx(a)linutronix.de>
commit b8b9ce4b5aec8de9e23cabb0a26b78641f9ab1d6 upstream.
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:
It's wrong because it will just make people turn off RETPOLINE, and the
asm updates - and return stack clearing - that are independent of the
compiler are likely the most important parts because they are likely the
ones easiest to target.
And it's annoying because most people won't be able to do anything about
it. The number of people building their own compiler? Very small. So if
their distro hasn't got a compiler yet (and pretty much nobody does), the
warning is just annoying crap.
It is already properly reported as part of the sysfs interface. The
compile-time warning only encourages bad things.
Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: thomas.lendacky(a)amd.com
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uA…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/Makefile | 2 --
1 file changed, 2 deletions(-)
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -187,8 +187,6 @@ ifdef CONFIG_RETPOLINE
RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
ifneq ($(RETPOLINE_CFLAGS),)
KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
- else
- $(warning CONFIG_RETPOLINE=y, but not supported by the compiler. Toolchain update recommended.)
endif
endif
Patches currently in stable-queue which might be from tglx(a)linutronix.de are
queue-4.9/x86-spectre-add-boot-time-option-to-select-spectre-v2-mitigation.patch
queue-4.9/x86-retpoline-irq32-convert-assembler-indirect-jumps.patch
queue-4.9/objtool-detect-jumps-to-retpoline-thunks.patch
queue-4.9/x86-cpufeatures-add-x86_bug_spectre_v.patch
queue-4.9/x86-microcode-intel-extend-bdw-late-loading-with-a-revision-check.patch
queue-4.9/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.9/x86-retpoline-hyperv-convert-assembler-indirect-jumps.patch
queue-4.9/x86-retpoline-entry-convert-entry-assembler-indirect-jumps.patch
queue-4.9/x86-asm-use-register-variable-to-get-stack-pointer-value.patch
queue-4.9/sysfs-cpu-fix-typos-in-vulnerability-documentation.patch
queue-4.9/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.9/objtool-modules-discard-objtool-annotation-sections-for-modules.patch
queue-4.9/x86-cpufeatures-make-cpu-bugs-sticky.patch
queue-4.9/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.9/x86-retpoline-ftrace-convert-ftrace-assembler-indirect-jumps.patch
queue-4.9/x86-documentation-add-pti-description.patch
queue-4.9/x86-acpi-handle-sci-interrupts-above-legacy-space-gracefully.patch
queue-4.9/objtool-allow-alternatives-to-be-ignored.patch
queue-4.9/x86-cpu-implement-cpu-vulnerabilites-sysfs-functions.patch
queue-4.9/x86-retpoline-crypto-convert-crypto-assembler-indirect-jumps.patch
queue-4.9/x86-cpu-factor-out-application-of-forced-cpu-caps.patch
queue-4.9/selftests-x86-add-test_vsyscall.patch
queue-4.9/x86-retpoline-xen-convert-xen-hypercall-indirect-jumps.patch
queue-4.9/x86-cpu-merge-bugs.c-and-bugs_64.c.patch
queue-4.9/x86-retpoline-checksum32-convert-assembler-indirect-jumps.patch
queue-4.9/x86-mm-32-move-setup_clear_cpu_cap-x86_feature_pcid-earlier.patch
queue-4.9/sysfs-cpu-add-vulnerability-folder.patch
queue-4.9/x86-retpoline-fill-return-stack-buffer-on-vmexit.patch
queue-4.9/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.9/x86-acpi-reduce-code-duplication-in-mp_override_legacy_irq.patch
queue-4.9/x86-retpoline-remove-compile-time-warning.patch
queue-4.9/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.9/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
queue-4.9/x86-retpoline-add-initial-retpoline-support.patch
This is a note to let you know that I've just added the patch titled
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-retpoline-ftrace-convert-ftrace-assembler-indirect-jumps.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9351803bd803cdbeb9b5a7850b7b6f464806e3db Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 11 Jan 2018 21:46:29 +0000
Subject: x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit 9351803bd803cdbeb9b5a7850b7b6f464806e3db upstream.
Convert all indirect jumps in ftrace assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Arjan van de Ven <arjan(a)linux.intel.com>
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: thomas.lendacky(a)amd.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/1515707194-20531-8-git-send-email-dwmw@amazon.co.…
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/entry/entry_32.S | 5 +++--
arch/x86/kernel/mcount_64.S | 7 ++++---
2 files changed, 7 insertions(+), 5 deletions(-)
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -985,7 +985,8 @@ trace:
movl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax
- call *ftrace_trace_function
+ movl ftrace_trace_function, %ecx
+ CALL_NOSPEC %ecx
popl %edx
popl %ecx
@@ -1021,7 +1022,7 @@ return_to_handler:
movl %eax, %ecx
popl %edx
popl %eax
- jmp *%ecx
+ JMP_NOSPEC %ecx
#endif
#ifdef CONFIG_TRACING
--- a/arch/x86/kernel/mcount_64.S
+++ b/arch/x86/kernel/mcount_64.S
@@ -8,7 +8,7 @@
#include <asm/ptrace.h>
#include <asm/ftrace.h>
#include <asm/export.h>
-
+#include <asm/nospec-branch.h>
.code64
.section .entry.text, "ax"
@@ -290,8 +290,9 @@ trace:
* ip and parent ip are used and the list function is called when
* function tracing is enabled.
*/
- call *ftrace_trace_function
+ movq ftrace_trace_function, %r8
+ CALL_NOSPEC %r8
restore_mcount_regs
jmp fgraph_trace
@@ -334,5 +335,5 @@ GLOBAL(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
- jmp *%rdi
+ JMP_NOSPEC %rdi
#endif
Patches currently in stable-queue which might be from dwmw(a)amazon.co.uk are
queue-4.9/x86-spectre-add-boot-time-option-to-select-spectre-v2-mitigation.patch
queue-4.9/x86-retpoline-irq32-convert-assembler-indirect-jumps.patch
queue-4.9/objtool-detect-jumps-to-retpoline-thunks.patch
queue-4.9/x86-cpufeatures-add-x86_bug_spectre_v.patch
queue-4.9/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.9/x86-retpoline-hyperv-convert-assembler-indirect-jumps.patch
queue-4.9/x86-retpoline-entry-convert-entry-assembler-indirect-jumps.patch
queue-4.9/sysfs-cpu-fix-typos-in-vulnerability-documentation.patch
queue-4.9/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.9/x86-cpufeatures-make-cpu-bugs-sticky.patch
queue-4.9/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.9/x86-retpoline-ftrace-convert-ftrace-assembler-indirect-jumps.patch
queue-4.9/objtool-allow-alternatives-to-be-ignored.patch
queue-4.9/x86-cpu-implement-cpu-vulnerabilites-sysfs-functions.patch
queue-4.9/x86-retpoline-crypto-convert-crypto-assembler-indirect-jumps.patch
queue-4.9/x86-cpu-factor-out-application-of-forced-cpu-caps.patch
queue-4.9/x86-retpoline-xen-convert-xen-hypercall-indirect-jumps.patch
queue-4.9/x86-retpoline-checksum32-convert-assembler-indirect-jumps.patch
queue-4.9/x86-mm-32-move-setup_clear_cpu_cap-x86_feature_pcid-earlier.patch
queue-4.9/sysfs-cpu-add-vulnerability-folder.patch
queue-4.9/x86-retpoline-fill-return-stack-buffer-on-vmexit.patch
queue-4.9/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.9/x86-retpoline-remove-compile-time-warning.patch
queue-4.9/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.9/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
queue-4.9/x86-retpoline-add-initial-retpoline-support.patch
This is a note to let you know that I've just added the patch titled
x86/retpoline: Fill return stack buffer on vmexit
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-retpoline-fill-return-stack-buffer-on-vmexit.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 117cc7a908c83697b0b737d15ae1eb5943afe35b Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Fri, 12 Jan 2018 11:11:27 +0000
Subject: x86/retpoline: Fill return stack buffer on vmexit
From: David Woodhouse <dwmw(a)amazon.co.uk>
commit 117cc7a908c83697b0b737d15ae1eb5943afe35b upstream.
In accordance with the Intel and AMD documentation, we need to overwrite
all entries in the RSB on exiting a guest, to prevent malicious branch
target predictions from affecting the host kernel. This is needed both
for retpoline and for IBRS.
[ak: numbers again for the RSB stuffing labels]
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: thomas.lendacky(a)amd.com
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Kees Cook <keescook(a)google.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linux-foundation.org>
Cc: Paul Turner <pjt(a)google.com>
Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/nospec-branch.h | 78 ++++++++++++++++++++++++++++++++++-
arch/x86/kvm/svm.c | 4 +
arch/x86/kvm/vmx.c | 4 +
3 files changed, 85 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -7,6 +7,48 @@
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>
+/*
+ * Fill the CPU return stack buffer.
+ *
+ * Each entry in the RSB, if used for a speculative 'ret', contains an
+ * infinite 'pause; jmp' loop to capture speculative execution.
+ *
+ * This is required in various cases for retpoline and IBRS-based
+ * mitigations for the Spectre variant 2 vulnerability. Sometimes to
+ * eliminate potentially bogus entries from the RSB, and sometimes
+ * purely to ensure that it doesn't get empty, which on some CPUs would
+ * allow predictions from other (unwanted!) sources to be used.
+ *
+ * We define a CPP macro such that it can be used from both .S files and
+ * inline assembly. It's possible to do a .macro and then include that
+ * from C via asm(".include <asm/nospec-branch.h>") but let's not go there.
+ */
+
+#define RSB_CLEAR_LOOPS 32 /* To forcibly overwrite all entries */
+#define RSB_FILL_LOOPS 16 /* To avoid underflow */
+
+/*
+ * Google experimented with loop-unrolling and this turned out to be
+ * the optimal version — two calls, each with their own speculation
+ * trap should their return address end up getting used, in a loop.
+ */
+#define __FILL_RETURN_BUFFER(reg, nr, sp) \
+ mov $(nr/2), reg; \
+771: \
+ call 772f; \
+773: /* speculation trap */ \
+ pause; \
+ jmp 773b; \
+772: \
+ call 774f; \
+775: /* speculation trap */ \
+ pause; \
+ jmp 775b; \
+774: \
+ dec reg; \
+ jnz 771b; \
+ add $(BITS_PER_LONG/8) * nr, sp;
+
#ifdef __ASSEMBLY__
/*
@@ -76,6 +118,20 @@
#endif
.endm
+ /*
+ * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
+ * monstrosity above, manually.
+ */
+.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
+#ifdef CONFIG_RETPOLINE
+ ANNOTATE_NOSPEC_ALTERNATIVE
+ ALTERNATIVE "jmp .Lskip_rsb_\@", \
+ __stringify(__FILL_RETURN_BUFFER(\reg,\nr,%_ASM_SP)) \
+ \ftr
+.Lskip_rsb_\@:
+#endif
+.endm
+
#else /* __ASSEMBLY__ */
#define ANNOTATE_NOSPEC_ALTERNATIVE \
@@ -119,7 +175,7 @@
X86_FEATURE_RETPOLINE)
# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
-#else /* No retpoline */
+#else /* No retpoline for C / inline asm */
# define CALL_NOSPEC "call *%[thunk_target]\n"
# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
#endif
@@ -134,5 +190,25 @@ enum spectre_v2_mitigation {
SPECTRE_V2_IBRS,
};
+/*
+ * On VMEXIT we must ensure that no RSB predictions learned in the guest
+ * can be followed in the host, by overwriting the RSB completely. Both
+ * retpoline and IBRS mitigations for Spectre v2 need this; only on future
+ * CPUs with IBRS_ATT *might* it be avoided.
+ */
+static inline void vmexit_fill_RSB(void)
+{
+#ifdef CONFIG_RETPOLINE
+ unsigned long loops = RSB_CLEAR_LOOPS / 2;
+
+ asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
+ ALTERNATIVE("jmp 910f",
+ __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
+ X86_FEATURE_RETPOLINE)
+ "910:"
+ : "=&r" (loops), ASM_CALL_CONSTRAINT
+ : "r" (loops) : "memory" );
+#endif
+}
#endif /* __ASSEMBLY__ */
#endif /* __NOSPEC_BRANCH_H__ */
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -44,6 +44,7 @@
#include <asm/debugreg.h>
#include <asm/kvm_para.h>
#include <asm/irq_remapping.h>
+#include <asm/nospec-branch.h>
#include <asm/virtext.h>
#include "trace.h"
@@ -4917,6 +4918,9 @@ static void svm_vcpu_run(struct kvm_vcpu
#endif
);
+ /* Eliminate branch target predictions from guest mode */
+ vmexit_fill_RSB();
+
#ifdef CONFIG_X86_64
wrmsrl(MSR_GS_BASE, svm->host.gs_base);
#else
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -48,6 +48,7 @@
#include <asm/kexec.h>
#include <asm/apic.h>
#include <asm/irq_remapping.h>
+#include <asm/nospec-branch.h>
#include "trace.h"
#include "pmu.h"
@@ -9026,6 +9027,9 @@ static void __noclone vmx_vcpu_run(struc
#endif
);
+ /* Eliminate branch target predictions from guest mode */
+ vmexit_fill_RSB();
+
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if (debugctlmsr)
update_debugctlmsr(debugctlmsr);
Patches currently in stable-queue which might be from dwmw(a)amazon.co.uk are
queue-4.9/x86-spectre-add-boot-time-option-to-select-spectre-v2-mitigation.patch
queue-4.9/x86-retpoline-irq32-convert-assembler-indirect-jumps.patch
queue-4.9/objtool-detect-jumps-to-retpoline-thunks.patch
queue-4.9/x86-cpufeatures-add-x86_bug_spectre_v.patch
queue-4.9/x86-alternatives-add-missing-n-at-end-of-alternative-inline-asm.patch
queue-4.9/x86-retpoline-hyperv-convert-assembler-indirect-jumps.patch
queue-4.9/x86-retpoline-entry-convert-entry-assembler-indirect-jumps.patch
queue-4.9/sysfs-cpu-fix-typos-in-vulnerability-documentation.patch
queue-4.9/x86-cpufeatures-add-x86_bug_cpu_insecure.patch
queue-4.9/x86-cpufeatures-make-cpu-bugs-sticky.patch
queue-4.9/x86-cpu-amd-make-lfence-a-serializing-instruction.patch
queue-4.9/x86-retpoline-ftrace-convert-ftrace-assembler-indirect-jumps.patch
queue-4.9/objtool-allow-alternatives-to-be-ignored.patch
queue-4.9/x86-cpu-implement-cpu-vulnerabilites-sysfs-functions.patch
queue-4.9/x86-retpoline-crypto-convert-crypto-assembler-indirect-jumps.patch
queue-4.9/x86-cpu-factor-out-application-of-forced-cpu-caps.patch
queue-4.9/x86-retpoline-xen-convert-xen-hypercall-indirect-jumps.patch
queue-4.9/x86-retpoline-checksum32-convert-assembler-indirect-jumps.patch
queue-4.9/x86-mm-32-move-setup_clear_cpu_cap-x86_feature_pcid-earlier.patch
queue-4.9/sysfs-cpu-add-vulnerability-folder.patch
queue-4.9/x86-retpoline-fill-return-stack-buffer-on-vmexit.patch
queue-4.9/x86-pti-rename-bug_cpu_insecure-to-bug_cpu_meltdown.patch
queue-4.9/x86-retpoline-remove-compile-time-warning.patch
queue-4.9/x86-alternatives-fix-optimize_nops-checking.patch
queue-4.9/x86-cpu-amd-use-lfence_rdtsc-in-preference-to-mfence_rdtsc.patch
queue-4.9/x86-retpoline-add-initial-retpoline-support.patch