The recent rework of probe_kernel_read() and its conversion to
get_kernel_nofault() inadvertently broke is_prefetch(). We were using
probe_kernel_read() as a sloppy "read user or kernel memory" helper, but it
doens't do that any more. The new get_kernel_nofault() reads *kernel*
memory only, which completely broke is_prefetch() for user access.
Adjust the code to the the correct accessor based on access mode. The
manual address bounds check is no longer necessary, since the accessor
helpers (get_user() / get_kernel_nofault()) do the right thing all by
themselves. As a bonus, by using the correct accessor, we don't need the
open-coded address bounds check.
While we're at it, disable the workaround on all CPUs except AMD Family
0xF. By my reading of the Revision Guide for AMD Athlon™ 64 and AMD
Opteron™ Processors, only family 0xF is affected.
Fixes: eab0c6089b68 ("maccess: unify the probe kernel arch hooks")
Cc: stable(a)vger.kernel.org
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
arch/x86/mm/fault.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 106b22d1d189..50dfdc71761e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -54,7 +54,7 @@ kmmio_fault(struct pt_regs *regs, unsigned long addr)
* 32-bit mode:
*
* Sometimes AMD Athlon/Opteron CPUs report invalid exceptions on prefetch.
- * Check that here and ignore it.
+ * Check that here and ignore it. This is AMD erratum #91.
*
* 64-bit mode:
*
@@ -83,11 +83,7 @@ check_prefetch_opcode(struct pt_regs *regs, unsigned char *instr,
#ifdef CONFIG_X86_64
case 0x40:
/*
- * In AMD64 long mode 0x40..0x4F are valid REX prefixes
- * Need to figure out under what instruction mode the
- * instruction was issued. Could check the LDT for lm,
- * but for now it's good enough to assume that long
- * mode only uses well known segments or kernel.
+ * In 64-bit mode 0x40..0x4F are valid REX prefixes
*/
return (!user_mode(regs) || user_64bit_mode(regs));
#endif
@@ -124,23 +120,38 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
if (error_code & X86_PF_INSTR)
return 0;
+ if (likely(boot_cpu_data.x86_vendor != X86_VENDOR_AMD
+ || boot_cpu_data.x86 != 0xf))
+ return 0;
+
instr = (void *)convert_ip_to_linear(current, regs);
max_instr = instr + 15;
- if (user_mode(regs) && instr >= (unsigned char *)TASK_SIZE_MAX)
- return 0;
+ /*
+ * This code has historically always bailed out if IP points to a
+ * not-present page (e.g. due to a race). No one has ever
+ * complained about this.
+ */
+ pagefault_disable();
while (instr < max_instr) {
unsigned char opcode;
- if (get_kernel_nofault(opcode, instr))
- break;
+ if (user_mode(regs)) {
+ if (get_user(opcode, instr))
+ break;
+ } else {
+ if (get_kernel_nofault(opcode, instr))
+ break;
+ }
instr++;
if (!check_prefetch_opcode(regs, instr, opcode, &prefetch))
break;
}
+
+ pagefault_enable();
return prefetch;
}
--
2.29.2
On Fri, Jan 29, 2021, Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 76bce832cade..15733013b266 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1401,7 +1401,7 @@ static u64 kvm_get_arch_capabilities(void)
> * This lets the guest use VERW to clear CPU buffers.
This comment be updated to call out the new TSX_CTRL behavior.
/*
* On TAA affected systems:
* - nothing to do if TSX is disabled on the host.
* - we emulate TSX_CTRL if present on the host.
* This lets the guest use VERW to clear CPU buffers.
*/
> */
> if (!boot_cpu_has(X86_FEATURE_RTM))
> - data &= ~(ARCH_CAP_TAA_NO | ARCH_CAP_TSX_CTRL_MSR);
> + data &= ~ARCH_CAP_TAA_NO;
Hmm, simply clearing TSX_CTRL will only preserve the host value. Since
ARCH_CAPABILITIES is unconditionally emulated by KVM, wouldn't it make sense to
unconditionally expose TSX_CTRL as well, as opposed to exposing it only if it's
supported in the host? I.e. allow migrating a TSX-disabled guest to a host
without TSX. Or am I misunderstanding how TSX_CTRL is checked/used?
> else if (!boot_cpu_has_bug(X86_BUG_TAA))
> data |= ARCH_CAP_TAA_NO;
>
> --
> 2.26.2
>
Commit 7a2da5d7960a ("spi: fsl: Fix driver breakage when SPI_CS_HIGH
is not set in spi->mode") broke our MPC8309 board by effectively
inverting the boolean value passed to fsl_spi_cs_control. The
SPISEL_BOOT signal is used as chipselect, but it's not a gpio, so
we cannot rely on gpiolib handling the polarity.
Adapt to the new world order by inverting the logic here. This does
assume that the slave sitting at the SPISEL_BOOT is active low, but
should that ever turn out not to be the case, one can create a stub
gpiochip driver controlling a single gpio (or rather, a single "spo",
special-purpose output).
Fixes: 7a2da5d7960a ("spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
---
drivers/spi/spi-fsl-spi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 6d8e0a05a535..e4a8d203f940 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -695,7 +695,7 @@ static void fsl_spi_cs_control(struct spi_device *spi, bool on)
if (WARN_ON_ONCE(!pinfo->immr_spi_cs))
return;
- iowrite32be(on ? SPI_BOOT_SEL_BIT : 0, pinfo->immr_spi_cs);
+ iowrite32be(on ? 0 : SPI_BOOT_SEL_BIT, pinfo->immr_spi_cs);
}
}
--
2.23.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0941e3b0653fef1ea68287f6a948c6c68a45c9ba Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Mon, 14 Dec 2020 12:12:08 -0500
Subject: [PATCH] Revert "dm raid: fix discard limits for raid1 and raid10"
This reverts commit e0910c8e4f87bb9f767e61a778b0d9271c4dc512.
Reverting 6ffeb1c3f822 ("md: change mddev 'chunk_sectors' from int to
unsigned") exposes dm-raid.c compiler warnings detailed that commit's
header. Clearly this more conservative fix, of simply reverting
e0910c8e4f8, would've been more prudent given how late we were in the
v5.10 release. Lessons have been learned.
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index dc8568ab96f2..56b723d012ac 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3730,14 +3730,12 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs));
/*
- * RAID10 personality requires bio splitting,
- * RAID0/1/4/5/6 don't and process large discard bios properly.
+ * RAID1 and RAID10 personalities require bio splitting,
+ * RAID0/4/5/6 don't and process large discard bios properly.
*/
- if (rs_is_raid10(rs)) {
- limits->discard_granularity = max(chunk_size_bytes,
- limits->discard_granularity);
- limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
- limits->max_discard_sectors);
+ if (rs_is_raid1(rs) || rs_is_raid10(rs)) {
+ limits->discard_granularity = chunk_size_bytes;
+ limits->max_discard_sectors = rs->md.chunk_sectors;
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e0910c8e4f87bb9f767e61a778b0d9271c4dc512 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)redhat.com>
Date: Thu, 24 Sep 2020 13:14:52 -0400
Subject: [PATCH] dm raid: fix discard limits for raid1 and raid10
Block core warned that discard_granularity was 0 for dm-raid with
personality of raid1. Reason is that raid_io_hints() was incorrectly
special-casing raid1 rather than raid0.
But since commit 29efc390b9462 ("md/md0: optimize raid0 discard
handling") even raid0 properly handles large discards.
Fix raid_io_hints() by removing discard limits settings for raid1.
Also, fix limits for raid10 by properly stacking underlying limits as
done in blk_stack_limits().
Depends-on: 29efc390b9462 ("md/md0: optimize raid0 discard handling")
Fixes: 61697a6abd24a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable(a)vger.kernel.org
Reported-by: Zdenek Kabelac <zkabelac(a)redhat.com>
Reported-by: Mikulas Patocka <mpatocka(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 56b723d012ac..dc8568ab96f2 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3730,12 +3730,14 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs));
/*
- * RAID1 and RAID10 personalities require bio splitting,
- * RAID0/4/5/6 don't and process large discard bios properly.
+ * RAID10 personality requires bio splitting,
+ * RAID0/1/4/5/6 don't and process large discard bios properly.
*/
- if (rs_is_raid1(rs) || rs_is_raid10(rs)) {
- limits->discard_granularity = chunk_size_bytes;
- limits->max_discard_sectors = rs->md.chunk_sectors;
+ if (rs_is_raid10(rs)) {
+ limits->discard_granularity = max(chunk_size_bytes,
+ limits->discard_granularity);
+ limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
+ limits->max_discard_sectors);
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ce5379963b2884e9d23bea0c5674a7251414c84b Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
Date: Sat, 16 Jan 2021 19:50:34 +0100
Subject: [PATCH] netfilter: nft_dynset: dump expressions when set definition
contains no expressions
If the set definition provides no stateful expressions, then include the
stateful expression in the ruleset listing. Without this fix, the dynset
rule listing shows the stateful expressions provided by the set
definition.
Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 218c09e4fddd..d164ef9e6843 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -384,22 +384,25 @@ static int nft_dynset_dump(struct sk_buff *skb, const struct nft_expr *expr)
nf_jiffies64_to_msecs(priv->timeout),
NFTA_DYNSET_PAD))
goto nla_put_failure;
- if (priv->num_exprs == 1) {
- if (nft_expr_dump(skb, NFTA_DYNSET_EXPR, priv->expr_array[0]))
- goto nla_put_failure;
- } else if (priv->num_exprs > 1) {
- struct nlattr *nest;
-
- nest = nla_nest_start_noflag(skb, NFTA_DYNSET_EXPRESSIONS);
- if (!nest)
- goto nla_put_failure;
-
- for (i = 0; i < priv->num_exprs; i++) {
- if (nft_expr_dump(skb, NFTA_LIST_ELEM,
- priv->expr_array[i]))
+ if (priv->set->num_exprs == 0) {
+ if (priv->num_exprs == 1) {
+ if (nft_expr_dump(skb, NFTA_DYNSET_EXPR,
+ priv->expr_array[0]))
goto nla_put_failure;
+ } else if (priv->num_exprs > 1) {
+ struct nlattr *nest;
+
+ nest = nla_nest_start_noflag(skb, NFTA_DYNSET_EXPRESSIONS);
+ if (!nest)
+ goto nla_put_failure;
+
+ for (i = 0; i < priv->num_exprs; i++) {
+ if (nft_expr_dump(skb, NFTA_LIST_ELEM,
+ priv->expr_array[i]))
+ goto nla_put_failure;
+ }
+ nla_nest_end(skb, nest);
}
- nla_nest_end(skb, nest);
}
if (nla_put_be32(skb, NFTA_DYNSET_FLAGS, htonl(flags)))
goto nla_put_failure;