Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.
Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.
Signed-off-by: Lukas Czerner <lczerner(a)redhat.com>
Reported-by: Joe Habermann <joe.habermann(a)quantum.com>
Cc: <stable(a)vger.kernel.org>
---
fs/ext4/inode.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5e..6b4c5c0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4298,28 +4298,28 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
EXT4_BLOCK_SIZE_BITS(sb);
stop_block = (offset + length) >> EXT4_BLOCK_SIZE_BITS(sb);
- /* If there are no blocks to remove, return now */
- if (first_block >= stop_block)
- goto out_stop;
+ /* If there are blocks to remove, do it */
+ if (stop_block > first_block) {
- down_write(&EXT4_I(inode)->i_data_sem);
- ext4_discard_preallocations(inode);
+ down_write(&EXT4_I(inode)->i_data_sem);
+ ext4_discard_preallocations(inode);
- ret = ext4_es_remove_extent(inode, first_block,
- stop_block - first_block);
- if (ret) {
- up_write(&EXT4_I(inode)->i_data_sem);
- goto out_stop;
- }
+ ret = ext4_es_remove_extent(inode, first_block,
+ stop_block - first_block);
+ if (ret) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto out_stop;
+ }
- if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
- ret = ext4_ext_remove_space(inode, first_block,
- stop_block - 1);
- else
- ret = ext4_ind_remove_space(handle, inode, first_block,
- stop_block);
+ if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+ ret = ext4_ext_remove_space(inode, first_block,
+ stop_block - 1);
+ else
+ ret = ext4_ind_remove_space(handle, inode, first_block,
+ stop_block);
- up_write(&EXT4_I(inode)->i_data_sem);
+ up_write(&EXT4_I(inode)->i_data_sem);
+ }
if (IS_SYNC(inode))
ext4_handle_sync(handle);
--
2.7.5
When ext4_ind_map_blocks() computes a length of a hole, it doesn't count
with the fact that mapped offset may be somewhere in the middle of the
completely empty subtree. In such case it will return too large length
of the hole which then results in lseek(SEEK_DATA) to end up returning
an incorrect offset beyond the end of the hole.
Fix the problem by correctly taking offset within a subtree into account
when computing a length of a hole.
Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8
CC: stable(a)vger.kernel.org
Reported-by: Jeff Mahoney <jeffm(a)suse.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/indirect.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
I'll submit corresponding fstest shortly.
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index c32802c956d5..bf7fa1507e81 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -561,10 +561,16 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
unsigned epb = inode->i_sb->s_blocksize / sizeof(u32);
int i;
- /* Count number blocks in a subtree under 'partial' */
- count = 1;
- for (i = 0; partial + i != chain + depth - 1; i++)
- count *= epb;
+ /*
+ * Count number blocks in a subtree under 'partial'. At each
+ * level we count number of complete empty subtrees beyond
+ * current offset and then descend into the subtree only
+ * partially beyond current offset.
+ */
+ count = 0;
+ for (i = partial - chain + 1; i < depth; i++)
+ count = count * epb + (epb - offsets[i] - 1);
+ count++;
/* Fill in size of a hole we found */
map->m_pblk = 0;
map->m_len = min_t(unsigned int, map->m_len, count);
--
2.13.6
Hi,
This is the 3rd version of bugfix series for kprobes on arm.
This series fixes 4 different issues which I found.
- Fix to use smp_processor_id() after disabling preemption.
- Prohibit probing on optimized_callback() for avoiding
recursive probe.
- Prohibit kprobes on do_undefinstr() by same reason.
- Prohibit kprobes on get_user() by same reason.
>From v2, I included another 2 bugfixes (1/4 and 2/4)
which are not merged yet, and added "Cc: stable(a)vger.kernel.org",
since there are obvious bugs.
Thanks,
---
Masami Hiramatsu (4):
arm: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed
arm: kprobes: Prohibit probing on optimized_callback
arm: kprobes: Prohibit kprobes on do_undefinstr
arm: kprobes: Prohibit kprobes on get_user functions
arch/arm/include/asm/assembler.h | 10 ++++++++++
arch/arm/kernel/traps.c | 5 ++++-
arch/arm/lib/getuser.S | 10 ++++++++++
arch/arm/probes/kprobes/opt-arm.c | 4 +++-
4 files changed, 27 insertions(+), 2 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
This is the start of the stable review cycle for the 4.4.130 release.
There are 50 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Apr 29 13:56:42 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.130-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.130-rc1
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390/uprobes: implement arch_uretprobe_is_alive()
Sebastian Ott <sebott(a)linux.ibm.com>
s390/cio: update chpid descriptor after resource accessibility event
Dan Carpenter <dan.carpenter(a)oracle.com>
cdrom: information leak in cdrom_ioctl_media_changed()
Martin K. Petersen <martin.petersen(a)oracle.com>
scsi: mptsas: Disable WRITE SAME
Eric Dumazet <edumazet(a)google.com>
ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
Eric Dumazet <edumazet(a)google.com>
net: af_packet: fix race in PACKET_{R|T}X_RING
Eric Dumazet <edumazet(a)google.com>
tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
Wolfgang Bumiller <w.bumiller(a)proxmox.com>
net: fix deadlock while clearing neighbor proxy table
Eric Dumazet <edumazet(a)google.com>
tipc: add policy for TIPC_NLA_NET_ADDR
Cong Wang <xiyou.wangcong(a)gmail.com>
llc: fix NULL pointer deref for SOCK_ZAPPED
Cong Wang <xiyou.wangcong(a)gmail.com>
llc: hold llc_sap before release_sock()
Xin Long <lucien.xin(a)gmail.com>
sctp: do not check port in sctp_inet6_cmp_addr
Toshiaki Makita <makita.toshiaki(a)lab.ntt.co.jp>
vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
Guillaume Nault <g.nault(a)alphalink.fr>
pppoe: check sockaddr length in pppoe_connect()
Willem de Bruijn <willemb(a)google.com>
packet: fix bitfield update race
Xin Long <lucien.xin(a)gmail.com>
team: fix netconsole setup over team
Paolo Abeni <pabeni(a)redhat.com>
team: avoid adding twice the same option to the event list
Jann Horn <jannh(a)google.com>
tcp: don't read out-of-bounds opsize
Cong Wang <xiyou.wangcong(a)gmail.com>
llc: delete timers synchronously in llc_sk_free()
Eric Dumazet <edumazet(a)google.com>
net: validate attribute sizes in neigh_dump_table()
Guillaume Nault <g.nault(a)alphalink.fr>
l2tp: check sockaddr length in pppol2tp_connect()
Eric Biggers <ebiggers(a)google.com>
KEYS: DNS: limit the length of option strings
Xin Long <lucien.xin(a)gmail.com>
bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: correct module section names for expoline code revert
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: correct nospec auto detection init order
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: add sysfs attributes for spectre
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: report spectre mitigation via syslog
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: add automatic detection of the spectre defense
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: move nobp parameter functions to nospec-branch.c
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390/entry.S: fix spurious zeroing of r0
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: do not bypass BPENTER for interrupt system calls
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: introduce execute-trampolines for branches
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: run user space and KVM guests with modified branch prediction
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: add options to change branch prediction behaviour for the kernel
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390/alternative: use a copy of the facility bit mask
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: add optimized array_index_mask_nospec
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: scrub registers on kernel entry and KVM exit
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
KVM: s390: wire up bpb feature
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: enable CPU alternatives unconditionally
Martin Schwidefsky <schwidefsky(a)de.ibm.com>
s390: introduce CPU alternatives
Karthikeyan Periyasamy <periyasa(a)codeaurora.org>
Revert "ath10k: send (re)assoc peer command when NSS changed"
Sahitya Tummala <stummala(a)codeaurora.org>
jbd2: fix use after free in kjournald2()
Felix Fietkau <nbd(a)nbd.name>
ath9k_hw: check if the chip failed to wake up
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Input: drv260x - fix initializing overdrive voltage
Grant Grundler <grundler(a)chromium.org>
r8152: add Linksys USB3GIGV1 id
Chen Feng <puck.chen(a)hisilicon.com>
staging: ion : Donnot wakeup kswapd in ion system alloc
Jiri Olsa <jolsa(a)kernel.org>
perf: Return proper values for user stack errors
Xiaoming Gao <gxm.linux.kernel(a)gmail.com>
x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
Steve French <smfrench(a)gmail.com>
cifs: do not allow creating sockets except with SMB1 posix exensions
-------------
Diffstat:
Documentation/kernel-parameters.txt | 3 +
Makefile | 4 +-
arch/s390/Kconfig | 47 +++++
arch/s390/Makefile | 10 ++
arch/s390/include/asm/alternative.h | 149 +++++++++++++++
arch/s390/include/asm/barrier.h | 24 +++
arch/s390/include/asm/facility.h | 18 ++
arch/s390/include/asm/kvm_host.h | 3 +-
arch/s390/include/asm/lowcore.h | 7 +-
arch/s390/include/asm/nospec-branch.h | 17 ++
arch/s390/include/asm/processor.h | 4 +
arch/s390/include/asm/thread_info.h | 4 +
arch/s390/include/uapi/asm/kvm.h | 3 +
arch/s390/kernel/Makefile | 5 +-
arch/s390/kernel/alternative.c | 112 ++++++++++++
arch/s390/kernel/early.c | 5 +
arch/s390/kernel/entry.S | 250 +++++++++++++++++++++++---
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/module.c | 65 ++++++-
arch/s390/kernel/nospec-branch.c | 169 +++++++++++++++++
arch/s390/kernel/processor.c | 18 ++
arch/s390/kernel/setup.c | 14 +-
arch/s390/kernel/smp.c | 7 +-
arch/s390/kernel/uprobes.c | 9 +
arch/s390/kernel/vmlinux.lds.S | 37 ++++
arch/s390/kvm/kvm-s390.c | 13 +-
arch/x86/kernel/tsc.c | 2 +-
drivers/cdrom/cdrom.c | 2 +-
drivers/input/misc/drv260x.c | 2 +-
drivers/message/fusion/mptsas.c | 1 +
drivers/net/bonding/bond_main.c | 3 +-
drivers/net/ppp/pppoe.c | 4 +
drivers/net/team/team.c | 38 +++-
drivers/net/usb/cdc_ether.c | 10 ++
drivers/net/usb/r8152.c | 2 +
drivers/net/wireless/ath/ath10k/mac.c | 5 +-
drivers/net/wireless/ath/ath9k/hw.c | 4 +
drivers/s390/char/Makefile | 2 +
drivers/s390/cio/chsc.c | 14 +-
drivers/staging/android/ion/ion_system_heap.c | 2 +-
fs/cifs/dir.c | 9 +-
fs/jbd2/journal.c | 2 +-
include/linux/if_vlan.h | 7 +-
include/net/llc_conn.h | 1 +
include/uapi/linux/kvm.h | 1 +
kernel/events/core.c | 4 +-
net/core/dev.c | 2 +-
net/core/neighbour.c | 40 +++--
net/dns_resolver/dns_key.c | 13 +-
net/ipv4/tcp.c | 6 +-
net/ipv4/tcp_input.c | 7 +-
net/ipv6/route.c | 2 +
net/l2tp/l2tp_ppp.c | 7 +
net/llc/af_llc.c | 14 +-
net/llc/llc_c_ac.c | 9 +-
net/llc/llc_conn.c | 22 ++-
net/packet/af_packet.c | 88 ++++++---
net/packet/internal.h | 10 +-
net/sctp/ipv6.c | 60 +++----
net/tipc/net.c | 3 +-
60 files changed, 1228 insertions(+), 168 deletions(-)
commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer
from an erratum 1024718, which causes incorrect updates when DBM/AP
bits in a page table entry is modified without a break-before-make
sequence. The work around is to disable the hardware DBM feature
on the affected cores. The hardware Access Flag management features
is not affected.
The hardware DBM feature is a non-conflicting capability, i.e, the
kernel could handle cores using the feature and those without having
the features running at the same time. So this work around is detected
at early boot time, rather than delaying it until the CPUs are brought
up into the kernel with MMU turned on. This also avoids other complexities
with late CPUs turning online, with or without the hardware DBM features.
Cc: stable(a)vger.kernel.org # v4.9
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
Note: The upstream commit is on top of a reworked capability
infrastructure for arm64 heterogeneous systems, which allows
delaying the CPU model checks. This backport is based on the
original version of the patch [0], which checks the affected
CPU models during the early boot.
[0] https://lkml.kernel.org/r/20180116102323.3470-1-suzuki.poulose@arm.com
---
Documentation/arm64/silicon-errata.txt | 1 +
arch/arm64/Kconfig | 14 ++++++++++++
arch/arm64/include/asm/assembler.h | 40 ++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/cputype.h | 5 +++++
arch/arm64/mm/proc.S | 5 +++++
5 files changed, 65 insertions(+)
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index d11af52..ac9489f 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -54,6 +54,7 @@ stable kernels.
| ARM | Cortex-A57 | #852523 | N/A |
| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 |
| ARM | Cortex-A72 | #853709 | N/A |
+| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
| ARM | MMU-500 | #841119,#826419 | N/A |
| | | | |
| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 90e58bb..d0df361 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -427,6 +427,20 @@ config ARM64_ERRATUM_843419
If unsure, say Y.
+config ARM64_ERRATUM_1024718
+ bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update"
+ default y
+ help
+ This option adds work around for Arm Cortex-A55 Erratum 1024718.
+
+ Affected Cortex-A55 cores (r0p0, r0p1, r1p0) could cause incorrect
+ update of the hardware dirty bit when the DBM/AP bits are updated
+ without a break-before-make. The work around is to disable the usage
+ of hardware DBM locally on the affected cores. CPUs not affected by
+ erratum will continue to use the feature.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e60375c..bfcfec3 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -25,6 +25,7 @@
#include <asm/asm-offsets.h>
#include <asm/cpufeature.h>
+#include <asm/cputype.h>
#include <asm/page.h>
#include <asm/pgtable-hwdef.h>
#include <asm/ptrace.h>
@@ -435,4 +436,43 @@ alternative_endif
and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
.endm
+/*
+ * Check the MIDR_EL1 of the current CPU for a given model and a range of
+ * variant/revision. See asm/cputype.h for the macros used below.
+ *
+ * model: MIDR_CPU_MODEL of CPU
+ * rv_min: Minimum of MIDR_CPU_VAR_REV()
+ * rv_max: Maximum of MIDR_CPU_VAR_REV()
+ * res: Result register.
+ * tmp1, tmp2, tmp3: Temporary registers
+ *
+ * Corrupts: res, tmp1, tmp2, tmp3
+ * Returns: 0, if the CPU id doesn't match. Non-zero otherwise
+ */
+ .macro cpu_midr_match model, rv_min, rv_max, res, tmp1, tmp2, tmp3
+ mrs \res, midr_el1
+ mov_q \tmp1, (MIDR_REVISION_MASK | MIDR_VARIANT_MASK)
+ mov_q \tmp2, MIDR_CPU_MODEL_MASK
+ and \tmp3, \res, \tmp2 // Extract model
+ and \tmp1, \res, \tmp1 // rev & variant
+ mov_q \tmp2, \model
+ cmp \tmp3, \tmp2
+ cset \res, eq
+ cbz \res, .Ldone\@ // Model matches ?
+
+ .if (\rv_min != 0) // Skip min check if rv_min == 0
+ mov_q \tmp3, \rv_min
+ cmp \tmp1, \tmp3
+ cset \res, ge
+ .endif // \rv_min != 0
+ /* Skip rv_max check if rv_min == rv_max && rv_min != 0 */
+ .if ((\rv_min != \rv_max) || \rv_min == 0)
+ mov_q \tmp2, \rv_max
+ cmp \tmp1, \tmp2
+ cset \tmp2, le
+ and \res, \res, \tmp2
+ .endif
+.Ldone\@:
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 9ee3038..39d1db6 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -56,6 +56,9 @@
(0xf << MIDR_ARCHITECTURE_SHIFT) | \
((partnum) << MIDR_PARTNUM_SHIFT))
+#define MIDR_CPU_VAR_REV(var, rev) \
+ (((var) << MIDR_VARIANT_SHIFT) | (rev))
+
#define MIDR_CPU_MODEL_MASK (MIDR_IMPLEMENTOR_MASK | MIDR_PARTNUM_MASK | \
MIDR_ARCHITECTURE_MASK)
@@ -74,6 +77,7 @@
#define ARM_CPU_PART_AEM_V8 0xD0F
#define ARM_CPU_PART_FOUNDATION 0xD00
+#define ARM_CPU_PART_CORTEX_A55 0xD05
#define ARM_CPU_PART_CORTEX_A57 0xD07
#define ARM_CPU_PART_CORTEX_A72 0xD08
#define ARM_CPU_PART_CORTEX_A53 0xD03
@@ -89,6 +93,7 @@
#define BRCM_CPU_PART_VULCAN 0x516
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
+#define MIDR_CORTEX_A55 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55)
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 619da1c..66cce21 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -425,6 +425,11 @@ ENTRY(__cpu_setup)
cbz x9, 2f
cmp x9, #2
b.lt 1f
+#ifdef CONFIG_ARM64_ERRATUM_1024718
+ /* Disable hardware DBM on Cortex-A55 r0p0, r0p1 & r1p0 */
+ cpu_midr_match MIDR_CORTEX_A55, MIDR_CPU_VAR_REV(0, 0), MIDR_CPU_VAR_REV(1, 0), x1, x2, x3, x4
+ cbnz x1, 1f
+#endif
orr x10, x10, #TCR_HD // hardware Dirty flag update
1: orr x10, x10, #TCR_HA // hardware Access flag update
2:
--
2.7.4
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9d219554d9bf59875b4e571a0392d620e8954879 Mon Sep 17 00:00:00 2001
From: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Date: Wed, 2 May 2018 10:52:55 -0700
Subject: [PATCH] drm/i915: Adjust eDP's logical vco in a reliable place.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On intel_dp_compute_config() we were calculating the needed vco
for eDP on gen9 and we stashing it in
intel_atomic_state.cdclk.logical.vco
However few moments later on intel_modeset_checks() we fully
replace entire intel_atomic_state.cdclk.logical with
dev_priv->cdclk.logical fully overwriting the logical desired
vco for eDP on gen9.
So, with wrong VCO value we end up with wrong desired cdclk, but
also it will raise a lot of WARNs: On gen9, when we read
CDCLK_CTL to verify if we configured properly the desired
frequency the CD Frequency Select bits [27:26] == 10b can mean
337.5 or 308.57 MHz depending on the VCO. So if we have wrong
VCO value stashed we will believe the frequency selection didn't
stick and start to raise WARNs of cdclk mismatch.
[ 42.857519] [drm:intel_dump_cdclk_state [i915]] Changing CDCLK to 308571 kHz, VCO 8640000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0
[ 42.897269] cdclk state doesn't match!
[ 42.901052] WARNING: CPU: 5 PID: 1116 at drivers/gpu/drm/i915/intel_cdclk.c:2084 intel_set_cdclk+0x5d/0x110 [i915]
[ 42.938004] RIP: 0010:intel_set_cdclk+0x5d/0x110 [i915]
[ 43.155253] WARNING: CPU: 5 PID: 1116 at drivers/gpu/drm/i915/intel_cdclk.c:2084 intel_set_cdclk+0x5d/0x110 [i915]
[ 43.170277] [drm:intel_dump_cdclk_state [i915]] [hw state] 337500 kHz, VCO 8100000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0
[ 43.182566] [drm:intel_dump_cdclk_state [i915]] [sw state] 308571 kHz, VCO 8640000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0
v2: Move the entire eDP's vco logical adjustment to inside
the skl_modeset_calc_cdclk as suggested by Ville.
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Fixes: bb0f4aab0e76 ("drm/i915: Track full cdclk state for the logical and actual cdclk frequencies")
Cc: <stable(a)vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20180502175255.5344-1-rodrigo…
(cherry picked from commit 3297234a05ab1e90091b0574db4c397ef0e90d5f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index 32d24c69da3c..704ddb4d3ca7 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -2302,9 +2302,44 @@ static int bdw_modeset_calc_cdclk(struct drm_atomic_state *state)
return 0;
}
+static int skl_dpll0_vco(struct intel_atomic_state *intel_state)
+{
+ struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev);
+ struct intel_crtc *crtc;
+ struct intel_crtc_state *crtc_state;
+ int vco, i;
+
+ vco = intel_state->cdclk.logical.vco;
+ if (!vco)
+ vco = dev_priv->skl_preferred_vco_freq;
+
+ for_each_new_intel_crtc_in_state(intel_state, crtc, crtc_state, i) {
+ if (!crtc_state->base.enable)
+ continue;
+
+ if (!intel_crtc_has_type(crtc_state, INTEL_OUTPUT_EDP))
+ continue;
+
+ /*
+ * DPLL0 VCO may need to be adjusted to get the correct
+ * clock for eDP. This will affect cdclk as well.
+ */
+ switch (crtc_state->port_clock / 2) {
+ case 108000:
+ case 216000:
+ vco = 8640000;
+ break;
+ default:
+ vco = 8100000;
+ break;
+ }
+ }
+
+ return vco;
+}
+
static int skl_modeset_calc_cdclk(struct drm_atomic_state *state)
{
- struct drm_i915_private *dev_priv = to_i915(state->dev);
struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
int min_cdclk, cdclk, vco;
@@ -2312,9 +2347,7 @@ static int skl_modeset_calc_cdclk(struct drm_atomic_state *state)
if (min_cdclk < 0)
return min_cdclk;
- vco = intel_state->cdclk.logical.vco;
- if (!vco)
- vco = dev_priv->skl_preferred_vco_freq;
+ vco = skl_dpll0_vco(intel_state);
/*
* FIXME should also account for plane ratio
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 9a4a51e79fa1..b7b4cfdeb974 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -1881,26 +1881,6 @@ intel_dp_compute_config(struct intel_encoder *encoder,
reduce_m_n);
}
- /*
- * DPLL0 VCO may need to be adjusted to get the correct
- * clock for eDP. This will affect cdclk as well.
- */
- if (intel_dp_is_edp(intel_dp) && IS_GEN9_BC(dev_priv)) {
- int vco;
-
- switch (pipe_config->port_clock / 2) {
- case 108000:
- case 216000:
- vco = 8640000;
- break;
- default:
- vco = 8100000;
- break;
- }
-
- to_intel_atomic_state(pipe_config->base.state)->cdclk.logical.vco = vco;
- }
-
if (!HAS_DDI(dev_priv))
intel_dp_set_clock(encoder, pipe_config);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d79f7aa496fc94d763f67b833a1f36f4c171176f Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Tue, 10 Apr 2018 16:27:47 -0700
Subject: [PATCH] mm: treat indirectly reclaimable memory as free in overcommit
logic
Indirectly reclaimable memory can consume a significant part of total
memory and it's actually reclaimable (it will be released under actual
memory pressure).
So, the overcommit logic should treat it as free.
Otherwise, it's possible to cause random system-wide memory allocation
failures by consuming a significant amount of memory by indirectly
reclaimable memory, e.g. dentry external names.
If overcommit policy GUESS is used, it might be used for denial of
service attack under some conditions.
The following program illustrates the approach. It causes the kernel to
allocate an unreclaimable kmalloc-256 chunk for each stat() call, so
that at some point the overcommit logic may start blocking large
allocation system-wide.
int main()
{
char buf[256];
unsigned long i;
struct stat statbuf;
buf[0] = '/';
for (i = 1; i < sizeof(buf); i++)
buf[i] = '_';
for (i = 0; 1; i++) {
sprintf(&buf[248], "%8lu", i);
stat(buf, &statbuf);
}
return 0;
}
This patch in combination with related indirectly reclaimable memory
patches closes this issue.
Link: http://lkml.kernel.org/r/20180313130041.8078-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/util.c b/mm/util.c
index 029fc2f3b395..73676f0f1b43 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -667,6 +667,13 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
*/
free += global_node_page_state(NR_SLAB_RECLAIMABLE);
+ /*
+ * Part of the kernel memory, which can be released
+ * under memory pressure.
+ */
+ free += global_node_page_state(
+ NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT;
+
/*
* Leave reserved pages. The pages are not for anonymous pages.
*/
AER handling expects a successful return from slot_reset means the
driver made the device functional again. The nvme driver had been using
an asynchronous reset to recover the device, so the device
may still be initializing after control is returned to the
AER handler. This creates problems for subsequent event handling,
causing the initializion to fail.
This patch fixes that by syncing the controller reset before returning
to the AER driver, and reporting the true state of the reset.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657
Reported-by: Alex Gagniuc <mr.nuke.me(a)gmail.com>
Cc: Sinan Kaya <okaya(a)codeaurora.org>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Keith Busch <keith.busch(a)intel.com>
---
drivers/nvme/host/pci.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b542dce45927..2e221796257a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
dev_info(dev->ctrl.device, "restart after slot reset\n");
pci_restore_state(pdev);
- nvme_reset_ctrl(&dev->ctrl);
- return PCI_ERS_RESULT_RECOVERED;
+ nvme_reset_ctrl_sync(&dev->ctrl);
+
+ switch (dev->ctrl.state) {
+ case NVME_CTRL_LIVE:
+ case NVME_CTRL_ADMIN_ONLY:
+ return PCI_ERS_RESULT_RECOVERED;
+ default:
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
}
static void nvme_error_resume(struct pci_dev *pdev)
--
2.14.3
From: Stephen Hemminger <stephen(a)networkplumber.org>
In 4.9 kernel, the sysfs files for Hyper-V VMBus changed name but
the documentation files were not updated. The current sysfs file
names are /sys/bus/vmbus/devices/<UUID>/...
See commit 9a56e5d6a0ba ("Drivers: hv: make VMBus bus ids persistent")
and commit f6b2db084b65 ("vmbus: make sysfs names consistent with PCI")
Reported-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com>
---
Documentation/ABI/stable/sysfs-bus-vmbus | 40 ++++++++++++++++----------------
1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
index 0c9d9dcd2151..3eaffbb2d468 100644
--- a/Documentation/ABI/stable/sysfs-bus-vmbus
+++ b/Documentation/ABI/stable/sysfs-bus-vmbus
@@ -1,25 +1,25 @@
-What: /sys/bus/vmbus/devices/vmbus_*/id
+What: /sys/bus/vmbus/devices/<UUID>/id
Date: Jul 2009
KernelVersion: 2.6.31
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The VMBus child_relid of the device's primary channel
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/class_id
+What: /sys/bus/vmbus/devices/<UUID>/class_id
Date: Jul 2009
KernelVersion: 2.6.31
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The VMBus interface type GUID of the device
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/device_id
+What: /sys/bus/vmbus/devices/<UUID>/device_id
Date: Jul 2009
KernelVersion: 2.6.31
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The VMBus interface instance GUID of the device
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/channel_vp_mapping
+What: /sys/bus/vmbus/devices/<UUID>/channel_vp_mapping
Date: Jul 2015
KernelVersion: 4.2.0
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
@@ -28,112 +28,112 @@ Description: The mapping of which primary/sub channels are bound to which
Format: <channel's child_relid:the bound cpu's number>
Users: tools/hv/lsvmbus
-What: /sys/bus/vmbus/devices/vmbus_*/device
+What: /sys/bus/vmbus/devices/<UUID>/device
Date: Dec. 2015
KernelVersion: 4.5
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The 16 bit device ID of the device
Users: tools/hv/lsvmbus and user level RDMA libraries
-What: /sys/bus/vmbus/devices/vmbus_*/vendor
+What: /sys/bus/vmbus/devices/<UUID>/vendor
Date: Dec. 2015
KernelVersion: 4.5
Contact: K. Y. Srinivasan <kys(a)microsoft.com>
Description: The 16 bit vendor ID of the device
Users: tools/hv/lsvmbus and user level RDMA libraries
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Directory for per-channel information
NN is the VMBUS relid associtated with the channel.
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/cpu
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: VCPU (sub)channel is affinitized to
Users: tools/hv/lsvmbus and other debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/cpu
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: VCPU (sub)channel is affinitized to
Users: tools/hv/lsvmbus and other debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/in_mask
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/in_mask
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Host to guest channel interrupt mask
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/latency
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/latency
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Channel signaling latency
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/out_mask
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/out_mask
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Guest to host channel interrupt mask
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/pending
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/pending
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Channel interrupt pending state
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/read_avail
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/read_avail
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Bytes available to read
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/write_avail
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/write_avail
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Bytes available to write
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/events
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/events
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Number of times we have signaled the host
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/interrupts
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/interrupts
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Number of times we have taken an interrupt (incoming)
Users: Debugging tools
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/subchannel_id
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/subchannel_id
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Subchannel ID associated with VMBUS channel
Users: Debugging tools and userspace drivers
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/monitor_id
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/monitor_id
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
Description: Monitor bit associated with channel
Users: Debugging tools and userspace drivers
-What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/ring
+What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/ring
Date: January. 2018
KernelVersion: 4.16
Contact: Stephen Hemminger <sthemmin(a)microsoft.com>
--
2.15.1
On Linux kernel 4.16.8 (using Arch Linux) if I eject a USB 3.0 device
from my system (unplug or "udisksctl power-off --block-device /dev/sde),
it freezes instantly with a null pointer error in XHCI. I've been unable
to successfully capture a kernel log of the error (and my cameras did
not return usable results) and the screen begins scrolling
near-immediately with hung processors in a VT. I have done a git
bisection (between 4.16.7 and 4.16.7) narrowing it down to the following
commit:
commit f5331826b0b7a5f2db56a9020ddbb8ce16acdfc0
Author: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Thu May 3 17:30:07 2018 +0300
xhci: Fix use-after-free in xhci_free_virt_device
commit 44a182b9d17765514fa2b1cc911e4e65134eef93 upstream.
KASAN found a use-after-free in xhci_free_virt_device+0x33b/0x38e
where xhci_free_virt_device() sets slot id to 0 if udev exists:
if (dev->udev && dev->udev->slot_id)
dev->udev->slot_id = 0;
dev->udev will be true even if udev is freed because dev->udev is
not set to NULL.
set dev->udev pointer to NULL in xhci_free_dev()
The original patch went to stable so this fix needs to be applied
there as well.
Fixes: a400efe455f7 ("xhci: zero usb device slot_id member when
disabling and freeing a xhci slot")
Cc: <stable(a)vger.kernel.org>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
:040000 040000 356ecdc13bf252535bd51b8558a8173dae546f19
d28f201228652e87e51ba48f5a0ad4539cca5c29 M drivers
Can I be CC'd on future responses to this? I am not subscribed to the list.
Jacob Saunders
From: David Rientjes <rientjes(a)google.com>
Subject: mm, oom: fix concurrent munlock and oom reaper unmap, v3
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd
is zapped by the oom reaper during follow_page_mask() after the check for
pmd_none() is bypassed, this ends up deferencing a NULL ptl or a kernel
oops.
Fix this by manually freeing all possible memory from the mm before doing
the munlock and then setting MMF_OOM_SKIP. The oom reaper can not run on
the mm anymore so the munlock is safe to do in exit_mmap(). It also
matches the logic that the oom reaper currently uses for determining when
to set MMF_OOM_SKIP itself, so there's no new risk of excessive oom
killing.
This issue fixes CVE-2018-1000200.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.cor…
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Suggested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/oom.h | 2 +
mm/mmap.c | 44 +++++++++++++---------
mm/oom_kill.c | 81 ++++++++++++++++++++++--------------------
3 files changed, 71 insertions(+), 56 deletions(-)
diff -puN include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap include/linux/oom.h
--- a/include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/include/linux/oom.h
@@ -95,6 +95,8 @@ static inline int check_stable_address_s
return 0;
}
+void __oom_reap_task_mm(struct mm_struct *mm);
+
extern unsigned long oom_badness(struct task_struct *p,
struct mem_cgroup *memcg, const nodemask_t *nodemask,
unsigned long totalpages);
diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c
--- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/mmap.c
@@ -3024,6 +3024,32 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Manually reap the mm to free as much memory as possible.
+ * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
+ * this mm from further consideration. Taking mm->mmap_sem for
+ * write after setting MMF_OOM_SKIP will guarantee that the oom
+ * reaper will not run on this mm again after mmap_sem is
+ * dropped.
+ *
+ * Nothing can be holding mm->mmap_sem here and the above call
+ * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
+ * __oom_reap_task_mm() will not block.
+ *
+ * This needs to be done before calling munlock_vma_pages_all(),
+ * which clears VM_LOCKED, otherwise the oom reaper cannot
+ * reliably test it.
+ */
+ mutex_lock(&oom_lock);
+ __oom_reap_task_mm(mm);
+ mutex_unlock(&oom_lock);
+
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3045,24 +3071,6 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/oom_kill.c
@@ -469,7 +469,6 @@ bool process_shares_mm(struct task_struc
return false;
}
-
#ifdef CONFIG_MMU
/*
* OOM Reaper kernel thread which tries to reap the memory used by the OOM
@@ -480,16 +479,54 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+void __oom_reap_task_mm(struct mm_struct *mm)
{
- struct mmu_gather tlb;
struct vm_area_struct *vma;
+
+ /*
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over a reaped memory.
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+
+ for (vma = mm->mmap ; vma; vma = vma->vm_next) {
+ if (!can_madv_dontneed_vma(vma))
+ continue;
+
+ /*
+ * Only anonymous pages have a good chance to be dropped
+ * without additional steps which we cannot afford as we
+ * are OOM already.
+ *
+ * We do not even care about fs backed pages because all
+ * which are reclaimable have already been reclaimed and
+ * we do not want to block exit_mmap by keeping mm ref
+ * count elevated without a good reason.
+ */
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ const unsigned long start = vma->vm_start;
+ const unsigned long end = vma->vm_end;
+ struct mmu_gather tlb;
+
+ tlb_gather_mmu(&tlb, mm, start, end);
+ mmu_notifier_invalidate_range_start(mm, start, end);
+ unmap_page_range(&tlb, vma, start, end, NULL);
+ mmu_notifier_invalidate_range_end(mm, start, end);
+ tlb_finish_mmu(&tlb, start, end);
+ }
+ }
+}
+
+static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+{
bool ret = true;
/*
* We have to make sure to not race with the victim exit path
* and cause premature new oom victim selection:
- * __oom_reap_task_mm exit_mm
+ * oom_reap_task_mm exit_mm
* mmget_not_zero
* mmput
* atomic_dec_and_test
@@ -534,39 +571,8 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
- for (vma = mm->mmap ; vma; vma = vma->vm_next) {
- if (!can_madv_dontneed_vma(vma))
- continue;
+ __oom_reap_task_mm(mm);
- /*
- * Only anonymous pages have a good chance to be dropped
- * without additional steps which we cannot afford as we
- * are OOM already.
- *
- * We do not even care about fs backed pages because all
- * which are reclaimable have already been reclaimed and
- * we do not want to block exit_mmap by keeping mm ref
- * count elevated without a good reason.
- */
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
- const unsigned long start = vma->vm_start;
- const unsigned long end = vma->vm_end;
-
- tlb_gather_mmu(&tlb, mm, start, end);
- mmu_notifier_invalidate_range_start(mm, start, end);
- unmap_page_range(&tlb, vma, start, end, NULL);
- mmu_notifier_invalidate_range_end(mm, start, end);
- tlb_finish_mmu(&tlb, start, end);
- }
- }
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
@@ -587,14 +593,13 @@ static void oom_reap_task(struct task_st
struct mm_struct *mm = tsk->signal->oom_mm;
/* Retry the down_read_trylock(mmap_sem) a few times */
- while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+ while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm))
schedule_timeout_idle(HZ/10);
if (attempts <= MAX_OOM_REAP_RETRIES ||
test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
-
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
task_pid_nr(tsk), tsk->comm);
debug_show_all_locks();
_
From: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Subject: mm: sections are not offlined during memory hotremove
Memory hotplug and hotremove operate with per-block granularity. If the
machine has a large amount of memory (more than 64G), the size of a memory
block can span multiple sections. By mistake, during hotremove we set
only the first section to offline state.
The bug was discovered because kernel selftest started to fail:
https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop
After commit, "mm/memory_hotplug: optimize probe routine". But, the bug
is older than this commit. In this optimization we also added a check for
sections to be in a proper state during hotplug operation.
Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Steven Sistare <steven.sistare(a)oracle.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove mm/sparse.c
--- a/mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove
+++ a/mm/sparse.c
@@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
- unsigned long section_nr = pfn_to_section_nr(start_pfn);
+ unsigned long section_nr = pfn_to_section_nr(pfn);
struct mem_section *ms;
/*
_
Quoting Lionel Landwerlin (2018-05-11 18:41:28)
> On 11/05/18 16:51, Chris Wilson wrote:
>
> But I can't even startup a gdm on that machine with drm-tip. So maybe
> there is some much more broken...
>
> Don't leave us in suspense...
>
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890614
>
>
> Not our bug :)
You would not believe how relieved I am that someone else has bugs in
their code.
-Chris
Linus,
Working on some new updates to trace filtering, I noticed that the
regex_match_front() test was updated to be limited to the size
of the pattern instead of the full test string. But as the test string
is not guaranteed to be nul terminated, it still needs to consider
the size of the test string.
Please pull the latest trace-v4.17-rc4 tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-v4.17-rc4
Tag SHA1: cb82e07247fa5f1a91940a2928fb260256727a14
Head SHA1: dc432c3d7f9bceb3de6f5b44fb9c657c9810ed6d
Steven Rostedt (VMware) (1):
tracing: Fix regex_match_front() to not over compare the test string
----
kernel/trace/trace_events_filter.c | 3 +++
1 file changed, 3 insertions(+)
---------------------------
commit dc432c3d7f9bceb3de6f5b44fb9c657c9810ed6d
Author: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Date: Wed May 9 11:59:32 2018 -0400
tracing: Fix regex_match_front() to not over compare the test string
The regex match function regex_match_front() in the tracing filter logic,
was fixed to test just the pattern length from testing the entire test
string. That is, it went from strncmp(str, r->pattern, len) to
strcmp(str, r->pattern, r->len).
The issue is that str is not guaranteed to be nul terminated, and if r->len
is greater than the length of str, it can access more memory than is
allocated.
The solution is to add a simple test if (len < r->len) return 0.
Cc: stable(a)vger.kernel.org
Fixes: 285caad415f45 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 1f951b3df60c..7d306b74230f 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -762,6 +762,9 @@ static int regex_match_full(char *str, struct regex *r, int len)
static int regex_match_front(char *str, struct regex *r, int len)
{
+ if (len < r->len)
+ return 0;
+
if (strncmp(str, r->pattern, r->len) == 0)
return 1;
return 0;
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e538409257d0217a9bc715686100a5328db75a15 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Date: Wed, 4 Apr 2018 22:38:49 +0200
Subject: [PATCH] test_firmware: fix setting old custom fw path back on exit,
second try
Commit 65c79230576 tried to clear the custom firmware path on exit by
writing a single space to the firmware_class.path parameter. This
doesn't work because nothing strips this space from the value stored
and fw_get_filesystem_firmware() only ignores zero-length paths.
Instead, write a null byte.
Fixes: 0a8adf58475 ("test: add firmware_class loader test")
Fixes: 65c79230576 ("test_firmware: fix setting old custom fw path back on exit")
Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Acked-by: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/tools/testing/selftests/firmware/fw_lib.sh b/tools/testing/selftests/firmware/fw_lib.sh
index 9ea31b57d71a..962d7f4ac627 100755
--- a/tools/testing/selftests/firmware/fw_lib.sh
+++ b/tools/testing/selftests/firmware/fw_lib.sh
@@ -154,11 +154,13 @@ test_finish()
if [ "$HAS_FW_LOADER_USER_HELPER" = "yes" ]; then
echo "$OLD_TIMEOUT" >/sys/class/firmware/timeout
fi
- if [ "$OLD_FWPATH" = "" ]; then
- OLD_FWPATH=" "
- fi
if [ "$TEST_REQS_FW_SET_CUSTOM_PATH" = "yes" ]; then
- echo -n "$OLD_FWPATH" >/sys/module/firmware_class/parameters/path
+ if [ "$OLD_FWPATH" = "" ]; then
+ # A zero-length write won't work; write a null byte
+ printf '\000' >/sys/module/firmware_class/parameters/path
+ else
+ echo -n "$OLD_FWPATH" >/sys/module/firmware_class/parameters/path
+ fi
fi
if [ -f $FW ]; then
rm -f "$FW"
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.
Cc: Stable <stable(a)vger.kernel.org> # 4.12+
Reported-by: Jan Glauber <jan.glauber(a)caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
---
virt/kvm/arm/vgic/vgic-its.c | 4 ++--
virt/kvm/arm/vgic/vgic-v3.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 7cb060e01a76..4ed79c939fb4 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1897,7 +1897,7 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, int esz,
int next_offset;
size_t byte_offset;
- ret = kvm_read_guest(kvm, gpa, entry, esz);
+ ret = kvm_read_guest_lock(kvm, gpa, entry, esz);
if (ret)
return ret;
@@ -2267,7 +2267,7 @@ static int vgic_its_restore_cte(struct vgic_its *its, gpa_t gpa, int esz)
int ret;
BUG_ON(esz > sizeof(val));
- ret = kvm_read_guest(kvm, gpa, &val, esz);
+ ret = kvm_read_guest_lock(kvm, gpa, &val, esz);
if (ret)
return ret;
val = le64_to_cpu(val);
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index c7423f3768e5..bdcf8e7a6161 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -344,7 +344,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
bit_nr = irq->intid % BITS_PER_BYTE;
ptr = pendbase + byte_offset;
- ret = kvm_read_guest(kvm, ptr, &val, 1);
+ ret = kvm_read_guest_lock(kvm, ptr, &val, 1);
if (ret)
return ret;
@@ -397,7 +397,7 @@ int vgic_v3_save_pending_tables(struct kvm *kvm)
ptr = pendbase + byte_offset;
if (byte_offset != last_byte_offset) {
- ret = kvm_read_guest(kvm, ptr, &val, 1);
+ ret = kvm_read_guest_lock(kvm, ptr, &val, 1);
if (ret)
return ret;
last_byte_offset = byte_offset;
--
2.14.1
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Provide a wrapper which does that and use that everywhere.
Note that ending the SRCU critical section before returning from the
kvm_read_guest() wrapper is safe, because the data has been *copied*, so
we don't need to rely on valid references to the memslot anymore.
Cc: Stable <stable(a)vger.kernel.org> # 4.8+
Reported-by: Jan Glauber <jan.glauber(a)caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
---
arch/arm/include/asm/kvm_mmu.h | 16 ++++++++++++++++
arch/arm64/include/asm/kvm_mmu.h | 16 ++++++++++++++++
virt/kvm/arm/vgic/vgic-its.c | 15 ++++++++-------
3 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 707a1f06dc5d..f675162663f0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -309,6 +309,22 @@ static inline unsigned int kvm_get_vmid_bits(void)
return 8;
}
+/*
+ * We are not in the kvm->srcu critical section most of the time, so we take
+ * the SRCU read lock here. Since we copy the data from the user page, we
+ * can immediately drop the lock again.
+ */
+static inline int kvm_read_guest_lock(struct kvm *kvm,
+ gpa_t gpa, void *data, unsigned long len)
+{
+ int srcu_idx = srcu_read_lock(&kvm->srcu);
+ int ret = kvm_read_guest(kvm, gpa, data, len);
+
+ srcu_read_unlock(&kvm->srcu, srcu_idx);
+
+ return ret;
+}
+
static inline void *kvm_get_hyp_vector(void)
{
return kvm_ksym_ref(__kvm_hyp_vector);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 082110993647..6128992c2ded 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -360,6 +360,22 @@ static inline unsigned int kvm_get_vmid_bits(void)
return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
}
+/*
+ * We are not in the kvm->srcu critical section most of the time, so we take
+ * the SRCU read lock here. Since we copy the data from the user page, we
+ * can immediately drop the lock again.
+ */
+static inline int kvm_read_guest_lock(struct kvm *kvm,
+ gpa_t gpa, void *data, unsigned long len)
+{
+ int srcu_idx = srcu_read_lock(&kvm->srcu);
+ int ret = kvm_read_guest(kvm, gpa, data, len);
+
+ srcu_read_unlock(&kvm->srcu, srcu_idx);
+
+ return ret;
+}
+
#ifdef CONFIG_KVM_INDIRECT_VECTORS
/*
* EL2 vectors can be mapped and rerouted in a number of ways,
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 51a80b600632..7cb060e01a76 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -281,8 +281,8 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
int ret;
unsigned long flags;
- ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
- &prop, 1);
+ ret = kvm_read_guest_lock(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
+ &prop, 1);
if (ret)
return ret;
@@ -444,8 +444,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
* this very same byte in the last iteration. Reuse that.
*/
if (byte_offset != last_byte_offset) {
- ret = kvm_read_guest(vcpu->kvm, pendbase + byte_offset,
- &pendmask, 1);
+ ret = kvm_read_guest_lock(vcpu->kvm,
+ pendbase + byte_offset,
+ &pendmask, 1);
if (ret) {
kfree(intids);
return ret;
@@ -789,7 +790,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id,
return false;
/* Each 1st level entry is represented by a 64-bit value. */
- if (kvm_read_guest(its->dev->kvm,
+ if (kvm_read_guest_lock(its->dev->kvm,
BASER_ADDRESS(baser) + index * sizeof(indirect_ptr),
&indirect_ptr, sizeof(indirect_ptr)))
return false;
@@ -1370,8 +1371,8 @@ static void vgic_its_process_commands(struct kvm *kvm, struct vgic_its *its)
cbaser = CBASER_ADDRESS(its->cbaser);
while (its->cwriter != its->creadr) {
- int ret = kvm_read_guest(kvm, cbaser + its->creadr,
- cmd_buf, ITS_CMD_SIZE);
+ int ret = kvm_read_guest_lock(kvm, cbaser + its->creadr,
+ cmd_buf, ITS_CMD_SIZE);
/*
* If kvm_read_guest() fails, this could be due to the guest
* programming a bogus value in CBASER or something else going
--
2.14.1
Apparently the development of update_affinity() overlapped with the
promotion of irq_lock to be _irqsave, so the patch didn't convert this
lock over. This will make lockdep complain.
Fix this by disabling IRQs around the lock.
Cc: stable(a)vger.kernel.org
Fixes: 08c9fd042117 ("KVM: arm/arm64: vITS: Add a helper to update the affinity of an LPI")
Reported-by: Jan Glauber <jan.glauber(a)caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
---
virt/kvm/arm/vgic/vgic-its.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 41abf92f2699..51a80b600632 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -350,10 +350,11 @@ static int vgic_copy_lpi_list(struct kvm_vcpu *vcpu, u32 **intid_ptr)
static int update_affinity(struct vgic_irq *irq, struct kvm_vcpu *vcpu)
{
int ret = 0;
+ unsigned long flags;
- spin_lock(&irq->irq_lock);
+ spin_lock_irqsave(&irq->irq_lock, flags);
irq->target_vcpu = vcpu;
- spin_unlock(&irq->irq_lock);
+ spin_unlock_irqrestore(&irq->irq_lock, flags);
if (irq->hw) {
struct its_vlpi_map map;
--
2.14.1
As Jan reported [1], lockdep complains about the VGIC not being bullet
proof. This seems to be due to two issues:
- When commit 006df0f34930 ("KVM: arm/arm64: Support calling
vgic_update_irq_pending from irq context") promoted irq_lock and
ap_list_lock to _irqsave, we forgot two instances of irq_lock.
lockdeps seems to pick those up.
- If a lock is _irqsave, any other locks we take inside them should be
_irqsafe as well. So the lpi_list_lock needs to be promoted also.
This fixes both issues by simply making the remaining instances of those
locks _irqsave.
One irq_lock is addressed in a separate patch, to simplify backporting.
Cc: stable(a)vger.kernel.org
Fixes: 006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context")
Reported-by: Jan Glauber <jan.glauber(a)caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/575718.html
---
virt/kvm/arm/vgic/vgic-debug.c | 5 +++--
virt/kvm/arm/vgic/vgic-its.c | 10 ++++++----
virt/kvm/arm/vgic/vgic.c | 22 ++++++++++++++--------
3 files changed, 23 insertions(+), 14 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-debug.c b/virt/kvm/arm/vgic/vgic-debug.c
index 10b38178cff2..4ffc0b5e6105 100644
--- a/virt/kvm/arm/vgic/vgic-debug.c
+++ b/virt/kvm/arm/vgic/vgic-debug.c
@@ -211,6 +211,7 @@ static int vgic_debug_show(struct seq_file *s, void *v)
struct vgic_state_iter *iter = (struct vgic_state_iter *)v;
struct vgic_irq *irq;
struct kvm_vcpu *vcpu = NULL;
+ unsigned long flags;
if (iter->dist_id == 0) {
print_dist_state(s, &kvm->arch.vgic);
@@ -227,9 +228,9 @@ static int vgic_debug_show(struct seq_file *s, void *v)
irq = &kvm->arch.vgic.spis[iter->intid - VGIC_NR_PRIVATE_IRQS];
}
- spin_lock(&irq->irq_lock);
+ spin_lock_irqsave(&irq->irq_lock, flags);
print_irq_state(s, irq, vcpu);
- spin_unlock(&irq->irq_lock);
+ spin_unlock_irqrestore(&irq->irq_lock, flags);
return 0;
}
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index a8f07243aa9f..41abf92f2699 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -52,6 +52,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq = vgic_get_irq(kvm, NULL, intid), *oldirq;
+ unsigned long flags;
int ret;
/* In this case there is no put, since we keep the reference. */
@@ -71,7 +72,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
irq->intid = intid;
irq->target_vcpu = vcpu;
- spin_lock(&dist->lpi_list_lock);
+ spin_lock_irqsave(&dist->lpi_list_lock, flags);
/*
* There could be a race with another vgic_add_lpi(), so we need to
@@ -99,7 +100,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
dist->lpi_list_count++;
out_unlock:
- spin_unlock(&dist->lpi_list_lock);
+ spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
/*
* We "cache" the configuration table entries in our struct vgic_irq's.
@@ -315,6 +316,7 @@ static int vgic_copy_lpi_list(struct kvm_vcpu *vcpu, u32 **intid_ptr)
{
struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
struct vgic_irq *irq;
+ unsigned long flags;
u32 *intids;
int irq_count, i = 0;
@@ -330,7 +332,7 @@ static int vgic_copy_lpi_list(struct kvm_vcpu *vcpu, u32 **intid_ptr)
if (!intids)
return -ENOMEM;
- spin_lock(&dist->lpi_list_lock);
+ spin_lock_irqsave(&dist->lpi_list_lock, flags);
list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) {
if (i == irq_count)
break;
@@ -339,7 +341,7 @@ static int vgic_copy_lpi_list(struct kvm_vcpu *vcpu, u32 **intid_ptr)
continue;
intids[i++] = irq->intid;
}
- spin_unlock(&dist->lpi_list_lock);
+ spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
*intid_ptr = intids;
return i;
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 27219313a406..1dfb5b2f1b12 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -43,9 +43,13 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* kvm->lock (mutex)
* its->cmd_lock (mutex)
* its->its_lock (mutex)
- * vgic_cpu->ap_list_lock
- * kvm->lpi_list_lock
- * vgic_irq->irq_lock
+ * vgic_cpu->ap_list_lock must be taken with IRQs disabled
+ * kvm->lpi_list_lock must be taken with IRQs disabled
+ * vgic_irq->irq_lock must be taken with IRQs disabled
+ *
+ * As the ap_list_lock might be taken from the timer interrupt handler,
+ * we have to disable IRQs before taking this lock and everything lower
+ * than it.
*
* If you need to take multiple locks, always take the upper lock first,
* then the lower ones, e.g. first take the its_lock, then the irq_lock.
@@ -72,8 +76,9 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid)
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq = NULL;
+ unsigned long flags;
- spin_lock(&dist->lpi_list_lock);
+ spin_lock_irqsave(&dist->lpi_list_lock, flags);
list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) {
if (irq->intid != intid)
@@ -89,7 +94,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid)
irq = NULL;
out_unlock:
- spin_unlock(&dist->lpi_list_lock);
+ spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
return irq;
}
@@ -134,19 +139,20 @@ static void vgic_irq_release(struct kref *ref)
void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
+ unsigned long flags;
if (irq->intid < VGIC_MIN_LPI)
return;
- spin_lock(&dist->lpi_list_lock);
+ spin_lock_irqsave(&dist->lpi_list_lock, flags);
if (!kref_put(&irq->refcount, vgic_irq_release)) {
- spin_unlock(&dist->lpi_list_lock);
+ spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
return;
};
list_del(&irq->lpi_list);
dist->lpi_list_count--;
- spin_unlock(&dist->lpi_list_lock);
+ spin_unlock_irqrestore(&dist->lpi_list_lock, flags);
kfree(irq);
}
--
2.14.1
commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream
Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer
from an erratum 1024718, which causes incorrect updates when DBM/AP
bits in a page table entry is modified without a break-before-make
sequence. The work around is to disable the hardware DBM feature
on the affected cores. The hardware Access Flag management features
is not affected.
The hardware DBM feature is a non-conflicting capability, i.e, the
kernel could handle cores using the feature and those without having
the features running at the same time. So this work around is detected
at early boot time, rather than delaying it until the CPUs are brought
up into the kernel with MMU turned on. This also avoids other complexities
with late CPUs turning online, with or without the hardware DBM features.
Cc: stable(a)vger.kernel.org # v4.14
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
Note: The upstream commit is on top of a reworked capability
infrastructure for arm64 heterogeneous systems, which allows
delaying the CPU model checks. This backport is based on the
original version of the patch [0], which checks the affected
CPU models during the early boot.
[0] https://lkml.kernel.org/r/20180116102323.3470-1-suzuki.poulose@arm.com
---
Documentation/arm64/silicon-errata.txt | 1 +
arch/arm64/Kconfig | 14 ++++++++++++
arch/arm64/include/asm/assembler.h | 40 ++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/cputype.h | 2 ++
arch/arm64/mm/proc.S | 5 +++++
5 files changed, 62 insertions(+)
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index f3d0d31..e4fe6ad 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -55,6 +55,7 @@ stable kernels.
| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 |
| ARM | Cortex-A72 | #853709 | N/A |
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
+| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
| ARM | MMU-500 | #841119,#826419 | N/A |
| | | | |
| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c2abb4e..2d5f7ac 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -443,6 +443,20 @@ config ARM64_ERRATUM_843419
If unsure, say Y.
+config ARM64_ERRATUM_1024718
+ bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update"
+ default y
+ help
+ This option adds work around for Arm Cortex-A55 Erratum 1024718.
+
+ Affected Cortex-A55 cores (r0p0, r0p1, r1p0) could cause incorrect
+ update of the hardware dirty bit when the DBM/AP bits are updated
+ without a break-before-make. The work around is to disable the usage
+ of hardware DBM locally on the affected cores. CPUs not affected by
+ erratum will continue to use the feature.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 463619d..25b2a41 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -25,6 +25,7 @@
#include <asm/asm-offsets.h>
#include <asm/cpufeature.h>
+#include <asm/cputype.h>
#include <asm/page.h>
#include <asm/pgtable-hwdef.h>
#include <asm/ptrace.h>
@@ -495,4 +496,43 @@ alternative_endif
and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
.endm
+/*
+ * Check the MIDR_EL1 of the current CPU for a given model and a range of
+ * variant/revision. See asm/cputype.h for the macros used below.
+ *
+ * model: MIDR_CPU_MODEL of CPU
+ * rv_min: Minimum of MIDR_CPU_VAR_REV()
+ * rv_max: Maximum of MIDR_CPU_VAR_REV()
+ * res: Result register.
+ * tmp1, tmp2, tmp3: Temporary registers
+ *
+ * Corrupts: res, tmp1, tmp2, tmp3
+ * Returns: 0, if the CPU id doesn't match. Non-zero otherwise
+ */
+ .macro cpu_midr_match model, rv_min, rv_max, res, tmp1, tmp2, tmp3
+ mrs \res, midr_el1
+ mov_q \tmp1, (MIDR_REVISION_MASK | MIDR_VARIANT_MASK)
+ mov_q \tmp2, MIDR_CPU_MODEL_MASK
+ and \tmp3, \res, \tmp2 // Extract model
+ and \tmp1, \res, \tmp1 // rev & variant
+ mov_q \tmp2, \model
+ cmp \tmp3, \tmp2
+ cset \res, eq
+ cbz \res, .Ldone\@ // Model matches ?
+
+ .if (\rv_min != 0) // Skip min check if rv_min == 0
+ mov_q \tmp3, \rv_min
+ cmp \tmp1, \tmp3
+ cset \res, ge
+ .endif // \rv_min != 0
+ /* Skip rv_max check if rv_min == rv_max && rv_min != 0 */
+ .if ((\rv_min != \rv_max) || \rv_min == 0)
+ mov_q \tmp2, \rv_max
+ cmp \tmp1, \tmp2
+ cset \tmp2, le
+ and \res, \res, \tmp2
+ .endif
+.Ldone\@:
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index be7bd19..30da091 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -78,6 +78,7 @@
#define ARM_CPU_PART_AEM_V8 0xD0F
#define ARM_CPU_PART_FOUNDATION 0xD00
+#define ARM_CPU_PART_CORTEX_A55 0xD05
#define ARM_CPU_PART_CORTEX_A57 0xD07
#define ARM_CPU_PART_CORTEX_A72 0xD08
#define ARM_CPU_PART_CORTEX_A53 0xD03
@@ -98,6 +99,7 @@
#define QCOM_CPU_PART_KRYO 0x200
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
+#define MIDR_CORTEX_A55 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55)
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
#define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73)
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 139320a..e338165 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -438,6 +438,11 @@ ENTRY(__cpu_setup)
cbz x9, 2f
cmp x9, #2
b.lt 1f
+#ifdef CONFIG_ARM64_ERRATUM_1024718
+ /* Disable hardware DBM on Cortex-A55 r0p0, r0p1 & r1p0 */
+ cpu_midr_match MIDR_CORTEX_A55, MIDR_CPU_VAR_REV(0, 0), MIDR_CPU_VAR_REV(1, 0), x1, x2, x3, x4
+ cbnz x1, 1f
+#endif
orr x10, x10, #TCR_HD // hardware Dirty flag update
1: orr x10, x10, #TCR_HA // hardware Access flag update
2:
--
2.7.4
This is a note to let you know that I've just added the patch titled
iio: sca3000: Fix an error handling path in 'sca3000_probe()'
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 4a5b45383ca371e123ba103d34d4b3b87616245c Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Date: Sun, 8 Apr 2018 21:44:01 +0200
Subject: iio: sca3000: Fix an error handling path in 'sca3000_probe()'
Use 'devm_iio_kfifo_allocate()' instead of 'iio_kfifo_allocate()' in order
to simplify code and avoid a memory leak in an error path in
'sca3000_probe()'. A call to 'sca3000_unconfigure_ring()' was missing.
Sent via the next merge window as unimportant bug and there are
other patches dependent on it.
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/sca3000.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/iio/accel/sca3000.c b/drivers/iio/accel/sca3000.c
index f33dadf7b262..562f125235db 100644
--- a/drivers/iio/accel/sca3000.c
+++ b/drivers/iio/accel/sca3000.c
@@ -1277,7 +1277,7 @@ static int sca3000_configure_ring(struct iio_dev *indio_dev)
{
struct iio_buffer *buffer;
- buffer = iio_kfifo_allocate();
+ buffer = devm_iio_kfifo_allocate(&indio_dev->dev);
if (!buffer)
return -ENOMEM;
@@ -1287,11 +1287,6 @@ static int sca3000_configure_ring(struct iio_dev *indio_dev)
return 0;
}
-static void sca3000_unconfigure_ring(struct iio_dev *indio_dev)
-{
- iio_kfifo_free(indio_dev->buffer);
-}
-
static inline
int __sca3000_hw_ring_state_set(struct iio_dev *indio_dev, bool state)
{
@@ -1546,8 +1541,6 @@ static int sca3000_remove(struct spi_device *spi)
if (spi->irq)
free_irq(spi->irq, indio_dev);
- sca3000_unconfigure_ring(indio_dev);
-
return 0;
}
--
2.17.0
This is a note to let you know that I've just added the patch titled
iio: adc: ad7791: remove sample freq sysfs attributes
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 7eb6b35d93c356f1afebbfb808bc296d6351e708 Mon Sep 17 00:00:00 2001
From: Alexandru Ardelean <alexandru.ardelean(a)analog.com>
Date: Mon, 12 Mar 2018 14:06:53 +0200
Subject: iio: adc: ad7791: remove sample freq sysfs attributes
In the current state, these attributes are broken, because they are
registered already, and the kernel throws a warning.
The first registration happens via the `IIO_CHAN_INFO_SAMP_FREQ` flag from
the `ad_sigma_delta` driver.
In this commit these attrs are removed, and in the following the
IIO_CHAN_INFO_SAMP_FREQ behavior will be implemented, which replaces these
hooks.
This is done to make things a bit easier to review as there is a bit of
overlap in the patch if it's done all at once.
Fixes: a13e831fcaa7 ("staging: iio: ad7192: implement IIO_CHAN_INFO_SAMP_FREQ")
Signed-off-by: Alexandru Ardelean <alexandru.ardelean(a)analog.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7791.c | 49 ----------------------------------------
1 file changed, 49 deletions(-)
diff --git a/drivers/iio/adc/ad7791.c b/drivers/iio/adc/ad7791.c
index 70fbf92f9827..03a5f7d6cb0c 100644
--- a/drivers/iio/adc/ad7791.c
+++ b/drivers/iio/adc/ad7791.c
@@ -244,58 +244,9 @@ static int ad7791_read_raw(struct iio_dev *indio_dev,
return -EINVAL;
}
-static const char * const ad7791_sample_freq_avail[] = {
- [AD7791_FILTER_RATE_120] = "120",
- [AD7791_FILTER_RATE_100] = "100",
- [AD7791_FILTER_RATE_33_3] = "33.3",
- [AD7791_FILTER_RATE_20] = "20",
- [AD7791_FILTER_RATE_16_6] = "16.6",
- [AD7791_FILTER_RATE_16_7] = "16.7",
- [AD7791_FILTER_RATE_13_3] = "13.3",
- [AD7791_FILTER_RATE_9_5] = "9.5",
-};
-
-static ssize_t ad7791_read_frequency(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct iio_dev *indio_dev = dev_to_iio_dev(dev);
- struct ad7791_state *st = iio_priv(indio_dev);
- unsigned int rate = st->filter & AD7791_FILTER_RATE_MASK;
-
- return sprintf(buf, "%s\n", ad7791_sample_freq_avail[rate]);
-}
-
-static ssize_t ad7791_write_frequency(struct device *dev,
- struct device_attribute *attr, const char *buf, size_t len)
-{
- struct iio_dev *indio_dev = dev_to_iio_dev(dev);
- struct ad7791_state *st = iio_priv(indio_dev);
- int i, ret;
-
- i = sysfs_match_string(ad7791_sample_freq_avail, buf);
- if (i < 0)
- return i;
-
- ret = iio_device_claim_direct_mode(indio_dev);
- if (ret)
- return ret;
- st->filter &= ~AD7791_FILTER_RATE_MASK;
- st->filter |= i;
- ad_sd_write_reg(&st->sd, AD7791_REG_FILTER, sizeof(st->filter),
- st->filter);
- iio_device_release_direct_mode(indio_dev);
-
- return len;
-}
-
-static IIO_DEV_ATTR_SAMP_FREQ(S_IWUSR | S_IRUGO,
- ad7791_read_frequency,
- ad7791_write_frequency);
-
static IIO_CONST_ATTR_SAMP_FREQ_AVAIL("120 100 33.3 20 16.7 16.6 13.3 9.5");
static struct attribute *ad7791_attributes[] = {
- &iio_dev_attr_sampling_frequency.dev_attr.attr,
&iio_const_attr_sampling_frequency_available.dev_attr.attr,
NULL
};
--
2.17.0