- Linux-stable-mirror - lists.linaro.org

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.4.130 release. There are 50 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Apr 29 13:56:42 UTC 2018. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.130-rc… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.4.130-rc1 Heiko Carstens <heiko.carstens(a)de.ibm.com> s390/uprobes: implement arch_uretprobe_is_alive() Sebastian Ott <sebott(a)linux.ibm.com> s390/cio: update chpid descriptor after resource accessibility event Dan Carpenter <dan.carpenter(a)oracle.com> cdrom: information leak in cdrom_ioctl_media_changed() Martin K. Petersen <martin.petersen(a)oracle.com> scsi: mptsas: Disable WRITE SAME Eric Dumazet <edumazet(a)google.com> ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy Eric Dumazet <edumazet(a)google.com> net: af_packet: fix race in PACKET_{R|T}X_RING Eric Dumazet <edumazet(a)google.com> tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets Wolfgang Bumiller <w.bumiller(a)proxmox.com> net: fix deadlock while clearing neighbor proxy table Eric Dumazet <edumazet(a)google.com> tipc: add policy for TIPC_NLA_NET_ADDR Cong Wang <xiyou.wangcong(a)gmail.com> llc: fix NULL pointer deref for SOCK_ZAPPED Cong Wang <xiyou.wangcong(a)gmail.com> llc: hold llc_sap before release_sock() Xin Long <lucien.xin(a)gmail.com> sctp: do not check port in sctp_inet6_cmp_addr Toshiaki Makita <makita.toshiaki(a)lab.ntt.co.jp> vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi Guillaume Nault <g.nault(a)alphalink.fr> pppoe: check sockaddr length in pppoe_connect() Willem de Bruijn <willemb(a)google.com> packet: fix bitfield update race Xin Long <lucien.xin(a)gmail.com> team: fix netconsole setup over team Paolo Abeni <pabeni(a)redhat.com> team: avoid adding twice the same option to the event list Jann Horn <jannh(a)google.com> tcp: don't read out-of-bounds opsize Cong Wang <xiyou.wangcong(a)gmail.com> llc: delete timers synchronously in llc_sk_free() Eric Dumazet <edumazet(a)google.com> net: validate attribute sizes in neigh_dump_table() Guillaume Nault <g.nault(a)alphalink.fr> l2tp: check sockaddr length in pppol2tp_connect() Eric Biggers <ebiggers(a)google.com> KEYS: DNS: limit the length of option strings Xin Long <lucien.xin(a)gmail.com> bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: correct module section names for expoline code revert Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: correct nospec auto detection init order Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: add sysfs attributes for spectre Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: report spectre mitigation via syslog Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: add automatic detection of the spectre defense Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: move nobp parameter functions to nospec-branch.c Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390/entry.S: fix spurious zeroing of r0 Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: do not bypass BPENTER for interrupt system calls Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*) Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: introduce execute-trampolines for branches Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: run user space and KVM guests with modified branch prediction Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: add options to change branch prediction behaviour for the kernel Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390/alternative: use a copy of the facility bit mask Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: add optimized array_index_mask_nospec Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: scrub registers on kernel entry and KVM exit Martin Schwidefsky <schwidefsky(a)de.ibm.com> KVM: s390: wire up bpb feature Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: enable CPU alternatives unconditionally Martin Schwidefsky <schwidefsky(a)de.ibm.com> s390: introduce CPU alternatives Karthikeyan Periyasamy <periyasa(a)codeaurora.org> Revert "ath10k: send (re)assoc peer command when NSS changed" Sahitya Tummala <stummala(a)codeaurora.org> jbd2: fix use after free in kjournald2() Felix Fietkau <nbd(a)nbd.name> ath9k_hw: check if the chip failed to wake up Dmitry Torokhov <dmitry.torokhov(a)gmail.com> Input: drv260x - fix initializing overdrive voltage Grant Grundler <grundler(a)chromium.org> r8152: add Linksys USB3GIGV1 id Chen Feng <puck.chen(a)hisilicon.com> staging: ion : Donnot wakeup kswapd in ion system alloc Jiri Olsa <jolsa(a)kernel.org> perf: Return proper values for user stack errors Xiaoming Gao <gxm.linux.kernel(a)gmail.com> x86/tsc: Prevent 32bit truncation in calc_hpet_ref() Steve French <smfrench(a)gmail.com> cifs: do not allow creating sockets except with SMB1 posix exensions ------------- Diffstat: Documentation/kernel-parameters.txt | 3 + Makefile | 4 +- arch/s390/Kconfig | 47 +++++ arch/s390/Makefile | 10 ++ arch/s390/include/asm/alternative.h | 149 +++++++++++++++ arch/s390/include/asm/barrier.h | 24 +++ arch/s390/include/asm/facility.h | 18 ++ arch/s390/include/asm/kvm_host.h | 3 +- arch/s390/include/asm/lowcore.h | 7 +- arch/s390/include/asm/nospec-branch.h | 17 ++ arch/s390/include/asm/processor.h | 4 + arch/s390/include/asm/thread_info.h | 4 + arch/s390/include/uapi/asm/kvm.h | 3 + arch/s390/kernel/Makefile | 5 +- arch/s390/kernel/alternative.c | 112 ++++++++++++ arch/s390/kernel/early.c | 5 + arch/s390/kernel/entry.S | 250 +++++++++++++++++++++++--- arch/s390/kernel/ipl.c | 1 + arch/s390/kernel/module.c | 65 ++++++- arch/s390/kernel/nospec-branch.c | 169 +++++++++++++++++ arch/s390/kernel/processor.c | 18 ++ arch/s390/kernel/setup.c | 14 +- arch/s390/kernel/smp.c | 7 +- arch/s390/kernel/uprobes.c | 9 + arch/s390/kernel/vmlinux.lds.S | 37 ++++ arch/s390/kvm/kvm-s390.c | 13 +- arch/x86/kernel/tsc.c | 2 +- drivers/cdrom/cdrom.c | 2 +- drivers/input/misc/drv260x.c | 2 +- drivers/message/fusion/mptsas.c | 1 + drivers/net/bonding/bond_main.c | 3 +- drivers/net/ppp/pppoe.c | 4 + drivers/net/team/team.c | 38 +++- drivers/net/usb/cdc_ether.c | 10 ++ drivers/net/usb/r8152.c | 2 + drivers/net/wireless/ath/ath10k/mac.c | 5 +- drivers/net/wireless/ath/ath9k/hw.c | 4 + drivers/s390/char/Makefile | 2 + drivers/s390/cio/chsc.c | 14 +- drivers/staging/android/ion/ion_system_heap.c | 2 +- fs/cifs/dir.c | 9 +- fs/jbd2/journal.c | 2 +- include/linux/if_vlan.h | 7 +- include/net/llc_conn.h | 1 + include/uapi/linux/kvm.h | 1 + kernel/events/core.c | 4 +- net/core/dev.c | 2 +- net/core/neighbour.c | 40 +++-- net/dns_resolver/dns_key.c | 13 +- net/ipv4/tcp.c | 6 +- net/ipv4/tcp_input.c | 7 +- net/ipv6/route.c | 2 + net/l2tp/l2tp_ppp.c | 7 + net/llc/af_llc.c | 14 +- net/llc/llc_c_ac.c | 9 +- net/llc/llc_conn.c | 22 ++- net/packet/af_packet.c | 88 ++++++--- net/packet/internal.h | 10 +- net/sctp/ipv6.c | 60 +++---- net/tipc/net.c | 3 +- 60 files changed, 1228 insertions(+), 168 deletions(-)

7 years, 7 months

7
58
0 0

[PATCH] [stable 4.9] arm64: Add work around for Arm Cortex-A55 Erratum 1024718

by Suzuki K Poulose

commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer from an erratum 1024718, which causes incorrect updates when DBM/AP bits in a page table entry is modified without a break-before-make sequence. The work around is to disable the hardware DBM feature on the affected cores. The hardware Access Flag management features is not affected. The hardware DBM feature is a non-conflicting capability, i.e, the kernel could handle cores using the feature and those without having the features running at the same time. So this work around is detected at early boot time, rather than delaying it until the CPUs are brought up into the kernel with MMU turned on. This also avoids other complexities with late CPUs turning online, with or without the hardware DBM features. Cc: stable(a)vger.kernel.org # v4.9 Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Will Deacon <will.deacon(a)arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com> --- Note: The upstream commit is on top of a reworked capability infrastructure for arm64 heterogeneous systems, which allows delaying the CPU model checks. This backport is based on the original version of the patch [0], which checks the affected CPU models during the early boot. [0] https://lkml.kernel.org/r/20180116102323.3470-1-suzuki.poulose@arm.com --- Documentation/arm64/silicon-errata.txt | 1 + arch/arm64/Kconfig | 14 ++++++++++++ arch/arm64/include/asm/assembler.h | 40 ++++++++++++++++++++++++++++++++++ arch/arm64/include/asm/cputype.h | 5 +++++ arch/arm64/mm/proc.S | 5 +++++ 5 files changed, 65 insertions(+) diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index d11af52..ac9489f 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -54,6 +54,7 @@ stable kernels. | ARM | Cortex-A57 | #852523 | N/A | | ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | | ARM | Cortex-A72 | #853709 | N/A | +| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | | ARM | MMU-500 | #841119,#826419 | N/A | | | | | | | Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 90e58bb..d0df361 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -427,6 +427,20 @@ config ARM64_ERRATUM_843419 If unsure, say Y. +config ARM64_ERRATUM_1024718 + bool "Cortex-A55: 1024718: Update of DBM/AP bits without break before make might result in incorrect update" + default y + help + This option adds work around for Arm Cortex-A55 Erratum 1024718. + + Affected Cortex-A55 cores (r0p0, r0p1, r1p0) could cause incorrect + update of the hardware dirty bit when the DBM/AP bits are updated + without a break-before-make. The work around is to disable the usage + of hardware DBM locally on the affected cores. CPUs not affected by + erratum will continue to use the feature. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index e60375c..bfcfec3 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -25,6 +25,7 @@ #include <asm/asm-offsets.h> #include <asm/cpufeature.h> +#include <asm/cputype.h> #include <asm/page.h> #include <asm/pgtable-hwdef.h> #include <asm/ptrace.h> @@ -435,4 +436,43 @@ alternative_endif and \phys, \pte, #(((1 << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT) .endm +/* + * Check the MIDR_EL1 of the current CPU for a given model and a range of + * variant/revision. See asm/cputype.h for the macros used below. + * + * model: MIDR_CPU_MODEL of CPU + * rv_min: Minimum of MIDR_CPU_VAR_REV() + * rv_max: Maximum of MIDR_CPU_VAR_REV() + * res: Result register. + * tmp1, tmp2, tmp3: Temporary registers + * + * Corrupts: res, tmp1, tmp2, tmp3 + * Returns: 0, if the CPU id doesn't match. Non-zero otherwise + */ + .macro cpu_midr_match model, rv_min, rv_max, res, tmp1, tmp2, tmp3 + mrs \res, midr_el1 + mov_q \tmp1, (MIDR_REVISION_MASK | MIDR_VARIANT_MASK) + mov_q \tmp2, MIDR_CPU_MODEL_MASK + and \tmp3, \res, \tmp2 // Extract model + and \tmp1, \res, \tmp1 // rev & variant + mov_q \tmp2, \model + cmp \tmp3, \tmp2 + cset \res, eq + cbz \res, .Ldone\@ // Model matches ? + + .if (\rv_min != 0) // Skip min check if rv_min == 0 + mov_q \tmp3, \rv_min + cmp \tmp1, \tmp3 + cset \res, ge + .endif // \rv_min != 0 + /* Skip rv_max check if rv_min == rv_max && rv_min != 0 */ + .if ((\rv_min != \rv_max) || \rv_min == 0) + mov_q \tmp2, \rv_max + cmp \tmp1, \tmp2 + cset \tmp2, le + and \res, \res, \tmp2 + .endif +.Ldone\@: + .endm + #endif /* __ASM_ASSEMBLER_H */ diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 9ee3038..39d1db6 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -56,6 +56,9 @@ (0xf << MIDR_ARCHITECTURE_SHIFT) | \ ((partnum) << MIDR_PARTNUM_SHIFT)) +#define MIDR_CPU_VAR_REV(var, rev) \ + (((var) << MIDR_VARIANT_SHIFT) | (rev)) + #define MIDR_CPU_MODEL_MASK (MIDR_IMPLEMENTOR_MASK | MIDR_PARTNUM_MASK | \ MIDR_ARCHITECTURE_MASK) @@ -74,6 +77,7 @@ #define ARM_CPU_PART_AEM_V8 0xD0F #define ARM_CPU_PART_FOUNDATION 0xD00 +#define ARM_CPU_PART_CORTEX_A55 0xD05 #define ARM_CPU_PART_CORTEX_A57 0xD07 #define ARM_CPU_PART_CORTEX_A72 0xD08 #define ARM_CPU_PART_CORTEX_A53 0xD03 @@ -89,6 +93,7 @@ #define BRCM_CPU_PART_VULCAN 0x516 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) +#define MIDR_CORTEX_A55 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A55) #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) #define MIDR_CORTEX_A73 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A73) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 619da1c..66cce21 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -425,6 +425,11 @@ ENTRY(__cpu_setup) cbz x9, 2f cmp x9, #2 b.lt 1f +#ifdef CONFIG_ARM64_ERRATUM_1024718 + /* Disable hardware DBM on Cortex-A55 r0p0, r0p1 & r1p0 */ + cpu_midr_match MIDR_CORTEX_A55, MIDR_CPU_VAR_REV(0, 0), MIDR_CPU_VAR_REV(1, 0), x1, x2, x3, x4 + cbnz x1, 1f +#endif orr x10, x10, #TCR_HD // hardware Dirty flag update 1: orr x10, x10, #TCR_HA // hardware Access flag update 2: -- 2.7.4

7 years, 7 months

3
3
0 0

FAILED: patch "[PATCH] drm/i915: Adjust eDP's logical vco in a reliable place." failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9d219554d9bf59875b4e571a0392d620e8954879 Mon Sep 17 00:00:00 2001 From: Rodrigo Vivi <rodrigo.vivi(a)intel.com> Date: Wed, 2 May 2018 10:52:55 -0700 Subject: [PATCH] drm/i915: Adjust eDP's logical vco in a reliable place. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On intel_dp_compute_config() we were calculating the needed vco for eDP on gen9 and we stashing it in intel_atomic_state.cdclk.logical.vco However few moments later on intel_modeset_checks() we fully replace entire intel_atomic_state.cdclk.logical with dev_priv->cdclk.logical fully overwriting the logical desired vco for eDP on gen9. So, with wrong VCO value we end up with wrong desired cdclk, but also it will raise a lot of WARNs: On gen9, when we read CDCLK_CTL to verify if we configured properly the desired frequency the CD Frequency Select bits [27:26] == 10b can mean 337.5 or 308.57 MHz depending on the VCO. So if we have wrong VCO value stashed we will believe the frequency selection didn't stick and start to raise WARNs of cdclk mismatch. [ 42.857519] [drm:intel_dump_cdclk_state [i915]] Changing CDCLK to 308571 kHz, VCO 8640000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0 [ 42.897269] cdclk state doesn't match! [ 42.901052] WARNING: CPU: 5 PID: 1116 at drivers/gpu/drm/i915/intel_cdclk.c:2084 intel_set_cdclk+0x5d/0x110 [i915] [ 42.938004] RIP: 0010:intel_set_cdclk+0x5d/0x110 [i915] [ 43.155253] WARNING: CPU: 5 PID: 1116 at drivers/gpu/drm/i915/intel_cdclk.c:2084 intel_set_cdclk+0x5d/0x110 [i915] [ 43.170277] [drm:intel_dump_cdclk_state [i915]] [hw state] 337500 kHz, VCO 8100000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0 [ 43.182566] [drm:intel_dump_cdclk_state [i915]] [sw state] 308571 kHz, VCO 8640000 kHz, ref 24000 kHz, bypass 24000 kHz, voltage level 0 v2: Move the entire eDP's vco logical adjustment to inside the skl_modeset_calc_cdclk as suggested by Ville. Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Fixes: bb0f4aab0e76 ("drm/i915: Track full cdclk state for the logical and actual cdclk frequencies") Cc: <stable(a)vger.kernel.org> # v4.12+ Link: https://patchwork.freedesktop.org/patch/msgid/20180502175255.5344-1-rodrigo… (cherry picked from commit 3297234a05ab1e90091b0574db4c397ef0e90d5f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com> diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c index 32d24c69da3c..704ddb4d3ca7 100644 --- a/drivers/gpu/drm/i915/intel_cdclk.c +++ b/drivers/gpu/drm/i915/intel_cdclk.c @@ -2302,9 +2302,44 @@ static int bdw_modeset_calc_cdclk(struct drm_atomic_state *state) return 0; } +static int skl_dpll0_vco(struct intel_atomic_state *intel_state) +{ + struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev); + struct intel_crtc *crtc; + struct intel_crtc_state *crtc_state; + int vco, i; + + vco = intel_state->cdclk.logical.vco; + if (!vco) + vco = dev_priv->skl_preferred_vco_freq; + + for_each_new_intel_crtc_in_state(intel_state, crtc, crtc_state, i) { + if (!crtc_state->base.enable) + continue; + + if (!intel_crtc_has_type(crtc_state, INTEL_OUTPUT_EDP)) + continue; + + /* + * DPLL0 VCO may need to be adjusted to get the correct + * clock for eDP. This will affect cdclk as well. + */ + switch (crtc_state->port_clock / 2) { + case 108000: + case 216000: + vco = 8640000; + break; + default: + vco = 8100000; + break; + } + } + + return vco; +} + static int skl_modeset_calc_cdclk(struct drm_atomic_state *state) { - struct drm_i915_private *dev_priv = to_i915(state->dev); struct intel_atomic_state *intel_state = to_intel_atomic_state(state); int min_cdclk, cdclk, vco; @@ -2312,9 +2347,7 @@ static int skl_modeset_calc_cdclk(struct drm_atomic_state *state) if (min_cdclk < 0) return min_cdclk; - vco = intel_state->cdclk.logical.vco; - if (!vco) - vco = dev_priv->skl_preferred_vco_freq; + vco = skl_dpll0_vco(intel_state); /* * FIXME should also account for plane ratio diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 9a4a51e79fa1..b7b4cfdeb974 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -1881,26 +1881,6 @@ intel_dp_compute_config(struct intel_encoder *encoder, reduce_m_n); } - /* - * DPLL0 VCO may need to be adjusted to get the correct - * clock for eDP. This will affect cdclk as well. - */ - if (intel_dp_is_edp(intel_dp) && IS_GEN9_BC(dev_priv)) { - int vco; - - switch (pipe_config->port_clock / 2) { - case 108000: - case 216000: - vco = 8640000; - break; - default: - vco = 8100000; - break; - } - - to_intel_atomic_state(pipe_config->base.state)->cdclk.logical.vco = vco; - } - if (!HAS_DDI(dev_priv)) intel_dp_set_clock(encoder, pipe_config);

7 years, 7 months

1
0
0 0

FAILED: patch "[PATCH] mm: treat indirectly reclaimable memory as free in overcommit" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From d79f7aa496fc94d763f67b833a1f36f4c171176f Mon Sep 17 00:00:00 2001 From: Roman Gushchin <guro(a)fb.com> Date: Tue, 10 Apr 2018 16:27:47 -0700 Subject: [PATCH] mm: treat indirectly reclaimable memory as free in overcommit logic Indirectly reclaimable memory can consume a significant part of total memory and it's actually reclaimable (it will be released under actual memory pressure). So, the overcommit logic should treat it as free. Otherwise, it's possible to cause random system-wide memory allocation failures by consuming a significant amount of memory by indirectly reclaimable memory, e.g. dentry external names. If overcommit policy GUESS is used, it might be used for denial of service attack under some conditions. The following program illustrates the approach. It causes the kernel to allocate an unreclaimable kmalloc-256 chunk for each stat() call, so that at some point the overcommit logic may start blocking large allocation system-wide. int main() { char buf[256]; unsigned long i; struct stat statbuf; buf[0] = '/'; for (i = 1; i < sizeof(buf); i++) buf[i] = '_'; for (i = 0; 1; i++) { sprintf(&buf[248], "%8lu", i); stat(buf, &statbuf); } return 0; } This patch in combination with related indirectly reclaimable memory patches closes this issue. Link: http://lkml.kernel.org/r/20180313130041.8078-1-guro@fb.com Signed-off-by: Roman Gushchin <guro(a)fb.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/util.c b/mm/util.c index 029fc2f3b395..73676f0f1b43 100644 --- a/mm/util.c +++ b/mm/util.c @@ -667,6 +667,13 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) */ free += global_node_page_state(NR_SLAB_RECLAIMABLE); + /* + * Part of the kernel memory, which can be released + * under memory pressure. + */ + free += global_node_page_state( + NR_INDIRECTLY_RECLAIMABLE_BYTES) >> PAGE_SHIFT; + /* * Leave reserved pages. The pages are not for anonymous pages. */

7 years, 7 months

1
0
0 0

[PATCH] nvme/pci: Sync controller reset for AER slot_reset

by Keith Busch

AER handling expects a successful return from slot_reset means the driver made the device functional again. The nvme driver had been using an asynchronous reset to recover the device, so the device may still be initializing after control is returned to the AER handler. This creates problems for subsequent event handling, causing the initializion to fail. This patch fixes that by syncing the controller reset before returning to the AER driver, and reporting the true state of the reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657 Reported-by: Alex Gagniuc <mr.nuke.me(a)gmail.com> Cc: Sinan Kaya <okaya(a)codeaurora.org> Cc: Bjorn Helgaas <bhelgaas(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Keith Busch <keith.busch(a)intel.com> --- drivers/nvme/host/pci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index b542dce45927..2e221796257a 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev) dev_info(dev->ctrl.device, "restart after slot reset\n"); pci_restore_state(pdev); - nvme_reset_ctrl(&dev->ctrl); - return PCI_ERS_RESULT_RECOVERED; + nvme_reset_ctrl_sync(&dev->ctrl); + + switch (dev->ctrl.state) { + case NVME_CTRL_LIVE: + case NVME_CTRL_ADMIN_ONLY: + return PCI_ERS_RESULT_RECOVERED; + default: + return PCI_ERS_RESULT_DISCONNECT; + } } static void nvme_error_resume(struct pci_dev *pdev) -- 2.14.3

7 years, 7 months

6
8
0 0

[PATCH 1/1] doc: fix sysfs ABI documentation

by kys＠linuxonhyperv.com

From: Stephen Hemminger <stephen(a)networkplumber.org> In 4.9 kernel, the sysfs files for Hyper-V VMBus changed name but the documentation files were not updated. The current sysfs file names are /sys/bus/vmbus/devices/<UUID>/... See commit 9a56e5d6a0ba ("Drivers: hv: make VMBus bus ids persistent") and commit f6b2db084b65 ("vmbus: make sysfs names consistent with PCI") Reported-by: Michael Kelley <mikelley(a)microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com> Cc: stable(a)vger.kernel.org Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com> --- Documentation/ABI/stable/sysfs-bus-vmbus | 40 ++++++++++++++++---------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus index 0c9d9dcd2151..3eaffbb2d468 100644 --- a/Documentation/ABI/stable/sysfs-bus-vmbus +++ b/Documentation/ABI/stable/sysfs-bus-vmbus @@ -1,25 +1,25 @@ -What: /sys/bus/vmbus/devices/vmbus_*/id +What: /sys/bus/vmbus/devices/<UUID>/id Date: Jul 2009 KernelVersion: 2.6.31 Contact: K. Y. Srinivasan <kys(a)microsoft.com> Description: The VMBus child_relid of the device's primary channel Users: tools/hv/lsvmbus -What: /sys/bus/vmbus/devices/vmbus_*/class_id +What: /sys/bus/vmbus/devices/<UUID>/class_id Date: Jul 2009 KernelVersion: 2.6.31 Contact: K. Y. Srinivasan <kys(a)microsoft.com> Description: The VMBus interface type GUID of the device Users: tools/hv/lsvmbus -What: /sys/bus/vmbus/devices/vmbus_*/device_id +What: /sys/bus/vmbus/devices/<UUID>/device_id Date: Jul 2009 KernelVersion: 2.6.31 Contact: K. Y. Srinivasan <kys(a)microsoft.com> Description: The VMBus interface instance GUID of the device Users: tools/hv/lsvmbus -What: /sys/bus/vmbus/devices/vmbus_*/channel_vp_mapping +What: /sys/bus/vmbus/devices/<UUID>/channel_vp_mapping Date: Jul 2015 KernelVersion: 4.2.0 Contact: K. Y. Srinivasan <kys(a)microsoft.com> @@ -28,112 +28,112 @@ Description: The mapping of which primary/sub channels are bound to which Format: <channel's child_relid:the bound cpu's number> Users: tools/hv/lsvmbus -What: /sys/bus/vmbus/devices/vmbus_*/device +What: /sys/bus/vmbus/devices/<UUID>/device Date: Dec. 2015 KernelVersion: 4.5 Contact: K. Y. Srinivasan <kys(a)microsoft.com> Description: The 16 bit device ID of the device Users: tools/hv/lsvmbus and user level RDMA libraries -What: /sys/bus/vmbus/devices/vmbus_*/vendor +What: /sys/bus/vmbus/devices/<UUID>/vendor Date: Dec. 2015 KernelVersion: 4.5 Contact: K. Y. Srinivasan <kys(a)microsoft.com> Description: The 16 bit vendor ID of the device Users: tools/hv/lsvmbus and user level RDMA libraries -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN +What: /sys/bus/vmbus/devices/<UUID>/channels/<N> Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Directory for per-channel information NN is the VMBUS relid associtated with the channel. -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/cpu +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: VCPU (sub)channel is affinitized to Users: tools/hv/lsvmbus and other debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/cpu +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/cpu Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: VCPU (sub)channel is affinitized to Users: tools/hv/lsvmbus and other debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/in_mask +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/in_mask Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Host to guest channel interrupt mask Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/latency +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/latency Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Channel signaling latency Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/out_mask +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/out_mask Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Guest to host channel interrupt mask Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/pending +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/pending Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Channel interrupt pending state Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/read_avail +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/read_avail Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Bytes available to read Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/write_avail +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/write_avail Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Bytes available to write Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/events +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/events Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Number of times we have signaled the host Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/interrupts +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/interrupts Date: September. 2017 KernelVersion: 4.14 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Number of times we have taken an interrupt (incoming) Users: Debugging tools -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/subchannel_id +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/subchannel_id Date: January. 2018 KernelVersion: 4.16 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Subchannel ID associated with VMBUS channel Users: Debugging tools and userspace drivers -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/monitor_id +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/monitor_id Date: January. 2018 KernelVersion: 4.16 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> Description: Monitor bit associated with channel Users: Debugging tools and userspace drivers -What: /sys/bus/vmbus/devices/vmbus_*/channels/NN/ring +What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/ring Date: January. 2018 KernelVersion: 4.16 Contact: Stephen Hemminger <sthemmin(a)microsoft.com> -- 2.15.1

7 years, 7 months

1
0
0 0

[PATCH v2 03/12] arm: dts: mt7623: fix invalid memory node being generated

by sean.wang＠mediatek.com

From: Sean Wang <sean.wang(a)mediatek.com> Below two wrong nodes in existing DTS files would cause a fail boot since in fact the address 0 is not the correct place the memory device locates at. memory { device_type = "memory"; reg = <0x0 0x0 0x0 0x0>; }; memory@80000000 { reg = <0x0 0x80000000 0x0 0x40000000>; }; In order to avoid having a memory node starting at address 0, we can't include file skeleton64.dtsi and instead need to explicitly manually define a few of properties the DTS relies on such as #address-cells and #size-cells in root node and device_type in the node memory@80000000. Cc: stable(a)vger.kernel.org Fixes: 31ac0d69a1d4 ("ARM: dts: mediatek: add MT7623 basic support") Signed-off-by: Sean Wang <sean.wang(a)mediatek.com> Cc: Rob Herring <robh+dt(a)kernel.org> --- arch/arm/boot/dts/mt7623.dtsi | 3 ++- arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts | 1 + arch/arm/boot/dts/mt7623n-rfb.dtsi | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/mt7623.dtsi b/arch/arm/boot/dts/mt7623.dtsi index fec4715..406a9f3 100644 --- a/arch/arm/boot/dts/mt7623.dtsi +++ b/arch/arm/boot/dts/mt7623.dtsi @@ -15,11 +15,12 @@ #include <dt-bindings/phy/phy.h> #include <dt-bindings/reset/mt2701-resets.h> #include <dt-bindings/thermal/thermal.h> -#include "skeleton64.dtsi" / { compatible = "mediatek,mt7623"; interrupt-parent = <&sysirq>; + #address-cells = <2>; + #size-cells = <2>; cpu_opp_table: opp-table { compatible = "operating-points-v2"; diff --git a/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts b/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts index bbf56f8..5938e4c 100644 --- a/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts +++ b/arch/arm/boot/dts/mt7623n-bananapi-bpi-r2.dts @@ -109,6 +109,7 @@ }; memory@80000000 { + device_type = "memory"; reg = <0 0x80000000 0 0x40000000>; }; }; diff --git a/arch/arm/boot/dts/mt7623n-rfb.dtsi b/arch/arm/boot/dts/mt7623n-rfb.dtsi index a199ae7..343e8ef 100644 --- a/arch/arm/boot/dts/mt7623n-rfb.dtsi +++ b/arch/arm/boot/dts/mt7623n-rfb.dtsi @@ -40,6 +40,7 @@ }; memory@80000000 { + device_type = "memory"; reg = <0 0x80000000 0 0x40000000>; }; -- 2.7.4

7 years, 7 months

3
2
0 0

patch "xhci: Fix use-after-free in xhci_free_virt_device" causes kernel crashes on 4.16.8 on unplug

by Jacob Saunders

On Linux kernel 4.16.8 (using Arch Linux) if I eject a USB 3.0 device from my system (unplug or "udisksctl power-off --block-device /dev/sde), it freezes instantly with a null pointer error in XHCI. I've been unable to successfully capture a kernel log of the error (and my cameras did not return usable results) and the screen begins scrolling near-immediately with hung processors in a VT. I have done a git bisection (between 4.16.7 and 4.16.7) narrowing it down to the following commit: commit f5331826b0b7a5f2db56a9020ddbb8ce16acdfc0 Author: Mathias Nyman <mathias.nyman(a)linux.intel.com> Date: Thu May 3 17:30:07 2018 +0300 xhci: Fix use-after-free in xhci_free_virt_device commit 44a182b9d17765514fa2b1cc911e4e65134eef93 upstream. KASAN found a use-after-free in xhci_free_virt_device+0x33b/0x38e where xhci_free_virt_device() sets slot id to 0 if udev exists: if (dev->udev && dev->udev->slot_id) dev->udev->slot_id = 0; dev->udev will be true even if udev is freed because dev->udev is not set to NULL. set dev->udev pointer to NULL in xhci_free_dev() The original patch went to stable so this fix needs to be applied there as well. Fixes: a400efe455f7 ("xhci: zero usb device slot_id member when disabling and freeing a xhci slot") Cc: <stable(a)vger.kernel.org> Reported-by: Guenter Roeck <linux(a)roeck-us.net> Reviewed-by: Guenter Roeck <linux(a)roeck-us.net> Tested-by: Guenter Roeck <linux(a)roeck-us.net> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> :040000 040000 356ecdc13bf252535bd51b8558a8173dae546f19 d28f201228652e87e51ba48f5a0ad4539cca5c29 M drivers Can I be CC'd on future responses to this? I am not subscribed to the list. Jacob Saunders

7 years, 7 months

2
1
0 0

[patch 10/13] mm, oom: fix concurrent munlock and oom reaper unmap, v3

by akpm＠linux-foundation.org

From: David Rientjes <rientjes(a)google.com> Subject: mm, oom: fix concurrent munlock and oom reaper unmap, v3 Since exit_mmap() is done without the protection of mm->mmap_sem, it is possible for the oom reaper to concurrently operate on an mm until MMF_OOM_SKIP is set. This allows munlock_vma_pages_all() to concurrently run while the oom reaper is operating on a vma. Since munlock_vma_pages_range() depends on clearing VM_LOCKED from vm_flags before actually doing the munlock to determine if any other vmas are locking the same memory, the check for VM_LOCKED in the oom reaper is racy. This is especially noticeable on architectures such as powerpc where clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd is zapped by the oom reaper during follow_page_mask() after the check for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a kernel oops. Fix this by manually freeing all possible memory from the mm before doing the munlock and then setting MMF_OOM_SKIP. The oom reaper can not run on the mm anymore so the munlock is safe to do in exit_mmap(). It also matches the logic that the oom reaper currently uses for determining when to set MMF_OOM_SKIP itself, so there's no new risk of excessive oom killing. This issue fixes CVE-2018-1000200. Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.cor… Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: David Rientjes <rientjes(a)google.com> Suggested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: <stable(a)vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/oom.h | 2 + mm/mmap.c | 44 +++++++++++++--------- mm/oom_kill.c | 81 ++++++++++++++++++++++-------------------- 3 files changed, 71 insertions(+), 56 deletions(-) diff -puN include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap include/linux/oom.h --- a/include/linux/oom.h~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap +++ a/include/linux/oom.h @@ -95,6 +95,8 @@ static inline int check_stable_address_s return 0; } +void __oom_reap_task_mm(struct mm_struct *mm); + extern unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg, const nodemask_t *nodemask, unsigned long totalpages); diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c --- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap +++ a/mm/mmap.c @@ -3024,6 +3024,32 @@ void exit_mmap(struct mm_struct *mm) /* mm's last user has gone, and its about to be pulled down */ mmu_notifier_release(mm); + if (unlikely(mm_is_oom_victim(mm))) { + /* + * Manually reap the mm to free as much memory as possible. + * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard + * this mm from further consideration. Taking mm->mmap_sem for + * write after setting MMF_OOM_SKIP will guarantee that the oom + * reaper will not run on this mm again after mmap_sem is + * dropped. + * + * Nothing can be holding mm->mmap_sem here and the above call + * to mmu_notifier_release(mm) ensures mmu notifier callbacks in + * __oom_reap_task_mm() will not block. + * + * This needs to be done before calling munlock_vma_pages_all(), + * which clears VM_LOCKED, otherwise the oom reaper cannot + * reliably test it. + */ + mutex_lock(&oom_lock); + __oom_reap_task_mm(mm); + mutex_unlock(&oom_lock); + + set_bit(MMF_OOM_SKIP, &mm->flags); + down_write(&mm->mmap_sem); + up_write(&mm->mmap_sem); + } + if (mm->locked_vm) { vma = mm->mmap; while (vma) { @@ -3045,24 +3071,6 @@ void exit_mmap(struct mm_struct *mm) /* update_hiwater_rss(mm) here? but nobody should be looking */ /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); - - if (unlikely(mm_is_oom_victim(mm))) { - /* - * Wait for oom_reap_task() to stop working on this - * mm. Because MMF_OOM_SKIP is already set before - * calling down_read(), oom_reap_task() will not run - * on this "mm" post up_write(). - * - * mm_is_oom_victim() cannot be set from under us - * either because victim->mm is already set to NULL - * under task_lock before calling mmput and oom_mm is - * set not NULL by the OOM killer only if victim->mm - * is found not NULL while holding the task_lock. - */ - set_bit(MMF_OOM_SKIP, &mm->flags); - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); - } free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, 0, -1); diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c --- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap +++ a/mm/oom_kill.c @@ -469,7 +469,6 @@ bool process_shares_mm(struct task_struc return false; } - #ifdef CONFIG_MMU /* * OOM Reaper kernel thread which tries to reap the memory used by the OOM @@ -480,16 +479,54 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reape static struct task_struct *oom_reaper_list; static DEFINE_SPINLOCK(oom_reaper_lock); -static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +void __oom_reap_task_mm(struct mm_struct *mm) { - struct mmu_gather tlb; struct vm_area_struct *vma; + + /* + * Tell all users of get_user/copy_from_user etc... that the content + * is no longer stable. No barriers really needed because unmapping + * should imply barriers already and the reader would hit a page fault + * if it stumbled over a reaped memory. + */ + set_bit(MMF_UNSTABLE, &mm->flags); + + for (vma = mm->mmap ; vma; vma = vma->vm_next) { + if (!can_madv_dontneed_vma(vma)) + continue; + + /* + * Only anonymous pages have a good chance to be dropped + * without additional steps which we cannot afford as we + * are OOM already. + * + * We do not even care about fs backed pages because all + * which are reclaimable have already been reclaimed and + * we do not want to block exit_mmap by keeping mm ref + * count elevated without a good reason. + */ + if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { + const unsigned long start = vma->vm_start; + const unsigned long end = vma->vm_end; + struct mmu_gather tlb; + + tlb_gather_mmu(&tlb, mm, start, end); + mmu_notifier_invalidate_range_start(mm, start, end); + unmap_page_range(&tlb, vma, start, end, NULL); + mmu_notifier_invalidate_range_end(mm, start, end); + tlb_finish_mmu(&tlb, start, end); + } + } +} + +static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) +{ bool ret = true; /* * We have to make sure to not race with the victim exit path * and cause premature new oom victim selection: - * __oom_reap_task_mm exit_mm + * oom_reap_task_mm exit_mm * mmget_not_zero * mmput * atomic_dec_and_test @@ -534,39 +571,8 @@ static bool __oom_reap_task_mm(struct ta trace_start_task_reaping(tsk->pid); - /* - * Tell all users of get_user/copy_from_user etc... that the content - * is no longer stable. No barriers really needed because unmapping - * should imply barriers already and the reader would hit a page fault - * if it stumbled over a reaped memory. - */ - set_bit(MMF_UNSTABLE, &mm->flags); - - for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) - continue; + __oom_reap_task_mm(mm); - /* - * Only anonymous pages have a good chance to be dropped - * without additional steps which we cannot afford as we - * are OOM already. - * - * We do not even care about fs backed pages because all - * which are reclaimable have already been reclaimed and - * we do not want to block exit_mmap by keeping mm ref - * count elevated without a good reason. - */ - if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { - const unsigned long start = vma->vm_start; - const unsigned long end = vma->vm_end; - - tlb_gather_mmu(&tlb, mm, start, end); - mmu_notifier_invalidate_range_start(mm, start, end); - unmap_page_range(&tlb, vma, start, end, NULL); - mmu_notifier_invalidate_range_end(mm, start, end); - tlb_finish_mmu(&tlb, start, end); - } - } pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)), @@ -587,14 +593,13 @@ static void oom_reap_task(struct task_st struct mm_struct *mm = tsk->signal->oom_mm; /* Retry the down_read_trylock(mmap_sem) a few times */ - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm)) + while (attempts++ < MAX_OOM_REAP_RETRIES && !oom_reap_task_mm(tsk, mm)) schedule_timeout_idle(HZ/10); if (attempts <= MAX_OOM_REAP_RETRIES || test_bit(MMF_OOM_SKIP, &mm->flags)) goto done; - pr_info("oom_reaper: unable to reap pid:%d (%s)\n", task_pid_nr(tsk), tsk->comm); debug_show_all_locks(); _

7 years, 7 months

1
0
0 0

[patch 06/13] mm: sections are not offlined during memory hotremove

by akpm＠linux-foundation.org

From: Pavel Tatashin <pasha.tatashin(a)oracle.com> Subject: mm: sections are not offlined during memory hotremove Memory hotplug and hotremove operate with per-block granularity. If the machine has a large amount of memory (more than 64G), the size of a memory block can span multiple sections. By mistake, during hotremove we set only the first section to offline state. The bug was discovered because kernel selftest started to fail: https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop After commit, "mm/memory_hotplug: optimize probe routine". But, the bug is older than this commit. In this optimization we also added a check for sections to be in a proper state during hotplug operation. Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes") Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Steven Sistare <steven.sistare(a)oracle.com> Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/sparse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove mm/sparse.c --- a/mm/sparse.c~mm-sections-are-not-offlined-during-memory-hotremove +++ a/mm/sparse.c @@ -629,7 +629,7 @@ void offline_mem_sections(unsigned long unsigned long pfn; for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { - unsigned long section_nr = pfn_to_section_nr(start_pfn); + unsigned long section_nr = pfn_to_section_nr(pfn); struct mem_section *ms; /* _

7 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror