- Linux-stable-mirror - lists.linaro.org

FAILED: patch "[PATCH] iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 66e839030fd698586734e017fd55c4f2a89dba0b Mon Sep 17 00:00:00 2001 From: Matt Chen <matt.chen(a)intel.com> Date: Fri, 3 Aug 2018 14:29:20 +0800 Subject: [PATCH] iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE >From coreboot/BIOS: Name ("WGDS", Package() { Revision, Package() { DomainType, // 0x7:WiFi ==> We miss this one. WgdsWiFiSarDeltaGroup1PowerMax1, // Group 1 FCC 2400 Max WgdsWiFiSarDeltaGroup1PowerChainA1, // Group 1 FCC 2400 A Offset WgdsWiFiSarDeltaGroup1PowerChainB1, // Group 1 FCC 2400 B Offset WgdsWiFiSarDeltaGroup1PowerMax2, // Group 1 FCC 5200 Max WgdsWiFiSarDeltaGroup1PowerChainA2, // Group 1 FCC 5200 A Offset WgdsWiFiSarDeltaGroup1PowerChainB2, // Group 1 FCC 5200 B Offset WgdsWiFiSarDeltaGroup2PowerMax1, // Group 2 EC Jap 2400 Max WgdsWiFiSarDeltaGroup2PowerChainA1, // Group 2 EC Jap 2400 A Offset WgdsWiFiSarDeltaGroup2PowerChainB1, // Group 2 EC Jap 2400 B Offset WgdsWiFiSarDeltaGroup2PowerMax2, // Group 2 EC Jap 5200 Max WgdsWiFiSarDeltaGroup2PowerChainA2, // Group 2 EC Jap 5200 A Offset WgdsWiFiSarDeltaGroup2PowerChainB2, // Group 2 EC Jap 5200 B Offset WgdsWiFiSarDeltaGroup3PowerMax1, // Group 3 ROW 2400 Max WgdsWiFiSarDeltaGroup3PowerChainA1, // Group 3 ROW 2400 A Offset WgdsWiFiSarDeltaGroup3PowerChainB1, // Group 3 ROW 2400 B Offset WgdsWiFiSarDeltaGroup3PowerMax2, // Group 3 ROW 5200 Max WgdsWiFiSarDeltaGroup3PowerChainA2, // Group 3 ROW 5200 A Offset WgdsWiFiSarDeltaGroup3PowerChainB2, // Group 3 ROW 5200 B Offset } }) When read the ACPI data to find out the WGDS, the DATA_SIZE is never matched. >From the above format, it gives 19 numbers, but our driver is hardcode as 18. Fix it to pass then can parse the data into our wgds table. Then we will see: iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init Sending GEO_TX_POWER_LIMIT iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init SAR geographic profile[0] Band[0]: chain A = 68 chain B = 69 max_tx_power = 54 iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init SAR geographic profile[0] Band[1]: chain A = 48 chain B = 49 max_tx_power = 70 iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init SAR geographic profile[1] Band[0]: chain A = 51 chain B = 67 max_tx_power = 50 iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init SAR geographic profile[1] Band[1]: chain A = 69 chain B = 70 max_tx_power = 68 iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init SAR geographic profile[2] Band[0]: chain A = 49 chain B = 50 max_tx_power = 48 iwlwifi 0000:01:00.0: U iwl_mvm_sar_geo_init SAR geographic profile[2] Band[1]: chain A = 52 chain B = 53 max_tx_power = 51 Cc: stable(a)vger.kernel.org # 4.12+ Fixes: a6bff3cb19b7 ("iwlwifi: mvm: add GEO_TX_POWER_LIMIT cmd for geographic tx power table") Signed-off-by: Matt Chen <matt.chen(a)intel.com> Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com> diff --git a/drivers/net/wireless/intel/iwlwifi/fw/acpi.h b/drivers/net/wireless/intel/iwlwifi/fw/acpi.h index 2439e98431ee..7492dfb6729b 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/acpi.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/acpi.h @@ -6,6 +6,7 @@ * GPL LICENSE SUMMARY * * Copyright(c) 2017 Intel Deutschland GmbH + * Copyright(c) 2018 Intel Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of version 2 of the GNU General Public License as @@ -26,6 +27,7 @@ * BSD LICENSE * * Copyright(c) 2017 Intel Deutschland GmbH + * Copyright(c) 2018 Intel Corporation * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -81,7 +83,7 @@ #define ACPI_WRDS_WIFI_DATA_SIZE (ACPI_SAR_TABLE_SIZE + 2) #define ACPI_EWRD_WIFI_DATA_SIZE ((ACPI_SAR_PROFILE_NUM - 1) * \ ACPI_SAR_TABLE_SIZE + 3) -#define ACPI_WGDS_WIFI_DATA_SIZE 18 +#define ACPI_WGDS_WIFI_DATA_SIZE 19 #define ACPI_WRDD_WIFI_DATA_SIZE 2 #define ACPI_SPLC_WIFI_DATA_SIZE 2 diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c index dade206d5511..899f4a6432fb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -893,7 +893,7 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm) IWL_DEBUG_RADIO(mvm, "Sending GEO_TX_POWER_LIMIT\n"); BUILD_BUG_ON(ACPI_NUM_GEO_PROFILES * ACPI_WGDS_NUM_BANDS * - ACPI_WGDS_TABLE_SIZE != ACPI_WGDS_WIFI_DATA_SIZE); + ACPI_WGDS_TABLE_SIZE + 1 != ACPI_WGDS_WIFI_DATA_SIZE); BUILD_BUG_ON(ACPI_NUM_GEO_PROFILES > IWL_NUM_GEO_PROFILES);

6 years, 9 months

3
2
0 0

[PATCH for-4.9.y 00/10] Stable candidates for linux-4.9.y

by Amit Pundir

Hi Greg, Few stable candidates for 4.9.y for your consideration. Cherry picked and build tested on linux-4.9.141 for ARCH=arm/arm64 + allmodconfig. Few fixes are applicable for 4.4.y and 3.18.y as well, but they needed minor rebasing, so I'll submit them along with other fixes shortly in separate threads. Regards, Amit Pundir Amitkumar Karwar (3): mwifiex: prevent register accesses after host is sleeping mwifiex: report error to PCIe for suspend failure mwifiex: Fix NULL pointer dereference in skb_dequeue() Johannes Thumshirn (1): cw1200: Don't leak memory if krealloc failes Karthik D A (1): mwifiex: fix p2p device doesn't find in scan problem Subhash Jadavani (2): scsi: ufs: fix race between clock gating and devfreq scaling work scsi: ufshcd: release resources if probe fails Vasanthakumar Thiagarajan (1): ath10k: fix kernel panic due to race in accessing arvif list Venkat Gopalakrishnan (1): scsi: ufshcd: Fix race between clk scaling and ungate work Yaniv Gardi (1): scsi: ufs: fix bugs related to null pointer access and array size drivers/net/wireless/ath/ath10k/mac.c | 6 ++ drivers/net/wireless/marvell/mwifiex/cfg80211.c | 10 +++- drivers/net/wireless/marvell/mwifiex/pcie.c | 19 +++++-- drivers/net/wireless/marvell/mwifiex/wmm.c | 12 +++- drivers/net/wireless/st/cw1200/wsm.c | 16 +++--- drivers/scsi/ufs/ufs.h | 3 +- drivers/scsi/ufs/ufshcd-pci.c | 2 + drivers/scsi/ufs/ufshcd-pltfrm.c | 5 +- drivers/scsi/ufs/ufshcd.c | 75 ++++++++++++++++++++++--- 9 files changed, 118 insertions(+), 30 deletions(-) -- 2.7.4

6 years, 9 months

2
11
0 0

Re: [tip:x86/pti] x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support

by David Woodhouse

On Thu, 2018-11-29 at 09:12 +0000, Sasha Levin wrote: > Hi, > > [This is an automated email] > > This commit has been processed because it contains a -stable tag. > The stable tag indicates that it's relevant for the following trees: > all > > The bot has tested the following trees: v4.19.5, v4.14.84, v4.9.141, > v4.4.165, v3.18.127, > > v4.19.5: Build OK! > v4.14.84: Build OK! > v4.9.141: Failed to apply! Possible dependencies: > > v4.4.165: Failed to apply! Possible dependencies: > > v3.18.127: Failed to apply! Possible dependencies: > > How should we proceed with this patch? I think it's fine to apply it only to 4.19 and 4.14. It's not imperative that the older kernels get it. People building those kernels should already have their tools in place; it's not like we expect *new* users of ancient kernels, who will be tripped up by this.

6 years, 9 months

1
0
0 0

[PATCH v2] Return only active connectors for get_resources ioctl

by Stanislav Lisovskiy

Currently kernel might allocate different connector ids for the same outputs in case of DP MST, which seems to confuse userspace. There are can be different connector ids in the list, which could be assigned to the same output, while being in different states. This results in issues, like external displays staying blank after quick unplugging and plugging back(bug #106250). Returning only active DP connectors fixes the issue. v2: Removed caps from the title Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106250 Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com> --- drivers/gpu/drm/drm_mode_config.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_mode_config.c b/drivers/gpu/drm/drm_mode_config.c index ee80788f2c40..ec5b2b08a45e 100644 --- a/drivers/gpu/drm/drm_mode_config.c +++ b/drivers/gpu/drm/drm_mode_config.c @@ -143,6 +143,7 @@ int drm_mode_getresources(struct drm_device *dev, void *data, drm_connector_list_iter_begin(dev, &conn_iter); count = 0; connector_id = u64_to_user_ptr(card_res->connector_id_ptr); + DRM_DEBUG_KMS("GetResources: writing connectors start"); drm_for_each_connector_iter(connector, &conn_iter) { /* only expose writeback connectors if userspace understands them */ if (!file_priv->writeback_connectors && @@ -150,15 +151,20 @@ int drm_mode_getresources(struct drm_device *dev, void *data, continue; if (drm_lease_held(file_priv, connector->base.id)) { - if (count < card_res->count_connectors && - put_user(connector->base.id, connector_id + count)) { - drm_connector_list_iter_end(&conn_iter); - return -EFAULT; + if (connector->connector_type != DRM_MODE_CONNECTOR_DisplayPort || + connector->status != connector_status_disconnected) { + if (count < card_res->count_connectors && + put_user(connector->base.id, connector_id + count)) { + drm_connector_list_iter_end(&conn_iter); + return -EFAULT; + } + DRM_DEBUG_KMS("GetResources: connector %s", connector->name); + count++; } - count++; } } card_res->count_connectors = count; + DRM_DEBUG_KMS("GetResources: writing connectors end - count %d", count); drm_connector_list_iter_end(&conn_iter); return ret; -- 2.17.1

6 years, 9 months

3
7
0 0

[PATCH 4.14 00/62] 4.14.84-stable review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.14.84 release. There are 62 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed Nov 28 10:50:20 UTC 2018. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.84-rc… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.14.84-rc1 Ilya Dryomov <idryomov(a)gmail.com> libceph: fall back to sendmsg for slab pages Eric Biggers <ebiggers(a)google.com> HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges Hans de Goede <hdegoede(a)redhat.com> ACPI / platform: Add SMB0001 HID to forbidden_id_list Gustavo A. R. Silva <gustavo(a)embeddedor.com> drivers/misc/sgi-gru: fix Spectre v1 vulnerability Johan Hovold <johan(a)kernel.org> mtd: rawnand: atmel: fix OF child-node lookup Mattias Jacobsson <2pi(a)mok.nu> USB: misc: appledisplay: add 20" Apple Cinema Display Nathan Chancellor <natechancellor(a)gmail.com> misc: atmel-ssc: Fix section annotation on atmel_ssc_get_driver_data Emmanuel Pescosta <emmanuelpescosta099(a)gmail.com> usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB Kai-Heng Feng <kai.heng.feng(a)canonical.com> USB: quirks: Add no-lpm quirk for Raydium touchscreens Maarten Jacobs <maarten256(a)outlook.com> usb: cdc-acm: add entry for Hiro (Conexant) modem Dan Carpenter <dan.carpenter(a)oracle.com> uio: Fix an Oops on load Aaro Koskinen <aaro.koskinen(a)iki.fi> MIPS: OCTEON: cavium_octeon_defconfig: re-enable OCTEON USB driver Sakari Ailus <sakari.ailus(a)linux.intel.com> media: v4l: event: Add subscription to list before calling "add" operation Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> x86/mm: Move LDT remap out of KASLR region on 5-level paging Adrian Hunter <adrian.hunter(a)intel.com> perf test code-reading: Fix perf_env setup for PTI entry trampolines Adrian Hunter <adrian.hunter(a)intel.com> perf machine: Workaround missing maps for x86 PTI entry trampolines Adrian Hunter <adrian.hunter(a)intel.com> perf machine: Add nr_cpus_avail() Adrian Hunter <adrian.hunter(a)intel.com> perf tools: Fix kernel_start for PTI on x86 Adrian Hunter <adrian.hunter(a)intel.com> perf machine: Add machine__is() to identify machine arch Mika Westerberg <mika.westerberg(a)linux.intel.com> ACPI / watchdog: Prefer iTCO_wdt always when WDAT table uses RTC SRAM YueHaibing <yuehaibing(a)huawei.com> SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer() Minchan Kim <minchan(a)kernel.org> zram: close udev startup race condition as default groups Thor Thayer <thor.thayer(a)linux.intel.com> net: stmmac: Fix RX packet size > 8191 Sagiv Ozeri <sagiv.ozeri(a)cavium.com> qed: Fix potential memory corruption Denis Bolotin <denis.bolotin(a)cavium.com> qed: Fix blocking/unlimited SPQ entries leak Denis Bolotin <denis.bolotin(a)cavium.com> qed: Fix memory/entry leak in qed_init_sp_request() Jacob Keller <jacob.e.keller(a)intel.com> i40e: restore NETIF_F_GSO_IPXIP[46] to netdev features Gustavo Romero <gromero(a)linux.vnet.ibm.com> perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so Valentin Schneider <valentin.schneider(a)arm.com> sched/core: Take the hotplug lock in sched_init_smp() Vignesh R <vigneshr(a)ti.com> i2c: omap: Enable for ARCH_K3 Thomas Richter <tmricht(a)linux.ibm.com> s390/perf: Change CPUM_CF return code in event init function Jeremy Linton <jeremy.linton(a)arm.com> lib/raid6: Fix arm64 test build Ricardo Ribalda Delgado <ricardo.ribalda(a)gmail.com> clk: fixed-factor: fix of_node_get-put imbalance Inki Dae <inki.dae(a)samsung.com> Revert "drm/exynos/decon5433: implement frame counter" Geert Uytterhoeven <geert(a)linux-m68k.org> hwmon: (ibmpowernv) Remove bogus __init annotations Julian Wiedmann <jwi(a)linux.ibm.com> s390/qeth: fix HiperSockets sniffer Taehee Yoo <ap420073(a)gmail.com> netfilter: xt_IDLETIMER: add sysfs filename checking routine Jozsef Kadlecsik <kadlec(a)blackhole.kfki.hu> netfilter: ipset: Correct rcu_dereference() call in ip_set_put_comment() Justin M. Forbes <jforbes(a)fedoraproject.org> s390/mm: Fix ERROR: "__node_distance" undefined! Eric Westbrook <eric(a)westbrook.io> netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net Stefano Brivio <sbrivio(a)redhat.com> netfilter: ipset: list:set: Decrease refcount synchronously on deletion and replace Vasily Gorbik <gor(a)linux.ibm.com> s390/vdso: add missing FORCE to build targets Nathan Chancellor <natechancellor(a)gmail.com> arm64: percpu: Initialize ret in the default case Paul Gortmaker <paul.gortmaker(a)windriver.com> platform/x86: acerhdf: Add BIOS entry for Gateway LT31 v1.3307 Feng Tang <feng.tang(a)intel.com> x86/earlyprintk: Add a force option for pciserial device Zubin Mithra <zsm(a)chromium.org> apparmor: Fix uninitialized value in aa_split_fqname Marek Szyprowski <m.szyprowski(a)samsung.com> clk: samsung: exynos5420: Enable PERIS clocks for suspend Chengguang Xu <cgxu519(a)gmx.com> fs/exofs: fix potential memory leak in mount option parsing David Miller <davem(a)davemloft.net> perf symbols: Set PLT entry/header sizes properly on Sparc Alan Tull <atull(a)kernel.org> clk: fixed-rate: fix of_node_get-put imbalance Rajneesh Bhardwaj <rajneesh.bhardwaj(a)linux.intel.com> platform/x86: intel_telemetry: report debugfs failure Lee, Shawn C <shawn.c.lee(a)intel.com> drm/edid: Add 6 bpc quirk for BOE panel. Richard Weinberger <richard(a)nod.at> um: Give start_idle_thread() a return code Ernesto A. Fernández <ernesto.mnd.fernandez(a)gmail.com> hfsplus: prevent btree data loss on root split Ernesto A. Fernández <ernesto.mnd.fernandez(a)gmail.com> hfs: prevent btree data loss on root split Jann Horn <jannh(a)google.com> reiserfs: propagate errors from fill_with_dentries() properly Radoslaw Tyl <radoslawx.tyl(a)intel.com> ixgbe: fix MAC anti-spoofing filter after VFLR Keith Busch <keith.busch(a)intel.com> nvme-pci: fix conflicting p2p resource adds Anders Roxell <anders.roxell(a)linaro.org> arm64: kprobe: make page to RO mode when allocate it Ronnie Sahlberg <lsahlber(a)redhat.com> cifs: fix return value for cifs_listxattr Colin Ian King <colin.king(a)canonical.com> cifs: don't dereference smb_file_target before null check ------------- Diffstat: Documentation/admin-guide/kernel-parameters.txt | 6 +- Documentation/x86/x86_64/mm.txt | 10 +- Makefile | 4 +- arch/arm64/include/asm/percpu.h | 3 + arch/arm64/kernel/probes/kprobes.c | 27 +++-- arch/mips/configs/cavium_octeon_defconfig | 1 + arch/s390/kernel/perf_cpum_cf.c | 2 +- arch/s390/kernel/vdso32/Makefile | 6 +- arch/s390/kernel/vdso64/Makefile | 6 +- arch/s390/numa/numa.c | 1 + arch/um/os-Linux/skas/process.c | 5 + arch/x86/include/asm/page_64_types.h | 12 ++- arch/x86/include/asm/pgtable_64_types.h | 7 +- arch/x86/kernel/early_printk.c | 29 ++++-- arch/x86/kernel/ldt.c | 49 ++++++--- arch/x86/xen/mmu_pv.c | 6 +- drivers/acpi/acpi_platform.c | 1 + drivers/acpi/acpi_watchdog.c | 72 ++++++++----- drivers/block/zram/zram_drv.c | 26 ++--- drivers/clk/clk-fixed-factor.c | 1 + drivers/clk/clk-fixed-rate.c | 1 + drivers/clk/samsung/clk-exynos5420.c | 1 + drivers/gpu/drm/drm_edid.c | 3 + drivers/gpu/drm/exynos/exynos5433_drm_decon.c | 9 -- drivers/gpu/drm/exynos/exynos_drm_crtc.c | 11 -- drivers/gpu/drm/exynos/exynos_drm_drv.h | 1 - drivers/hid/uhid.c | 12 +++ drivers/hwmon/ibmpowernv.c | 7 +- drivers/i2c/busses/Kconfig | 2 +- drivers/media/v4l2-core/v4l2-event.c | 43 ++++---- drivers/misc/atmel-ssc.c | 2 +- drivers/misc/sgi-gru/grukdump.c | 4 + drivers/mtd/nand/atmel/nand-controller.c | 11 +- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 + drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 4 +- drivers/net/ethernet/qlogic/qed/qed_sp.h | 3 + drivers/net/ethernet/qlogic/qed/qed_sp_commands.c | 16 ++- drivers/net/ethernet/qlogic/qed/qed_spq.c | 69 ++++++------- drivers/net/ethernet/stmicro/stmmac/common.h | 3 +- drivers/net/ethernet/stmicro/stmmac/descs_com.h | 2 +- drivers/net/ethernet/stmicro/stmmac/enh_desc.c | 2 +- drivers/net/ethernet/stmicro/stmmac/ring_mode.c | 2 +- drivers/nvme/host/pci.c | 5 +- drivers/platform/x86/acerhdf.c | 1 + drivers/platform/x86/intel_telemetry_debugfs.c | 8 +- drivers/s390/net/qeth_l3_main.c | 8 +- drivers/uio/uio.c | 7 +- drivers/usb/class/cdc-acm.c | 3 + drivers/usb/core/quirks.c | 8 ++ drivers/usb/misc/appledisplay.c | 1 + fs/cifs/cifsfs.c | 7 +- fs/cifs/smb2ops.c | 11 +- fs/exofs/super.c | 5 +- fs/hfs/brec.c | 4 + fs/hfsplus/brec.c | 4 + fs/reiserfs/xattr.c | 7 ++ include/linux/netfilter/ipset/ip_set.h | 2 +- include/linux/netfilter/ipset/ip_set_comment.h | 4 +- kernel/sched/core.c | 5 +- lib/raid6/test/Makefile | 4 +- net/ceph/messenger.c | 12 ++- net/netfilter/ipset/ip_set_core.c | 23 ++--- net/netfilter/ipset/ip_set_hash_netportnet.c | 8 +- net/netfilter/ipset/ip_set_list_set.c | 17 ++-- net/netfilter/xt_IDLETIMER.c | 20 ++++ net/sunrpc/xdr.c | 2 +- security/apparmor/lib.c | 6 +- tools/perf/jvmti/jvmti_agent.c | 49 +++++++-- tools/perf/tests/code-reading.c | 1 + tools/perf/util/env.c | 32 ++++++ tools/perf/util/env.h | 4 + tools/perf/util/machine.c | 117 +++++++++++++++++++++- tools/perf/util/machine.h | 6 ++ tools/perf/util/symbol-elf.c | 12 ++- tools/perf/util/symbol.c | 12 ++- 75 files changed, 637 insertions(+), 262 deletions(-)

6 years, 9 months

11
79
0 0

FAILED: patch "[PATCH] dax: Avoid losing wakeup in dax_lock_mapping_entry" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 25bbe21bf427a81b8e3ccd480ea0e1d940256156 Mon Sep 17 00:00:00 2001 From: Matthew Wilcox <willy(a)infradead.org> Date: Fri, 16 Nov 2018 15:50:02 -0500 Subject: [PATCH] dax: Avoid losing wakeup in dax_lock_mapping_entry After calling get_unlocked_entry(), you have to call put_unlocked_entry() to avoid subsequent waiters losing wakeups. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: stable(a)vger.kernel.org Signed-off-by: Matthew Wilcox <willy(a)infradead.org> diff --git a/fs/dax.c b/fs/dax.c index cf2394e2bf4b..9bcce89ea18e 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -391,6 +391,7 @@ bool dax_lock_mapping_entry(struct page *page) rcu_read_unlock(); entry = get_unlocked_entry(&xas); xas_unlock_irq(&xas); + put_unlocked_entry(&xas, entry); rcu_read_lock(); continue; }

6 years, 9 months

4
4
0 0

[PATCH] xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc

by Cherian, George

From: "Cherian, George" <George.Cherian(a)cavium.com> commit 11644a7659529730eaf2f166efaabe7c3dc7af8c upstream Implement workaround for ThunderX2 Errata-129 (documented in CN99XX Known Issues" available at Cavium support site). As per ThunderX2errata-129, USB 2 device may come up as USB 1 if a connection to a USB 1 device is followed by another connection to a USB 2 device, the link will come up as USB 1 for the USB 2 device. Resolution: Reset the PHY after the USB 1 device is disconnected. The PHY reset sequence is done using private registers in XHCI register space. After the PHY is reset we check for the PLL lock status and retry the operation if it fails. From our tests, retrying 4 times is sufficient. Add a new quirk flag XHCI_RESET_PLL_ON_DISCONNECT to invoke the workaround in handle_xhci_port_status(). Cc: stable(a)vger.kernel.org Cc: stable(a)vger.kernel.org # 4.14.x: 36b6857: xhci: Allow more than 32 quirks Signed-off-by: George Cherian <george.cherian(a)cavium.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- There is a conflict while cherry-pick of 36b6857: xhci: Allow more than 32 quirks. It is trivial to resolve. Let me know in case if it is an issue. drivers/usb/host/xhci-pci.c | 5 +++++ drivers/usb/host/xhci-ring.c | 35 ++++++++++++++++++++++++++++++++++- drivers/usb/host/xhci.h | 1 + 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 9218f506f8e3..4b07b6859b4c 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -236,6 +236,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) if (pdev->vendor == PCI_VENDOR_ID_TI && pdev->device == 0x8241) xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_7; + if ((pdev->vendor == PCI_VENDOR_ID_BROADCOM || + pdev->vendor == PCI_VENDOR_ID_CAVIUM) && + pdev->device == 0x9026) + xhci->quirks |= XHCI_RESET_PLL_ON_DISCONNECT; + if (xhci->quirks & XHCI_RESET_ON_RESUME) xhci_dbg_trace(xhci, trace_xhci_dbg_quirks, "QUIRK: Resetting on resume"); diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 6996235e34a9..ea35f346d26b 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1568,6 +1568,35 @@ static void handle_device_notification(struct xhci_hcd *xhci, usb_wakeup_notification(udev->parent, udev->portnum); } +/* + * Quirk hanlder for errata seen on Cavium ThunderX2 processor XHCI + * Controller. + * As per ThunderX2errata-129 USB 2 device may come up as USB 1 + * If a connection to a USB 1 device is followed by another connection + * to a USB 2 device. + * + * Reset the PHY after the USB device is disconnected if device speed + * is less than HCD_USB3. + * Retry the reset sequence max of 4 times checking the PLL lock status. + * + */ +static void xhci_cavium_reset_phy_quirk(struct xhci_hcd *xhci) +{ + struct usb_hcd *hcd = xhci_to_hcd(xhci); + u32 pll_lock_check; + u32 retry_count = 4; + + do { + /* Assert PHY reset */ + writel(0x6F, hcd->regs + 0x1048); + udelay(10); + /* De-assert the PHY reset */ + writel(0x7F, hcd->regs + 0x1048); + udelay(200); + pll_lock_check = readl(hcd->regs + 0x1070); + } while (!(pll_lock_check & 0x1) && --retry_count); +} + static void handle_port_status(struct xhci_hcd *xhci, union xhci_trb *event) { @@ -1725,9 +1754,13 @@ static void handle_port_status(struct xhci_hcd *xhci, goto cleanup; } - if (hcd->speed < HCD_USB3) + if (hcd->speed < HCD_USB3) { xhci_test_and_clear_bit(xhci, port_array, faked_port_index, PORT_PLC); + if ((xhci->quirks & XHCI_RESET_PLL_ON_DISCONNECT) && + (portsc & PORT_CSC) && !(portsc & PORT_CONNECT)) + xhci_cavium_reset_phy_quirk(xhci); + } cleanup: /* Update event ring dequeue pointer before dropping the lock */ diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index d7d2a3dfafb8..84457fc192fc 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1836,6 +1836,7 @@ struct xhci_hcd { #define XHCI_U2_DISABLE_WAKE BIT_ULL(27) #define XHCI_ASMEDIA_MODIFY_FLOWCONTROL BIT_ULL(28) #define XHCI_SUSPEND_DELAY BIT_ULL(30) +#define XHCI_RESET_PLL_ON_DISCONNECT BIT_ULL(34) unsigned int num_active_eps; unsigned int limit_active_eps; -- 2.19.2

6 years, 9 months

2
1
0 0

[PATCH AUTOSEL 3.18 1/6] iommu/ipmmu-vmsa: Fix crash on early domain free

by Sasha Levin

From: Geert Uytterhoeven <geert+renesas(a)glider.be> [ Upstream commit e5b78f2e349eef5d4fca5dc1cf5a3b4b2cc27abd ] If iommu_ops.add_device() fails, iommu_ops.domain_free() is still called, leading to a crash, as the domain was only partially initialized: ipmmu-vmsa e67b0000.mmu: Cannot accommodate DMA translation for IOMMU page tables sata_rcar ee300000.sata: Unable to initialize IPMMU context iommu: Failed to add device ee300000.sata to group 0: -22 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 ... Call trace: ipmmu_domain_free+0x1c/0xa0 iommu_group_release+0x48/0x68 kobject_put+0x74/0xe8 kobject_del.part.0+0x3c/0x50 kobject_put+0x60/0xe8 iommu_group_get_for_dev+0xa8/0x1f0 ipmmu_add_device+0x1c/0x40 of_iommu_configure+0x118/0x190 Fix this by checking if the domain's context already exists, before trying to destroy it. Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be> Reviewed-by: Robin Murphy <robin.murphy(a)arm.com> Fixes: d25a2a16f0889 ('iommu: Add driver for Renesas VMSA-compatible IPMMU') Signed-off-by: Joerg Roedel <jroedel(a)suse.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/iommu/ipmmu-vmsa.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 7dab5cbcc775..47e8db51288b 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -383,6 +383,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain) { + if (!domain->mmu) + return; + /* * Disable the context. Flush the TLB as required when modifying the * context registers. -- 2.17.1

6 years, 9 months

1
5
0 0

[PATCH AUTOSEL 4.4 01/13] iommu/vt-d: Fix NULL pointer dereference in prq_event_thread()

by Sasha Levin

From: Lu Baolu <baolu.lu(a)linux.intel.com> [ Upstream commit 19ed3e2dd8549c1a34914e8dad01b64e7837645a ] When handling page request without pasid event, go to "no_pasid" branch instead of "bad_req". Otherwise, a NULL pointer deference will happen there. Cc: Ashok Raj <ashok.raj(a)intel.com> Cc: Jacob Pan <jacob.jun.pan(a)linux.intel.com> Cc: Sohil Mehta <sohil.mehta(a)intel.com> Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com> Fixes: a222a7f0bb6c9 'iommu/vt-d: Implement page request handling' Signed-off-by: Joerg Roedel <jroedel(a)suse.de> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/iommu/intel-svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 10068a481e22..cbde03e509c1 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -558,7 +558,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) pr_err("%s: Page request without PASID: %08llx %08llx\n", iommu->name, ((unsigned long long *)req)[0], ((unsigned long long *)req)[1]); - goto bad_req; + goto no_pasid; } if (!svm || svm->pasid != req->pasid) { -- 2.17.1

6 years, 9 months

1
12
0 0

[PATCH AUTOSEL 4.9 01/18] media: omap3isp: Unregister media device as first

by Sasha Levin

From: Sakari Ailus <sakari.ailus(a)linux.intel.com> [ Upstream commit 30efae3d789cd0714ef795545a46749236e29558 ] While there are issues related to object lifetime management, unregister the media device first when the driver is being unbound. This is slightly safer. Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/media/platform/omap3isp/isp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c index 1e98b4845ea1..a21b12c5c085 100644 --- a/drivers/media/platform/omap3isp/isp.c +++ b/drivers/media/platform/omap3isp/isp.c @@ -1591,6 +1591,8 @@ static void isp_pm_complete(struct device *dev) static void isp_unregister_entities(struct isp_device *isp) { + media_device_unregister(&isp->media_dev); + omap3isp_csi2_unregister_entities(&isp->isp_csi2a); omap3isp_ccp2_unregister_entities(&isp->isp_ccp2); omap3isp_ccdc_unregister_entities(&isp->isp_ccdc); @@ -1601,7 +1603,6 @@ static void isp_unregister_entities(struct isp_device *isp) omap3isp_stat_unregister_entities(&isp->isp_hist); v4l2_device_unregister(&isp->v4l2_dev); - media_device_unregister(&isp->media_dev); media_device_cleanup(&isp->media_dev); } -- 2.17.1

6 years, 9 months

1
17
0 0

+ mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/khugepaged: collapse_shmem() do not crash on Compound has been added to the -mm tree. Its filename is mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-collapse_shmem-do-no… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-collapse_shmem-do-no… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/khugepaged: collapse_shmem() do not crash on Compound collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before it holds page lock of the first page, racing truncation then extension might conceivably have inserted a hugepage there already. Fail with the SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or otherwise mishandling the unexpected hugepage - though later we might code up a more constructive way of handling it, with SCAN_SUCCESS. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd(a)google.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-do-not-crash-on-compound +++ a/mm/khugepaged.c @@ -1399,7 +1399,15 @@ static void collapse_shmem(struct mm_str */ VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageUptodate(page), page); - VM_BUG_ON_PAGE(PageTransCompound(page), page); + + /* + * If file was truncated then extended, or hole-punched, before + * we locked the first page, then a THP might be there already. + */ + if (PageTransCompound(page)) { + result = SCAN_PAGE_COMPOUND; + goto out_unlock; + } if (page_mapping(page) != mapping) { result = SCAN_TRUNCATED; _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-khugepaged-collapse_shmem-without-freezing-new_page.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/khugepaged: collapse_shmem() without freezing new_page has been added to the -mm tree. Its filename is mm-khugepaged-collapse_shmem-without-freezing-new_page.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-collapse_shmem-witho… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-collapse_shmem-witho… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/khugepaged: collapse_shmem() without freezing new_page khugepaged's collapse_shmem() does almost all of its work, to assemble the huge new_page from 512 scattered old pages, with the new_page's refcount frozen to 0 (and refcounts of all old pages so far also frozen to 0). Including shmem_getpage() to read in any which were out on swap, memory reclaim if necessary to allocate their intermediate pages, and copying over all the data from old to new. Imagine the frozen refcount as a spinlock held, but without any lock debugging to highlight the abuse: it's not good, and under serious load heads into lockups - speculative getters of the page are not expecting to spin while khugepaged is rescheduled. One can get a little further under load by hacking around elsewhere; but fortunately, freezing the new_page turns out to have been entirely unnecessary, with no hacks needed elsewhere. The huge new_page lock is already held throughout, and guards all its subpages as they are brought one by one into the page cache tree; and anything reading the data in that page, without the lock, before it has been marked PageUptodate, would already be in the wrong. So simply eliminate the freezing of the new_page. Each of the old pages remains frozen with refcount 0 after it has been replaced by a new_page subpage in the page cache tree, until they are all unfrozen on success or failure: just as before. They could be unfrozen sooner, but cause no problem once no longer visible to find_get_entry(), filemap_map_pages() and other speculative lookups. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-without-freezing-new_page +++ a/mm/khugepaged.c @@ -1287,7 +1287,7 @@ static void retract_page_tables(struct a * collapse_shmem - collapse small tmpfs/shmem pages into huge one. * * Basic scheme is simple, details are more complex: - * - allocate and freeze a new huge page; + * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap in pages if necessary; * + fill in gaps; @@ -1295,11 +1295,11 @@ static void retract_page_tables(struct a * - if replacing succeeds: * + copy data over; * + free old pages; - * + unfreeze huge page; + * + unlock huge page; * - if replacing failed; * + put all pages back and unfreeze them; * + restore gaps in the page cache; - * + free huge page; + * + unlock and free huge page; */ static void collapse_shmem(struct mm_struct *mm, struct address_space *mapping, pgoff_t start, @@ -1333,13 +1333,11 @@ static void collapse_shmem(struct mm_str __SetPageSwapBacked(new_page); new_page->index = start; new_page->mapping = mapping; - BUG_ON(!page_ref_freeze(new_page, 1)); /* - * At this point the new_page is 'frozen' (page_count() is zero), - * locked and not up-to-date. It's safe to insert it into the page - * cache, because nobody would be able to map it or use it in other - * way until we unfreeze it. + * At this point the new_page is locked and not up-to-date. + * It's safe to insert it into the page cache, because nobody would + * be able to map it or use it in another way until we unlock it. */ /* This will be less messy when we use multi-index entries */ @@ -1491,9 +1489,8 @@ xa_unlocked: index++; } - /* Everything is ready, let's unfreeze the new_page */ SetPageUptodate(new_page); - page_ref_unfreeze(new_page, HPAGE_PMD_NR); + page_ref_add(new_page, HPAGE_PMD_NR - 1); set_page_dirty(new_page); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_anon(new_page); @@ -1541,8 +1538,6 @@ xa_unlocked: VM_BUG_ON(nr_none); xas_unlock_irq(&xas); - /* Unfreeze new_page, caller would take care about freeing it */ - page_ref_unfreeze(new_page, 1); mem_cgroup_cancel_charge(new_page, memcg, true); new_page->mapping = NULL; } _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-khugepaged-minor-reorderings-in-collapse_shmem.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/khugepaged: minor reorderings in collapse_shmem() has been added to the -mm tree. Its filename is mm-khugepaged-minor-reorderings-in-collapse_shmem.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-minor-reorderings-in… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-minor-reorderings-in… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/khugepaged: minor reorderings in collapse_shmem() Several cleanups in collapse_shmem(): most of which probably do not really matter, beyond doing things in a more familiar and reassuring order. Simplify the failure gotos in the main loop, and on success update stats while interrupts still disabled from the last iteration. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd(a)google.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/khugepaged.c~mm-khugepaged-minor-reorderings-in-collapse_shmem +++ a/mm/khugepaged.c @@ -1329,10 +1329,10 @@ static void collapse_shmem(struct mm_str goto out; } + __SetPageLocked(new_page); + __SetPageSwapBacked(new_page); new_page->index = start; new_page->mapping = mapping; - __SetPageSwapBacked(new_page); - __SetPageLocked(new_page); BUG_ON(!page_ref_freeze(new_page, 1)); /* @@ -1366,13 +1366,13 @@ static void collapse_shmem(struct mm_str if (index == start) { if (!xas_next_entry(&xas, end - 1)) { result = SCAN_TRUNCATED; - break; + goto xa_locked; } xas_set(&xas, index); } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; - break; + goto xa_locked; } xas_store(&xas, new_page + (index % HPAGE_PMD_NR)); nr_none++; @@ -1387,13 +1387,12 @@ static void collapse_shmem(struct mm_str result = SCAN_FAIL; goto xa_unlocked; } - xas_lock_irq(&xas); - xas_set(&xas, index); } else if (trylock_page(page)) { get_page(page); + xas_unlock_irq(&xas); } else { result = SCAN_PAGE_LOCK; - break; + goto xa_locked; } /* @@ -1408,11 +1407,10 @@ static void collapse_shmem(struct mm_str result = SCAN_TRUNCATED; goto out_unlock; } - xas_unlock_irq(&xas); if (isolate_lru_page(page)) { result = SCAN_DEL_PAGE_LRU; - goto out_isolate_failed; + goto out_unlock; } if (page_mapped(page)) @@ -1432,7 +1430,9 @@ static void collapse_shmem(struct mm_str */ if (!page_ref_freeze(page, 3)) { result = SCAN_PAGE_COUNT; - goto out_lru; + xas_unlock_irq(&xas); + putback_lru_page(page); + goto out_unlock; } /* @@ -1444,24 +1444,26 @@ static void collapse_shmem(struct mm_str /* Finally, replace with the new page. */ xas_store(&xas, new_page + (index % HPAGE_PMD_NR)); continue; -out_lru: - xas_unlock_irq(&xas); - putback_lru_page(page); -out_isolate_failed: - unlock_page(page); - put_page(page); - goto xa_unlocked; out_unlock: unlock_page(page); put_page(page); - break; + goto xa_unlocked; + } + + __inc_node_page_state(new_page, NR_SHMEM_THPS); + if (nr_none) { + struct zone *zone = page_zone(new_page); + + __mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none); + __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none); } - xas_unlock_irq(&xas); +xa_locked: + xas_unlock_irq(&xas); xa_unlocked: + if (result == SCAN_SUCCEED) { struct page *page, *tmp; - struct zone *zone = page_zone(new_page); /* * Replacing old pages with new one has succeeded, now we @@ -1476,11 +1478,11 @@ xa_unlocked: copy_highpage(new_page + (page->index % HPAGE_PMD_NR), page); list_del(&page->lru); - unlock_page(page); - page_ref_unfreeze(page, 1); page->mapping = NULL; + page_ref_unfreeze(page, 1); ClearPageActive(page); ClearPageUnevictable(page); + unlock_page(page); put_page(page); index++; } @@ -1489,28 +1491,17 @@ xa_unlocked: index++; } - local_irq_disable(); - __inc_node_page_state(new_page, NR_SHMEM_THPS); - if (nr_none) { - __mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none); - __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none); - } - local_irq_enable(); - - /* - * Remove pte page tables, so we can re-fault - * the page as huge. - */ - retract_page_tables(mapping, start); - /* Everything is ready, let's unfreeze the new_page */ - set_page_dirty(new_page); SetPageUptodate(new_page); page_ref_unfreeze(new_page, HPAGE_PMD_NR); + set_page_dirty(new_page); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_anon(new_page); - unlock_page(new_page); + /* + * Remove pte page tables, so we can re-fault the page as huge. + */ + retract_page_tables(mapping, start); *hpage = NULL; khugepaged_pages_collapsed++; @@ -1543,8 +1534,8 @@ xa_unlocked: xas_store(&xas, page); xas_pause(&xas); xas_unlock_irq(&xas); - putback_lru_page(page); unlock_page(page); + putback_lru_page(page); xas_lock_irq(&xas); } VM_BUG_ON(nr_none); @@ -1553,9 +1544,10 @@ xa_unlocked: /* Unfreeze new_page, caller would take care about freeing it */ page_ref_unfreeze(new_page, 1); mem_cgroup_cancel_charge(new_page, memcg, true); - unlock_page(new_page); new_page->mapping = NULL; } + + unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); /* TODO: tracepoints */ _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/khugepaged: collapse_shmem() remember to clear holes has been added to the -mm tree. Its filename is mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-collapse_shmem-remem… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-collapse_shmem-remem… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/khugepaged: collapse_shmem() remember to clear holes Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp flags khugepaged uses to allocate a huge page - in all common cases it would just be a waste of effort - so collapse_shmem() must remember to clear out any holes that it instantiates. The obvious place to do so, where they are put into the page cache tree, is not a good choice: because interrupts are disabled there. Leave it until further down, once success is assured, where the other pages are copied (before setting PageUptodate). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-remember-to-clear-holes +++ a/mm/khugepaged.c @@ -1467,7 +1467,12 @@ xa_unlocked: * Replacing old pages with new one has succeeded, now we * need to copy the content and free the old pages. */ + index = start; list_for_each_entry_safe(page, tmp, &pagelist, lru) { + while (index < page->index) { + clear_highpage(new_page + (index % HPAGE_PMD_NR)); + index++; + } copy_highpage(new_page + (page->index % HPAGE_PMD_NR), page); list_del(&page->lru); @@ -1477,6 +1482,11 @@ xa_unlocked: ClearPageActive(page); ClearPageUnevictable(page); put_page(page); + index++; + } + while (index < end) { + clear_highpage(new_page + (index % HPAGE_PMD_NR)); + index++; } local_irq_disable(); _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/khugepaged: fix crashes due to misaccounted holes has been added to the -mm tree. Its filename is mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-fix-crashes-due-to-m… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-fix-crashes-due-to-m… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/khugepaged: fix crashes due to misaccounted holes Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later closed and unlinked. khugepaged's collapse_shmem() was forgetting to update mapping->nrpages on the rollback path, after it had added but then needs to undo some holes. There is indeed an irritating asymmetry between shmem_charge(), whose callers want it to increment nrpages after successfully accounting blocks, and shmem_uncharge(), when __delete_from_page_cache() already decremented nrpages itself: oh well, just add a comment on that to them both. And shmem_recalc_inode() is supposed to be called when the accounting is expected to be in balance (so it can deduce from imbalance that reclaim discarded some pages): so change shmem_charge() to update nrpages earlier (though it's rare for the difference to matter at all). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils Fixes: 800d8c63b2e98 ("shmem: add huge pages support") Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/khugepaged.c~mm-khugepaged-fix-crashes-due-to-misaccounted-holes +++ a/mm/khugepaged.c @@ -1506,9 +1506,12 @@ xa_unlocked: khugepaged_pages_collapsed++; } else { struct page *page; + /* Something went wrong: roll back page cache changes */ - shmem_uncharge(mapping->host, nr_none); xas_lock_irq(&xas); + mapping->nrpages -= nr_none; + shmem_uncharge(mapping->host, nr_none); + xas_set(&xas, start); xas_for_each(&xas, page, end - 1) { page = list_first_entry_or_null(&pagelist, --- a/mm/shmem.c~mm-khugepaged-fix-crashes-due-to-misaccounted-holes +++ a/mm/shmem.c @@ -297,12 +297,14 @@ bool shmem_charge(struct inode *inode, l if (!shmem_inode_acct_block(inode, pages)) return false; + /* nrpages adjustment first, then shmem_recalc_inode() when balanced */ + inode->i_mapping->nrpages += pages; + spin_lock_irqsave(&info->lock, flags); info->alloced += pages; inode->i_blocks += pages * BLOCKS_PER_PAGE; shmem_recalc_inode(inode); spin_unlock_irqrestore(&info->lock, flags); - inode->i_mapping->nrpages += pages; return true; } @@ -312,6 +314,8 @@ void shmem_uncharge(struct inode *inode, struct shmem_inode_info *info = SHMEM_I(inode); unsigned long flags; + /* nrpages adjustment done by __delete_from_page_cache() or caller */ + spin_lock_irqsave(&info->lock, flags); info->alloced -= pages; inode->i_blocks -= pages * BLOCKS_PER_PAGE; _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/khugepaged: collapse_shmem() stop if punched or truncated has been added to the -mm tree. Its filename is mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-collapse_shmem-stop-… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-collapse_shmem-stop-… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/khugepaged: collapse_shmem() stop if punched or truncated Huge tmpfs testing showed that although collapse_shmem() recognizes a concurrently truncated or hole-punched page correctly, its handling of holes was liable to refill an emptied extent. Add check to stop that. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins <hughd(a)google.com> Reviewed-by: Matthew Wilcox <willy(a)infradead.org> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/khugepaged.c~mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated +++ a/mm/khugepaged.c @@ -1359,6 +1359,17 @@ static void collapse_shmem(struct mm_str VM_BUG_ON(index != xas.xa_index); if (!page) { + /* + * Stop if extent has been truncated or hole-punched, + * and is now completely empty. + */ + if (index == start) { + if (!xas_next_entry(&xas, end - 1)) { + result = SCAN_TRUNCATED; + break; + } + xas_set(&xas, index); + } if (!shmem_charge(mapping->host, 1)) { result = SCAN_FAIL; break; _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() has been added to the -mm tree. Its filename is mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-huge_memory-fix-lockdep-complai… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-huge_memory-fix-lockdep-complai… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that __split_huge_page() was using i_size_read() while holding the irq-safe lru_lock and page tree lock, but the 32-bit i_size_read() uses an irq-unsafe seqlock which should not be nested inside them. Instead, read the i_size earlier in split_huge_page_to_list(), and pass the end offset down to __split_huge_page(): all while holding head page lock, which is enough to prevent truncation of that extent before the page tree lock has been taken. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils Fixes: baa355fd33142 ("thp: file pages support for split_huge_page()") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/huge_memory.c~mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read +++ a/mm/huge_memory.c @@ -2439,12 +2439,11 @@ static void __split_huge_page_tail(struc } static void __split_huge_page(struct page *page, struct list_head *list, - unsigned long flags) + pgoff_t end, unsigned long flags) { struct page *head = compound_head(page); struct zone *zone = page_zone(head); struct lruvec *lruvec; - pgoff_t end = -1; int i; lruvec = mem_cgroup_page_lruvec(head, zone->zone_pgdat); @@ -2452,9 +2451,6 @@ static void __split_huge_page(struct pag /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head); - if (!PageAnon(page)) - end = DIV_ROUND_UP(i_size_read(head->mapping->host), PAGE_SIZE); - for (i = HPAGE_PMD_NR - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); /* Some pages can be beyond i_size: drop them from page cache */ @@ -2626,6 +2622,7 @@ int split_huge_page_to_list(struct page int count, mapcount, extra_pins, ret; bool mlocked; unsigned long flags; + pgoff_t end; VM_BUG_ON_PAGE(is_huge_zero_page(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -2648,6 +2645,7 @@ int split_huge_page_to_list(struct page ret = -EBUSY; goto out; } + end = -1; mapping = NULL; anon_vma_lock_write(anon_vma); } else { @@ -2661,6 +2659,15 @@ int split_huge_page_to_list(struct page anon_vma = NULL; i_mmap_lock_read(mapping); + + /* + *__split_huge_page() may need to trim off pages beyond EOF: + * but on 32-bit, i_size_read() takes an irq-unsafe seqlock, + * which cannot be nested inside the page tree lock. So note + * end now: i_size itself may be changed at any moment, but + * head page lock is good enough to serialize the trimming. + */ + end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); } /* @@ -2707,7 +2714,7 @@ int split_huge_page_to_list(struct page if (mapping) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); - __split_huge_page(page, list, flags); + __split_huge_page(page, list, end, flags); if (PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/huge_memory: splitting set mapping+index before unfreeze has been added to the -mm tree. Its filename is mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-huge_memory-splitting-set-mappi… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-huge_memory-splitting-set-mappi… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/huge_memory: splitting set mapping+index before unfreeze Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page). Move the setting of mapping and index up before the page_ref_unfreeze() in __split_huge_page_tail() to fix this: so that a page cache lookup cannot get a reference while the tail's mapping and index are unstable. In fact, might as well move them up before the smp_wmb(): I don't see an actual need for that, but if I'm missing something, this way round is safer than the other, and no less efficient. You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is misplaced, and should be left until after the trylock_page(); but left as is has not crashed since, and gives more stringent assurance. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()") Requires: 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/huge_memory.c~mm-huge_memory-splitting-set-mappingindex-before-unfreeze +++ a/mm/huge_memory.c @@ -2402,6 +2402,12 @@ static void __split_huge_page_tail(struc (1L << PG_unevictable) | (1L << PG_dirty))); + /* ->mapping in first tail page is compound_mapcount */ + VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, + page_tail); + page_tail->mapping = head->mapping; + page_tail->index = head->index + tail; + /* Page flags must be visible before we make the page non-compound. */ smp_wmb(); @@ -2422,12 +2428,6 @@ static void __split_huge_page_tail(struc if (page_is_idle(head)) set_page_idle(page_tail); - /* ->mapping in first tail page is compound_mapcount */ - VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, - page_tail); - page_tail->mapping = head->mapping; - - page_tail->index = head->index + tail; page_cpupid_xchg_last(page_tail, page_cpupid_last(head)); /* _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

+ mm-huge_memory-rename-freeze_page-to-unmap_page.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/huge_memory: rename freeze_page() to unmap_page() has been added to the -mm tree. Its filename is mm-huge_memory-rename-freeze_page-to-unmap_page.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-huge_memory-rename-freeze_page-… and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-huge_memory-rename-freeze_page-… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/huge_memory: rename freeze_page() to unmap_page() The term "freeze" is used in several ways in the kernel, and in mm it has the particular meaning of forcing page refcount temporarily to 0. freeze_page() is just too confusing a name for a function that unmaps a page: rename it unmap_page(), and rename unfreeze_page() remap_page(). Went to change the mention of freeze_page() added later in mm/rmap.c, but found it to be incorrect: ordinary page reclaim reaches there too; but the substance of the comment still seems correct, so edit it down. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Jerome Glisse <jglisse(a)redhat.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/huge_memory.c~mm-huge_memory-rename-freeze_page-to-unmap_page +++ a/mm/huge_memory.c @@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are } } -static void freeze_page(struct page *page) +static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; @@ -2365,7 +2365,7 @@ static void freeze_page(struct page *pag VM_BUG_ON_PAGE(!unmap_success, page); } -static void unfreeze_page(struct page *page) +static void remap_page(struct page *page) { int i; if (PageTransHuge(page)) { @@ -2483,7 +2483,7 @@ static void __split_huge_page(struct pag spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); - unfreeze_page(head); + remap_page(head); for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; @@ -2664,7 +2664,7 @@ int split_huge_page_to_list(struct page } /* - * Racy check if we can split the page, before freeze_page() will + * Racy check if we can split the page, before unmap_page() will * split PMDs */ if (!can_split_huge_page(head, &extra_pins)) { @@ -2673,7 +2673,7 @@ int split_huge_page_to_list(struct page } mlocked = PageMlocked(page); - freeze_page(head); + unmap_page(head); VM_BUG_ON_PAGE(compound_mapcount(head), head); /* Make sure the page is not on per-CPU pagevec as it takes pin */ @@ -2727,7 +2727,7 @@ int split_huge_page_to_list(struct page fail: if (mapping) xa_unlock(&mapping->i_pages); spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); - unfreeze_page(head); + remap_page(head); ret = -EBUSY; } --- a/mm/rmap.c~mm-huge_memory-rename-freeze_page-to-unmap_page +++ a/mm/rmap.c @@ -1627,16 +1627,9 @@ static bool try_to_unmap_one(struct page address + PAGE_SIZE); } else { /* - * We should not need to notify here as we reach this - * case only from freeze_page() itself only call from - * split_huge_page_to_list() so everything below must - * be true: - * - page is not anonymous - * - page is locked - * - * So as it is a locked file back page thus it can not - * be remove from the page cache and replace by a new - * page before mmu_notifier_invalidate_range_end so no + * This is a locked file-backed page, thus it cannot + * be removed from the page cache and replaced by a new + * page before mmu_notifier_invalidate_range_end, so no * concurrent thread might update its page table to * point at new page while a device still is using this * page. _ Patches currently in -mm which might be from hughd(a)google.com are mm-huge_memory-rename-freeze_page-to-unmap_page.patch mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch mm-khugepaged-minor-reorderings-in-collapse_shmem.patch mm-khugepaged-collapse_shmem-without-freezing-new_page.patch mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch mm-khugepaged-fix-the-xas_create_range-error-path.patch mm-put_and_wait_on_page_locked-while-page-is-migrated.patch

6 years, 9 months

1
0
0 0

[PATCH] scsi: storvsc: Fix a race in sub-channel creation that can cause panic

by kys＠linuxonhyperv.com

From: Dexuan Cui <decui(a)microsoft.com> We can concurrently try to open the same sub-channel from 2 paths: path #1: vmbus_onoffer() -> vmbus_process_offer() -> handle_sc_creation(). path #2: storvsc_probe() -> storvsc_connect_to_vsp() -> -> storvsc_channel_init() -> handle_multichannel_storage() -> -> vmbus_are_subchannels_present() -> handle_sc_creation(). They conflict with each other, but it was not an issue before the recent commit ae6935ed7d42 ("vmbus: split ring buffer allocation from open"), because at the beginning of vmbus_open() we checked newchannel->state so only one path could succeed, and the other would return with -EINVAL. After ae6935ed7d42, the failing path frees the channel's ringbuffer by vmbus_free_ring(), and this causes a panic later. Commit ae6935ed7d42 itself is good, and it just reveals the longstanding race. We can resolve the issue by removing path #2, i.e. removing the second vmbus_are_subchannels_present() in handle_multichannel_storage(). BTW, the comment "Check to see if sub-channels have already been created" in handle_multichannel_storage() is incorrect: when we unload the driver, we first close the sub-channel(s) and then close the primary channel, next the host sends rescind-offer message(s) so primary->sc_list will become empty. This means the first vmbus_are_subchannels_present() in handle_multichannel_storage() is never useful. Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open") Cc: stable(a)vger.kernel.org Cc: Long Li <longli(a)microsoft.com> Cc: Stephen Hemminger <sthemmin(a)microsoft.com> Cc: K. Y. Srinivasan <kys(a)microsoft.com> Cc: Haiyang Zhang <haiyangz(a)microsoft.com> Signed-off-by: Dexuan Cui <decui(a)microsoft.com> Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com> --- drivers/scsi/storvsc_drv.c | 61 +++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 31 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index f03dc03a42c3..8f88348ebe42 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -446,7 +446,6 @@ struct storvsc_device { bool destroy; bool drain_notify; - bool open_sub_channel; atomic_t num_outstanding_req; struct Scsi_Host *host; @@ -636,33 +635,38 @@ static inline struct storvsc_device *get_in_stor_device( static void handle_sc_creation(struct vmbus_channel *new_sc) { struct hv_device *device = new_sc->primary_channel->device_obj; + struct device *dev = &device->device; struct storvsc_device *stor_device; struct vmstorage_channel_properties props; + int ret; stor_device = get_out_stor_device(device); if (!stor_device) return; - if (stor_device->open_sub_channel == false) - return; - memset(&props, 0, sizeof(struct vmstorage_channel_properties)); - vmbus_open(new_sc, - storvsc_ringbuffer_size, - storvsc_ringbuffer_size, - (void *)&props, - sizeof(struct vmstorage_channel_properties), - storvsc_on_channel_callback, new_sc); + ret = vmbus_open(new_sc, + storvsc_ringbuffer_size, + storvsc_ringbuffer_size, + (void *)&props, + sizeof(struct vmstorage_channel_properties), + storvsc_on_channel_callback, new_sc); - if (new_sc->state == CHANNEL_OPENED_STATE) { - stor_device->stor_chns[new_sc->target_cpu] = new_sc; - cpumask_set_cpu(new_sc->target_cpu, &stor_device->alloced_cpus); + /* In case vmbus_open() fails, we don't use the sub-channel. */ + if (ret != 0) { + dev_err(dev, "Failed to open sub-channel: err=%d\n", ret); + return; } + + /* Add the sub-channel to the array of available channels. */ + stor_device->stor_chns[new_sc->target_cpu] = new_sc; + cpumask_set_cpu(new_sc->target_cpu, &stor_device->alloced_cpus); } static void handle_multichannel_storage(struct hv_device *device, int max_chns) { + struct device *dev = &device->device; struct storvsc_device *stor_device; int num_cpus = num_online_cpus(); int num_sc; @@ -679,21 +683,11 @@ static void handle_multichannel_storage(struct hv_device *device, int max_chns) request = &stor_device->init_request; vstor_packet = &request->vstor_packet; - stor_device->open_sub_channel = true; /* * Establish a handler for dealing with subchannels. */ vmbus_set_sc_create_callback(device->channel, handle_sc_creation); - /* - * Check to see if sub-channels have already been created. This - * can happen when this driver is re-loaded after unloading. - */ - - if (vmbus_are_subchannels_present(device->channel)) - return; - - stor_device->open_sub_channel = false; /* * Request the host to create sub-channels. */ @@ -710,23 +704,29 @@ static void handle_multichannel_storage(struct hv_device *device, int max_chns) VM_PKT_DATA_INBAND, VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); - if (ret != 0) + if (ret != 0) { + dev_err(dev, "Failed to create sub-channel: err=%d\n", ret); return; + } t = wait_for_completion_timeout(&request->wait_event, 10*HZ); - if (t == 0) + if (t == 0) { + dev_err(dev, "Failed to create sub-channel: timed out\n"); return; + } if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO || - vstor_packet->status != 0) + vstor_packet->status != 0) { + dev_err(dev, "Failed to create sub-channel: op=%d, sts=%d\n", + vstor_packet->operation, vstor_packet->status); return; + } /* - * Now that we created the sub-channels, invoke the check; this - * may trigger the callback. + * We need to do nothing here, because vmbus_process_offer() + * invokes channel->sc_creation_callback, which will open and use + * the sub-channel(s). */ - stor_device->open_sub_channel = true; - vmbus_are_subchannels_present(device->channel); } static void cache_wwn(struct storvsc_device *stor_device, @@ -1794,7 +1794,6 @@ static int storvsc_probe(struct hv_device *device, } stor_device->destroy = false; - stor_device->open_sub_channel = false; init_waitqueue_head(&stor_device->waiting_to_drain); stor_device->device = device; stor_device->host = host; -- 2.19.1

6 years, 9 months

3
2
0 0

[PATCH v3 2/7] zram: fix double free backing device

by Minchan Kim

If blkdev_get fails, we shouldn't do blkdev_put. Otherwise, kernel emits below log. This patch fixes it. [ 31.073006] WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120 [ 31.075104] Modules linked in: [ 31.075898] CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453 [ 31.077484] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 31.079589] RIP: 0010:blkdev_put+0x105/0x120 [ 31.080606] Code: 48 c7 80 a0 00 00 00 00 00 00 00 48 c7 c7 40 e7 40 96 e8 6e 47 73 00 48 8b bb e0 00 00 00 e9 2c ff ff ff 0f 0b e9 75 ff ff ff <0f> 0b e9 5a ff ff ff 48 c7 80 a0 00 00 00 00 00 00 00 eb 87 0f 1f [ 31.085080] RSP: 0018:ffffb409005c7ed0 EFLAGS: 00010297 [ 31.086383] RAX: ffff9779fe5a8040 RBX: ffff9779fbc17300 RCX: 00000000b9fc37a4 [ 31.088105] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff9640e740 [ 31.089850] RBP: ffff9779fbc17318 R08: ffffffff95499a89 R09: 0000000000000004 [ 31.091201] R10: ffffb409005c7e50 R11: 7a9ef6088ff4d4a1 R12: 0000000000000083 [ 31.092276] R13: ffff9779fe607b98 R14: 0000000000000000 R15: ffff9779fe607a38 [ 31.093355] FS: 00007fc118d9b840(0000) GS:ffff9779fc600000(0000) knlGS:0000000000000000 [ 31.094582] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 31.095541] CR2: 00007fc11894b8dc CR3: 00000000339f6001 CR4: 0000000000160ef0 [ 31.096781] Call Trace: [ 31.097212] __x64_sys_swapoff+0x46d/0x490 [ 31.097914] do_syscall_64+0x5a/0x190 [ 31.098550] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 31.099402] RIP: 0033:0x7fc11843ec27 [ 31.100013] Code: 73 01 c3 48 8b 0d 71 62 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 a8 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 62 2c 00 f7 d8 64 89 01 48 [ 31.103149] RSP: 002b:00007ffdf69be648 EFLAGS: 00000206 ORIG_RAX: 00000000000000a8 [ 31.104425] RAX: ffffffffffffffda RBX: 00000000011d98c0 RCX: 00007fc11843ec27 [ 31.105627] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000011d98c0 [ 31.106847] RBP: 0000000000000001 R08: 00007ffdf69be690 R09: 0000000000000001 [ 31.108038] R10: 00000000000002b1 R11: 0000000000000206 R12: 0000000000000001 [ 31.109231] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 31.110433] irq event stamp: 4466 [ 31.111001] hardirqs last enabled at (4465): [<ffffffff953ebd43>] __free_pages_ok+0x1e3/0x490 [ 31.112437] hardirqs last disabled at (4466): [<ffffffff95201b7a>] trace_hardirqs_off_thunk+0x1a/0x1c [ 31.113973] softirqs last enabled at (3420): [<ffffffff95e00333>] __do_softirq+0x333/0x446 [ 31.115364] softirqs last disabled at (3407): [<ffffffff9527aee1>] irq_exit+0xd1/0xe0 Cc: stable(a)vger.kernel.org # 4.14+ Signed-off-by: Minchan Kim <minchan(a)kernel.org> --- drivers/block/zram/zram_drv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 21a7046958a3..d1459cc1159f 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -387,8 +387,10 @@ static ssize_t backing_dev_store(struct device *dev, bdev = bdgrab(I_BDEV(inode)); err = blkdev_get(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram); - if (err < 0) + if (err < 0) { + bdev = NULL; goto out; + } nr_pages = i_size_read(inode) >> PAGE_SHIFT; bitmap_sz = BITS_TO_LONGS(nr_pages) * sizeof(long); -- 2.20.0.rc0.387.gc7a69e6b6c-goog

6 years, 9 months

2
1
0 0

[PATCH] drm/amdgpu: don't expose fan attributes on APUs

by Alex Deucher

They don't have a fan controller. Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c index 0de8650c5d6e..1f61ed95727c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c @@ -1644,6 +1644,19 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj, attr == &sensor_dev_attr_fan1_enable.dev_attr.attr)) return 0; + /* Skip fan attributes on APU */ + if ((adev->flags & AMD_IS_APU) && + (attr == &sensor_dev_attr_pwm1.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_max.dev_attr.attr || + attr == &sensor_dev_attr_pwm1_min.dev_attr.attr || + attr == &sensor_dev_attr_fan1_input.dev_attr.attr || + attr == &sensor_dev_attr_fan1_min.dev_attr.attr || + attr == &sensor_dev_attr_fan1_max.dev_attr.attr || + attr == &sensor_dev_attr_fan1_target.dev_attr.attr || + attr == &sensor_dev_attr_fan1_enable.dev_attr.attr)) + return 0; + /* Skip limit attributes if DPM is not enabled */ if (!adev->pm.dpm_enabled && (attr == &sensor_dev_attr_temp1_crit.dev_attr.attr || -- 2.13.6

6 years, 9 months

2
1
0 0

+ zram-writeback-throttle.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: zram: writeback throttle has been added to the -mm tree. Its filename is zram-writeback-throttle.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/zram-writeback-throttle.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/zram-writeback-throttle.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Minchan Kim <minchan(a)kernel.org> Subject: zram: writeback throttle On small memory systems there are lots of write IOs so if we use a flash device as swap there would be serious flash wearout. To overcome this problem, system developers need to design a write limitation strategy to guarantee flash health for the entire product life. This patch creates a new knob "writeback_limit" on zram. With that, if the current writeback IO count (/sys/block/zramX/io_stat) exceeds the limitation, zram stops further writeback until the admin can reset the limit. Link: http://lkml.kernel.org/r/20181127055429.251614-8-minchan@kernel.org Signed-off-by: Minchan Kim <minchan(a)kernel.org> Cc: Joey Pabalinas <joeypabalinas(a)gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com> Cc: <stable(a)vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/ABI/testing/sysfs-block-zram | 9 +++ Documentation/blockdev/zram.txt | 2 drivers/block/zram/zram_drv.c | 47 ++++++++++++++++++- drivers/block/zram/zram_drv.h | 2 4 files changed, 59 insertions(+), 1 deletion(-) --- a/Documentation/ABI/testing/sysfs-block-zram~zram-writeback-throttle +++ a/Documentation/ABI/testing/sysfs-block-zram @@ -121,3 +121,12 @@ Description: The bd_stat file is read-only and represents backing device's statistics (bd_count, bd_reads, bd_writes) in a format similar to block layer statistics file format. + +What: /sys/block/zram<id>/writeback_limit +Date: November 2018 +Contact: Minchan Kim <minchan(a)kernel.org> +Description: + The writeback_limit file is read-write and specifies the maximum + amount of writeback ZRAM can do. The limit could be changed + in run time and "0" means disable the limit. + No limit is the initial state. --- a/Documentation/blockdev/zram.txt~zram-writeback-throttle +++ a/Documentation/blockdev/zram.txt @@ -164,6 +164,8 @@ reset WO trigger device r mem_used_max WO reset the `mem_used_max' counter (see later) mem_limit WO specifies the maximum amount of memory ZRAM can use to store the compressed data +writeback_limit WO specifies the maximum amount of write IO zram can + write out to backing device as 4KB unit max_comp_streams RW the number of possible concurrent compress operations comp_algorithm RW show and change the compression algorithm compact WO trigger memory compaction --- a/drivers/block/zram/zram_drv.c~zram-writeback-throttle +++ a/drivers/block/zram/zram_drv.c @@ -330,6 +330,40 @@ next: } #ifdef CONFIG_ZRAM_WRITEBACK + +static ssize_t writeback_limit_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + struct zram *zram = dev_to_zram(dev); + u64 val; + ssize_t ret = -EINVAL; + + if (kstrtoull(buf, 10, &val)) + return ret; + + down_read(&zram->init_lock); + atomic64_set(&zram->stats.bd_wb_limit, val); + if (val == 0 || val > atomic64_read(&zram->stats.bd_writes)) + zram->stop_writeback = false; + up_read(&zram->init_lock); + ret = len; + + return ret; +} + +static ssize_t writeback_limit_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u64 val; + struct zram *zram = dev_to_zram(dev); + + down_read(&zram->init_lock); + val = atomic64_read(&zram->stats.bd_wb_limit); + up_read(&zram->init_lock); + + return scnprintf(buf, PAGE_SIZE, "%llu\n", val); +} + static void reset_bdev(struct zram *zram) { struct block_device *bdev; @@ -571,6 +605,7 @@ static ssize_t writeback_store(struct de char mode_buf[8]; unsigned long mode = -1UL; unsigned long blk_idx = 0; + u64 wb_count, wb_limit; sz = strscpy(mode_buf, buf, sizeof(mode_buf)); if (sz <= 0) @@ -612,6 +647,11 @@ static ssize_t writeback_store(struct de bvec.bv_len = PAGE_SIZE; bvec.bv_offset = 0; + if (zram->stop_writeback) { + ret = -EIO; + break; + } + if (!blk_idx) { blk_idx = alloc_block_bdev(zram); if (!blk_idx) { @@ -670,7 +710,7 @@ static ssize_t writeback_store(struct de continue; } - atomic64_inc(&zram->stats.bd_writes); + wb_count = atomic64_inc_return(&zram->stats.bd_writes); /* * We released zram_slot_lock so need to check if the slot was * changed. If there is freeing for the slot, we can catch it @@ -694,6 +734,9 @@ static ssize_t writeback_store(struct de zram_set_element(zram, index, blk_idx); blk_idx = 0; atomic64_inc(&zram->stats.pages_stored); + wb_limit = atomic64_read(&zram->stats.bd_wb_limit); + if (wb_limit != 0 && wb_count >= wb_limit) + zram->stop_writeback = true; next: zram_slot_unlock(zram, index); } @@ -1767,6 +1810,7 @@ static DEVICE_ATTR_RW(comp_algorithm); #ifdef CONFIG_ZRAM_WRITEBACK static DEVICE_ATTR_RW(backing_dev); static DEVICE_ATTR_WO(writeback); +static DEVICE_ATTR_RW(writeback_limit); #endif static struct attribute *zram_disk_attrs[] = { @@ -1782,6 +1826,7 @@ static struct attribute *zram_disk_attrs #ifdef CONFIG_ZRAM_WRITEBACK &dev_attr_backing_dev.attr, &dev_attr_writeback.attr, + &dev_attr_writeback_limit.attr, #endif &dev_attr_io_stat.attr, &dev_attr_mm_stat.attr, --- a/drivers/block/zram/zram_drv.h~zram-writeback-throttle +++ a/drivers/block/zram/zram_drv.h @@ -86,6 +86,7 @@ struct zram_stats { atomic64_t bd_count; /* no. of pages in backing device */ atomic64_t bd_reads; /* no. of reads from backing device */ atomic64_t bd_writes; /* no. of writes from backing device */ + atomic64_t bd_wb_limit; /* writeback limit of backing device */ #endif }; @@ -113,6 +114,7 @@ struct zram { */ bool claim; /* Protected by bdev->bd_mutex */ struct file *backing_dev; + bool stop_writeback; #ifdef CONFIG_ZRAM_WRITEBACK struct block_device *bdev; unsigned int old_block_size; _ Patches currently in -mm which might be from minchan(a)kernel.org are zram-fix-lockdep-warning-of-free-block-handling.patch zram-fix-double-free-backing-device.patch zram-refactoring-flags-and-writeback-stuff.patch zram-introduce-zram_idle-flag.patch zram-support-idle-huge-page-writeback.patch zram-add-bd_stat-statistics.patch zram-writeback-throttle.patch

6 years, 9 months

1
0
0 0

+ zram-add-bd_stat-statistics.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: zram: add bd_stat statistics has been added to the -mm tree. Its filename is zram-add-bd_stat-statistics.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/zram-add-bd_stat-statistics.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/zram-add-bd_stat-statistics.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Minchan Kim <minchan(a)kernel.org> Subject: zram: add bd_stat statistics bd_stat represents things that happened in the backing device. Currently it supports bd_counts, bd_reads and bd_writes which are helpful to understand wearout of flash and memory saving. Link: http://lkml.kernel.org/r/20181127055429.251614-7-minchan@kernel.org Signed-off-by: Minchan Kim <minchan(a)kernel.org> Cc: Joey Pabalinas <joeypabalinas(a)gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com> Cc: <stable(a)vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/ABI/testing/sysfs-block-zram | 8 +++++ Documentation/blockdev/zram.txt | 11 +++++++ drivers/block/zram/zram_drv.c | 29 +++++++++++++++++++ drivers/block/zram/zram_drv.h | 5 +++ 4 files changed, 53 insertions(+) --- a/Documentation/ABI/testing/sysfs-block-zram~zram-add-bd_stat-statistics +++ a/Documentation/ABI/testing/sysfs-block-zram @@ -113,3 +113,11 @@ Contact: Minchan Kim <minchan(a)kernel.org Description: The writeback file is write-only and trigger idle and/or huge page writeback to backing device. + +What: /sys/block/zram<id>/bd_stat +Date: November 2018 +Contact: Minchan Kim <minchan(a)kernel.org> +Description: + The bd_stat file is read-only and represents backing device's + statistics (bd_count, bd_reads, bd_writes) in a format + similar to block layer statistics file format. --- a/Documentation/blockdev/zram.txt~zram-add-bd_stat-statistics +++ a/Documentation/blockdev/zram.txt @@ -221,6 +221,17 @@ line of text and contains the following pages_compacted the number of pages freed during compaction huge_pages the number of incompressible pages +File /sys/block/zram<id>/bd_stat + +The stat file represents device's backing device statistics. It consists of +a single line of text and contains the following stats separated by whitespace: + bd_count size of data written in backing device. + Unit: 4K bytes + bd_reads the number of reads from backing device + Unit: 4K bytes + bd_writes the number of writes to backing device + Unit: 4K bytes + 9) Deactivate: swapoff /dev/zram0 umount /dev/zram1 --- a/drivers/block/zram/zram_drv.c~zram-add-bd_stat-statistics +++ a/drivers/block/zram/zram_drv.c @@ -502,6 +502,7 @@ retry: if (test_and_set_bit(blk_idx, zram->bitmap)) goto retry; + atomic64_inc(&zram->stats.bd_count); return blk_idx; } @@ -511,6 +512,7 @@ static void free_block_bdev(struct zram was_set = test_and_clear_bit(blk_idx, zram->bitmap); WARN_ON_ONCE(!was_set); + atomic64_dec(&zram->stats.bd_count); } static void zram_page_end_io(struct bio *bio) @@ -668,6 +670,7 @@ static ssize_t writeback_store(struct de continue; } + atomic64_inc(&zram->stats.bd_writes); /* * We released zram_slot_lock so need to check if the slot was * changed. If there is freeing for the slot, we can catch it @@ -757,6 +760,7 @@ static int read_from_bdev_sync(struct zr static int read_from_bdev(struct zram *zram, struct bio_vec *bvec, unsigned long entry, struct bio *parent, bool sync) { + atomic64_inc(&zram->stats.bd_reads); if (sync) return read_from_bdev_sync(zram, bvec, entry, parent); else @@ -1013,6 +1017,25 @@ static ssize_t mm_stat_show(struct devic return ret; } +#ifdef CONFIG_ZRAM_WRITEBACK +static ssize_t bd_stat_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct zram *zram = dev_to_zram(dev); + ssize_t ret; + + down_read(&zram->init_lock); + ret = scnprintf(buf, PAGE_SIZE, + "%8llu %8llu %8llu\n", + (u64)atomic64_read(&zram->stats.bd_count), + (u64)atomic64_read(&zram->stats.bd_reads), + (u64)atomic64_read(&zram->stats.bd_writes)); + up_read(&zram->init_lock); + + return ret; +} +#endif + static ssize_t debug_stat_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1033,6 +1056,9 @@ static ssize_t debug_stat_show(struct de static DEVICE_ATTR_RO(io_stat); static DEVICE_ATTR_RO(mm_stat); +#ifdef CONFIG_ZRAM_WRITEBACK +static DEVICE_ATTR_RO(bd_stat); +#endif static DEVICE_ATTR_RO(debug_stat); static void zram_meta_free(struct zram *zram, u64 disksize) @@ -1759,6 +1785,9 @@ static struct attribute *zram_disk_attrs #endif &dev_attr_io_stat.attr, &dev_attr_mm_stat.attr, +#ifdef CONFIG_ZRAM_WRITEBACK + &dev_attr_bd_stat.attr, +#endif &dev_attr_debug_stat.attr, NULL, }; --- a/drivers/block/zram/zram_drv.h~zram-add-bd_stat-statistics +++ a/drivers/block/zram/zram_drv.h @@ -82,6 +82,11 @@ struct zram_stats { atomic_long_t max_used_pages; /* no. of maximum pages stored */ atomic64_t writestall; /* no. of write slow paths */ atomic64_t miss_free; /* no. of missed free */ +#ifdef CONFIG_ZRAM_WRITEBACK + atomic64_t bd_count; /* no. of pages in backing device */ + atomic64_t bd_reads; /* no. of reads from backing device */ + atomic64_t bd_writes; /* no. of writes from backing device */ +#endif }; struct zram { _ Patches currently in -mm which might be from minchan(a)kernel.org are zram-fix-lockdep-warning-of-free-block-handling.patch zram-fix-double-free-backing-device.patch zram-refactoring-flags-and-writeback-stuff.patch zram-introduce-zram_idle-flag.patch zram-support-idle-huge-page-writeback.patch zram-add-bd_stat-statistics.patch zram-writeback-throttle.patch

6 years, 9 months

1
0
0 0

+ zram-support-idle-huge-page-writeback.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: zram: support idle/huge page writeback has been added to the -mm tree. Its filename is zram-support-idle-huge-page-writeback.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/zram-support-idle-huge-page-writeb… and later at http://ozlabs.org/~akpm/mmotm/broken-out/zram-support-idle-huge-page-writeb… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Minchan Kim <minchan(a)kernel.org> Subject: zram: support idle/huge page writeback Add a new feature "zram idle/huge page writeback". In the zram-swap use case, zram usually has many idle/huge swap pages. It's pointless to keep them in memory (ie, zram). To solve this problem, this feature introduces idle/huge page writeback to the backing device so the goal is to save more memory space on embedded systems. Normal sequence to use idle/huge page writeback feature is as follows, while (1) { # mark allocated zram slot to idle echo all > /sys/block/zram0/idle # leave system working for several hours # Unless there is no access for some blocks on zram, # they are still IDLE marked pages. echo "idle" > /sys/block/zram0/writeback or/and echo "huge" > /sys/block/zram0/writeback # write the IDLE or/and huge marked slot into backing device # and free the memory. } By per discussion: https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, This patch removes direct incommpressibe page writeback feature (d2afd25114f4 ("zram: write incompressible pages to backing device")) so we could regard it as regression because incompressible pages don't go to backing storage automatically. Instead, users should do this via "echo huge" > /sys/block/zram/writeback" manually. If we hear some regression, we could restore the function. Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan(a)kernel.org> Reviewed-by: Joey Pabalinas <joeypabalinas(a)gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com> Cc: <stable(a)vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/ABI/testing/sysfs-block-zram | 7 Documentation/blockdev/zram.txt | 28 +- drivers/block/zram/Kconfig | 5 drivers/block/zram/zram_drv.c | 247 +++++++++++++------ drivers/block/zram/zram_drv.h | 1 5 files changed, 209 insertions(+), 79 deletions(-) --- a/Documentation/ABI/testing/sysfs-block-zram~zram-support-idle-huge-page-writeback +++ a/Documentation/ABI/testing/sysfs-block-zram @@ -106,3 +106,10 @@ Description: idle file is write-only and mark zram slot as idle. If system has mounted debugfs, user can see which slots are idle via /sys/kernel/debug/zram/zram<id>/block_state + +What: /sys/block/zram<id>/writeback +Date: November 2018 +Contact: Minchan Kim <minchan(a)kernel.org> +Description: + The writeback file is write-only and trigger idle and/or + huge page writeback to backing device. --- a/Documentation/blockdev/zram.txt~zram-support-idle-huge-page-writeback +++ a/Documentation/blockdev/zram.txt @@ -238,11 +238,31 @@ line of text and contains the following = writeback -With incompressible pages, there is no memory saving with zram. -Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page +With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page to backing storage rather than keeping it in memory. -User should set up backing device via /sys/block/zramX/backing_dev -before disksize setting. +To use the feature, admin should set up backing device via + + "echo /dev/sda5 > /sys/block/zramX/backing_dev" + +before disksize setting. It supports only partition at this moment. +If admin want to use incompressible page writeback, they could do via + + "echo huge > /sys/block/zramX/write" + +To use idle page writeback, first, user need to declare zram pages +as idle. + + "echo all > /sys/block/zramX/idle" + +From now on, any pages on zram are idle pages. The idle mark +will be removed until someone request access of the block. +IOW, unless there is access request, those pages are still idle pages. + +Admin can request writeback of those idle pages at right timing via + + "echo idle > /sys/block/zramX/writeback" + +With the command, zram writeback idle pages from memory to the storage. = memory tracking --- a/drivers/block/zram/Kconfig~zram-support-idle-huge-page-writeback +++ a/drivers/block/zram/Kconfig @@ -15,7 +15,7 @@ config ZRAM See Documentation/blockdev/zram.txt for more information. config ZRAM_WRITEBACK - bool "Write back incompressible page to backing device" + bool "Write back incompressible or idle page to backing device" depends on ZRAM help With incompressible page, there is no memory saving to keep it @@ -23,6 +23,9 @@ config ZRAM_WRITEBACK For this feature, admin should set up backing device via /sys/block/zramX/backing_dev. + With /sys/block/zramX/{idle,writeback}, application could ask + idle page's writeback to the backing device to save in memory. + See Documentation/blockdev/zram.txt for more information. config ZRAM_MEMORY_TRACKING --- a/drivers/block/zram/zram_drv.c~zram-support-idle-huge-page-writeback +++ a/drivers/block/zram/zram_drv.c @@ -52,6 +52,9 @@ static unsigned int num_devices = 1; static size_t huge_class_size; static void zram_free_page(struct zram *zram, size_t index); +static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec, + u32 index, int offset, struct bio *bio); + static int zram_slot_trylock(struct zram *zram, u32 index) { @@ -73,13 +76,6 @@ static inline bool init_done(struct zram return zram->disksize; } -static inline bool zram_allocated(struct zram *zram, u32 index) -{ - - return (zram->table[index].flags >> (ZRAM_FLAG_SHIFT + 1)) || - zram->table[index].handle; -} - static inline struct zram *dev_to_zram(struct device *dev) { return (struct zram *)dev_to_disk(dev)->private_data; @@ -138,6 +134,13 @@ static void zram_set_obj_size(struct zra zram->table[index].flags = (flags << ZRAM_FLAG_SHIFT) | size; } +static inline bool zram_allocated(struct zram *zram, u32 index) +{ + return zram_get_obj_size(zram, index) || + zram_test_flag(zram, index, ZRAM_SAME) || + zram_test_flag(zram, index, ZRAM_WB); +} + #if PAGE_SIZE != 4096 static inline bool is_partial_io(struct bio_vec *bvec) { @@ -308,10 +311,14 @@ static ssize_t idle_store(struct device } for (index = 0; index < nr_pages; index++) { + /* + * Do not mark ZRAM_UNDER_WB slot as ZRAM_IDLE to close race. + * See the comment in writeback_store. + */ zram_slot_lock(zram, index); - if (!zram_allocated(zram, index)) + if (!zram_allocated(zram, index) || + zram_test_flag(zram, index, ZRAM_UNDER_WB)) goto next; - zram_set_flag(zram, index, ZRAM_IDLE); next: zram_slot_unlock(zram, index); @@ -546,6 +553,158 @@ static int read_from_bdev_async(struct z return 1; } +#define HUGE_WRITEBACK 0x1 +#define IDLE_WRITEBACK 0x2 + +static ssize_t writeback_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + struct zram *zram = dev_to_zram(dev); + unsigned long nr_pages = zram->disksize >> PAGE_SHIFT; + unsigned long index; + struct bio bio; + struct bio_vec bio_vec; + struct page *page; + ssize_t ret, sz; + char mode_buf[8]; + unsigned long mode = -1UL; + unsigned long blk_idx = 0; + + sz = strscpy(mode_buf, buf, sizeof(mode_buf)); + if (sz <= 0) + return -EINVAL; + + /* ignore trailing newline */ + if (mode_buf[sz - 1] == '\n') + mode_buf[sz - 1] = 0x00; + + if (!strcmp(mode_buf, "idle")) + mode = IDLE_WRITEBACK; + else if (!strcmp(mode_buf, "huge")) + mode = HUGE_WRITEBACK; + + if (mode == -1UL) + return -EINVAL; + + down_read(&zram->init_lock); + if (!init_done(zram)) { + ret = -EINVAL; + goto release_init_lock; + } + + if (!zram->backing_dev) { + ret = -ENODEV; + goto release_init_lock; + } + + page = alloc_page(GFP_KERNEL); + if (!page) { + ret = -ENOMEM; + goto release_init_lock; + } + + for (index = 0; index < nr_pages; index++) { + struct bio_vec bvec; + + bvec.bv_page = page; + bvec.bv_len = PAGE_SIZE; + bvec.bv_offset = 0; + + if (!blk_idx) { + blk_idx = alloc_block_bdev(zram); + if (!blk_idx) { + ret = -ENOSPC; + break; + } + } + + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index)) + goto next; + + if (zram_test_flag(zram, index, ZRAM_WB) || + zram_test_flag(zram, index, ZRAM_SAME) || + zram_test_flag(zram, index, ZRAM_UNDER_WB)) + goto next; + + if ((mode & IDLE_WRITEBACK && + !zram_test_flag(zram, index, ZRAM_IDLE)) && + (mode & HUGE_WRITEBACK && + !zram_test_flag(zram, index, ZRAM_HUGE))) + goto next; + /* + * Clearing ZRAM_UNDER_WB is duty of caller. + * IOW, zram_free_page never clear it. + */ + zram_set_flag(zram, index, ZRAM_UNDER_WB); + /* Need for hugepage writeback racing */ + zram_set_flag(zram, index, ZRAM_IDLE); + zram_slot_unlock(zram, index); + if (zram_bvec_read(zram, &bvec, index, 0, NULL)) { + zram_slot_lock(zram, index); + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_clear_flag(zram, index, ZRAM_IDLE); + zram_slot_unlock(zram, index); + continue; + } + + bio_init(&bio, &bio_vec, 1); + bio_set_dev(&bio, zram->bdev); + bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9); + bio.bi_opf = REQ_OP_WRITE | REQ_SYNC; + + bio_add_page(&bio, bvec.bv_page, bvec.bv_len, + bvec.bv_offset); + /* + * XXX: A single page IO would be inefficient for write + * but it would be not bad as starter. + */ + ret = submit_bio_wait(&bio); + if (ret) { + zram_slot_lock(zram, index); + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_clear_flag(zram, index, ZRAM_IDLE); + zram_slot_unlock(zram, index); + continue; + } + + /* + * We released zram_slot_lock so need to check if the slot was + * changed. If there is freeing for the slot, we can catch it + * easily by zram_allocated. + * A subtle case is the slot is freed/reallocated/marked as + * ZRAM_IDLE again. To close the race, idle_store doesn't + * mark ZRAM_IDLE once it found the slot was ZRAM_UNDER_WB. + * Thus, we could close the race by checking ZRAM_IDLE bit. + */ + zram_slot_lock(zram, index); + if (!zram_allocated(zram, index) || + !zram_test_flag(zram, index, ZRAM_IDLE)) { + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_clear_flag(zram, index, ZRAM_IDLE); + goto next; + } + + zram_free_page(zram, index); + zram_clear_flag(zram, index, ZRAM_UNDER_WB); + zram_set_flag(zram, index, ZRAM_WB); + zram_set_element(zram, index, blk_idx); + blk_idx = 0; + atomic64_inc(&zram->stats.pages_stored); +next: + zram_slot_unlock(zram, index); + } + + if (blk_idx) + free_block_bdev(zram, blk_idx); + ret = len; + __free_page(page); +release_init_lock: + up_read(&zram->init_lock); + + return ret; +} + struct zram_work { struct work_struct work; struct zram *zram; @@ -603,57 +762,8 @@ static int read_from_bdev(struct zram *z else return read_from_bdev_async(zram, bvec, entry, parent); } - -static int write_to_bdev(struct zram *zram, struct bio_vec *bvec, - u32 index, struct bio *parent, - unsigned long *pentry) -{ - struct bio *bio; - unsigned long entry; - - bio = bio_alloc(GFP_ATOMIC, 1); - if (!bio) - return -ENOMEM; - - entry = alloc_block_bdev(zram); - if (!entry) { - bio_put(bio); - return -ENOSPC; - } - - bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9); - bio_set_dev(bio, zram->bdev); - if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len, - bvec->bv_offset)) { - bio_put(bio); - free_block_bdev(zram, entry); - return -EIO; - } - - if (!parent) { - bio->bi_opf = REQ_OP_WRITE | REQ_SYNC; - bio->bi_end_io = zram_page_end_io; - } else { - bio->bi_opf = parent->bi_opf; - bio_chain(bio, parent); - } - - submit_bio(bio); - *pentry = entry; - - return 0; -} - #else static inline void reset_bdev(struct zram *zram) {}; -static int write_to_bdev(struct zram *zram, struct bio_vec *bvec, - u32 index, struct bio *parent, - unsigned long *pentry) - -{ - return -EIO; -} - static int read_from_bdev(struct zram *zram, struct bio_vec *bvec, unsigned long entry, struct bio *parent, bool sync) { @@ -1006,7 +1116,8 @@ out: atomic64_dec(&zram->stats.pages_stored); zram_set_handle(zram, index, 0); zram_set_obj_size(zram, index, 0); - WARN_ON_ONCE(zram->table[index].flags & ~(1UL << ZRAM_LOCK)); + WARN_ON_ONCE(zram->table[index].flags & + ~(1UL << ZRAM_LOCK | 1UL << ZRAM_UNDER_WB)); } static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, @@ -1115,7 +1226,6 @@ static int __zram_bvec_write(struct zram struct page *page = bvec->bv_page; unsigned long element = 0; enum zram_pageflags flags = 0; - bool allow_wb = true; mem = kmap_atomic(page); if (page_same_filled(mem, &element)) { @@ -1140,21 +1250,8 @@ compress_again: return ret; } - if (unlikely(comp_len >= huge_class_size)) { + if (comp_len >= huge_class_size) comp_len = PAGE_SIZE; - if (zram->backing_dev && allow_wb) { - zcomp_stream_put(zram->comp); - ret = write_to_bdev(zram, bvec, index, bio, &element); - if (!ret) { - flags = ZRAM_WB; - ret = 1; - goto out; - } - allow_wb = false; - goto compress_again; - } - } - /* * handle allocation has 2 paths: * a) fast path is executed with preemption disabled (for @@ -1643,6 +1740,7 @@ static DEVICE_ATTR_RW(max_comp_streams); static DEVICE_ATTR_RW(comp_algorithm); #ifdef CONFIG_ZRAM_WRITEBACK static DEVICE_ATTR_RW(backing_dev); +static DEVICE_ATTR_WO(writeback); #endif static struct attribute *zram_disk_attrs[] = { @@ -1657,6 +1755,7 @@ static struct attribute *zram_disk_attrs &dev_attr_comp_algorithm.attr, #ifdef CONFIG_ZRAM_WRITEBACK &dev_attr_backing_dev.attr, + &dev_attr_writeback.attr, #endif &dev_attr_io_stat.attr, &dev_attr_mm_stat.attr, --- a/drivers/block/zram/zram_drv.h~zram-support-idle-huge-page-writeback +++ a/drivers/block/zram/zram_drv.h @@ -47,6 +47,7 @@ enum zram_pageflags { ZRAM_LOCK = ZRAM_FLAG_SHIFT, ZRAM_SAME, /* Page consists the same element */ ZRAM_WB, /* page is stored on backing_device */ + ZRAM_UNDER_WB, /* page is under writeback */ ZRAM_HUGE, /* Incompressible page */ ZRAM_IDLE, /* not accessed page since last idle marking */ _ Patches currently in -mm which might be from minchan(a)kernel.org are zram-fix-lockdep-warning-of-free-block-handling.patch zram-fix-double-free-backing-device.patch zram-refactoring-flags-and-writeback-stuff.patch zram-introduce-zram_idle-flag.patch zram-support-idle-huge-page-writeback.patch zram-add-bd_stat-statistics.patch zram-writeback-throttle.patch

6 years, 9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror