- Linux-stable-mirror - lists.linaro.org

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 5.4.89 release. There are 92 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed, 13 Jan 2021 13:00:19 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.89-rc1… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 5.4.89-rc1 Paolo Bonzini <pbonzini(a)redhat.com> KVM: x86: fix shift out of bounds reported by UBSAN Ying-Tsun Huang <ying-tsun.huang(a)amd.com> x86/mtrr: Correct the range check before performing MTRR type lookups Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: nft_dynset: report EOPNOTSUPP on missing set feature Florian Westphal <fw(a)strlen.de> netfilter: xt_RATEEST: reject non-null terminated string from userspace Vasily Averin <vvs(a)virtuozzo.com> netfilter: ipset: fix shift-out-of-bounds in htable_bits() Subash Abhinov Kasiviswanathan <subashab(a)codeaurora.org> netfilter: x_tables: Update remaining dereference to RCU Matthew Auld <matthew.auld(a)intel.com> drm/i915: clear the gpu reloc batch Charan Teja Reddy <charante(a)codeaurora.org> dmabuf: fix use-after-free of dmabuf's file->f_inode Bard Liao <yung-chuan.liao(a)linux.intel.com> Revert "device property: Keep secondary firmware node secondary by type" Filipe Manana <fdmanana(a)suse.com> btrfs: send: fix wrong file path when there is an inode with a pending rmdir PeiSen Hou <pshou(a)realtek.com> ALSA: hda/realtek: Add two "Intel Reference board" SSID in the ALC256. Kai-Heng Feng <kai.heng.feng(a)canonical.com> ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7 Kailang Yang <kailang(a)realtek.com> ALSA: hda/realtek - Fix speaker volume control on Lenovo C940 bo liu <bo.liu(a)senarytech.com> ALSA: hda/conexant: add a new hda codec CX11970 Takashi Iwai <tiwai(a)suse.de> ALSA: hda/via: Fix runtime PM for Clevo W35xSS Lai Jiangshan <laijs(a)linux.alibaba.com> kvm: check tlbs_dirty directly Dan Williams <dan.j.williams(a)intel.com> x86/mm: Fix leak of pmd ptlock Johan Hovold <johan(a)kernel.org> USB: serial: keyspan_pda: remove unused variable Eddie Hung <eddie.hung(a)mediatek.com> usb: gadget: configfs: Fix use-after-free issue with udc_name Chandana Kishori Chiluveru <cchiluve(a)codeaurora.org> usb: gadget: configfs: Preserve function ordering after bind failure Sriharsha Allenki <sallenki(a)codeaurora.org> usb: gadget: Fix spinlock lockup on usb_function_deactivate Yang Yingliang <yangyingliang(a)huawei.com> USB: gadget: legacy: fix return error code in acm_ms_bind() Manish Narani <manish.narani(a)xilinx.com> usb: gadget: u_ether: Fix MTU size mismatch with RX packet size Zqiang <qiang.zhang(a)windriver.com> usb: gadget: function: printer: Fix a memory leak for interface descriptor Jerome Brunet <jbrunet(a)baylibre.com> usb: gadget: f_uac2: reset wMaxPacketSize Arnd Bergmann <arnd(a)arndb.de> usb: gadget: select CONFIG_CRC32 Takashi Iwai <tiwai(a)suse.de> ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks Johan Hovold <johan(a)kernel.org> USB: usblp: fix DMA to stack Johan Hovold <johan(a)kernel.org> USB: yurex: fix control-URB timeout handling Bjørn Mork <bjorn(a)mork.no> USB: serial: option: add Quectel EM160R-GL Daniel Palmer <daniel(a)0x0f.com> USB: serial: option: add LongSung M5710 module support Johan Hovold <johan(a)kernel.org> USB: serial: iuu_phoenix: fix DMA from stack Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> usb: uas: Add PNY USB Portable SSD to unusual_uas Randy Dunlap <rdunlap(a)infradead.org> usb: usbip: vhci_hcd: protect shift size Michael Grzeschik <m.grzeschik(a)pengutronix.de> USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set Yu Kuai <yukuai3(a)huawei.com> usb: chipidea: ci_hdrc_imx: add missing put_device() call in usbmisc_get_init_data() Serge Semin <Sergey.Semin(a)baikalelectronics.ru> usb: dwc3: ulpi: Use VStsDone to detect PHY regs access completion Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp> USB: cdc-wdm: Fix use after free in service_outstanding_interrupt(). Sean Young <sean(a)mess.org> USB: cdc-acm: blacklist another IR Droid device taehyun.cho <taehyun.cho(a)samsung.com> usb: gadget: enable super speed plus Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> staging: mt7621-dma: Fix a resource leak in an error handling path Nathan Chancellor <natechancellor(a)gmail.com> powerpc: Handle .text.{hot,unlikely}.* in linker script Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> crypto: asym_tpm: correct zero out potential secrets Ard Biesheuvel <ardb(a)kernel.org> crypto: ecdh - avoid buffer overflow in ecdh_set_secret() Dexuan Cui <decui(a)microsoft.com> video: hyperv_fb: Fix the mmap() regression for v5.4.y and older Hans de Goede <hdegoede(a)redhat.com> Bluetooth: revert: hci_h5: close serdev device and free hu in h5_close Dominique Martinet <asmadeus(a)codewreck.org> kbuild: don't hardcode depmod path Davide Caratti <dcaratti(a)redhat.com> net/sched: sch_taprio: ensure to reset/destroy all child qdiscs Shannon Nelson <snelson(a)pensando.io> ionic: account for vlan tag len in rx buffer len Yunjian Wang <wangyunjian(a)huawei.com> vhost_net: fix ubuf refcount incorrectly when sendmsg fails Bjørn Mork <bjorn(a)mork.no> net: usb: qmi_wwan: add Quectel EM160R-GL Roland Dreier <roland(a)kernel.org> CDC-NCM: remove "connected" log message Martin Blumenstingl <martin.blumenstingl(a)googlemail.com> net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access Martin Blumenstingl <martin.blumenstingl(a)googlemail.com> net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs Heiner Kallweit <hkallweit1(a)gmail.com> r8169: work around power-saving bug on some chip versions Xie He <xie.he.0141(a)gmail.com> net: hdlc_ppp: Fix issues when mod_timer is called while timer is running Cong Wang <cong.wang(a)bytedance.com> erspan: fix version 1 check in gre_parse_header() Yunjian Wang <wangyunjian(a)huawei.com> net: hns: fix return value check in __lb_other_process() Randy Dunlap <rdunlap(a)infradead.org> net: sched: prevent invalid Scell_log shift count Guillaume Nault <gnault(a)redhat.com> ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst() Stefan Chulski <stefanc(a)marvell.com> net: mvpp2: fix pkt coalescing int-threshold configuration Yunjian Wang <wangyunjian(a)huawei.com> tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS Grygorii Strashko <grygorii.strashko(a)ti.com> net: ethernet: ti: cpts: fix ethtool output when no ptp_clock registered Antoine Tenart <atenart(a)kernel.org> net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc Antoine Tenart <atenart(a)kernel.org> net-sysfs: take the rtnl lock when storing xps_rxqs Antoine Tenart <atenart(a)kernel.org> net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc Antoine Tenart <atenart(a)kernel.org> net-sysfs: take the rtnl lock when storing xps_cpus Dinghao Liu <dinghao.liu(a)zju.edu.cn> net: ethernet: Fix memleak in ethoc_probe John Wang <wangzhiqiang.bj(a)bytedance.com> net/ncsi: Use real net-device for response handler Petr Machata <me(a)pmachata.org> net: dcb: Validate netlink message in DCB handler Jeff Dike <jdike(a)akamai.com> virtio_net: Fix recursive call to cpus_read_lock() Manish Chopra <manishc(a)marvell.com> qede: fix offload for IPIP tunnel packets Dinghao Liu <dinghao.liu(a)zju.edu.cn> net: ethernet: mvneta: Fix error handling in mvneta_probe Lijun Pan <ljp(a)linux.ibm.com> ibmvnic: continue fatal error reset after passive init Stefan Chulski <stefanc(a)marvell.com> net: mvpp2: Fix GoP port 3 Networking Complex Control configurations Dan Carpenter <dan.carpenter(a)oracle.com> atm: idt77252: call pci_disable_device() on error path Rasmus Villemoes <rasmus.villemoes(a)prevas.dk> ethernet: ucc_geth: set dev->max_mtu to 1518 Rasmus Villemoes <rasmus.villemoes(a)prevas.dk> ethernet: ucc_geth: fix use-after-free in ucc_geth_remove() Florian Fainelli <f.fainelli(a)gmail.com> net: systemport: set dev->max_mtu to UMAC_MAX_MTU_SIZE Stefan Chulski <stefanc(a)marvell.com> net: mvpp2: prs: fix PPPoE with ipv6 packet parse Stefan Chulski <stefanc(a)marvell.com> net: mvpp2: Add TCAM entry to drop flow control pause frames Jakub Kicinski <kuba(a)kernel.org> iavf: fix double-release of rtnl_lock Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com> i40e: Fix Error I40E_AQ_RC_EINVAL when removing VFs Alexey Dobriyan <adobriyan(a)gmail.com> proc: fix lookup in /proc/net subdirectories after setns(2) Alexey Dobriyan <adobriyan(a)gmail.com> proc: change ->nlink under proc_subdir_lock Linus Torvalds <torvalds(a)linux-foundation.org> depmod: handle the case of /sbin/depmod without /sbin in PATH Huang Shijie <sjhuang(a)iluvatar.ai> lib/genalloc: fix the overflow when size is too big Bart Van Assche <bvanassche(a)acm.org> scsi: scsi_transport_spi: Set RQF_PM for domain validation commands Bart Van Assche <bvanassche(a)acm.org> scsi: ide: Do not set the RQF_PREEMPT flag for sense requests Adrian Hunter <adrian.hunter(a)intel.com> scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff() Bean Huo <beanhuo(a)micron.com> scsi: ufs: Fix wrong print message in dev_err() Yunfeng Ye <yeyunfeng(a)huawei.com> workqueue: Kick a worker based on the actual activation of delayed works ------------- Diffstat: Makefile | 6 +- arch/powerpc/kernel/vmlinux.lds.S | 2 +- arch/x86/kernel/cpu/mtrr/generic.c | 6 +- arch/x86/kvm/mmu.h | 2 +- arch/x86/mm/pgtable.c | 2 + crypto/asymmetric_keys/asym_tpm.c | 2 +- crypto/ecdh.c | 3 +- drivers/atm/idt77252.c | 2 +- drivers/base/core.c | 2 +- drivers/bluetooth/hci_h5.c | 8 +-- drivers/dma-buf/dma-buf.c | 21 +++++-- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 +- drivers/ide/ide-atapi.c | 1 - drivers/ide/ide-io.c | 5 -- drivers/net/dsa/lantiq_gswip.c | 27 +++------ drivers/net/ethernet/broadcom/bcmsysport.c | 1 + drivers/net/ethernet/ethoc.c | 3 +- drivers/net/ethernet/freescale/ucc_geth.c | 3 +- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 4 ++ drivers/net/ethernet/ibm/ibmvnic.c | 3 +- drivers/net/ethernet/intel/i40e/i40e.h | 3 + drivers/net/ethernet/intel/i40e/i40e_main.c | 10 ++++ drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 +- drivers/net/ethernet/intel/iavf/iavf_main.c | 4 +- drivers/net/ethernet/marvell/mvneta.c | 2 +- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 13 ++-- drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.c | 38 +++++++++++- drivers/net/ethernet/marvell/mvpp2/mvpp2_prs.h | 2 +- drivers/net/ethernet/pensando/ionic/ionic_txrx.c | 2 +- drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 ++ drivers/net/ethernet/realtek/r8169_main.c | 6 +- drivers/net/ethernet/ti/cpts.c | 2 + drivers/net/tun.c | 2 +- drivers/net/usb/cdc_ncm.c | 3 - drivers/net/usb/qmi_wwan.c | 1 + drivers/net/virtio_net.c | 12 ++-- drivers/net/wan/hdlc_ppp.c | 7 +++ drivers/scsi/scsi_transport_spi.c | 27 ++++++--- drivers/scsi/ufs/ufshcd-pci.c | 34 ++++++++++- drivers/scsi/ufs/ufshcd.c | 2 +- drivers/staging/mt7621-dma/mtk-hsdma.c | 4 +- drivers/usb/chipidea/ci_hdrc_imx.c | 6 +- drivers/usb/class/cdc-acm.c | 4 ++ drivers/usb/class/cdc-wdm.c | 16 ++++- drivers/usb/class/usblp.c | 21 ++++++- drivers/usb/dwc3/core.h | 1 + drivers/usb/dwc3/ulpi.c | 2 +- drivers/usb/gadget/Kconfig | 2 + drivers/usb/gadget/composite.c | 10 +++- drivers/usb/gadget/configfs.c | 19 ++++-- drivers/usb/gadget/function/f_printer.c | 1 + drivers/usb/gadget/function/f_uac2.c | 69 +++++++++++++++++----- drivers/usb/gadget/function/u_ether.c | 9 +-- drivers/usb/gadget/legacy/acm_ms.c | 4 +- drivers/usb/host/xhci.c | 24 ++++---- drivers/usb/misc/yurex.c | 3 + drivers/usb/serial/iuu_phoenix.c | 20 +++++-- drivers/usb/serial/keyspan_pda.c | 2 - drivers/usb/serial/option.c | 3 + drivers/usb/storage/unusual_uas.h | 7 +++ drivers/usb/usbip/vhci_hcd.c | 2 + drivers/vhost/net.c | 6 +- drivers/video/fbdev/hyperv_fb.c | 6 +- fs/btrfs/send.c | 49 +++++++++------ fs/proc/generic.c | 55 +++++++++++------ fs/proc/internal.h | 7 +++ fs/proc/proc_net.c | 16 ----- include/linux/proc_fs.h | 8 ++- include/net/red.h | 4 +- kernel/workqueue.c | 13 +++- lib/genalloc.c | 25 ++++---- net/core/net-sysfs.c | 65 ++++++++++++++++---- net/dcb/dcbnl.c | 2 + net/ipv4/fib_frontend.c | 2 +- net/ipv4/gre_demux.c | 2 +- net/ipv4/netfilter/arp_tables.c | 2 +- net/ipv4/netfilter/ip_tables.c | 2 +- net/ipv6/netfilter/ip6_tables.c | 2 +- net/ncsi/ncsi-rsp.c | 2 +- net/netfilter/ipset/ip_set_hash_gen.h | 20 ++----- net/netfilter/nft_dynset.c | 6 +- net/netfilter/xt_RATEEST.c | 3 + net/sched/sch_choke.c | 2 +- net/sched/sch_gred.c | 2 +- net/sched/sch_red.c | 2 +- net/sched/sch_sfq.c | 2 +- net/sched/sch_taprio.c | 2 +- scripts/depmod.sh | 2 + sound/pci/hda/hda_intel.c | 2 - sound/pci/hda/patch_conexant.c | 1 + sound/pci/hda/patch_realtek.c | 9 +++ sound/pci/hda/patch_via.c | 13 ++++ sound/usb/midi.c | 4 ++ virt/kvm/kvm_main.c | 3 +- 94 files changed, 590 insertions(+), 266 deletions(-)

4 years, 7 months

5
96
0 0

[PATCH v2] i2c: tegra: Wait for config load atomically while in ISR

by Mikko Perttunen

Upon a communication error, the interrupt handler can call tegra_i2c_disable_packet_mode. This causes a sleeping poll to happen unless the current transaction was marked atomic. Fix this by making the poll happen atomically if we are in an IRQ. This matches the behavior prior to the patch mentioned in the Fixes tag. Fixes: ede2299f7101 ("i2c: tegra: Support atomic transfers") Cc: stable(a)vger.kernel.org Signed-off-by: Mikko Perttunen <mperttunen(a)nvidia.com> --- v2: * Use in_irq() instead of passing a flag from the ISR. Thanks to Dmitry for the suggestion. * Update commit message. --- drivers/i2c/busses/i2c-tegra.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index 6f08c0c3238d..0727383f4940 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -533,7 +533,7 @@ static int tegra_i2c_poll_register(struct tegra_i2c_dev *i2c_dev, void __iomem *addr = i2c_dev->base + tegra_i2c_reg_addr(i2c_dev, reg); u32 val; - if (!i2c_dev->atomic_mode) + if (!i2c_dev->atomic_mode && !in_irq()) return readl_relaxed_poll_timeout(addr, val, !(val & mask), delay_us, timeout_us); -- 2.30.0

4 years, 7 months

3
3
0 0

+ mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active has been added to the -mm tree. Its filename is mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-remove-vm_bug_on_page-… and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-remove-vm_bug_on_page-… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/20210110124017.86750-7-songmuchun@bytedance.com Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active +++ a/mm/hugetlb.c @@ -1361,8 +1361,7 @@ struct hstate *size_to_hstate(unsigned l */ bool page_huge_active(struct page *page) { - VM_BUG_ON_PAGE(!PageHuge(page), page); - return PageHead(page) && PagePrivate(&page[1]); + return PageHeadHuge(page) && PagePrivate(&page[1]); } /* never called for tail page */ _ Patches currently in -mm which might be from songmuchun(a)bytedance.com are mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch mm-memcontrol-convert-nr_file_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch mm-memcontrol-make-the-slab-calculation-consistent.patch mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch

4 years, 7 months

1
0
0 0

+ mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: hugetlb: fix a race between isolating and freeing page has been added to the -mm tree. Its filename is mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-race-between-iso… and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-race-between-iso… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: hugetlb: fix a race between isolating and freeing page There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/20210110124017.86750-6-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-isolating-and-freeing-page +++ a/mm/hugetlb.c @@ -5581,9 +5581,9 @@ bool isolate_huge_page(struct page *page { bool ret = true; - VM_BUG_ON_PAGE(!PageHead(page), page); spin_lock(&hugetlb_lock); - if (!page_huge_active(page) || !get_page_unless_zero(page)) { + if (!PageHeadHuge(page) || !page_huge_active(page) || + !get_page_unless_zero(page)) { ret = false; goto unlock; } _ Patches currently in -mm which might be from songmuchun(a)bytedance.com are mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch mm-memcontrol-convert-nr_file_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch mm-memcontrol-make-the-slab-calculation-consistent.patch mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch

4 years, 7 months

1
0
0 0

+ mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: hugetlb: fix a race between freeing and dissolving the page has been added to the -mm tree. Its filename is mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-race-between-fre… and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-race-between-fre… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: hugetlb: fix a race between freeing and dissolving the page There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race windows is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. Link: https://lkml.kernel.org/r/20210110124017.86750-4-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) --- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page +++ a/mm/hugetlb.c @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; +static inline bool PageHugeFreed(struct page *head) +{ + return page_private(head + 4) == -1UL; +} + +static inline void SetPageHugeFreed(struct page *head) +{ + set_page_private(head + 4, -1UL); +} + +static inline void ClearPageHugeFreed(struct page *head) +{ + set_page_private(head + 4, 0); +} + /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hst list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + SetPageHugeFreed(page); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_no list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); + ClearPageHugeFreed(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; return page; @@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hs spin_lock(&hugetlb_lock); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; + ClearPageHugeFreed(page); spin_unlock(&hugetlb_lock); } @@ -1771,6 +1789,14 @@ int dissolve_free_huge_page(struct page int nid = page_to_nid(head); if (h->free_huge_pages - h->resv_huge_pages == 0) goto out; + + /* + * We should make sure that the page is already on the free list + * when it is dissolved. + */ + if (unlikely(!PageHugeFreed(head))) + goto out; + /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. _ Patches currently in -mm which might be from songmuchun(a)bytedance.com are mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch mm-memcontrol-convert-nr_file_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch mm-memcontrol-make-the-slab-calculation-consistent.patch mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch

4 years, 7 months

1
0
0 0

+ mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page has been added to the -mm tree. Its filename is mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlbfs-fix-cannot-migrate-t… and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlbfs-fix-cannot-migrate-t… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210110124017.86750-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/hugetlbfs/inode.c | 3 ++- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 2 +- 3 files changed, 5 insertions(+), 2 deletions(-) --- a/fs/hugetlbfs/inode.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page +++ a/fs/hugetlbfs/inode.c @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct f mutex_unlock(&hugetlb_fault_mutex_table[hash]); + set_page_huge_active(page); /* * unlock_page because locked by add_to_page_cache() - * page_put due to reference from alloc_huge_page() + * put_page() due to reference from alloc_huge_page() */ unlock_page(page); put_page(page); --- a/include/linux/hugetlb.h~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page +++ a/include/linux/hugetlb.h @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot } #endif +void set_page_huge_active(struct page *page); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; --- a/mm/hugetlb.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page +++ a/mm/hugetlb.c @@ -1349,7 +1349,7 @@ bool page_huge_active(struct page *page) } /* never called for tail page */ -static void set_page_huge_active(struct page *page) +void set_page_huge_active(struct page *page) { VM_BUG_ON_PAGE(!PageHeadHuge(page), page); SetPagePrivate(&page[1]); _ Patches currently in -mm which might be from songmuchun(a)bytedance.com are mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch mm-memcontrol-convert-nr_file_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch mm-memcontrol-make-the-slab-calculation-consistent.patch mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch

4 years, 7 months

1
0
0 0

[to-be-updated] mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fix initialization of struct page for holes in memory layout has been removed from the -mm tree. Its filename was mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch This patch was dropped because an updated version will be merged ------------------------------------------------------ From: Mike Rapoport <rppt(a)linux.ibm.com> Subject: mm: fix initialization of struct page for holes in memory layout There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterated through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields if this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in struct page) in the same pageblock. Interleave initialization of pages that correspond to holes with the initialization of memory map, so that zone and node information will be properly set on such pages. [akpm(a)linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20201209214304.6812-3-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com> Reported-by: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Baoquan He <bhe(a)redhat.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Qian Cai <cai(a)lca.pw> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 152 +++++++++++++++++++--------------------------- 1 file changed, 65 insertions(+), 87 deletions(-) --- a/mm/page_alloc.c~mm-fix-initialization-of-struct-page-for-holes-in-memory-layout +++ a/mm/page_alloc.c @@ -6254,24 +6254,85 @@ static void __meminit zone_init_free_lis } } -void __meminit __weak memmap_init(unsigned long size, int nid, - unsigned long zone, - unsigned long range_start_pfn) +#if !defined(CONFIG_FLAT_NODE_MEM_MAP) +/* + * Only struct pages that are backed by physical memory available to the + * kernel are zeroed and initialized by memmap_init_zone(). + * But, there are some struct pages that are either reserved by firmware or + * do not correspond to physical page frames because the actual memory bank + * is not a multiple of SECTION_SIZE. + * Fields of those struct pages may be accessed (for example page_to_pfn() + * on some configuration accesses page flags) so we must explicitly + * initialize those struct pages. + */ +static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn, + int zone, int node) { - unsigned long start_pfn, end_pfn; + unsigned long pfn; + u64 pgcnt = 0; + + for (pfn = spfn; pfn < epfn; pfn++) { + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { + pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) + + pageblock_nr_pages - 1; + continue; + } + __init_single_page(pfn_to_page(pfn), pfn, zone, node); + __SetPageReserved(pfn_to_page(pfn)); + pgcnt++; + } + + return pgcnt; +} +#else +static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn, + int zone, int node) +{ + return 0; +} +#endif + +void __init __weak memmap_init(unsigned long size, int nid, + unsigned long zone, + unsigned long range_start_pfn) +{ + unsigned long start_pfn, end_pfn, hole_start_pfn = 0; unsigned long range_end_pfn = range_start_pfn + size; + u64 pgcnt = 0; int i; for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); + hole_start_pfn = clamp(hole_start_pfn, range_start_pfn, + range_end_pfn); if (end_pfn > start_pfn) { size = end_pfn - start_pfn; memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); } + + if (hole_start_pfn < start_pfn) + pgcnt += init_unavailable_range(hole_start_pfn, + start_pfn, zone, nid); + hole_start_pfn = end_pfn; } + + /* + * Early sections always have a fully populated memmap for the whole + * section - see pfn_valid(). If the last section has holes at the + * end and that section is marked "online", the memmap will be + * considered initialized. Make sure that memmap has a well defined + * state. + */ + if (hole_start_pfn < range_end_pfn) + pgcnt += init_unavailable_range(hole_start_pfn, range_end_pfn, + zone, nid); + + if (pgcnt) + pr_info("%s: Zeroed struct page in unavailable ranges: %lld\n", + zone_names[zone], pgcnt); } static int zone_batchsize(struct zone *zone) @@ -7072,88 +7133,6 @@ void __init free_area_init_memoryless_no free_area_init_node(nid); } -#if !defined(CONFIG_FLAT_NODE_MEM_MAP) -/* - * Initialize all valid struct pages in the range [spfn, epfn) and mark them - * PageReserved(). Return the number of struct pages that were initialized. - */ -static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn) -{ - unsigned long pfn; - u64 pgcnt = 0; - - for (pfn = spfn; pfn < epfn; pfn++) { - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) - + pageblock_nr_pages - 1; - continue; - } - /* - * Use a fake node/zone (0) for now. Some of these pages - * (in memblock.reserved but not in memblock.memory) will - * get re-initialized via reserve_bootmem_region() later. - */ - __init_single_page(pfn_to_page(pfn), pfn, 0, 0); - __SetPageReserved(pfn_to_page(pfn)); - pgcnt++; - } - - return pgcnt; -} - -/* - * Only struct pages that are backed by physical memory are zeroed and - * initialized by going through __init_single_page(). But, there are some - * struct pages which are reserved in memblock allocator and their fields - * may be accessed (for example page_to_pfn() on some configuration accesses - * flags). We must explicitly initialize those struct pages. - * - * This function also addresses a similar issue where struct pages are left - * uninitialized because the physical address range is not covered by - * memblock.memory or memblock.reserved. That could happen when memblock - * layout is manually configured via memmap=, or when the highest physical - * address (max_pfn) does not end on a section boundary. - */ -static void __init init_unavailable_mem(void) -{ - phys_addr_t start, end; - u64 i, pgcnt; - phys_addr_t next = 0; - - /* - * Loop through unavailable ranges not covered by memblock.memory. - */ - pgcnt = 0; - for_each_mem_range(i, &start, &end) { - if (next < start) - pgcnt += init_unavailable_range(PFN_DOWN(next), - PFN_UP(start)); - next = end; - } - - /* - * Early sections always have a fully populated memmap for the whole - * section - see pfn_valid(). If the last section has holes at the - * end and that section is marked "online", the memmap will be - * considered initialized. Make sure that memmap has a well defined - * state. - */ - pgcnt += init_unavailable_range(PFN_DOWN(next), - round_up(max_pfn, PAGES_PER_SECTION)); - - /* - * Struct pages that do not have backing memory. This could be because - * firmware is using some of this memory, or for some other reasons. - */ - if (pgcnt) - pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt); -} -#else -static inline void __init init_unavailable_mem(void) -{ -} -#endif /* !CONFIG_FLAT_NODE_MEM_MAP */ - #if MAX_NUMNODES > 1 /* * Figure out the number of possible node ids. @@ -7584,7 +7563,6 @@ void __init free_area_init(unsigned long /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); - init_unavailable_mem(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid); _ Patches currently in -mm which might be from rppt(a)linux.ibm.com are mm-add-definition-of-pmd_page_order.patch mmap-make-mlock_future_check-global.patch set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix.patch secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch secretmem-add-memcg-accounting.patch pm-hibernate-disable-when-there-are-active-secretmem-users.patch arch-mm-wire-up-memfd_secret-system-call-were-relevant.patch secretmem-test-add-basic-selftest-for-memfd_secret2.patch

4 years, 7 months

1
0
0 0

[to-be-updated] mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: memblock: enforce overlap of memory.memblock and memory.reserved has been removed from the -mm tree. Its filename was mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved.patch This patch was dropped because an updated version will be merged ------------------------------------------------------ From: Mike Rapoport <rppt(a)linux.ibm.com> Subject: mm: memblock: enforce overlap of memory.memblock and memory.reserved Patch series "mm: fix initialization of struct page for holes in memory layout", v2. Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") exposed several issues with the memory map initialization and these patches fix those issues. Initially there were crashes during compaction that Qian Cai reported back in April [1]. It seemed back then that the probelm was fixed, but a few weeks ago Andrea Arcangeli hit the same bug [2] and after a long discussion between us [3] I think these patches are the proper fix. [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun… This patch (of 2): memblock does not require that the reserved memory ranges will be a subset of memblock.memory. As a result there may be reserved pages that are not in the range of any zone or node because zone and node boundaries are detected based on memblock.memory and pages that only present in memblock.reserved are not taken into account during zone/node size detection. Make sure that all ranges in memblock.reserved are added to memblock.memory before calculating node and zone boundaries. Link: https://lkml.kernel.org/r/20201209214304.6812-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20201209214304.6812-2-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com> Reported-by: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Baoquan He <bhe(a)redhat.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Qian Cai <cai(a)lca.pw> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/memblock.h | 1 + mm/memblock.c | 24 ++++++++++++++++++++++++ mm/page_alloc.c | 7 +++++++ 3 files changed, 32 insertions(+) --- a/include/linux/memblock.h~mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved +++ a/include/linux/memblock.h @@ -120,6 +120,7 @@ int memblock_clear_nomap(phys_addr_t bas unsigned long memblock_free_all(void); void reset_node_managed_pages(pg_data_t *pgdat); void reset_all_zones_managed_pages(void); +void memblock_enforce_memory_reserved_overlap(void); /* Low level functions */ void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags, --- a/mm/memblock.c~mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved +++ a/mm/memblock.c @@ -1860,6 +1860,30 @@ void __init_memblock memblock_trim_memor } } +/** + * memblock_enforce_memory_reserved_overlap - make sure every range in + * @memblock.reserved is covered by @memblock.memory + * + * The data in @memblock.memory is used to detect zone and node boundaries + * during initialization of the memory map and the page allocator. Make + * sure that every memory range present in @memblock.reserved is also added + * to @memblock.memory even if the architecture specific memory + * initialization failed to do so + */ +void __init memblock_enforce_memory_reserved_overlap(void) +{ + phys_addr_t start, end; + int nid; + u64 i; + + __for_each_mem_range(i, &memblock.reserved, &memblock.memory, + NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, &nid) { + pr_warn("memblock: reserved range [%pa-%pa] is not in memory\n", + &start, &end); + memblock_add_node(start, (end - start), nid); + } +} + void __init_memblock memblock_set_current_limit(phys_addr_t limit) { memblock.current_limit = limit; --- a/mm/page_alloc.c~mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved +++ a/mm/page_alloc.c @@ -7513,6 +7513,13 @@ void __init free_area_init(unsigned long memset(arch_zone_highest_possible_pfn, 0, sizeof(arch_zone_highest_possible_pfn)); + /* + * Some architectures (e.g. x86) have reserved pages outside of + * memblock.memory. Make sure these pages are taken into account + * when detecting zone and node boundaries + */ + memblock_enforce_memory_reserved_overlap(); + start_pfn = find_min_pfn_with_active_regions(); descending = arch_has_descending_max_zone_pfns(); _ Patches currently in -mm which might be from rppt(a)linux.ibm.com are mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch mm-add-definition-of-pmd_page_order.patch mmap-make-mlock_future_check-global.patch set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix.patch secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch secretmem-add-memcg-accounting.patch pm-hibernate-disable-when-there-are-active-secretmem-users.patch arch-mm-wire-up-memfd_secret-system-call-were-relevant.patch secretmem-test-add-basic-selftest-for-memfd_secret2.patch

4 years, 7 months

1
0
0 0

FAILED: patch "[PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ae28d1aae48a1258bd09a6f707ebb4231d79a761 Mon Sep 17 00:00:00 2001 From: Fenghua Yu <fenghua.yu(a)intel.com> Date: Thu, 17 Dec 2020 14:31:18 -0800 Subject: [PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR Currently, when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running, the work is run as soon as possible. If the task is not running, the work is executed later in the kernel exit path when the kernel returns to the task again. Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups. As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available. To solve these issues, update the PQR_ASSOC MSR in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal. [1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3… [2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.… [ bp: Massage commit message and drop the two update_task_closid_rmid() variants. ] Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt <shakeelb(a)google.com> Reported-by: Valentin Schneider <valentin.schneider(a)arm.com> Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com> Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com> Signed-off-by: Borislav Petkov <bp(a)suse.de> Reviewed-by: Tony Luck <tony.luck(a)intel.com> Reviewed-by: James Morse <james.morse(a)arm.com> Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com> Cc: stable(a)vger.kernel.org Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.16082431… diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 29ffb95b25ff..1c6f8a60ac52 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -525,89 +525,63 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp) kfree(rdtgrp); } -struct task_move_callback { - struct callback_head work; - struct rdtgroup *rdtgrp; -}; - -static void move_myself(struct callback_head *head) +static void _update_task_closid_rmid(void *task) { - struct task_move_callback *callback; - struct rdtgroup *rdtgrp; - - callback = container_of(head, struct task_move_callback, work); - rdtgrp = callback->rdtgrp; - /* - * If resource group was deleted before this task work callback - * was invoked, then assign the task to root group and free the - * resource group. + * If the task is still current on this CPU, update PQR_ASSOC MSR. + * Otherwise, the MSR is updated when the task is scheduled in. */ - if (atomic_dec_and_test(&rdtgrp->waitcount) && - (rdtgrp->flags & RDT_DELETED)) { - current->closid = 0; - current->rmid = 0; - rdtgroup_remove(rdtgrp); - } - - if (unlikely(current->flags & PF_EXITING)) - goto out; - - preempt_disable(); - /* update PQR_ASSOC MSR to make resource group go into effect */ - resctrl_sched_in(); - preempt_enable(); + if (task == current) + resctrl_sched_in(); +} -out: - kfree(callback); +static void update_task_closid_rmid(struct task_struct *t) +{ + if (IS_ENABLED(CONFIG_SMP) && task_curr(t)) + smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1); + else + _update_task_closid_rmid(t); } static int __rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp) { - struct task_move_callback *callback; - int ret; - - callback = kzalloc(sizeof(*callback), GFP_KERNEL); - if (!callback) - return -ENOMEM; - callback->work.func = move_myself; - callback->rdtgrp = rdtgrp; - /* - * Take a refcount, so rdtgrp cannot be freed before the - * callback has been invoked. + * Set the task's closid/rmid before the PQR_ASSOC MSR can be + * updated by them. + * + * For ctrl_mon groups, move both closid and rmid. + * For monitor groups, can move the tasks only from + * their parent CTRL group. */ - atomic_inc(&rdtgrp->waitcount); - ret = task_work_add(tsk, &callback->work, TWA_RESUME); - if (ret) { - /* - * Task is exiting. Drop the refcount and free the callback. - * No need to check the refcount as the group cannot be - * deleted before the write function unlocks rdtgroup_mutex. - */ - atomic_dec(&rdtgrp->waitcount); - kfree(callback); - rdt_last_cmd_puts("Task exited\n"); - } else { - /* - * For ctrl_mon groups move both closid and rmid. - * For monitor groups, can move the tasks only from - * their parent CTRL group. - */ - if (rdtgrp->type == RDTCTRL_GROUP) { - tsk->closid = rdtgrp->closid; + + if (rdtgrp->type == RDTCTRL_GROUP) { + tsk->closid = rdtgrp->closid; + tsk->rmid = rdtgrp->mon.rmid; + } else if (rdtgrp->type == RDTMON_GROUP) { + if (rdtgrp->mon.parent->closid == tsk->closid) { tsk->rmid = rdtgrp->mon.rmid; - } else if (rdtgrp->type == RDTMON_GROUP) { - if (rdtgrp->mon.parent->closid == tsk->closid) { - tsk->rmid = rdtgrp->mon.rmid; - } else { - rdt_last_cmd_puts("Can't move task to different control group\n"); - ret = -EINVAL; - } + } else { + rdt_last_cmd_puts("Can't move task to different control group\n"); + return -EINVAL; } } - return ret; + + /* + * Ensure the task's closid and rmid are written before determining if + * the task is current that will decide if it will be interrupted. + */ + barrier(); + + /* + * By now, the task's closid and rmid are set. If the task is current + * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource + * group go into effect. If the task is not current, the MSR will be + * updated when the task is scheduled in. + */ + update_task_closid_rmid(tsk); + + return 0; } static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)

4 years, 7 months

2
2
0 0

FAILED: patch "[PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ae28d1aae48a1258bd09a6f707ebb4231d79a761 Mon Sep 17 00:00:00 2001 From: Fenghua Yu <fenghua.yu(a)intel.com> Date: Thu, 17 Dec 2020 14:31:18 -0800 Subject: [PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR Currently, when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running, the work is run as soon as possible. If the task is not running, the work is executed later in the kernel exit path when the kernel returns to the task again. Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups. As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available. To solve these issues, update the PQR_ASSOC MSR in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal. [1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3… [2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.… [ bp: Massage commit message and drop the two update_task_closid_rmid() variants. ] Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt <shakeelb(a)google.com> Reported-by: Valentin Schneider <valentin.schneider(a)arm.com> Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com> Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com> Signed-off-by: Borislav Petkov <bp(a)suse.de> Reviewed-by: Tony Luck <tony.luck(a)intel.com> Reviewed-by: James Morse <james.morse(a)arm.com> Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com> Cc: stable(a)vger.kernel.org Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.16082431… diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 29ffb95b25ff..1c6f8a60ac52 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -525,89 +525,63 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp) kfree(rdtgrp); } -struct task_move_callback { - struct callback_head work; - struct rdtgroup *rdtgrp; -}; - -static void move_myself(struct callback_head *head) +static void _update_task_closid_rmid(void *task) { - struct task_move_callback *callback; - struct rdtgroup *rdtgrp; - - callback = container_of(head, struct task_move_callback, work); - rdtgrp = callback->rdtgrp; - /* - * If resource group was deleted before this task work callback - * was invoked, then assign the task to root group and free the - * resource group. + * If the task is still current on this CPU, update PQR_ASSOC MSR. + * Otherwise, the MSR is updated when the task is scheduled in. */ - if (atomic_dec_and_test(&rdtgrp->waitcount) && - (rdtgrp->flags & RDT_DELETED)) { - current->closid = 0; - current->rmid = 0; - rdtgroup_remove(rdtgrp); - } - - if (unlikely(current->flags & PF_EXITING)) - goto out; - - preempt_disable(); - /* update PQR_ASSOC MSR to make resource group go into effect */ - resctrl_sched_in(); - preempt_enable(); + if (task == current) + resctrl_sched_in(); +} -out: - kfree(callback); +static void update_task_closid_rmid(struct task_struct *t) +{ + if (IS_ENABLED(CONFIG_SMP) && task_curr(t)) + smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1); + else + _update_task_closid_rmid(t); } static int __rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp) { - struct task_move_callback *callback; - int ret; - - callback = kzalloc(sizeof(*callback), GFP_KERNEL); - if (!callback) - return -ENOMEM; - callback->work.func = move_myself; - callback->rdtgrp = rdtgrp; - /* - * Take a refcount, so rdtgrp cannot be freed before the - * callback has been invoked. + * Set the task's closid/rmid before the PQR_ASSOC MSR can be + * updated by them. + * + * For ctrl_mon groups, move both closid and rmid. + * For monitor groups, can move the tasks only from + * their parent CTRL group. */ - atomic_inc(&rdtgrp->waitcount); - ret = task_work_add(tsk, &callback->work, TWA_RESUME); - if (ret) { - /* - * Task is exiting. Drop the refcount and free the callback. - * No need to check the refcount as the group cannot be - * deleted before the write function unlocks rdtgroup_mutex. - */ - atomic_dec(&rdtgrp->waitcount); - kfree(callback); - rdt_last_cmd_puts("Task exited\n"); - } else { - /* - * For ctrl_mon groups move both closid and rmid. - * For monitor groups, can move the tasks only from - * their parent CTRL group. - */ - if (rdtgrp->type == RDTCTRL_GROUP) { - tsk->closid = rdtgrp->closid; + + if (rdtgrp->type == RDTCTRL_GROUP) { + tsk->closid = rdtgrp->closid; + tsk->rmid = rdtgrp->mon.rmid; + } else if (rdtgrp->type == RDTMON_GROUP) { + if (rdtgrp->mon.parent->closid == tsk->closid) { tsk->rmid = rdtgrp->mon.rmid; - } else if (rdtgrp->type == RDTMON_GROUP) { - if (rdtgrp->mon.parent->closid == tsk->closid) { - tsk->rmid = rdtgrp->mon.rmid; - } else { - rdt_last_cmd_puts("Can't move task to different control group\n"); - ret = -EINVAL; - } + } else { + rdt_last_cmd_puts("Can't move task to different control group\n"); + return -EINVAL; } } - return ret; + + /* + * Ensure the task's closid and rmid are written before determining if + * the task is current that will decide if it will be interrupted. + */ + barrier(); + + /* + * By now, the task's closid and rmid are set. If the task is current + * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource + * group go into effect. If the task is not current, the MSR will be + * updated when the task is scheduled in. + */ + update_task_closid_rmid(tsk); + + return 0; } static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)

4 years, 7 months

2
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror