February 2021 - Linux-stable-mirror

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 5.4.100 release. There are 13 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.100-rc… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 5.4.100-rc1 Matwey V. Kornilov <matwey(a)sai.msu.ru> media: pwc: Use correct device for DMA Jan Beulich <jbeulich(a)suse.com> xen-blkback: fix error handling in xen_blkbk_map() Jan Beulich <jbeulich(a)suse.com> xen-scsiback: don't "handle" error by BUG() Jan Beulich <jbeulich(a)suse.com> xen-netback: don't "handle" error by BUG() Jan Beulich <jbeulich(a)suse.com> xen-blkback: don't "handle" error by BUG() Stefano Stabellini <stefano.stabellini(a)xilinx.com> xen/arm: don't ignore return errors from set_phys_to_machine Jan Beulich <jbeulich(a)suse.com> Xen/gntdev: correct error checking in gntdev_map_grant_pages() Jan Beulich <jbeulich(a)suse.com> Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages() Jan Beulich <jbeulich(a)suse.com> Xen/x86: also check kernel mapping in set_foreign_p2m_mapping() Jan Beulich <jbeulich(a)suse.com> Xen/x86: don't bail early from clear_foreign_p2m_mapping() Wang Hai <wanghai38(a)huawei.com> net: bridge: Fix a warning when del bridge sysfs Loic Poulain <loic.poulain(a)linaro.org> net: qrtr: Fix port ID for control messages Paolo Bonzini <pbonzini(a)redhat.com> KVM: SEV: fix double locking due to incorrect backport ------------- Diffstat: Makefile | 4 ++-- arch/arm/xen/p2m.c | 6 ++++-- arch/x86/kvm/svm.c | 1 - arch/x86/xen/p2m.c | 15 +++++++-------- drivers/block/xen-blkback/blkback.c | 30 ++++++++++++++++-------------- drivers/media/usb/pwc/pwc-if.c | 22 +++++++++++++--------- drivers/net/xen-netback/netback.c | 4 +--- drivers/xen/gntdev.c | 37 ++++++++++++++++++++----------------- drivers/xen/xen-scsiback.c | 4 ++-- include/xen/grant_table.h | 1 + net/bridge/br.c | 5 ++++- net/qrtr/qrtr.c | 2 +- 12 files changed, 71 insertions(+), 60 deletions(-)

4 years, 4 months

7
21
0 0

Please apply commit 234f414efd11 ("Bluetooth: btusb: Some Qualcomm Bluetooth adapters stop working") back to 5.10.y

by Salvatore Bonaccorso

Hi Please do consider applying 234f414efd11 ("Bluetooth: btusb: Some Qualcomm Bluetooth adapters stop working") back to 5.10.y versions. The issue got introduced in starting in 5.10-rc1. Greg, I realize this is for now only in mainline but not even in released tag, so following your earlier suggestion this might as well be delayed a bit. In https://bugzilla.kernel.org/show_bug.cgi?id=211571, https://bugzilla.kernel.org/show_bug.cgi?id=210681 and https://bugs.debian.org/981005 several users indicated to be affected and would appreciate a fix to be backported to the stable series. Regards, Salvatore

4 years, 4 months

2
1
0 0

[PATCH v2 0/2] Set CLEAR_PAYLOAD_ID_TABLE as broadcast request

by Wayne Lin

While testing MST hotplug events on daisy chain monitors, find out that CLEAR_PAYLOAD_ID_TABLE is not broadcasted and payload id table is not reset. Dig in deeper and find out two parts needed to be fixed. 1. Link_Count_Total & Link_Count_Remaining of Broadcast message are incorrect. Should set lct=1 & lcr=6 2. CLEAR_PAYLOAD_ID_TABLE request message is not set as path broadcast request message. Should fix this. Changes since v1: *Refer to the suggestion from Ville Syrjala. While preparing hdr-rad, take broadcast case into consideration. Wayne Lin (2): drm/dp_mst: Revise broadcast msg lct & lcr drm/dp_mst: Set CLEAR_PAYLOAD_ID_TABLE as broadcast drivers/gpu/drm/drm_dp_mst_topology.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) -- 2.17.1

4 years, 4 months

3
4
0 0

[PATCH v2 0/1] s390/vfio-ap: fix circular lockdep when staring SE guest

by Tony Krowiak

Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated") introduced a change that results in a circular lockdep when a Secure Execution guest that is configured with crypto devices is started. The problem resulted due to the fact that the patch moved the setting of the guest's AP masks within the protection of the matrix_dev->lock when the vfio_ap driver is notified that the KVM pointer has been set. Since it is not critical that setting/clearing of the guest's AP masks be done under the matrix_dev->lock when the driver is notified, the masks will not be updated under the matrix_dev->lock. The lock is necessary for the setting/unsetting of the KVM pointer, however, so that will remain in place. The dependency chain for the circular lockdep resolved by this patch is (in reverse order): 2: vfio_ap_mdev_group_notifier: kvm->lock matrix_dev->lock 1: handle_pqap: matrix_dev->lock kvm_vcpu_ioctl: vcpu->mutex 0: kvm_s390_cpus_to_pv: vcpu->mutex kvm_vm_ioctl: kvm->lock Please note that if checkpatch is run against this patch series, you may get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not pulled?" message. The commit 'f21916ec4826', however, is definitely in the master branch on top of which this patch series was built, so I'm not sure why this message is being output by checkpatch. Change log v1=> v2: ------------------ * No longer holding the matrix_dev->lock prior to setting/clearing the masks supplying the AP configuration to a KVM guest. * Make all updates to the data in the matrix mdev that is used to manage AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm() function instead of the group notifier callback. * Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm() function instead of the vfio_ap_mdev_release() function. Tony Krowiak (1): s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks drivers/s390/crypto/vfio_ap_ops.c | 119 +++++++++++++++++++++--------- 1 file changed, 84 insertions(+), 35 deletions(-) -- 2.21.1

4 years, 4 months

4
6
0 0

[patch 159/173] mm, compaction: make fast_isolate_freepages() stay within zone

by Andrew Morton

From: Vlastimil Babka <vbabka(a)suse.cz> Subject: mm, compaction: make fast_isolate_freepages() stay within zone Compaction always operates on pages from a single given zone when isolating both pages to migrate and freepages. Pageblock boundaries are intersected with zone boundaries to be safe in case zone starts or ends in the middle of pageblock. The use of pageblock_pfn_to_page() protects against non-contiguous pageblocks. The functions fast_isolate_freepages() and fast_isolate_around() don't currently protect the fast freepage isolation thoroughly enough against these corner cases, and can result in freepage isolation operate outside of zone boundaries: - in fast_isolate_freepages() if we get a pfn from the first pageblock of a zone that starts in the middle of that pageblock, 'highest' can be a pfn outside of the zone. If we fail to isolate anything in this function, we may then call fast_isolate_around() on a pfn outside of the zone and there effectively do a set_pageblock_skip(page_to_pfn(highest)) which may currently hit a VM_BUG_ON() in some configurations - fast_isolate_around() checks only the zone end boundary and not beginning, nor that the pageblock is contiguous (with pageblock_pfn_to_page()) so it's possible that we end up calling isolate_freepages_block() on a range of pfn's from two different zones and end up e.g. isolating freepages under the wrong zone's lock. This patch should fix the above issues. Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: David Rientjes <rientjes(a)google.com> Acked-by: Mel Gorman <mgorman(a)techsingularity.net> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Mike Rapoport <rppt(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/compaction.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) --- a/mm/compaction.c~mm-compaction-make-fast_isolate_freepages-stay-within-zone +++ a/mm/compaction.c @@ -1284,7 +1284,7 @@ static void fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated) { unsigned long start_pfn, end_pfn; - struct page *page = pfn_to_page(pfn); + struct page *page; /* Do not search around if there are enough pages already */ if (cc->nr_freepages >= cc->nr_migratepages) @@ -1295,8 +1295,12 @@ fast_isolate_around(struct compact_contr return; /* Pageblock boundaries */ - start_pfn = pageblock_start_pfn(pfn); - end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)) - 1; + start_pfn = max(pageblock_start_pfn(pfn), cc->zone->zone_start_pfn); + end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)); + + page = pageblock_pfn_to_page(start_pfn, end_pfn, cc->zone); + if (!page) + return; /* Scan before */ if (start_pfn != pfn) { @@ -1398,7 +1402,8 @@ fast_isolate_freepages(struct compact_co pfn = page_to_pfn(freepage); if (pfn >= highest) - highest = pageblock_start_pfn(pfn); + highest = max(pageblock_start_pfn(pfn), + cc->zone->zone_start_pfn); if (pfn >= low_pfn) { cc->fast_search_fail = 0; @@ -1468,7 +1473,8 @@ fast_isolate_freepages(struct compact_co } else { if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pageblock_pfn_to_page(min_pfn, - pageblock_end_pfn(min_pfn), + min(pageblock_end_pfn(min_pfn), + zone_end_pfn(cc->zone)), cc->zone); cc->free_pfn = min_pfn; } _

4 years, 4 months

1
0
0 0

[patch 152/173] mm/vmscan: restore zone_reclaim_mode ABI

by Andrew Morton

From: Dave Hansen <dave.hansen(a)linux.intel.com> Subject: mm/vmscan: restore zone_reclaim_mode ABI I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl. Like a good kernel developer, I also went to go update the documentation. I noticed that the bits in the documentation didn't match the bits in the #defines. The VM never explicitly checks the RECLAIM_ZONE bit. The bit is, however implicitly checked when checking 'node_reclaim_mode==0'. The RECLAIM_ZONE #define was removed in a cleanup. That, by itself is fine. But, when the bit was removed (bit 0) the _other_ bit locations also got changed. That's not OK because the bit values are documented to mean one specific thing. Users surely do not expect the meaning to change from kernel to kernel. The end result is that if someone had a script that did: sysctl vm.zone_reclaim_mode=1 it would have gone from enabling node reclaim for clean unmapped pages to writing out pages during node reclaim after the commit in question. That's not great. Put the bits back the way they were and add a comment so something like this is a bit harder to do again. Update the documentation to make it clear that the first bit is ignored. Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE") Reviewed-by: Ben Widawsky <ben.widawsky(a)intel.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Acked-by: David Rientjes <rientjes(a)google.com> Acked-by: Christoph Lameter <cl(a)linux.com> Cc: Alex Shi <alex.shi(a)linux.alibaba.com> Cc: Daniel Wagner <dwagner(a)suse.de> Cc: "Tobin C. Harding" <tobin(a)kernel.org> Cc: Christoph Lameter <cl(a)linux.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Huang Ying <ying.huang(a)intel.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Qian Cai <cai(a)lca.pw> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Documentation/admin-guide/sysctl/vm.rst | 10 +++++----- mm/vmscan.c | 9 +++++++-- 2 files changed, 12 insertions(+), 7 deletions(-) --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-zone_reclaim_mode-abi +++ a/Documentation/admin-guide/sysctl/vm.rst @@ -983,11 +983,11 @@ that benefit from having their data cach left disabled as the caching effect is likely to be more important than data locality. -zone_reclaim may be enabled if it's known that the workload is partitioned -such that each partition fits within a NUMA node and that accessing remote -memory would cause a measurable performance reduction. The page allocator -will then reclaim easily reusable pages (those page cache pages that are -currently not used) before allocating off node pages. +Consider enabling one or more zone_reclaim mode bits if it's known that the +workload is partitioned such that each partition fits within a NUMA node +and that accessing remote memory would cause a measurable performance +reduction. The page allocator will take additional actions before +allocating off node pages. Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone --- a/mm/vmscan.c~mm-vmscan-restore-zone_reclaim_mode-abi +++ a/mm/vmscan.c @@ -4085,8 +4085,13 @@ module_init(kswapd_init) */ int node_reclaim_mode __read_mostly; -#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */ -#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */ +/* + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl + * ABI. New bits are OK, but existing bits can never change. + */ +#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ +#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ /* * Priority for NODE_RECLAIM. This determines the fraction of pages _

4 years, 4 months

1
0
0 0

[patch 131/173] hugetlb: fix copy_huge_page_from_user contig page struct assumption

by Andrew Morton

From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: hugetlb: fix copy_huge_page_from_user contig page struct assumption page structs are not guaranteed to be contiguous for gigantic pages. The routine copy_huge_page_from_user can encounter gigantic pages, yet it assumes page structs are contiguous when copying pages from user space. Since page structs for the target gigantic page are not contiguous, the data copied from user space could overwrite other pages not associated with the gigantic page and cause data corruption. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: Davidlohr Bueso <dbueso(a)suse.de> Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Joao Martins <joao.m.martins(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/mm/memory.c~hugetlb-fix-copy_huge_page_from_user-contig-page-struct-assumption +++ a/mm/memory.c @@ -5177,17 +5177,19 @@ long copy_huge_page_from_user(struct pag void *page_kaddr; unsigned long i, rc = 0; unsigned long ret_val = pages_per_huge_page * PAGE_SIZE; + struct page *subpage = dst_page; - for (i = 0; i < pages_per_huge_page; i++) { + for (i = 0; i < pages_per_huge_page; + i++, subpage = mem_map_next(subpage, dst_page, i)) { if (allow_pagefault) - page_kaddr = kmap(dst_page + i); + page_kaddr = kmap(subpage); else - page_kaddr = kmap_atomic(dst_page + i); + page_kaddr = kmap_atomic(subpage); rc = copy_from_user(page_kaddr, (const void __user *)(src + i * PAGE_SIZE), PAGE_SIZE); if (allow_pagefault) - kunmap(dst_page + i); + kunmap(subpage); else kunmap_atomic(page_kaddr); _

4 years, 4 months

1
0
0 0

[patch 130/173] hugetlb: fix update_and_free_page contig page struct assumption

by Andrew Morton

From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: hugetlb: fix update_and_free_page contig page struct assumption page structs are not guaranteed to be contiguous for gigantic pages. The routine update_and_free_page can encounter a gigantic page, yet it assumes page structs are contiguous when setting page flags in subpages. If update_and_free_page encounters non-contiguous page structs, we can see “BUG: Bad page state in process …” errors. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Zi Yan outlined steps to reproduce here [1]. [1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidi… Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") Signed-off-by: Zi Yan <ziy(a)nvidia.com> Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: Davidlohr Bueso <dbueso(a)suse.de> Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Joao Martins <joao.m.martins(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~hugetlb-fix-update_and_free_page-contig-page-struct-assumption +++ a/mm/hugetlb.c @@ -1321,14 +1321,16 @@ static inline void destroy_compound_giga static void update_and_free_page(struct hstate *h, struct page *page) { int i; + struct page *subpage = page; if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; h->nr_huge_pages--; h->nr_huge_pages_node[page_to_nid(page)]--; - for (i = 0; i < pages_per_huge_page(h); i++) { - page[i].flags &= ~(1 << PG_locked | 1 << PG_error | + for (i = 0; i < pages_per_huge_page(h); + i++, subpage = mem_map_next(subpage, page, i)) { + subpage->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | 1 << PG_dirty | 1 << PG_active | 1 << PG_private | 1 << PG_writeback); _

4 years, 4 months

1
0
0 0

[patch 074/173] mm: memcontrol: fix get_active_memcg return value

by Andrew Morton

From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: memcontrol: fix get_active_memcg return value We use a global percpu int_active_memcg variable to store the remote memcg when we are in the interrupt context. But get_active_memcg always return the current->active_memcg or root_mem_cgroup. The remote memcg (set in the interrupt context) is ignored. This is not what we want. So fix it. Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Reviewed-by: Shakeel Butt <shakeelb(a)google.com> Reviewed-by: Roman Gushchin <guro(a)fb.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memcontrol.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-fix-get_active_memcg-return-value +++ a/mm/memcontrol.c @@ -1061,13 +1061,9 @@ static __always_inline struct mem_cgroup rcu_read_lock(); memcg = active_memcg(); - if (memcg) { - /* current->active_memcg must hold a ref. */ - if (WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg = root_mem_cgroup; - else - memcg = current->active_memcg; - } + /* remote memcg must hold a ref. */ + if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css))) + memcg = root_mem_cgroup; rcu_read_unlock(); return memcg; _

4 years, 4 months

1
0
0 0

[patch 073/173] mm: memcontrol: fix swap undercounting in cgroup2

by Andrew Morton

From: Muchun Song <songmuchun(a)bytedance.com> Subject: mm: memcontrol: fix swap undercounting in cgroup2 When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. Add the do_memsw_account() check back to fix this problem. Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb(a)google.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: <stable(a)vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memcontrol.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) --- a/mm/memcontrol.c~mm-memcontrol-fix-swap-undercounting-in-cgroup2 +++ a/mm/memcontrol.c @@ -6748,7 +6748,19 @@ int mem_cgroup_charge(struct page *page, memcg_check_events(memcg, page); local_irq_enable(); - if (PageSwapCache(page)) { + /* + * Cgroup1's unified memory+swap counter has been charged with the + * new swapcache page, finish the transfer by uncharging the swap + * slot. The swap slot would also get uncharged when it dies, but + * it can stick around indefinitely and we'd count the page twice + * the entire time. + * + * Cgroup2 has separate resource counters for memory and swap, + * so this is a non-issue here. Memory and swap charge lifetimes + * correspond 1:1 to page and swap slot lifetimes: we charge the + * page to memory here, and uncharge swap when the slot is freed. + */ + if (do_memsw_account() && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; /* * The swap entry might not get freed for a long time, _

4 years, 4 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2021