January 2021 - Linux-stable-mirror

[PATCH] KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX

by Paolo Bonzini

VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable(a)vger.kernel.org> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable(a)vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> --- arch/x86/kvm/svm/nested.c | 3 +++ arch/x86/kvm/vmx/nested.c | 32 ++++++++++++++++++++++++++------ arch/x86/kvm/x86.c | 4 +--- 3 files changed, 30 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index cb4c6ee10029..7a605ad8254d 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -200,6 +200,9 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + if (WARN_ON(!is_guest_mode(vcpu))) + return true; + if (!nested_svm_vmrun_msrpm(svm)) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror = diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 0fbb46990dfc..20ab40a2ac34 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3124,13 +3124,9 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu) return 0; } -static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) +static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu) { - struct vmcs12 *vmcs12 = get_vmcs12(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); - struct kvm_host_map *map; - struct page *page; - u64 hpa; /* * hv_evmcs may end up being not mapped after migration (when @@ -3152,6 +3148,19 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) return false; } } + return true; +} + +static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + struct vcpu_vmx *vmx = to_vmx(vcpu); + struct kvm_host_map *map; + struct page *page; + u64 hpa; + + if (!nested_get_evmcs_page(vcpu)) + return false; if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { /* @@ -3224,6 +3233,17 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) return true; } +static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu) +{ + if (!nested_get_evmcs_page(vcpu)) + return false; + + if (is_guest_mode(vcpu) && !nested_get_vmcs12_pages(vcpu)) + return false; + + return true; +} + static int nested_vmx_write_pml_buffer(struct kvm_vcpu *vcpu, gpa_t gpa) { struct vmcs12 *vmcs12; @@ -6602,7 +6622,7 @@ struct kvm_x86_nested_ops vmx_nested_ops = { .hv_timer_pending = nested_vmx_preemption_timer_pending, .get_state = vmx_get_nested_state, .set_state = vmx_set_nested_state, - .get_nested_state_pages = nested_get_vmcs12_pages, + .get_nested_state_pages = vmx_get_nested_state_pages, .write_log_dirty = nested_vmx_write_pml_buffer, .enable_evmcs = nested_enable_evmcs, .get_evmcs_version = nested_get_evmcs_version, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9a8969a6dd06..b910aa74ee05 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8802,9 +8802,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_request_pending(vcpu)) { if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { - if (WARN_ON_ONCE(!is_guest_mode(vcpu))) - ; - else if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { + if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) { r = 0; goto out; } -- 2.26.2

4 years, 10 months

2
4
0 0

[PATCH] drm/i915/gt: Program mocs:63 for cache eviction on gen9

by Chris Wilson

Ville noticed that the last mocs entry is used unconditionally by the HW when it performs cache evictions, and noted that while the value is not meant to be writable by the driver, we should program it to a reasonable value nevertheless. As it turns out, we can change the value of mocs:63 and the value we were programming into it would cause hard hangs in conjunction with atomic operations. Suggested-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707 Fixes: 3bbaba0ceaa2 ("drm/i915: Added Programming of the MOCS") Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Cc: Jason Ekstrand <jason(a)jlekstrand.net> Cc: <stable(a)vger.kernel.org> # v4.3+ --- drivers/gpu/drm/i915/gt/intel_mocs.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c index 254873e1646e..6ae512847f64 100644 --- a/drivers/gpu/drm/i915/gt/intel_mocs.c +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c @@ -131,7 +131,10 @@ static const struct drm_i915_mocs_entry skl_mocs_table[] = { GEN9_MOCS_ENTRIES, MOCS_ENTRY(I915_MOCS_CACHED, LE_3_WB | LE_TC_2_LLC_ELLC | LE_LRUM(3), - L3_3_WB) + L3_3_WB), + MOCS_ENTRY(63, + LE_3_WB | LE_TC_1_LLC | LE_LRUM(3), + L3_1_UC) }; /* NOTE: the LE_TGT_CACHE is not used on Broxton */ -- 2.20.1

4 years, 10 months

3
4
0 0

[merged] proc_sysctl-fix-oops-caused-by-incorrect-command-parameters.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: proc_sysctl: fix oops caused by incorrect command parameters has been removed from the -mm tree. Its filename was proc_sysctl-fix-oops-caused-by-incorrect-command-parameters.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Xiaoming Ni <nixiaoming(a)huawei.com> Subject: proc_sysctl: fix oops caused by incorrect command parameters The process_sysctl_arg() does not check whether val is empty before invoking strlen(val). If the command line parameter () is incorrectly configured and val is empty, oops is triggered. For example: "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is triggered. The call stack is as follows: Kernel command line: .... hung_task_panic ...... Call trace: __pi_strlen+0x10/0x98 parse_args+0x278/0x344 do_sysctl_args+0x8c/0xfc kernel_init+0x5c/0xf4 ret_from_fork+0x10/0x30 To fix it, check whether "val" is empty when "phram" is a sysctl field. Error codes are returned in the failure branch, and error logs are generated by parse_args(). Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line") Signed-off-by: Xiaoming Ni <nixiaoming(a)huawei.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Kees Cook <keescook(a)chromium.org> Cc: Iurii Zaikin <yzaikin(a)google.com> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Heiner Kallweit <hkallweit1(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: <stable(a)vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/proc/proc_sysctl.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/fs/proc/proc_sysctl.c~proc_sysctl-fix-oops-caused-by-incorrect-command-parameters +++ a/fs/proc/proc_sysctl.c @@ -1770,6 +1770,12 @@ static int process_sysctl_arg(char *para return 0; } + if (!val) + return -EINVAL; + len = strlen(val); + if (len == 0) + return -EINVAL; + /* * To set sysctl options, we use a temporary mount of proc, look up the * respective sys/ file and write to it. To avoid mounting it when no @@ -1811,7 +1817,6 @@ static int process_sysctl_arg(char *para file, param, val); goto out; } - len = strlen(val); wret = kernel_write(file, val, len, &pos); if (wret < 0) { err = wret; _ Patches currently in -mm which might be from nixiaoming(a)huawei.com are

4 years, 10 months

1
0
0 0

[merged] mm-fix-page-reference-leak-in-soft_offline_page.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fix page reference leak in soft_offline_page() has been removed from the -mm tree. Its filename was mm-fix-page-reference-leak-in-soft_offline_page.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: fix page reference leak in soft_offline_page() The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup. When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference. Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dw… Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Qian Cai <cai(a)lca.pw> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) --- a/mm/memory-failure.c~mm-fix-page-reference-leak-in-soft_offline_page +++ a/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct return rc; } +static void put_ref_page(struct page *page) +{ + if (page) + put_page(page); +} + /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct int soft_offline_page(unsigned long pfn, int flags) { int ret; - struct page *page; bool try_again = true; + struct page *page, *ref_page = NULL; + + WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED)); if (!pfn_valid(pfn)) return -ENXIO; + if (flags & MF_COUNT_INCREASED) + ref_page = pfn_to_page(pfn); + /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn); - if (!page) + if (!page) { + put_ref_page(ref_page); return -EIO; + } if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); - if (flags & MF_COUNT_INCREASED) - put_page(page); + put_ref_page(ref_page); return 0; } _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are mm-move-pfn_to_online_page-out-of-line.patch mm-teach-pfn_to_online_page-to-consider-subsection-validity.patch mm-teach-pfn_to_online_page-about-zone_device-section-collisions.patch mm-teach-pfn_to_online_page-about-zone_device-section-collisions-fix.patch mm-fix-memory_failure-handling-of-dax-namespace-metadata.patch

4 years, 10 months

1
0
0 0

[merged] mm-fix-numa-stats-for-thp-migration.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fix numa stats for thp migration has been removed from the -mm tree. Its filename was mm-fix-numa-stats-for-thp-migration.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Shakeel Butt <shakeelb(a)google.com> Subject: mm: fix numa stats for thp migration Currently the kernel is not correctly updating the numa stats for NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that. For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment there is no need to handle THP migration as kernel still does not have write support for file THP but to be more future proof, this patch adds the THP support for those stats as well. Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp") Signed-off-by: Shakeel Butt <shakeelb(a)google.com> Acked-by: Yang Shi <shy828301(a)gmail.com> Reviewed-by: Roman Gushchin <guro(a)fb.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Muchun Song <songmuchun(a)bytedance.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) --- a/mm/migrate.c~mm-fix-numa-stats-for-thp-migration +++ a/mm/migrate.c @@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct add struct zone *oldzone, *newzone; int dirty; int expected_count = expected_page_refs(mapping, page) + extra_count; + int nr = thp_nr_pages(page); if (!mapping) { /* Anonymous page without mapping */ @@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct add */ newpage->index = page->index; newpage->mapping = page->mapping; - page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */ + page_ref_add(newpage, nr); /* add cache reference */ if (PageSwapBacked(page)) { __SetPageSwapBacked(newpage); if (PageSwapCache(page)) { @@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct add if (PageTransHuge(page)) { int i; - for (i = 1; i < HPAGE_PMD_NR; i++) { + for (i = 1; i < nr; i++) { xas_next(&xas); xas_store(&xas, newpage); } @@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct add * to one less reference. * We know this isn't the last reference. */ - page_ref_unfreeze(page, expected_count - thp_nr_pages(page)); + page_ref_unfreeze(page, expected_count - nr); xas_unlock(&xas); /* Leave irq disabled to prevent preemption while updating stats */ @@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct add old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); - __dec_lruvec_state(old_lruvec, NR_FILE_PAGES); - __inc_lruvec_state(new_lruvec, NR_FILE_PAGES); + __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr); + __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr); if (PageSwapBacked(page) && !PageSwapCache(page)) { - __dec_lruvec_state(old_lruvec, NR_SHMEM); - __inc_lruvec_state(new_lruvec, NR_SHMEM); + __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr); + __mod_lruvec_state(new_lruvec, NR_SHMEM, nr); } if (dirty && mapping_can_writeback(mapping)) { - __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY); - __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING); - __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY); - __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING); + __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr); + __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr); + __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); + __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } } local_irq_enable(); _ Patches currently in -mm which might be from shakeelb(a)google.com are mm-memcg-add-swapcache-stat-for-memcg-v2.patch

4 years, 10 months

1
0
0 0

[merged] mm-memcg-fix-memcg-file_dirty-numa-stat.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: memcg: fix memcg file_dirty numa stat has been removed from the -mm tree. Its filename was mm-memcg-fix-memcg-file_dirty-numa-stat.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Shakeel Butt <shakeelb(a)google.com> Subject: mm: memcg: fix memcg file_dirty numa stat The kernel updates the per-node NR_FILE_DIRTY stats on page migration but not the memcg numa stats. That was not an issue until recently the commit 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2") exposed numa stats for the memcg. So fix the file_dirty per-memcg numa stat. Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com Fixes: 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2") Signed-off-by: Shakeel Butt <shakeelb(a)google.com> Reviewed-by: Muchun Song <songmuchun(a)bytedance.com> Acked-by: Yang Shi <shy828301(a)gmail.com> Reviewed-by: Roman Gushchin <guro(a)fb.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/migrate.c~mm-memcg-fix-memcg-file_dirty-numa-stat +++ a/mm/migrate.c @@ -500,9 +500,9 @@ int migrate_page_move_mapping(struct add __inc_lruvec_state(new_lruvec, NR_SHMEM); } if (dirty && mapping_can_writeback(mapping)) { - __dec_node_state(oldzone->zone_pgdat, NR_FILE_DIRTY); + __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY); __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING); - __inc_node_state(newzone->zone_pgdat, NR_FILE_DIRTY); + __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY); __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING); } } _ Patches currently in -mm which might be from shakeelb(a)google.com are mm-memcg-add-swapcache-stat-for-memcg-v2.patch

4 years, 10 months

1
0
0 0

[merged] mm-memcg-slab-optimize-objcg-stock-draining.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: memcg/slab: optimize objcg stock draining has been removed from the -mm tree. Its filename was mm-memcg-slab-optimize-objcg-stock-draining.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Roman Gushchin <guro(a)fb.com> Subject: mm: memcg/slab: optimize objcg stock draining Imran Khan reported a 16% regression in hackbench results caused by the commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages"). The regression is noticeable in the case of a consequent allocation of several relatively large slab objects, e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads to a number of atomic operations in page_counter_uncharge(). The corresponding call graph is below (provided by Imran Khan): |__alloc_skb | | | |__kmalloc_reserve.isra.61 | | | | | |__kmalloc_node_track_caller | | | | | | | |slab_pre_alloc_hook.constprop.88 | | | obj_cgroup_charge | | | | | | | | | |__memcg_kmem_charge | | | | | | | | | | | |page_counter_try_charge | | | | | | | | | |refill_obj_stock | | | | | | | | | | | |drain_obj_stock.isra.68 | | | | | | | | | | | | | |__memcg_kmem_uncharge | | | | | | | | | | | | | | | |page_counter_uncharge | | | | | | | | | | | | | | | | | |page_counter_cancel | | | | | | | | | | | |__slab_alloc | | | | | | | | | |___slab_alloc | | | | | | | | |slab_post_alloc_hook Instead of directly uncharging the accounted kernel memory, it's possible to refill the generic page-sized per-cpu stock instead. It's a much faster operation, especially on a default hierarchy. As a bonus, __memcg_kmem_uncharge_page() will also get faster, so the freeing of page-sized kernel allocations (e.g. large kmallocs) will become faster. A similar change has been done earlier for the socket memory by the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket memory uncharging"). Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages") Signed-off-by: Roman Gushchin <guro(a)fb.com> Reported-by: Imran Khan <imran.f.khan(a)oracle.com> Tested-by: Imran Khan <imran.f.khan(a)oracle.com> Reviewed-by: Shakeel Butt <shakeelb(a)google.com> Reviewed-by: Michal Koutn <mkoutny(a)suse.com> Cc: Michal Koutný <mkoutny(a)suse.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memcontrol.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/mm/memcontrol.c~mm-memcg-slab-optimize-objcg-stock-draining +++ a/mm/memcontrol.c @@ -3115,9 +3115,7 @@ void __memcg_kmem_uncharge(struct mem_cg if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) page_counter_uncharge(&memcg->kmem, nr_pages); - page_counter_uncharge(&memcg->memory, nr_pages); - if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, nr_pages); + refill_stock(memcg, nr_pages); } /** _ Patches currently in -mm which might be from guro(a)fb.com are mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch mm-kmem-make-__memcg_kmem_uncharge-static.patch mm-cma-allocate-cma-areas-bottom-up.patch mm-cma-allocate-cma-areas-bottom-up-fix.patch mm-cma-allocate-cma-areas-bottom-up-fix-2.patch mm-cma-allocate-cma-areas-bottom-up-fix-3.patch memblock-do-not-start-bottom-up-allocations-with-kernel_end.patch mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch

4 years, 10 months

1
0
0 0

[merged] mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fix initialization of struct page for holes in memory layout has been removed from the -mm tree. Its filename was mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mike Rapoport <rppt(a)linux.ibm.com> Subject: mm: fix initialization of struct page for holes in memory layout There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields if this page are set to default values and the page is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in struct page) in the same pageblock. Update init_unavailable_mem() to use zone constraints defined by an architecture to properly setup the zone link and use node ID of the adjacent range in memblock.memory to set the node link. Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com> Reported-by: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Baoquan He <bhe(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: David Hildenbrand <david(a)redhat.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Qian Cai <cai(a)lca.pw> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 84 +++++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 34 deletions(-) --- a/mm/page_alloc.c~mm-fix-initialization-of-struct-page-for-holes-in-memory-layout +++ a/mm/page_alloc.c @@ -7078,23 +7078,26 @@ void __init free_area_init_memoryless_no * Initialize all valid struct pages in the range [spfn, epfn) and mark them * PageReserved(). Return the number of struct pages that were initialized. */ -static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn) +static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn, + int zone, int nid) { - unsigned long pfn; + unsigned long pfn, zone_spfn, zone_epfn; u64 pgcnt = 0; + zone_spfn = arch_zone_lowest_possible_pfn[zone]; + zone_epfn = arch_zone_highest_possible_pfn[zone]; + + spfn = clamp(spfn, zone_spfn, zone_epfn); + epfn = clamp(epfn, zone_spfn, zone_epfn); + for (pfn = spfn; pfn < epfn; pfn++) { if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) + pageblock_nr_pages - 1; continue; } - /* - * Use a fake node/zone (0) for now. Some of these pages - * (in memblock.reserved but not in memblock.memory) will - * get re-initialized via reserve_bootmem_region() later. - */ - __init_single_page(pfn_to_page(pfn), pfn, 0, 0); + + __init_single_page(pfn_to_page(pfn), pfn, zone, nid); __SetPageReserved(pfn_to_page(pfn)); pgcnt++; } @@ -7103,51 +7106,64 @@ static u64 __init init_unavailable_range } /* - * Only struct pages that are backed by physical memory are zeroed and - * initialized by going through __init_single_page(). But, there are some - * struct pages which are reserved in memblock allocator and their fields - * may be accessed (for example page_to_pfn() on some configuration accesses - * flags). We must explicitly initialize those struct pages. + * Only struct pages that correspond to ranges defined by memblock.memory + * are zeroed and initialized by going through __init_single_page() during + * memmap_init(). * - * This function also addresses a similar issue where struct pages are left - * uninitialized because the physical address range is not covered by - * memblock.memory or memblock.reserved. That could happen when memblock - * layout is manually configured via memmap=, or when the highest physical - * address (max_pfn) does not end on a section boundary. + * But, there could be struct pages that correspond to holes in + * memblock.memory. This can happen because of the following reasons: + * - phyiscal memory bank size is not necessarily the exact multiple of the + * arbitrary section size + * - early reserved memory may not be listed in memblock.memory + * - memory layouts defined with memmap= kernel parameter may not align + * nicely with memmap sections + * + * Explicitly initialize those struct pages so that: + * - PG_Reserved is set + * - zone link is set accorging to the architecture constrains + * - node is set to node id of the next populated region except for the + * trailing hole where last node id is used */ -static void __init init_unavailable_mem(void) +static void __init init_zone_unavailable_mem(int zone) { - phys_addr_t start, end; - u64 i, pgcnt; - phys_addr_t next = 0; + unsigned long start, end; + int i, nid; + u64 pgcnt; + unsigned long next = 0; /* - * Loop through unavailable ranges not covered by memblock.memory. + * Loop through holes in memblock.memory and initialize struct + * pages corresponding to these holes */ pgcnt = 0; - for_each_mem_range(i, &start, &end) { + for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) { if (next < start) - pgcnt += init_unavailable_range(PFN_DOWN(next), - PFN_UP(start)); + pgcnt += init_unavailable_range(next, start, zone, nid); next = end; } /* - * Early sections always have a fully populated memmap for the whole - * section - see pfn_valid(). If the last section has holes at the - * end and that section is marked "online", the memmap will be - * considered initialized. Make sure that memmap has a well defined - * state. + * Last section may surpass the actual end of memory (e.g. we can + * have 1Gb section and 512Mb of RAM pouplated). + * Make sure that memmap has a well defined state in this case. */ - pgcnt += init_unavailable_range(PFN_DOWN(next), - round_up(max_pfn, PAGES_PER_SECTION)); + end = round_up(max_pfn, PAGES_PER_SECTION); + pgcnt += init_unavailable_range(next, end, zone, nid); /* * Struct pages that do not have backing memory. This could be because * firmware is using some of this memory, or for some other reasons. */ if (pgcnt) - pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt); + pr_info("Zone %s: zeroed struct page in unavailable ranges: %lld pages", zone_names[zone], pgcnt); +} + +static void __init init_unavailable_mem(void) +{ + int zone; + + for (zone = 0; zone < ZONE_MOVABLE; zone++) + init_zone_unavailable_mem(zone); } #else static inline void __init init_unavailable_mem(void) _ Patches currently in -mm which might be from rppt(a)linux.ibm.com are mm-add-definition-of-pmd_page_order.patch mmap-make-mlock_future_check-global.patch riscv-kconfig-make-direct-map-manipulation-options-depend-on-mmu.patch set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch secretmem-add-memcg-accounting.patch pm-hibernate-disable-when-there-are-active-secretmem-users.patch arch-mm-wire-up-memfd_secret-system-call-where-relevant.patch secretmem-test-add-basic-selftest-for-memfd_secret2.patch

4 years, 10 months

1
0
0 0

[merged] x86-setup-dont-remove-e820_type_ram-for-pfn-0.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0 has been removed from the -mm tree. Its filename was x86-setup-dont-remove-e820_type_ram-for-pfn-0.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mike Rapoport <rppt(a)linux.ibm.com> Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0 Patch series "mm: fix initialization of struct page for holes in memory layout", v3. Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") exposed several issues with the memory map initialization and these patches fix those issues. Initially there were crashes during compaction that Qian Cai reported back in April [1]. It seemed back then that the problem was fixed, but a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional discussion at [3]. [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun… This patch (of 2): The first 4Kb of memory is a BIOS owned area and to avoid its allocation for the kernel it was not listed in e820 tables as memory. As the result, pfn 0 was never recognised by the generic memory management and it is not a part of neither node 0 nor ZONE_DMA. If set_pfnblock_flags_mask() would be ever called for the pageblock corresponding to the first 2Mbytes of memory, having pfn 0 outside of ZONE_DMA would trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); Along with reserving the first 4Kb in e820 tables, several first pages are reserved with memblock in several places during setup_arch(). These reservations are enough to ensure the kernel does not touch the BIOS area and it is not necessary to remove E820_TYPE_RAM for pfn 0. Remove the update of e820 table that changes the type of pfn 0 and move the comment describing why it was done to trim_low_memory_range() that reserves the beginning of the memory. Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com> Cc: Baoquan He <bhe(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: David Hildenbrand <david(a)redhat.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Qian Cai <cai(a)lca.pw> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/kernel/setup.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) --- a/arch/x86/kernel/setup.c~x86-setup-dont-remove-e820_type_ram-for-pfn-0 +++ a/arch/x86/kernel/setup.c @@ -661,17 +661,6 @@ static void __init trim_platform_memory_ static void __init trim_bios_range(void) { /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - * - * This typically reserves additional memory (64KiB by default) - * since some BIOSes are known to corrupt low memory. See the - * Kconfig help text for X86_RESERVE_LOW. - */ - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); - - /* * special case: Some BIOSes report the PC BIOS * area (640Kb -> 1Mb) as RAM even though it is not. * take them out. @@ -728,6 +717,15 @@ early_param("reservelow", parse_reservel static void __init trim_low_memory_range(void) { + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + * + * This typically reserves additional memory (64KiB by default) + * since some BIOSes are known to corrupt low memory. See the + * Kconfig help text for X86_RESERVE_LOW. + */ memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); } _ Patches currently in -mm which might be from rppt(a)linux.ibm.com are mm-add-definition-of-pmd_page_order.patch mmap-make-mlock_future_check-global.patch riscv-kconfig-make-direct-map-manipulation-options-depend-on-mmu.patch set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch secretmem-add-memcg-accounting.patch pm-hibernate-disable-when-there-are-active-secretmem-users.patch arch-mm-wire-up-memfd_secret-system-call-where-relevant.patch secretmem-test-add-basic-selftest-for-memfd_secret2.patch

4 years, 10 months

1
0
0 0

[PATCH 4.14 0/3] backport lazytime fix to 4.14

by Eric Biggers

Backport a lazytime fix from upstream to 4.14-stable. To make it cherry-pick cleanly, I first cherry-picked two other commits. Commit #2 had a conflict due to it making a change to fs/xfs/ which isn't applicable to 4.14. I dropped that part of the change. Christoph Hellwig (1): fs: move I_DIRTY_INODE to fs.h Eric Biggers (1): fs: fix lazytime expiration handling in __writeback_single_inode() Jan Kara (1): writeback: Drop I_DIRTY_TIME_EXPIRE fs/ext4/inode.c | 6 ++--- fs/fs-writeback.c | 43 +++++++++++++------------------- fs/gfs2/super.c | 2 +- include/linux/fs.h | 4 +-- include/trace/events/writeback.h | 1 - 5 files changed, 24 insertions(+), 32 deletions(-) -- 2.30.0

4 years, 10 months

1
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror January 2021