The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5abfd71d936a8aefd9f9ccd299dea7a164a5d455 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 22 Mar 2022 14:42:15 -0700
Subject: [PATCH] mm: don't skip swap entry even if zap_details specified
Patch series "mm: Rework zap ptes on swap entries", v5.
Patch 1 should fix a long standing bug for zap_pte_range() on
zap_details usage. The risk is we could have some swap entries skipped
while we should have zapped them.
Migration entries are not the major concern because file backed memory
always zap in the pattern that "first time without page lock, then
re-zap with page lock" hence the 2nd zap will always make sure all
migration entries are already recovered.
However there can be issues with real swap entries got skipped
errornoously. There's a reproducer provided in commit message of patch
1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole
patchset applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should
skip swap entries.
For example, when the callers specified details->zap_mapping==NULL, it
means the user wants to zap all the pages (including COWed pages), then
we need to look into swap entries because there can be private COWed
pages that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly
leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<===
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int page_size;
int shmem_fd;
char *buffer;
void main(void)
{
int ret;
char val;
page_size = getpagesize();
shmem_fd = memfd_create("test", 0);
assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE, shmem_fd, 0);
assert(buffer != MAP_FAILED);
/* Write private page, swap it out */
buffer[page_size] = 1;
madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */
ret = ftruncate(shmem_fd, page_size);
assert(ret == 0);
/* Recover the size */
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
/* Re-read the data, it should be all zero */
val = buffer[page_size];
if (val == 0)
printf("Good\n");
else
printf("BUG\n");
}
===8<===
We don't need to touch up the pmd path, because pmd never had a issue with
swap entries. For example, shmem pmd migration will always be split into
pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile
we should do the same check upon migration entry, hwpoison entry and
genuine swap entries too.
To be explicit, we should still remember to keep the private entries if
even_cows==false, and always zap them when even_cows==true.
The issue seems to exist starting from the initial commit of git.
[peterx(a)redhat.com: comment tweaks]
Link: https://lkml.kernel.org/r/20220217060746.71256-2-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220217060746.71256-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill(a)shutemov.name>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memory.c b/mm/memory.c
index 30e6d0248a3d..a7bf87cf2ba0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1313,6 +1313,17 @@ struct zap_details {
struct folio *single_folio; /* Locked folio to be unmapped */
};
+/* Whether we should zap all COWed (private) pages too */
+static inline bool should_zap_cows(struct zap_details *details)
+{
+ /* By default, zap all pages */
+ if (!details)
+ return true;
+
+ /* Or, we zap COWed pages only if the caller wants to */
+ return !details->zap_mapping;
+}
+
/*
* We set details->zap_mapping when we want to unmap shared but keep private
* pages. Return true if skip zapping this page, false otherwise.
@@ -1320,11 +1331,15 @@ struct zap_details {
static inline bool
zap_skip_check_mapping(struct zap_details *details, struct page *page)
{
- if (!details || !page)
+ /* If we can make a decision without *page.. */
+ if (should_zap_cows(details))
+ return false;
+
+ /* E.g. the caller passes NULL for the case of a zero page */
+ if (!page)
return false;
- return details->zap_mapping &&
- (details->zap_mapping != page_rmapping(page));
+ return details->zap_mapping != page_rmapping(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1405,17 +1420,24 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
continue;
}
- /* If details->check_mapping, we leave swap entries. */
- if (unlikely(details))
- continue;
-
- if (!non_swap_entry(entry))
+ if (!non_swap_entry(entry)) {
+ /* Genuine swap entry, hence a private anon page */
+ if (!should_zap_cows(details))
+ continue;
rss[MM_SWAPENTS]--;
- else if (is_migration_entry(entry)) {
+ } else if (is_migration_entry(entry)) {
struct page *page;
page = pfn_swap_entry_to_page(entry);
+ if (zap_skip_check_mapping(details, page))
+ continue;
rss[mm_counter(page)]--;
+ } else if (is_hwpoison_entry(entry)) {
+ if (!should_zap_cows(details))
+ continue;
+ } else {
+ /* We should have covered all the swap entry types */
+ WARN_ON_ONCE(1);
}
if (unlikely(!free_swap_and_cache(entry)))
print_bad_pte(vma, addr, ptent, NULL);
From: Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
The bug is here:
idr_remove(&connection->peer_devices, vnr);
If the previous for_each_connection() don't exit early (no goto hit
inside the loop), the iterator 'connection' after the loop will be a
bogus pointer to an invalid structure object containing the HEAD
(&resource->connections). As a result, the use of 'connection' above
will lead to a invalid memory access (including a possible invalid free
as idr_remove could call free_layer).
The original intention should have been to remove all peer_devices,
but the following lines have already done the work. So just remove
this line and the unneeded label, to fix this bug.
Cc: stable(a)vger.kernel.org
Fixes: c06ece6ba6f1b ("drbd: Turn connection->volumes into connection->peer_devices")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder(a)linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg(a)linbit.com>
---
drivers/block/drbd/drbd_main.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 9676a1d214bc..d6dfa286ddb3 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2773,12 +2773,12 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
if (init_submitter(device)) {
err = ERR_NOMEM;
- goto out_idr_remove_vol;
+ goto out_idr_remove_from_resource;
}
err = add_disk(disk);
if (err)
- goto out_idr_remove_vol;
+ goto out_idr_remove_from_resource;
/* inherit the connection state */
device->state.conn = first_connection(resource)->cstate;
@@ -2792,8 +2792,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
drbd_debugfs_device_add(device);
return NO_ERROR;
-out_idr_remove_vol:
- idr_remove(&connection->peer_devices, vnr);
out_idr_remove_from_resource:
for_each_connection(connection, resource) {
peer_device = idr_remove(&connection->peer_devices, vnr);
--
2.35.1
Hello!
Compared to the v4.14 backport, this backport for v4.9 includes extra
infrastructure for the arch timer driver, which is needed for the A76 timer
workaround, which is needed to cleanly pick up the A76 MIDR definition.
I didn't backport the Hisilicon 161010101 workaround as I'm unable to test it.
I also backported the capabilities midr-range support, which the bhb
mitigation makes use of. Ard had already backported this to v4.14.
I ended up backporting Ard's version from v4.14 as it conflicted less,
and he had added A35 to the spectre-safe list.
.. and then the Spectre-BHB bits, where as before the KVM stuff is
different to mainline as the infrastructure for doing this stuff is very
different.
Thanks,
James
Anshuman Khandual (1):
arm64: Add Cortex-X2 CPU part definition
Arnd Bergmann (1):
arm64: arch_timer: avoid unused function warning
Dave Martin (1):
arm64: capabilities: Update prototype for enable call back
Ding Tianhong (2):
clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter
clocksource/drivers/arm_arch_timer: Introduce generic errata handling
infrastructure
James Morse (20):
arm64: Remove useless UAO IPI and describe how this gets enabled
arm64: entry.S: Add ventry overflow sanity checks
arm64: entry: Make the trampoline cleanup optional
arm64: entry: Free up another register on kpti's tramp_exit path
arm64: entry: Move the trampoline data page before the text page
arm64: entry: Allow tramp_alias to access symbols after the 4K
boundary
arm64: entry: Don't assume tramp_vectors is the start of the vectors
arm64: entry: Move trampoline macros out of ifdef'd section
arm64: entry: Make the kpti trampoline's kpti sequence optional
arm64: entry: Allow the trampoline text to occupy multiple pages
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
arm64: Move arm64_update_smccc_conduit() out of SSBD ifdef
arm64: entry: Add vectors that have the bhb mitigation sequences
arm64: entry: Add macro for reading symbol addresses from the
trampoline
arm64: Add percpu vectors for EL1
KVM: arm64: Add templates for BHB mitigation sequences
arm64: Mitigate spectre style branch history side channels
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and
migrated
arm64: add ID_AA64ISAR2_EL1 sys register
arm64: Use the clearbhb instruction in mitigations
Marc Zyngier (6):
arm64: arch_timer: Add infrastructure for multiple erratum detection
methods
arm64: arch_timer: Add erratum handler for CPU-specific capability
arm64: arch_timer: Add workaround for ARM erratum 1188873
arm64: Add silicon-errata.txt entry for ARM erratum 1188873
arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT
arm64: Add part number for Neoverse N1
Rob Herring (1):
arm64: Add part number for Arm Cortex-A77
Robert Richter (1):
arm64: errata: Provide macro for major and minor cpu revisions
Suzuki K Poulose (10):
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Move errata work around check on boot CPU
arm64: capabilities: Move errata processing code
arm64: capabilities: Prepare for fine grained capabilities
arm64: capabilities: Add flags to handle the conflicts on late CPU
arm64: capabilities: Clean up midr range helpers
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
arm64: Add helper to decode register from instruction
Documentation/arm64/silicon-errata.txt | 1 +
Documentation/kernel-parameters.txt | 9 -
arch/arm/include/asm/kvm_host.h | 5 +
arch/arm/kvm/psci.c | 4 +
arch/arm64/Kconfig | 24 +
arch/arm64/include/asm/arch_timer.h | 44 +-
arch/arm64/include/asm/assembler.h | 34 ++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpucaps.h | 4 +-
arch/arm64/include/asm/cpufeature.h | 232 +++++++++-
arch/arm64/include/asm/cputype.h | 63 +++
arch/arm64/include/asm/fixmap.h | 6 +-
arch/arm64/include/asm/insn.h | 2 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/kvm_mmu.h | 2 +-
arch/arm64/include/asm/mmu.h | 8 +-
arch/arm64/include/asm/processor.h | 6 +-
arch/arm64/include/asm/sections.h | 6 +
arch/arm64/include/asm/sysreg.h | 5 +
arch/arm64/include/asm/vectors.h | 74 +++
arch/arm64/kernel/bpi.S | 55 +++
arch/arm64/kernel/cpu_errata.c | 595 ++++++++++++++++++++-----
arch/arm64/kernel/cpufeature.c | 167 +++++--
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry.S | 197 ++++++--
arch/arm64/kernel/fpsimd.c | 1 +
arch/arm64/kernel/insn.c | 29 ++
arch/arm64/kernel/smp.c | 6 -
arch/arm64/kernel/traps.c | 4 +-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/kvm/hyp/hyp-entry.S | 4 +
arch/arm64/kvm/hyp/switch.c | 9 +-
arch/arm64/mm/fault.c | 17 +-
arch/arm64/mm/mmu.c | 11 +-
drivers/clocksource/Kconfig | 4 +
drivers/clocksource/arm_arch_timer.c | 202 +++++++--
include/linux/arm-smccc.h | 7 +
37 files changed, 1509 insertions(+), 336 deletions(-)
create mode 100644 arch/arm64/include/asm/vectors.h
--
2.30.2