The patch titled
Subject: mm/hmm: fix race between hmm_mirror_unregister() and mmu_notifier callback
has been removed from the -mm tree. Its filename was
mm-hmm-fix-race-between-hmm_mirror_unregister-and-mmu_notifier-callback.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix race between hmm_mirror_unregister() and mmu_notifier callback
In hmm_mirror_unregister(), mm->hmm is set to NULL and then
mmu_notifier_unregister_no_release() is called. That creates a small
window where mmu_notifier can call mmu_notifier_ops with mm->hmm equal to
NULL. Fix this by first unregistering mmu notifier callbacks and then
setting mm->hmm to NULL.
Similarly in hmm_register(), set mm->hmm before registering mmu_notifier
callbacks so callback functions always see mm->hmm set.
Link: http://lkml.kernel.org/r/20181019160442.18723-4-jglisse@redhat.com
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Jérôme Glisse <jglisse(a)redhat.com>
Reviewed-by: Balbir Singh <bsingharora(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hmm.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
--- a/mm/hmm.c~mm-hmm-fix-race-between-hmm_mirror_unregister-and-mmu_notifier-callback
+++ a/mm/hmm.c
@@ -91,16 +91,6 @@ static struct hmm *hmm_register(struct m
spin_lock_init(&hmm->lock);
hmm->mm = mm;
- /*
- * We should only get here if hold the mmap_sem in write mode ie on
- * registration of first mirror through hmm_mirror_register()
- */
- hmm->mmu_notifier.ops = &hmm_mmu_notifier_ops;
- if (__mmu_notifier_register(&hmm->mmu_notifier, mm)) {
- kfree(hmm);
- return NULL;
- }
-
spin_lock(&mm->page_table_lock);
if (!mm->hmm)
mm->hmm = hmm;
@@ -108,12 +98,27 @@ static struct hmm *hmm_register(struct m
cleanup = true;
spin_unlock(&mm->page_table_lock);
- if (cleanup) {
- mmu_notifier_unregister(&hmm->mmu_notifier, mm);
- kfree(hmm);
- }
+ if (cleanup)
+ goto error;
+
+ /*
+ * We should only get here if hold the mmap_sem in write mode ie on
+ * registration of first mirror through hmm_mirror_register()
+ */
+ hmm->mmu_notifier.ops = &hmm_mmu_notifier_ops;
+ if (__mmu_notifier_register(&hmm->mmu_notifier, mm))
+ goto error_mm;
return mm->hmm;
+
+error_mm:
+ spin_lock(&mm->page_table_lock);
+ if (mm->hmm == hmm)
+ mm->hmm = NULL;
+ spin_unlock(&mm->page_table_lock);
+error:
+ kfree(hmm);
+ return NULL;
}
void hmm_mm_destroy(struct mm_struct *mm)
@@ -278,12 +283,13 @@ void hmm_mirror_unregister(struct hmm_mi
if (!should_unregister || mm == NULL)
return;
+ mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm);
+
spin_lock(&mm->page_table_lock);
if (mm->hmm == hmm)
mm->hmm = NULL;
spin_unlock(&mm->page_table_lock);
- mmu_notifier_unregister_no_release(&hmm->mmu_notifier, mm);
kfree(hmm);
}
EXPORT_SYMBOL(hmm_mirror_unregister);
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
The patch titled
Subject: mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly
has been removed from the -mm tree. Its filename was
mm-rmap-map_pte-was-not-handling-private-zone_device-page-properly-v3.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly
Private ZONE_DEVICE pages use a special pte entry and thus are not
present. Properly handle this case in map_pte(), it is already handled in
check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back to any
private ZONE_DEVICE memory to regular memory. This was found after stress
testing migration back to system memory. This ultimatly can lead to the
CPU constantly page fault looping on the special swap entry.
Link: http://lkml.kernel.org/r/20181019160442.18723-3-jglisse@redhat.com
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com>
Reviewed-by: Balbir Singh <bsingharora(a)gmail.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_vma_mapped.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
--- a/mm/page_vma_mapped.c~mm-rmap-map_pte-was-not-handling-private-zone_device-page-properly-v3
+++ a/mm/page_vma_mapped.c
@@ -21,7 +21,29 @@ static bool map_pte(struct page_vma_mapp
if (!is_swap_pte(*pvmw->pte))
return false;
} else {
- if (!pte_present(*pvmw->pte))
+ /*
+ * We get here when we are trying to unmap a private
+ * device page from the process address space. Such
+ * page is not CPU accessible and thus is mapped as
+ * a special swap entry, nonetheless it still does
+ * count as a valid regular mapping for the page (and
+ * is accounted as such in page maps count).
+ *
+ * So handle this special case as if it was a normal
+ * page mapping ie lock CPU page table and returns
+ * true.
+ *
+ * For more details on device private memory see HMM
+ * (include/linux/hmm.h or mm/hmm.c).
+ */
+ if (is_swap_pte(*pvmw->pte)) {
+ swp_entry_t entry;
+
+ /* Handle un-addressable ZONE_DEVICE memory */
+ entry = pte_to_swp_entry(*pvmw->pte);
+ if (!is_device_private_entry(entry))
+ return false;
+ } else if (!pte_present(*pvmw->pte))
return false;
}
}
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
From: "David S. Miller" <davem(a)davemloft.net>
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.
The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.
This seems silly and more expensive than it needs to be but it is enough
for a initial fix.
The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.
Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Tested-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: David Ahern <dsahern(a)gmail.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Souvik Banerjee <souvik1997(a)gmail.com>
Cc: Wang Nan <wangnan0(a)huawei.com>
Cc: stable(a)vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
---
tools/perf/util/machine.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 111ae858cbcb..8ee8ab39d8ac 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2140,6 +2140,27 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
return 0;
}
+static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
+ struct callchain_cursor *cursor,
+ struct symbol **parent,
+ struct addr_location *root_al,
+ u8 *cpumode, int ent)
+{
+ int err = 0;
+
+ while (--ent >= 0) {
+ u64 ip = chain->ips[ent];
+
+ if (ip >= PERF_CONTEXT_MAX) {
+ err = add_callchain_ip(thread, cursor, parent,
+ root_al, cpumode, ip,
+ false, NULL, NULL, 0);
+ break;
+ }
+ }
+ return err;
+}
+
static int thread__resolve_callchain_sample(struct thread *thread,
struct callchain_cursor *cursor,
struct perf_evsel *evsel,
@@ -2246,6 +2267,12 @@ static int thread__resolve_callchain_sample(struct thread *thread,
}
check_calls:
+ if (callchain_param.order != ORDER_CALLEE) {
+ err = find_prev_cpumode(chain, thread, cursor, parent, root_al,
+ &cpumode, chain->nr - first_call);
+ if (err)
+ return (err < 0) ? err : 0;
+ }
for (i = first_call, nr_entries = 0;
i < chain_nr && nr_entries < max_stack; i++) {
u64 ip;
@@ -2260,9 +2287,15 @@ static int thread__resolve_callchain_sample(struct thread *thread,
continue;
#endif
ip = chain->ips[j];
-
if (ip < PERF_CONTEXT_MAX)
++nr_entries;
+ else if (callchain_param.order != ORDER_CALLEE) {
+ err = find_prev_cpumode(chain, thread, cursor, parent,
+ root_al, &cpumode, j);
+ if (err)
+ return (err < 0) ? err : 0;
+ continue;
+ }
err = add_callchain_ip(thread, cursor, parent,
root_al, &cpumode, ip,
--
2.14.4
Initially we bumped into problem with 32-bit aligned atomic64_t
on ARC, see [1]. And then during quite lengthly discussion Peter Z.
mentioned ARCH_KMALLOC_MINALIGN which IMHO makes perfect sense.
If allocation is done by plain kmalloc() obtained buffer will be
ARCH_KMALLOC_MINALIGN aligned and then why buffer obtained via
devm_kmalloc() should have any other alignment?
This way we at least get the same behavior for both types of
allocation.
[1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004009.html
[2] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004036.html
Signed-off-by: Alexey Brodkin <abrodkin(a)synopsys.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: David Laight <David.Laight(a)ACULAB.COM>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vineet Gupta <vgupta(a)synopsys.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Greg KH <greg(a)kroah.com>
Cc: <stable(a)vger.kernel.org> # 4.8+
---
Changes v3 -> v4:
* Use ARCH_KMALLOC_MINALIGN for alignment instead of "8" [Peter]
Changes v2 -> v3:
* Align explicitly to 8 bytes [David]
* Rephrased in-line comment [David]
* Added more techinical details to commit message [Greg]
* Mention more alignment options in commit message [Geert]
Changes v1 -> v2:
* Reworded commit message
* Inserted comment right in source [Thomas]
drivers/base/devres.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 4aaf00d2098b..e038e2b3b7ea 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -26,8 +26,14 @@ struct devres_node {
struct devres {
struct devres_node node;
- /* -- 3 pointers */
- unsigned long long data[]; /* guarantee ull alignment */
+ /*
+ * Some archs want to perform DMA into kmalloc caches
+ * and need a guaranteed alignment larger than
+ * the alignment of a 64-bit integer.
+ * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
+ * buffer alignment as if it was allocated by plain kmalloc().
+ */
+ u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
};
struct devres_group {
--
2.17.2
Hi Greg and other -stable maintainers
Please consider adding da15fc2fa9c (perf tools: Disable parallelism for
'make clean') to your -stable trees. Spurious build failures like that
make it harder to do automatic tests of new -stable (and -stable-rc)
versions.
Thanks,
Rasmus
Hi,
please consider reverting
commit 84379c9afe011020e797e3f50a662b08a6355dcf
netfilter: ipv6: nf_defrag: drop skb dst before queueing
It causes kernel crash for locally generated ipv6 fragments
when netfilter ipv6 defragmentation is used.
The faulty commit is not essential for -stable, it only
delays netns teardown for longer than needed when that netns
still has ipv6 frags queued. Much better than crash :-/
commit ids are:
4.4.y: not affected (not backported)
4.9.y: backported as ad8b1ffc3efae2f65080bdb11145c87d299b8f9a
4.14.y: backported as 28c74ff85efd192aeca9005499ca50c24d795f61
4.18.y: (first affected kernel): 84379c9afe011020e797e3f50a662b08a6355dcf
For 4.19.y, you could also wait for a bug fix to hit Linus tree,
I can ping you again once its in:
https://patchwork.ozlabs.org/patch/988233/
Thanks,
Florian
Typing 'btc' on kdb doing all sorts of fail. Sometimes it would
crash, sometimes display nothing, and sometimes hang.
Bisect tracked this down to the commit ad67b74d2469 ("printk: hash
addresses printed with %p"), suggesting an obvious fix. The pointer
used internally in kdb shouldn't be hashed, so switch it to %px.
Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc: stable(a)vger.kernel.org
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
---
kernel/debug/kdb/kdb_bt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index 6ad4a9fcbd6f..7921ae4fca8d 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -179,14 +179,14 @@ kdb_bt(int argc, const char **argv)
kdb_printf("no process for cpu %ld\n", cpu);
return 0;
}
- sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
+ sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
kdb_parse(buf);
return 0;
}
kdb_printf("btc: cpu status: ");
kdb_parse("cpu\n");
for_each_online_cpu(cpu) {
- sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
+ sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
kdb_parse(buf);
touch_nmi_watchdog();
}
--
2.19.1.568.g152ad8e336-goog