This is the start of the stable review cycle for the 4.19.60 release.
There are 47 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat 20 Jul 2019 02:59:27 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.60-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.60-rc1
Jiri Slaby <jslaby(a)suse.cz>
x86/entry/32: Fix ENDPROC of common_spurious
Dave Airlie <airlied(a)redhat.com>
drm/udl: move to embedding drm device inside udl device.
Thomas Zimmermann <tzimmermann(a)suse.de>
drm/udl: Replace drm_dev_unref with drm_dev_put
Dave Airlie <airlied(a)redhat.com>
drm/udl: introduce a macro to convert dev to udl.
Mark Zhang <markz(a)nvidia.com>
regmap-irq: do not write mask register if mask_base is zero
Haren Myneni <haren(a)linux.vnet.ibm.com>
crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - fix hash on SEC1.
Christophe Leroy <christophe.leroy(a)c-s.fr>
crypto: talitos - move struct talitos_edesc into talitos.h
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: (re-)initialize tiqdio list entries
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390: fix stfle zero padding
Arnd Bergmann <arnd(a)arndb.de>
ARC: hide unused function unw_hdr_alloc
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Seperate unused system vectors from spurious entry again
Thomas Gleixner <tglx(a)linutronix.de>
x86/irq: Handle spurious interrupt after shutdown gracefully
Thomas Gleixner <tglx(a)linutronix.de>
x86/ioapic: Implement irq_get_irqchip_state() callback
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Add optional hardware synchronization for shutdown
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Fix misleading synchronize_irq() documentation
Thomas Gleixner <tglx(a)linutronix.de>
genirq: Delay deactivation in free_irq()
Vinod Koul <vkoul(a)kernel.org>
linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
Nicolas Boichat <drinkcat(a)chromium.org>
pinctrl: mediatek: Update cur_mask in mask/mask ops
Eiichi Tsukata <devel(a)etsukata.com>
cpu/hotplug: Fix out-of-bounds read when setting fail state
Nicolas Boichat <drinkcat(a)chromium.org>
pinctrl: mediatek: Ignore interrupts that are wake only during resume
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
HID: multitouch: Add pointstick support for ALPS Touchpad
Oleksandr Natalenko <oleksandr(a)redhat.com>
HID: chicony: add another quirk for PixArt mouse
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Add missing fixup_pointer() for next_early_pgt access
Kirill A. Shutemov <kirill(a)shutemov.name>
x86/boot/64: Fix crash if kernel image crosses page table boundary
Milan Broz <gmazyland(a)gmail.com>
dm verity: use message limit for data block corruption message
Jerome Marchand <jmarchan(a)redhat.com>
dm table: don't copy from a NULL pointer in realloc_argv()
Phil Reid <preid(a)electromag.com.au>
pinctrl: mcp23s08: Fix add_data and irqchip_add_nested call order
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
ARM: dts: imx6ul: fix PWM[1-4] interrupts
Sergej Benilov <sergej.benilov(a)googlemail.com>
sis900: fix TX completion
Takashi Iwai <tiwai(a)suse.de>
ppp: mppe: Add softdep to arc4
Petr Oros <poros(a)redhat.com>
be2net: fix link failure after ethtool offline test
Colin Ian King <colin.king(a)canonical.com>
x86/apic: Fix integer overflow on 10 bit left shift of cpu_khz
David Howells <dhowells(a)redhat.com>
afs: Fix uninitialised spinlock afs_volume::cb_break_lock
Arnd Bergmann <arnd(a)arndb.de>
ARM: omap2: remove incorrect __init annotation
Linus Walleij <linus.walleij(a)linaro.org>
ARM: dts: gemini Fix up DNS-313 compatible string
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix perf_sample_regs_user() mm check
Hans de Goede <hdegoede(a)redhat.com>
efi/bgrt: Drop BGRT status field reserved bits check
Tony Lindgren <tony(a)atomide.com>
clk: ti: clkctrl: Fix returning uninitialized data
Heyi Guo <guoheyi(a)huawei.com>
irqchip/gic-v3-its: Fix command queue pointer comparison bug
Sven Van Asbroeck <thesven73(a)gmail.com>
firmware: improve LSM/IMA security behaviour
James Morse <james.morse(a)arm.com>
drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
Masahiro Yamada <yamada.masahiro(a)socionext.com>
nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header
Cole Rogers <colerogers(a)disroot.org>
Input: synaptics - enable SMBUS on T480 thinkpad trackpad
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
e1000e: start network tx queue only when link is up
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Revert "e1000e: fix cyclic resets at link up with active tx"
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/unwind.c | 9 ++-
arch/arm/boot/dts/gemini-dlink-dns-313.dts | 2 +-
arch/arm/boot/dts/imx6ul.dtsi | 8 +--
arch/arm/mach-omap2/prm3xxx.c | 2 +-
arch/s390/include/asm/facility.h | 21 ++++--
arch/x86/entry/entry_32.S | 24 +++++++
arch/x86/entry/entry_64.S | 30 ++++++--
arch/x86/include/asm/hw_irq.h | 5 +-
arch/x86/kernel/apic/apic.c | 36 ++++++----
arch/x86/kernel/apic/io_apic.c | 46 ++++++++++++
arch/x86/kernel/apic/vector.c | 4 +-
arch/x86/kernel/head64.c | 20 +++---
arch/x86/kernel/idt.c | 3 +-
arch/x86/kernel/irq.c | 2 +-
drivers/base/cacheinfo.c | 3 +-
drivers/base/firmware_loader/fallback.c | 2 +-
drivers/base/regmap/regmap-irq.c | 6 ++
drivers/clk/ti/clkctrl.c | 7 +-
drivers/crypto/nx/nx-842-powernv.c | 8 ++-
drivers/crypto/talitos.c | 99 +++++++++++---------------
drivers/crypto/talitos.h | 30 ++++++++
drivers/firmware/efi/efi-bgrt.c | 5 --
drivers/gpu/drm/udl/udl_drv.c | 56 ++++++++++++---
drivers/gpu/drm/udl/udl_drv.h | 9 +--
drivers/gpu/drm/udl/udl_fb.c | 12 ++--
drivers/gpu/drm/udl/udl_gem.c | 2 +-
drivers/gpu/drm/udl/udl_main.c | 35 +++------
drivers/hid/hid-ids.h | 2 +
drivers/hid/hid-multitouch.c | 4 ++
drivers/hid/hid-quirks.c | 1 +
drivers/input/mouse/synaptics.c | 1 +
drivers/irqchip/irq-gic-v3-its.c | 35 ++++++---
drivers/md/dm-table.c | 2 +-
drivers/md/dm-verity-target.c | 4 +-
drivers/net/ethernet/emulex/benet/be_ethtool.c | 28 ++++++--
drivers/net/ethernet/intel/e1000e/netdev.c | 21 +++---
drivers/net/ethernet/sis/sis900.c | 16 ++---
drivers/net/ppp/ppp_mppe.c | 1 +
drivers/pinctrl/mediatek/mtk-eint.c | 34 +++++----
drivers/pinctrl/pinctrl-mcp23s08.c | 8 +--
drivers/s390/cio/qdio_setup.c | 2 +
drivers/s390/cio/qdio_thinint.c | 5 +-
fs/afs/callback.c | 4 +-
fs/afs/internal.h | 2 +-
fs/afs/volume.c | 1 +
include/linux/cpuhotplug.h | 1 +
include/linux/kernel.h | 3 +-
include/uapi/linux/nilfs2_ondisk.h | 24 +++----
kernel/cpu.c | 3 +
kernel/events/core.c | 2 +-
kernel/irq/autoprobe.c | 6 +-
kernel/irq/chip.c | 6 ++
kernel/irq/cpuhotplug.c | 2 +-
kernel/irq/internals.h | 5 ++
kernel/irq/manage.c | 88 +++++++++++++++++------
56 files changed, 534 insertions(+), 267 deletions(-)
If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01. However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.
This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether. Let's see how much this trivial patch fixes.
Reported-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Liran Alon <liran.alon(a)oracle.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/nested.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4f23e34f628b..0f1378789bd0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER, -1ull);
+ vmx->nested.need_vmcs12_to_shadow_sync = false;
}
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
@@ -1341,6 +1342,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
unsigned long val;
int i;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
preempt_disable();
vmcs_load(shadow_vmcs);
@@ -1373,6 +1377,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
unsigned long val;
int i, q;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
vmcs_load(shadow_vmcs);
for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -4436,7 +4443,6 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
- vmx->nested.need_vmcs12_to_shadow_sync = false;
vmx_disable_shadow_vmcs(vmx);
}
vmx->nested.posted_intr_nv = -1;
--
1.8.3.1
The livelock can be triggerred in the following pattern,
while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE),
indices)) {
...
for (i = 0; i < pagevec_count(&pvec); i++) {
index = indices[i];
...
}
index++; /* BUG */
}
multi order exceptional entry is not specially considered in
invalidate_inode_pages2_range() and it ended up with a livelock because
both index 0 and index 1 finds the same pmd, but this pmd is binded to
index 0, so index is set to 0 again.
This introduces a helper to take the pmd entry's length into account when
deciding the next index.
Note that there're other users of the above pattern which doesn't need to
fix,
- dax_layout_busy_page
It's been fixed in commit d7782145e1ad
("filesystem-dax: Fix dax_layout_busy_page() livelock")
- truncate_inode_pages_range
This won't loop forever since the exceptional entries are immediately
removed from radix tree after the search.
Fixes: 642261a ("dax: add struct iomap based DAX PMD support")
Cc: <stable(a)vger.kernel.org> since 4.9 to 4.19
Signed-off-by: Liu Bo <bo.liu(a)linux.alibaba.com>
---
The problem is gone after commit f280bf092d48 ("page cache: Convert
find_get_entries to XArray"), but since xarray seems too new to backport
to 4.19, I made this fix based on radix tree implementation.
fs/dax.c | 19 +++++++++++++++++++
include/linux/dax.h | 8 ++++++++
mm/truncate.c | 26 ++++++++++++++++++++++++--
3 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index ac334bc..cd05337 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -764,6 +764,25 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
return __dax_invalidate_mapping_entry(mapping, index, false);
}
+pgoff_t dax_get_multi_order(struct address_space *mapping, pgoff_t index,
+ void *entry)
+{
+ struct radix_tree_root *pages = &mapping->i_pages;
+ pgoff_t nr_pages = 1;
+
+ if (!dax_mapping(mapping))
+ return nr_pages;
+
+ xa_lock_irq(pages);
+ entry = get_unlocked_mapping_entry(mapping, index, NULL);
+ if (entry)
+ nr_pages = 1UL << dax_radix_order(entry);
+ put_unlocked_mapping_entry(mapping, index, entry);
+ xa_unlock_irq(pages);
+
+ return nr_pages;
+}
+
static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev,
sector_t sector, size_t size, struct page *to,
unsigned long vaddr)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a846184..f3c95c6 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -91,6 +91,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
struct page *dax_layout_busy_page(struct address_space *mapping);
bool dax_lock_mapping_entry(struct page *page);
void dax_unlock_mapping_entry(struct page *page);
+pgoff_t dax_get_multi_order(struct address_space *mapping, pgoff_t index,
+ void *entry);
#else
static inline bool bdev_dax_supported(struct block_device *bdev,
int blocksize)
@@ -134,6 +136,12 @@ static inline bool dax_lock_mapping_entry(struct page *page)
static inline void dax_unlock_mapping_entry(struct page *page)
{
}
+
+static inline pgoff_t dax_get_multi_order(struct address_space *mapping,
+ pgoff_t index, void *entry)
+{
+ return 1;
+}
#endif
int dax_read_lock(void);
diff --git a/mm/truncate.c b/mm/truncate.c
index 71b65aa..835911f 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -557,6 +557,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
indices)) {
+ pgoff_t nr_pages = 1;
+
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -568,6 +570,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
if (radix_tree_exceptional_entry(page)) {
invalidate_exceptional_entry(mapping, index,
page);
+ /*
+ * Account for multi-order entries at
+ * the end of the pagevec.
+ */
+ if (i < pagevec_count(&pvec) - 1)
+ continue;
+
+ nr_pages = dax_get_multi_order(mapping, index,
+ page);
continue;
}
@@ -607,7 +618,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec);
cond_resched();
- index++;
+ index += nr_pages;
}
return count;
}
@@ -688,6 +699,8 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
indices)) {
+ pgoff_t nr_pages = 1;
+
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -700,6 +713,15 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
if (!invalidate_exceptional_entry2(mapping,
index, page))
ret = -EBUSY;
+ /*
+ * Account for multi-order entries at
+ * the end of the pagevec.
+ */
+ if (i < pagevec_count(&pvec) - 1)
+ continue;
+
+ nr_pages = dax_get_multi_order(mapping, index,
+ page);
continue;
}
@@ -739,7 +761,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
pagevec_remove_exceptionals(&pvec);
pagevec_release(&pvec);
cond_resched();
- index++;
+ index += nr_pages;
}
/*
* For DAX we invalidate page tables after invalidating radix tree. We
--
1.8.3.1
The patch titled
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
has been added to the -mm tree. Its filename is
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-fix-bad-subpage-pointer-in-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-fix-bad-subpage-pointer-in-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
The patch titled
Subject: mm/hmm: fix ZONE_DEVICE anon page mapping reuse
has been added to the -mm tree. Its filename is
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-fix-zone_device-anon-page-m…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-fix-zone_device-anon-page-m…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix ZONE_DEVICE anon page mapping reuse
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table. For
example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Link: http://lkml.kernel.org/r/20190719192955.30462-3-rcampbell@nvidia.com
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/kernel/memremap.c~mm-hmm-fix-zone_device-anon-page-mapping-reuse
+++ a/kernel/memremap.c
@@ -397,6 +397,30 @@ void __put_devmap_managed_page(struct pa
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
The patch titled
Subject: mm: document zone device struct page field usage
has been added to the -mm tree. Its filename is
mm-document-zone-device-struct-page-field-usage.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-document-zone-device-struct-pag…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-document-zone-device-struct-pag…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm: document zone device struct page field usage
Struct page for ZONE_DEVICE private pages uses the page->mapping and and
page->index fields while the source anonymous pages are migrated to device
private memory. This is so rmap_walk() can find the page when migrating
the ZONE_DEVICE private page back to system memory. ZONE_DEVICE pmem
backed fsdax pages also use the page->mapping and page->index fields when
files are mapped into a process address space.
Restructure struct page and add comments to make this more clear.
Link: http://lkml.kernel.org/r/20190719192955.30462-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm_types.h | 42 +++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 13 deletions(-)
--- a/include/linux/mm_types.h~mm-document-zone-device-struct-page-field-usage
+++ a/include/linux/mm_types.h
@@ -76,13 +76,35 @@ struct page {
* avoid collision and false-positive PageTail().
*/
union {
- struct { /* Page cache and anonymous pages */
- /**
- * @lru: Pageout list, eg. active_list protected by
- * pgdat->lru_lock. Sometimes used as a generic list
- * by the page owner.
- */
- struct list_head lru;
+ struct { /* Page cache, anonymous, ZONE_DEVICE pages */
+ union {
+ /**
+ * @lru: Pageout list, e.g., active_list
+ * protected by pgdat->lru_lock. Sometimes
+ * used as a generic list by the page owner.
+ */
+ struct list_head lru;
+ /**
+ * ZONE_DEVICE pages are never on the lru
+ * list so they reuse the list space.
+ * ZONE_DEVICE private pages are counted as
+ * being mapped so the @mapping and @index
+ * fields are used while the page is migrated
+ * to device private memory.
+ * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
+ * use the @mapping and @index fields when pmem
+ * backed DAX files are mapped.
+ */
+ struct {
+ /**
+ * @pgmap: Points to the hosting
+ * device page map.
+ */
+ struct dev_pagemap *pgmap;
+ /** @zone_device_data: opaque data. */
+ void *zone_device_data;
+ };
+ };
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
pgoff_t index; /* Our offset within mapping. */
@@ -155,12 +177,6 @@ struct page {
spinlock_t ptl;
#endif
};
- struct { /* ZONE_DEVICE pages */
- /** @pgmap: Points to the hosting device page map. */
- struct dev_pagemap *pgmap;
- void *zone_device_data;
- unsigned long _zd_pad_1; /* uses mapping */
- };
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
The patch titled
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
has been removed from the -mm tree. Its filename was
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
When migrating a ZONE device private page from device memory to system
memory, the subpage pointer is initialized from a swap pte which computes
an invalid page pointer. A kernel panic results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Initialize subpage correctly before calling page_remove_rmap().
Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampbell@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..ec1af8b60423 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
--
2.20.1
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..ec1af8b60423 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
*/
+ subpage = page;
goto discard;
}
--
2.20.1
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
kernel/memremap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index bea6f887adad..238ae5d0ae8a 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -408,6 +408,10 @@ void __put_devmap_managed_page(struct page *page)
mem_cgroup_uncharge(page);
+ /* Clear anonymous page mapping to prevent stale pointers */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1