This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.197-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.197-rc1
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
afzal mohammed afzal.mohd.ma@gmail.com ARM: OMAP: replace setup_irq() by request_irq()
Alper Gun alpergun@google.com KVM: SVM: Call SEV Guest Decommission if ASID binding fails
Juergen Gross jgross@suse.com xen/events: reset active flag for lateeoi events later
Petr Mladek pmladek@suse.com kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Petr Mladek pmladek@suse.com kthread_worker: split code for canceling the delayed work timer
Anson Huang Anson.Huang@nxp.com ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
David Rientjes rientjes@google.com KVM: SVM: Periodically schedule when unregistering regions on destroy
Tahsin Erdogan trdgn@amazon.com ext4: eliminate bogus error in ext4_data_block_valid_rcu()
Christian König christian.koenig@amd.com drm/nouveau: fix dma_address check for CPU/GPU sync
ManYi Li limanyi@uniontech.com scsi: sr: Return appropriate error code when disk is ejected
Hugh Dickins hughd@google.com mm, futex: fix shared futex pgoff on shmem huge page
Hugh Dickins hughd@google.com mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
Hugh Dickins hughd@google.com mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): get vma_address_end() earlier
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): use goto instead of while (1)
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): add a level of indentation
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): crossing page table boundary
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): settle PageHuge on entry
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): use page for pvmw->page
Yang Shi shy828301@gmail.com mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
Hugh Dickins hughd@google.com mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
Jue Wang juew@google.com mm/thp: fix page_address_in_vma() on file THP tails
Hugh Dickins hughd@google.com mm/thp: fix vma_address() if virtual address below file offset
Hugh Dickins hughd@google.com mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Hugh Dickins hughd@google.com mm/thp: make is_huge_zero_pmd() safe and quicker
Hugh Dickins hughd@google.com mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
Miaohe Lin linmiaohe@huawei.com mm/rmap: use page_not_mapped in try_to_unmap()
Miaohe Lin linmiaohe@huawei.com mm/rmap: remove unneeded semicolon in page_not_mapped()
Alex Shi alex.shi@linux.alibaba.com mm: add VM_WARN_ON_ONCE_PAGE() macro
-------------
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/dra7.dtsi | 11 ++ arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 - arch/arm/mach-omap1/pm.c | 13 ++- arch/arm/mach-omap1/time.c | 10 +- arch/arm/mach-omap1/timer32k.c | 10 +- arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 181 +++++++++++++++++++++++---------- arch/x86/kvm/svm.c | 33 ++++-- drivers/clk/ti/clk-7xx.c | 1 + drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/scsi/sr.c | 2 + drivers/xen/events/events_base.c | 23 ++++- fs/ext4/block_validity.c | 4 +- include/linux/cpuhotplug.h | 1 + include/linux/huge_mm.h | 8 +- include/linux/hugetlb.h | 16 --- include/linux/mm.h | 3 + include/linux/mmdebug.h | 13 +++ include/linux/pagemap.h | 13 +-- include/linux/rmap.h | 3 +- kernel/futex.c | 2 +- kernel/kthread.c | 77 +++++++++----- mm/huge_memory.c | 56 +++++----- mm/hugetlb.c | 5 +- mm/internal.h | 53 +++++++--- mm/memory.c | 41 ++++++++ mm/page_vma_mapped.c | 160 ++++++++++++++++++----------- mm/pgtable-generic.c | 4 +- mm/rmap.c | 48 +++++---- mm/truncate.c | 43 ++++---- 31 files changed, 546 insertions(+), 304 deletions(-)
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit a4055888629bc0467d12d912cd7c90acdf3d9b12 part ]
Add VM_WARN_ON_ONCE_PAGE() macro.
Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.a... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: original commit was titled mm/memcg: warning on !memcg after readahead page charged which included uses of this macro in mm/memcontrol.c: here omitted.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mmdebug.h | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 2ad72d2c8cc5..5d0767cb424a 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -37,6 +37,18 @@ void dump_mm(const struct mm_struct *mm); BUG(); \ } \ } while (0) +#define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \ + static bool __section(".data.once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ + dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\ + __warned = true; \ + WARN_ON(1); \ + } \ + unlikely(__ret_warn_once); \ +}) + #define VM_WARN_ON(cond) (void)WARN_ON(cond) #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond) #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format) @@ -48,6 +60,7 @@ void dump_mm(const struct mm_struct *mm); #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond) #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond) +#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond) #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond) #endif
From: Miaohe Lin linmiaohe@huawei.com
[ Upstream commit e0af87ff7afcde2660be44302836d2d5618185af ]
Remove extra semicolon without any functional change intended.
Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c index 1bd94ea62f7f..69ce68616cbf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1729,7 +1729,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) static int page_not_mapped(struct page *page) { return !page_mapped(page); -}; +}
/** * try_to_munlock - try to munlock a page
From: Miaohe Lin linmiaohe@huawei.com
[ Upstream commit b7e188ec98b1644ff70a6d3624ea16aadc39f5e0 ]
page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error.
Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/rmap.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index 69ce68616cbf..70872d5b203c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1682,9 +1682,9 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg) return is_vma_temporary_stack(vma); }
-static int page_mapcount_is_zero(struct page *page) +static int page_not_mapped(struct page *page) { - return !total_mapcount(page); + return !page_mapped(page); }
/** @@ -1702,7 +1702,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, .arg = (void *)flags, - .done = page_mapcount_is_zero, + .done = page_not_mapped, .anon_lock = page_lock_anon_vma_read, };
@@ -1726,11 +1726,6 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) return !page_mapcount(page) ? true : false; }
-static int page_not_mapped(struct page *page) -{ - return !page_mapped(page); -} - /** * try_to_munlock - try to munlock a page * @page: the page to be munlocked
From: Hugh Dickins hughd@google.com
[ Upstream commit 99fa8a48203d62b3743d866fc48ef6abaee682be ]
Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.
Here is v2 batch of long-standing THP bug fixes that I had not got around to sending before, but prompted now by Wang Yugui's report https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and they have done no harm, but have *not* fixed that issue: something more is needed and I have no idea of what.
This patch (of 7):
Stressing huge tmpfs page migration racing hole punch often crashed on the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y kernel; or shortly afterwards, on a bad dereference in __split_huge_pmd_locked() when DEBUG_VM=n. They forgot to allow for pmd migration entries in the non-anonymous case.
Full disclosure: those particular experiments were on a kernel with more relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the vanilla kernel: it is conceivable that stricter locking happens to avoid those cases, or makes them less likely; but __split_huge_pmd_locked() already allowed for pmd migration entries when handling anonymous THPs, so this commit brings the shmem and file THP handling into line.
And while there: use old_pmd rather than _pmd, as in the following blocks; and make it clearer to the eye that the !vma_is_anonymous() block is self-contained, making an early return after accounting for unmapping.
Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins hughd@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Yang Shi shy828301@gmail.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Alistair Popple apopple@nvidia.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Zi Yan ziy@nvidia.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Jue Wang juew@google.com Cc: Peter Xu peterx@redhat.com Cc: Jan Kara jack@suse.cz Cc: Shakeel Butt shakeelb@google.com Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: this commit made intervening cleanups in pmdp_huge_clear_flush() redundant: here it's rediffed to skip them.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 27 ++++++++++++++++++--------- mm/pgtable-generic.c | 4 ++-- 2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7c374c0fcf0c..ce6ca09840bd 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, count_vm_event(THP_SPLIT_PMD);
if (!vma_is_anonymous(vma)) { - _pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); + old_pmd = pmdp_huge_clear_flush_notify(vma, haddr, pmd); /* * We are going to unmap this huge page. So * just go ahead and zap it @@ -2134,16 +2134,25 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, zap_deposited_table(mm, pmd); if (vma_is_dax(vma)) return; - page = pmd_page(_pmd); - if (!PageDirty(page) && pmd_dirty(_pmd)) - set_page_dirty(page); - if (!PageReferenced(page) && pmd_young(_pmd)) - SetPageReferenced(page); - page_remove_rmap(page, true); - put_page(page); + if (unlikely(is_pmd_migration_entry(old_pmd))) { + swp_entry_t entry; + + entry = pmd_to_swp_entry(old_pmd); + page = migration_entry_to_page(entry); + } else { + page = pmd_page(old_pmd); + if (!PageDirty(page) && pmd_dirty(old_pmd)) + set_page_dirty(page); + if (!PageReferenced(page) && pmd_young(old_pmd)) + SetPageReferenced(page); + page_remove_rmap(page, true); + put_page(page); + } add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + } + + if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index cf2af04b34b9..36770fcdc358 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -125,8 +125,8 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON((pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && - !pmd_devmap(*pmdp)) || !pmd_present(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && + !pmd_devmap(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd;
From: Hugh Dickins hughd@google.com
[ Upstream commit 3b77e8c8cde581dadab9a0f1543a347e24315f11 ]
Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry.
Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.
__split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present.
Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/huge_mm.h | 8 +++++++- mm/huge_memory.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e375f2249f52..becf9b1eae5a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -224,6 +224,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, extern vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t orig_pmd);
extern struct page *huge_zero_page; +extern unsigned long huge_zero_pfn;
static inline bool is_huge_zero_page(struct page *page) { @@ -232,7 +233,7 @@ static inline bool is_huge_zero_page(struct page *page)
static inline bool is_huge_zero_pmd(pmd_t pmd) { - return is_huge_zero_page(pmd_page(pmd)); + return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd); }
static inline bool is_huge_zero_pud(pud_t pud) @@ -342,6 +343,11 @@ static inline bool is_huge_zero_page(struct page *page) return false; }
+static inline bool is_huge_zero_pmd(pmd_t pmd) +{ + return false; +} + static inline bool is_huge_zero_pud(pud_t pud) { return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ce6ca09840bd..82ed62775c00 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -62,6 +62,7 @@ static struct shrinker deferred_split_shrinker;
static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; +unsigned long huge_zero_pfn __read_mostly = ~0UL;
bool transparent_hugepage_enabled(struct vm_area_struct *vma) { @@ -93,6 +94,7 @@ static struct page *get_huge_zero_page(void) __free_pages(zero_page, compound_order(zero_page)); goto retry; } + WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */ atomic_set(&huge_zero_refcount, 2); @@ -142,6 +144,7 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, if (atomic_cmpxchg(&huge_zero_refcount, 1, 0) == 1) { struct page *zero_page = xchg(&huge_zero_page, NULL); BUG_ON(zero_page == NULL); + WRITE_ONCE(huge_zero_pfn, ~0UL); __free_pages(zero_page, compound_order(zero_page)); return HPAGE_PMD_NR; } @@ -2152,7 +2155,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, return; }
- if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { + if (is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside
From: Hugh Dickins hughd@google.com
[ Upstream commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 ]
Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0.
And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount.
Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page().
When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated.
Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins hughd@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: upstream TTU_SYNC 0x10 takes the value which 5.11 commit 013339df116c ("mm/rmap: always do TTU_IGNORE_ACCESS") freed. It is very tempting to backport that commit (as 5.10 already did) and make no change here; but on reflection, good as that commit is, I'm reluctant to include any possible side-effect of it in this series.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/rmap.h | 3 ++- mm/huge_memory.c | 2 +- mm/page_vma_mapped.c | 11 +++++++++++ mm/rmap.c | 17 ++++++++++++++++- 4 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h index d7d6d4eb1794..91ccae946716 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -98,7 +98,8 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ - TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ + TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ + TTU_SYNC = 0x200, /* avoid racy checks with PVMW_SYNC */ };
#ifdef CONFIG_MMU diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 82ed62775c00..78c1ad5f8109 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2430,7 +2430,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | - TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; + TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC; bool unmap_success;
VM_BUG_ON_PAGE(!PageHead(page), page); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 11df03e71288..08e283ad4660 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -208,6 +208,17 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pvmw->ptl = NULL; } } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(pvmw->page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } return false; } if (!map_pte(pvmw)) diff --git a/mm/rmap.c b/mm/rmap.c index 70872d5b203c..5df055654e63 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1348,6 +1348,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, unsigned long start = address, end; enum ttu_flags flags = (enum ttu_flags)arg;
+ /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + if (flags & TTU_SYNC) + pvmw.flags = PVMW_SYNC; + /* munlock has nothing to gain from examining un-locked vmas */ if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) return true; @@ -1723,7 +1732,13 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags) else rmap_walk(page, &rwc);
- return !page_mapcount(page) ? true : false; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + return !page_mapcount(page); }
/**
From: Hugh Dickins hughd@google.com
[ Upstream commit 494334e43c16d63b878536a26505397fce6ff3a2 ]
Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap().
vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX.
Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too.
Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.
An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one().
Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered.
Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up conflicts on intervening thp_size(), and mmu_notifier_range initializations; substitute for compound_nr().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/internal.h | 53 ++++++++++++++++++++++++++++++++------------ mm/page_vma_mapped.c | 16 +++++-------- mm/rmap.c | 14 ++++++------ 3 files changed, 52 insertions(+), 31 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h index 397183c8fe47..3a2e973138d3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -331,27 +331,52 @@ static inline void mlock_migrate_page(struct page *newpage, struct page *page) extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
/* - * At what user virtual address is page expected in @vma? + * At what user virtual address is page expected in vma? + * Returns -EFAULT if all of the page is outside the range of vma. + * If page is a compound head, the entire compound page is considered. */ static inline unsigned long -__vma_address(struct page *page, struct vm_area_struct *vma) +vma_address(struct page *page, struct vm_area_struct *vma) { - pgoff_t pgoff = page_to_pgoff(page); - return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + pgoff_t pgoff; + unsigned long address; + + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ + pgoff = page_to_pgoff(page); + if (pgoff >= vma->vm_pgoff) { + address = vma->vm_start + + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + /* Check for address beyond vma (or wrapped through 0?) */ + if (address < vma->vm_start || address >= vma->vm_end) + address = -EFAULT; + } else if (PageHead(page) && + pgoff + (1UL << compound_order(page)) - 1 >= vma->vm_pgoff) { + /* Test above avoids possibility of wrap to 0 on 32-bit */ + address = vma->vm_start; + } else { + address = -EFAULT; + } + return address; }
+/* + * Then at what user virtual address will none of the page be found in vma? + * Assumes that vma_address() already returned a good starting address. + * If page is a compound head, the entire compound page is considered. + */ static inline unsigned long -vma_address(struct page *page, struct vm_area_struct *vma) +vma_address_end(struct page *page, struct vm_area_struct *vma) { - unsigned long start, end; - - start = __vma_address(page, vma); - end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1); - - /* page should be within @vma mapping range */ - VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma); - - return max(start, vma->vm_start); + pgoff_t pgoff; + unsigned long address; + + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ + pgoff = page_to_pgoff(page) + (1UL << compound_order(page)); + address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); + /* Check for address beyond vma (or wrapped through 0?) */ + if (address < vma->vm_start || address > vma->vm_end) + address = vma->vm_end; + return address; }
#else /* !CONFIG_MMU */ diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 08e283ad4660..bb7488e86bea 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -224,18 +224,18 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!map_pte(pvmw)) goto next_pte; while (1) { + unsigned long end; + if (check_pte(pvmw)) return true; next_pte: /* Seek to next pte only makes sense for THP */ if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page)) return not_found(pvmw); + end = vma_address_end(pvmw->page, pvmw->vma); do { pvmw->address += PAGE_SIZE; - if (pvmw->address >= pvmw->vma->vm_end || - pvmw->address >= - __vma_address(pvmw->page, pvmw->vma) + - hpage_nr_pages(pvmw->page) * PAGE_SIZE) + if (pvmw->address >= end) return not_found(pvmw); /* Did we cross page table boundary? */ if (pvmw->address % PMD_SIZE == 0) { @@ -273,14 +273,10 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma) .vma = vma, .flags = PVMW_SYNC, }; - unsigned long start, end; - - start = __vma_address(page, vma); - end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
- if (unlikely(end < vma->vm_start || start >= vma->vm_end)) + pvmw.address = vma_address(page, vma); + if (pvmw.address == -EFAULT) return 0; - pvmw.address = max(start, vma->vm_start); if (!page_vma_mapped_walk(&pvmw)) return 0; page_vma_mapped_walk_done(&pvmw); diff --git a/mm/rmap.c b/mm/rmap.c index 5df055654e63..ff56c460af53 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -686,7 +686,6 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) */ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) { - unsigned long address; if (PageAnon(page)) { struct anon_vma *page__anon_vma = page_anon_vma(page); /* @@ -701,10 +700,8 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return -EFAULT; } else return -EFAULT; - address = __vma_address(page, vma); - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) - return -EFAULT; - return address; + + return vma_address(page, vma); }
pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) @@ -896,7 +893,7 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, * We have to assume the worse case ie pmd for invalidation. Note that * the page can not be free from this function. */ - end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page))); + end = vma_address_end(page, vma); mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);
while (page_vma_mapped_walk(&pvmw)) { @@ -1378,7 +1375,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * Note that the page can not be free in this function as call of * try_to_unmap() must hold a reference on the page. */ - end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page))); + end = PageKsm(page) ? + address + PAGE_SIZE : vma_address_end(page, vma); if (PageHuge(page)) { /* * If sharing is possible, start and end will be adjusted @@ -1835,6 +1833,7 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, struct vm_area_struct *vma = avc->vma; unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) @@ -1889,6 +1888,7 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, pgoff_start, pgoff_end) { unsigned long address = vma_address(page, vma);
+ VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
From: Jue Wang juew@google.com
[ Upstream commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 ]
Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it.
hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later.
Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Jue Wang juew@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reviewed-by: Yang Shi shy828301@gmail.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/rmap.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index ff56c460af53..699f445e3e78 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -695,11 +695,11 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) if (!vma->anon_vma || !page__anon_vma || vma->anon_vma->root != page__anon_vma->root) return -EFAULT; - } else if (page->mapping) { - if (!vma->vm_file || vma->vm_file->f_mapping != page->mapping) - return -EFAULT; - } else + } else if (!vma->vm_file) { + return -EFAULT; + } else if (vma->vm_file->f_mapping != compound_head(page)->mapping) { return -EFAULT; + }
return vma_address(page, vma); }
From: Hugh Dickins hughd@google.com
[ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ]
There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped.
The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page.
However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside.
The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside.
Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly.
This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe.
Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped.
Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: fc127da085c2 ("truncate: handle file thp") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Yang Shi shy828301@gmail.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up call to truncate_cleanup_page() in truncate_inode_pages_range(). Use hpage_nr_pages() in unmap_mapping_page().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mm.h | 3 +++ mm/memory.c | 41 +++++++++++++++++++++++++++++++++++++++++ mm/truncate.c | 43 +++++++++++++++++++------------------------ 3 files changed, 63 insertions(+), 24 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index f6ecf41aea83..c736c677b876 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1338,6 +1338,7 @@ struct zap_details { struct address_space *check_mapping; /* Check page->mapping if set */ pgoff_t first_index; /* Lowest page->index to unmap */ pgoff_t last_index; /* Highest page->index to unmap */ + struct page *single_page; /* Locked page to be unmapped */ };
struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr, @@ -1428,6 +1429,7 @@ extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); +void unmap_mapping_page(struct page *page); void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows); void unmap_mapping_range(struct address_space *mapping, @@ -1448,6 +1450,7 @@ static inline int fixup_user_fault(struct task_struct *tsk, BUG(); return -EFAULT; } +static inline void unmap_mapping_page(struct page *page) { } static inline void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { } static inline void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/memory.c b/mm/memory.c index c2011c51f15d..49b546cdce0d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1439,7 +1439,18 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, else if (zap_huge_pmd(tlb, vma, pmd, addr)) goto next; /* fall through */ + } else if (details && details->single_page && + PageTransCompound(details->single_page) && + next - addr == HPAGE_PMD_SIZE && pmd_none(*pmd)) { + spinlock_t *ptl = pmd_lock(tlb->mm, pmd); + /* + * Take and drop THP pmd lock so that we cannot return + * prematurely, while zap_huge_pmd() has cleared *pmd, + * but not yet decremented compound_mapcount(). + */ + spin_unlock(ptl); } + /* * Here there can be other concurrent MADV_DONTNEED or * trans huge page faults running, and if the pmd is @@ -2924,6 +2935,36 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, } }
+/** + * unmap_mapping_page() - Unmap single page from processes. + * @page: The locked page to be unmapped. + * + * Unmap this page from any userspace process which still has it mmaped. + * Typically, for efficiency, the range of nearby pages has already been + * unmapped by unmap_mapping_pages() or unmap_mapping_range(). But once + * truncation or invalidation holds the lock on a page, it may find that + * the page has been remapped again: and then uses unmap_mapping_page() + * to unmap it finally. + */ +void unmap_mapping_page(struct page *page) +{ + struct address_space *mapping = page->mapping; + struct zap_details details = { }; + + VM_BUG_ON(!PageLocked(page)); + VM_BUG_ON(PageTail(page)); + + details.check_mapping = mapping; + details.first_index = page->index; + details.last_index = page->index + hpage_nr_pages(page) - 1; + details.single_page = page; + + i_mmap_lock_write(mapping); + if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) + unmap_mapping_range_tree(&mapping->i_mmap, &details); + i_mmap_unlock_write(mapping); +} + /** * unmap_mapping_pages() - Unmap pages from processes. * @mapping: The address space containing pages to be unmapped. diff --git a/mm/truncate.c b/mm/truncate.c index 71b65aab8077..43c73db17a0a 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -175,13 +175,10 @@ void do_invalidatepage(struct page *page, unsigned int offset, * its lock, b) when a concurrent invalidate_mapping_pages got there first and * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space. */ -static void -truncate_cleanup_page(struct address_space *mapping, struct page *page) +static void truncate_cleanup_page(struct page *page) { - if (page_mapped(page)) { - pgoff_t nr = PageTransHuge(page) ? HPAGE_PMD_NR : 1; - unmap_mapping_pages(mapping, page->index, nr, false); - } + if (page_mapped(page)) + unmap_mapping_page(page);
if (page_has_private(page)) do_invalidatepage(page, 0, PAGE_SIZE); @@ -226,7 +223,7 @@ int truncate_inode_page(struct address_space *mapping, struct page *page) if (page->mapping != mapping) return -EIO;
- truncate_cleanup_page(mapping, page); + truncate_cleanup_page(page); delete_from_page_cache(page); return 0; } @@ -364,7 +361,7 @@ void truncate_inode_pages_range(struct address_space *mapping, pagevec_add(&locked_pvec, page); } for (i = 0; i < pagevec_count(&locked_pvec); i++) - truncate_cleanup_page(mapping, locked_pvec.pages[i]); + truncate_cleanup_page(locked_pvec.pages[i]); delete_from_page_cache_batch(mapping, &locked_pvec); for (i = 0; i < pagevec_count(&locked_pvec); i++) unlock_page(locked_pvec.pages[i]); @@ -703,6 +700,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping, continue; }
+ if (!did_range_unmap && page_mapped(page)) { + /* + * If page is mapped, before taking its lock, + * zap the rest of the file in one hit. + */ + unmap_mapping_pages(mapping, index, + (1 + end - index), false); + did_range_unmap = 1; + } + lock_page(page); WARN_ON(page_to_index(page) != index); if (page->mapping != mapping) { @@ -710,23 +717,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, continue; } wait_on_page_writeback(page); - if (page_mapped(page)) { - if (!did_range_unmap) { - /* - * Zap the rest of the file in one hit. - */ - unmap_mapping_pages(mapping, index, - (1 + end - index), false); - did_range_unmap = 1; - } else { - /* - * Just zap this page - */ - unmap_mapping_pages(mapping, index, - 1, false); - } - } + + if (page_mapped(page)) + unmap_mapping_page(page); BUG_ON(page_mapped(page)); + ret2 = do_launder_page(mapping, page); if (ret2 == 0) { if (!invalidate_complete_page2(mapping, page))
From: Yang Shi shy828301@gmail.com
[ Upstream commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 ]
When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion.
As this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE().
[1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi shy828301@gmail.com Reviewed-by: Zi Yan ziy@nvidia.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Hugh Dickins hughd@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up variables and split_queue_lock in split_huge_page_to_list(), and conflict on ttu_flags in unmap_page().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 78c1ad5f8109..4400957d8e4e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2431,15 +2431,15 @@ static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC; - bool unmap_success;
VM_BUG_ON_PAGE(!PageHead(page), page);
if (PageAnon(page)) ttu_flags |= TTU_SPLIT_FREEZE;
- unmap_success = try_to_unmap(page, ttu_flags); - VM_BUG_ON_PAGE(!unmap_success, page); + try_to_unmap(page, ttu_flags); + + VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); }
static void remap_page(struct page *page) @@ -2698,7 +2698,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; - int count, mapcount, extra_pins, ret; + int extra_pins, ret; bool mlocked; unsigned long flags; pgoff_t end; @@ -2760,7 +2760,6 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
mlocked = PageMlocked(page); unmap_page(head); - VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* Make sure the page is not on per-CPU pagevec as it takes pin */ if (mlocked) @@ -2786,9 +2785,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
/* Prevent deferred_split_scan() touching ->_refcount */ spin_lock(&pgdata->split_queue_lock); - count = page_count(head); - mapcount = total_mapcount(head); - if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { + if (page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { pgdata->split_queue_len--; list_del(page_deferred_list(head)); @@ -2804,16 +2801,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) } else ret = 0; } else { - if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { - pr_alert("total_mapcount: %u, page_count(): %u\n", - mapcount, count); - if (PageTail(page)) - dump_page(head, NULL); - dump_page(page, "total_mapcount(head) > 0"); - BUG(); - } spin_unlock(&pgdata->split_queue_lock); -fail: if (mapping) +fail: + if (mapping) xa_unlock(&mapping->i_pages); spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); remap_page(head);
From: Hugh Dickins hughd@google.com
[ Upstream commit f003c03bd29e6f46fef1b9a8e8d636ac732286d5 ]
Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes".
I've marked all of these for stable: many are merely cleanups, but I think they are much better before the main fix than after.
This patch (of 11):
page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was used, sometimes pvmw->page itself: use the local copy "page" throughout.
Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Alistair Popple apopple@nvidia.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Yang Shi shy828301@gmail.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Zi Yan ziy@nvidia.com Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index bb7488e86bea..d146f6d31a62 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -151,7 +151,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pte) goto next_pte;
- if (unlikely(PageHuge(pvmw->page))) { + if (unlikely(PageHuge(page))) { /* when pud is not present, pte will be NULL */ pvmw->pte = huge_pte_offset(mm, pvmw->address, PAGE_SIZE << compound_order(page)); @@ -213,8 +213,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) * cannot return prematurely, while zap_huge_pmd() has * cleared *pmd but not decremented compound_mapcount(). */ - if ((pvmw->flags & PVMW_SYNC) && - PageTransCompound(pvmw->page)) { + if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) { spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
spin_unlock(ptl); @@ -230,9 +229,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return true; next_pte: /* Seek to next pte only makes sense for THP */ - if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page)) + if (!PageTransHuge(page) || PageHuge(page)) return not_found(pvmw); - end = vma_address_end(pvmw->page, pvmw->vma); + end = vma_address_end(page, pvmw->vma); do { pvmw->address += PAGE_SIZE; if (pvmw->address >= end)
From: Hugh Dickins hughd@google.com
[ Upstream commit 6d0fd5987657cb0c9756ce684e3a74c0f6351728 ]
page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the way at the start, so no need to worry about it later.
Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index d146f6d31a62..036ce20bb154 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -148,10 +148,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pmd && !pvmw->pte) return not_found(pvmw);
- if (pvmw->pte) - goto next_pte; - if (unlikely(PageHuge(page))) { + /* The only possible mapping was handled on last iteration */ + if (pvmw->pte) + return not_found(pvmw); + /* when pud is not present, pte will be NULL */ pvmw->pte = huge_pte_offset(mm, pvmw->address, PAGE_SIZE << compound_order(page)); @@ -164,6 +165,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); return true; } + + if (pvmw->pte) + goto next_pte; restart: pgd = pgd_offset(mm, pvmw->address); if (!pgd_present(*pgd)) @@ -229,7 +233,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return true; next_pte: /* Seek to next pte only makes sense for THP */ - if (!PageTransHuge(page) || PageHuge(page)) + if (!PageTransHuge(page)) return not_found(pvmw); end = vma_address_end(page, pvmw->vma); do {
From: Hugh Dickins hughd@google.com
[ Upstream commit 3306d3119ceacc43ea8b141a73e21fea68eec30c ]
page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then use it in subsequent tests, instead of repeatedly dereferencing pointer.
Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 036ce20bb154..7e1db77b096a 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -187,18 +187,19 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pmde = READ_ONCE(*pvmw->pmd); if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); - if (likely(pmd_trans_huge(*pvmw->pmd))) { + pmde = *pvmw->pmd; + if (likely(pmd_trans_huge(pmde))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); - if (pmd_page(*pvmw->pmd) != page) + if (pmd_page(pmde) != page) return not_found(pvmw); return true; - } else if (!pmd_present(*pvmw->pmd)) { + } else if (!pmd_present(pmde)) { if (thp_migration_supported()) { if (!(pvmw->flags & PVMW_MIGRATION)) return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { - swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); + if (is_migration_entry(pmd_to_swp_entry(pmde))) { + swp_entry_t entry = pmd_to_swp_entry(pmde);
if (migration_entry_to_page(entry) != page) return not_found(pvmw);
From: Hugh Dickins hughd@google.com
[ Upstream commit e2e1d4076c77b3671cf8ce702535ae7dee3acf89 ]
page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to follow the same "return not_found, return not_found, return true" pattern as the block above it (note: returning not_found there is never premature, since existence or prior existence of huge pmd guarantees good alignment).
Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Peter Xu peterx@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 7e1db77b096a..e32ed7912923 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -194,24 +194,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pmd_page(pmde) != page) return not_found(pvmw); return true; - } else if (!pmd_present(pmde)) { - if (thp_migration_supported()) { - if (!(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(pmde))) { - swp_entry_t entry = pmd_to_swp_entry(pmde); + } + if (!pmd_present(pmde)) { + swp_entry_t entry;
- if (migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; - } - } - return not_found(pvmw); - } else { - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; + if (!thp_migration_supported() || + !(pvmw->flags & PVMW_MIGRATION)) + return not_found(pvmw); + entry = pmd_to_swp_entry(pmde); + if (!is_migration_entry(entry) || + migration_entry_to_page(entry) != page) + return not_found(pvmw); + return true; } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; } else if (!pmd_present(pmde)) { /* * If PVMW_SYNC, take and drop THP pmd lock so that we
From: Hugh Dickins hughd@google.com
[ Upstream commit 448282487483d6fa5b2eeeafaa0acc681e544a9c ]
page_vma_mapped_walk() cleanup: adjust the test for crossing page table boundary - I believe pvmw->address is always page-aligned, but nothing else here assumed that; and remember to reset pvmw->pte to NULL after unmapping the page table, though I never saw any bug from that.
Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index e32ed7912923..1fccadd0eb6d 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -240,16 +240,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->address >= end) return not_found(pvmw); /* Did we cross page table boundary? */ - if (pvmw->address % PMD_SIZE == 0) { - pte_unmap(pvmw->pte); + if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) { if (pvmw->ptl) { spin_unlock(pvmw->ptl); pvmw->ptl = NULL; } + pte_unmap(pvmw->pte); + pvmw->pte = NULL; goto restart; - } else { - pvmw->pte++; } + pvmw->pte++; } while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
From: Hugh Dickins hughd@google.com
[ Upstream commit b3807a91aca7d21c05d5790612e49969117a72b9 ]
page_vma_mapped_walk() cleanup: add a level of indentation to much of the body, making no functional change in this commit, but reducing the later diff when this is all converted to a loop.
[hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix] Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com
Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 105 ++++++++++++++++++++++--------------------- 1 file changed, 55 insertions(+), 50 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 1fccadd0eb6d..819945678264 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -169,62 +169,67 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pte) goto next_pte; restart: - pgd = pgd_offset(mm, pvmw->address); - if (!pgd_present(*pgd)) - return false; - p4d = p4d_offset(pgd, pvmw->address); - if (!p4d_present(*p4d)) - return false; - pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) - return false; - pvmw->pmd = pmd_offset(pud, pvmw->address); - /* - * Make sure the pmd value isn't cached in a register by the - * compiler and used as a stale value after we've observed a - * subsequent update. - */ - pmde = READ_ONCE(*pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { - pvmw->ptl = pmd_lock(mm, pvmw->pmd); - pmde = *pvmw->pmd; - if (likely(pmd_trans_huge(pmde))) { - if (pvmw->flags & PVMW_MIGRATION) - return not_found(pvmw); - if (pmd_page(pmde) != page) - return not_found(pvmw); - return true; - } - if (!pmd_present(pmde)) { - swp_entry_t entry; + { + pgd = pgd_offset(mm, pvmw->address); + if (!pgd_present(*pgd)) + return false; + p4d = p4d_offset(pgd, pvmw->address); + if (!p4d_present(*p4d)) + return false; + pud = pud_offset(p4d, pvmw->address); + if (!pud_present(*pud)) + return false;
- if (!thp_migration_supported() || - !(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - entry = pmd_to_swp_entry(pmde); - if (!is_migration_entry(entry) || - migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; - } - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; - } else if (!pmd_present(pmde)) { + pvmw->pmd = pmd_offset(pud, pvmw->address); /* - * If PVMW_SYNC, take and drop THP pmd lock so that we - * cannot return prematurely, while zap_huge_pmd() has - * cleared *pmd but not decremented compound_mapcount(). + * Make sure the pmd value isn't cached in a register by the + * compiler and used as a stale value after we've observed a + * subsequent update. */ - if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) { - spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + pmde = READ_ONCE(*pvmw->pmd);
- spin_unlock(ptl); + if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { + pvmw->ptl = pmd_lock(mm, pvmw->pmd); + pmde = *pvmw->pmd; + if (likely(pmd_trans_huge(pmde))) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (pmd_page(pmde) != page) + return not_found(pvmw); + return true; + } + if (!pmd_present(pmde)) { + swp_entry_t entry; + + if (!thp_migration_supported() || + !(pvmw->flags & PVMW_MIGRATION)) + return not_found(pvmw); + entry = pmd_to_swp_entry(pmde); + if (!is_migration_entry(entry) || + migration_entry_to_page(entry) != page) + return not_found(pvmw); + return true; + } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } + return false; } - return false; + if (!map_pte(pvmw)) + goto next_pte; } - if (!map_pte(pvmw)) - goto next_pte; while (1) { unsigned long end;
From: Hugh Dickins hughd@google.com
[ Upstream commit 474466301dfd8b39a10c01db740645f3f7ae9a28 ]
page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte, and use "goto this_pte", in place of the "while (1)" loop at the end.
Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 819945678264..470f19c917ad 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -139,6 +139,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) { struct mm_struct *mm = pvmw->vma->vm_mm; struct page *page = pvmw->page; + unsigned long end; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -229,10 +230,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } if (!map_pte(pvmw)) goto next_pte; - } - while (1) { - unsigned long end; - +this_pte: if (check_pte(pvmw)) return true; next_pte: @@ -261,6 +259,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pvmw->ptl = pte_lockptr(mm, pvmw->pmd); spin_lock(pvmw->ptl); } + goto this_pte; } }
From: Hugh Dickins hughd@google.com
[ Upstream commit a765c417d876cc635f628365ec9aa6f09470069a ]
page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte.
It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit.
Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 470f19c917ad..7239c38e99e2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -167,6 +167,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return true; }
+ /* + * Seek to next pte only makes sense for THP. + * But more important than that optimization, is to filter out + * any PageKsm page: whose page->index misleads vma_address() + * and vma_address_end() to disaster. + */ + end = PageTransCompound(page) ? + vma_address_end(page, pvmw->vma) : + pvmw->address + PAGE_SIZE; if (pvmw->pte) goto next_pte; restart: @@ -234,10 +243,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (check_pte(pvmw)) return true; next_pte: - /* Seek to next pte only makes sense for THP */ - if (!PageTransHuge(page)) - return not_found(pvmw); - end = vma_address_end(page, pvmw->vma); do { pvmw->address += PAGE_SIZE; if (pvmw->address >= end)
From: Hugh Dickins hughd@google.com
[ Upstream commit a9a7504d9beaf395481faa91e70e2fd08f7a3dde ]
Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into.
Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next.
Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway.
Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 7239c38e99e2..b7a8009c1549 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -111,6 +111,13 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return pfn_in_hpage(pvmw->page, pfn); }
+static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) +{ + pvmw->address = (pvmw->address + size) & ~(size - 1); + if (!pvmw->address) + pvmw->address = ULONG_MAX; +} + /** * page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at * @pvmw->address @@ -179,16 +186,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (pvmw->pte) goto next_pte; restart: - { + do { pgd = pgd_offset(mm, pvmw->address); - if (!pgd_present(*pgd)) - return false; + if (!pgd_present(*pgd)) { + step_forward(pvmw, PGDIR_SIZE); + continue; + } p4d = p4d_offset(pgd, pvmw->address); - if (!p4d_present(*p4d)) - return false; + if (!p4d_present(*p4d)) { + step_forward(pvmw, P4D_SIZE); + continue; + } pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) - return false; + if (!pud_present(*pud)) { + step_forward(pvmw, PUD_SIZE); + continue; + }
pvmw->pmd = pmd_offset(pud, pvmw->address); /* @@ -235,7 +248,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
spin_unlock(ptl); } - return false; + step_forward(pvmw, PMD_SIZE); + continue; } if (!map_pte(pvmw)) goto next_pte; @@ -265,7 +279,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) spin_lock(pvmw->ptl); } goto this_pte; - } + } while (pvmw->address < end); + + return false; }
/**
From: Hugh Dickins hughd@google.com
[ Upstream commit a7a69d8ba88d8dcee7ef00e91d413a4bd003a814 ]
Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any.
Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Tested-by: Wang Yugui wangyugui@e16-tech.com Cc: Alistair Popple apopple@nvidia.com Cc: Matthew Wilcox willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Will Deacon will@kernel.org Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page_vma_mapped.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index b7a8009c1549..edca78609318 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -272,6 +272,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) goto restart; } pvmw->pte++; + if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) { + pvmw->ptl = pte_lockptr(mm, pvmw->pmd); + spin_lock(pvmw->ptl); + } } while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
From: Hugh Dickins hughd@google.com
[ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ]
If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong.
When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages.
page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head.
Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it.
[akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]
Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu neelnatu@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Zhang Yi wetpzy@gmail.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Darren Hart dvhart@infradead.org Cc: Davidlohr Bueso dave@stgolabs.net Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: leave redundant #include <linux/hugetlb.h> in kernel/futex.c, to avoid conflict over the header files included.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/hugetlb.h | 16 ---------------- include/linux/pagemap.h | 13 +++++++------ kernel/futex.c | 2 +- mm/hugetlb.c | 5 +---- 4 files changed, 9 insertions(+), 27 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c129c1c14c5f..2df83a659818 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -477,17 +477,6 @@ static inline int hstate_index(struct hstate *h) return h - hstates; }
-pgoff_t __basepage_index(struct page *page); - -/* Return page->index in PAGE_SIZE units */ -static inline pgoff_t basepage_index(struct page *page) -{ - if (!PageCompound(page)) - return page->index; - - return __basepage_index(page); -} - extern int dissolve_free_huge_page(struct page *page); extern int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn); @@ -582,11 +571,6 @@ static inline int hstate_index(struct hstate *h) return 0; }
-static inline pgoff_t basepage_index(struct page *page) -{ - return page->index; -} - static inline int dissolve_free_huge_page(struct page *page) { return 0; diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index b1bd2186e6d2..33b63b2a163f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -403,7 +403,7 @@ static inline struct page *read_mapping_page(struct address_space *mapping, }
/* - * Get index of the page with in radix-tree + * Get index of the page within radix-tree (but not for hugetlb pages). * (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_index(struct page *page) @@ -422,15 +422,16 @@ static inline pgoff_t page_to_index(struct page *page) return pgoff; }
+extern pgoff_t hugetlb_basepage_index(struct page *page); + /* - * Get the offset in PAGE_SIZE. - * (TODO: hugepage should have ->index in PAGE_SIZE) + * Get the offset in PAGE_SIZE (even for hugetlb pages). + * (TODO: hugetlb pages should have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_pgoff(struct page *page) { - if (unlikely(PageHeadHuge(page))) - return page->index << compound_order(page); - + if (unlikely(PageHuge(page))) + return hugetlb_basepage_index(page); return page_to_index(page); }
diff --git a/kernel/futex.c b/kernel/futex.c index 526ebcff5a0a..3c67da9b8408 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -719,7 +719,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
key->both.offset |= FUT_OFF_INODE; /* inode-based key */ key->shared.i_seq = get_inode_sequence_number(inode); - key->shared.pgoff = basepage_index(tail); + key->shared.pgoff = page_to_pgoff(tail); rcu_read_unlock(); }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c69f12e4c149..ebcf26bc4cd4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1391,15 +1391,12 @@ int PageHeadHuge(struct page *page_head) return get_compound_page_dtor(page_head) == free_huge_page; }
-pgoff_t __basepage_index(struct page *page) +pgoff_t hugetlb_basepage_index(struct page *page) { struct page *page_head = compound_head(page); pgoff_t index = page_index(page_head); unsigned long compound_idx;
- if (!PageHuge(page_head)) - return page_index(page); - if (compound_order(page_head) >= MAX_ORDER) compound_idx = page_to_pfn(page) - page_to_pfn(page_head); else
From: ManYi Li limanyi@uniontech.com
[ Upstream commit 7dd753ca59d6c8cc09aa1ed24f7657524803c7f3 ]
Handle a reported media event code of 3. This indicates that the media has been removed from the drive and user intervention is required to proceed. Return DISK_EVENT_EJECT_REQUEST in that case.
Link: https://lore.kernel.org/r/20210611094402.23884-1-limanyi@uniontech.com Signed-off-by: ManYi Li limanyi@uniontech.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sr.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 45c8bf39ad23..acf0c244141f 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -216,6 +216,8 @@ static unsigned int sr_get_events(struct scsi_device *sdev) return DISK_EVENT_EJECT_REQUEST; else if (med->media_event_code == 2) return DISK_EVENT_MEDIA_CHANGE; + else if (med->media_event_code == 3) + return DISK_EVENT_EJECT_REQUEST; return 0; }
From: Christian König christian.koenig@amd.com
[ Upstream commit d330099115597bbc238d6758a4930e72b49ea9ba ]
AGP for example doesn't have a dma_address array.
Signed-off-by: Christian König christian.koenig@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christia... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7214022dfb91..d230536e7086 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -512,7 +512,7 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; int i;
- if (!ttm_dma) + if (!ttm_dma || !ttm_dma->dma_address) return;
/* Don't waste time looping if the object is coherent */ @@ -532,7 +532,7 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; int i;
- if (!ttm_dma) + if (!ttm_dma || !ttm_dma->dma_address) return;
/* Don't waste time looping if the object is coherent */
From: Tahsin Erdogan trdgn@amazon.com
Mainline commit ce9f24cccdc0 ("ext4: check journal inode extents more carefully") enabled validity checks for journal inode's data blocks. This change got ported to stable branches, but the backport for 4.19 has a bug where it will flag an error even when system block entry's inode number matches journal inode.
The way error is reported is also problematic because it updates the superblock without following journaling rules. This may result in superblock checksum errors if the superblock is in the process of being committed but has a previously calculated checksum that doesn't include the bogus error update.
This patch eliminates the bogus error by trying to match how other backports were implemented, which is to flag an error only when inode numbers mismatch.
Fixes: commit a75a5d163857 ("ext4: check journal inode extents more carefully") Signed-off-by: Tahsin Erdogan trdgn@amazon.com Cc: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/block_validity.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -171,8 +171,10 @@ static int ext4_data_block_valid_rcu(str else if (start_blk >= (entry->start_blk + entry->count)) n = n->rb_right; else { + if (entry->ino == ino) + return 1; sbi->s_es->s_last_error_block = cpu_to_le64(start_blk); - return entry->ino == ino; + return 0; } } return 1;
From: David Rientjes rientjes@google.com
commit 7be74942f184fdfba34ddd19a0d995deb34d4a03 upstream.
There may be many encrypted regions that need to be unregistered when a SEV VM is destroyed. This can lead to soft lockups. For example, on a host running 4.15:
watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] CPU: 206 PID: 194348 Comm: t_virtual_machi RIP: 0010:free_unref_page_list+0x105/0x170 ... Call Trace: [<0>] release_pages+0x159/0x3d0 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] [<0>] kvm_arch_destroy_vm+0x47/0x200 [<0>] kvm_put_kvm+0x1a8/0x2f0 [<0>] kvm_vm_release+0x25/0x30 [<0>] do_exit+0x335/0xc10 [<0>] do_group_exit+0x3f/0xa0 [<0>] get_signal+0x1bc/0x670 [<0>] do_signal+0x31/0x130
Although the CLFLUSH is no longer issued on every encrypted region to be unregistered, there are no other changes that can prevent soft lockups for very large SEV VMs in the latest kernel.
Periodically schedule if necessary. This still holds kvm->lock across the resched, but since this only happens when the VM is destroyed this is assumed to be acceptable.
Signed-off-by: David Rientjes rientjes@google.com Message-Id: alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com [iwamatsu: adjust filename.] Reference: CVE-2020-36311 Signed-off-by: Nobuhiro Iwamatsu (CIP) nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1954,6 +1954,7 @@ static void sev_vm_destroy(struct kvm *k list_for_each_safe(pos, q, head) { __unregister_enc_region_locked(kvm, list_entry(pos, struct enc_region, list)); + cond_resched(); } }
From: Anson Huang Anson.Huang@nxp.com
commit 4521de30fbb3f5be0db58de93582ebce72c9d44f upstream.
The vdd3p0 LDO's input should be from external USB VBUS directly, NOT PMIC's power supply, the vdd3p0 LDO's target output voltage can be controlled by SW, and it requires input voltage to be high enough, with incorrect power supply assigned, if the power supply's voltage is lower than the LDO target output voltage, it will return fail and skip the LDO voltage adjustment, so remove the power supply assignment for vdd3p0 to avoid such scenario.
Fixes: 93385546ba36 ("ARM: dts: imx6qdl-sabresd: Assign corresponding power supply for LDOs") Signed-off-by: Anson Huang Anson.Huang@nxp.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Nobuhiro Iwamatsu (CIP) nobuhiro1.iwamatsu@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 ---- 1 file changed, 4 deletions(-)
--- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi +++ b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi @@ -675,10 +675,6 @@ vin-supply = <&vgen5_reg>; };
-®_vdd3p0 { - vin-supply = <&sw2_reg>; -}; - ®_vdd2p5 { vin-supply = <&vgen5_reg>; };
From: Petr Mladek pmladek@suse.com
commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream.
Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()".
This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling.
This patch (of 2):
Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().
It does not modify the existing behavior.
Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek pmladek@suse.com Cc: jenhaochen@google.com Cc: Martin Liu liumartin@google.com Cc: Minchan Kim minchan@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Oleg Nesterov oleg@redhat.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/kthread.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-)
--- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1010,6 +1010,33 @@ void kthread_flush_work(struct kthread_w EXPORT_SYMBOL_GPL(kthread_flush_work);
/* + * Make sure that the timer is neither set nor running and could + * not manipulate the work list_head any longer. + * + * The function is called under worker->lock. The lock is temporary + * released but the timer can't be set again in the meantime. + */ +static void kthread_cancel_delayed_work_timer(struct kthread_work *work, + unsigned long *flags) +{ + struct kthread_delayed_work *dwork = + container_of(work, struct kthread_delayed_work, work); + struct kthread_worker *worker = work->worker; + + /* + * del_timer_sync() must be called to make sure that the timer + * callback is not running. The lock must be temporary released + * to avoid a deadlock with the callback. In the meantime, + * any queuing is blocked by setting the canceling counter. + */ + work->canceling++; + spin_unlock_irqrestore(&worker->lock, *flags); + del_timer_sync(&dwork->timer); + spin_lock_irqsave(&worker->lock, *flags); + work->canceling--; +} + +/* * This function removes the work from the worker queue. Also it makes sure * that it won't get queued later via the delayed work's timer. * @@ -1023,23 +1050,8 @@ static bool __kthread_cancel_work(struct unsigned long *flags) { /* Try to cancel the timer if exists. */ - if (is_dwork) { - struct kthread_delayed_work *dwork = - container_of(work, struct kthread_delayed_work, work); - struct kthread_worker *worker = work->worker; - - /* - * del_timer_sync() must be called to make sure that the timer - * callback is not running. The lock must be temporary released - * to avoid a deadlock with the callback. In the meantime, - * any queuing is blocked by setting the canceling counter. - */ - work->canceling++; - spin_unlock_irqrestore(&worker->lock, *flags); - del_timer_sync(&dwork->timer); - spin_lock_irqsave(&worker->lock, *flags); - work->canceling--; - } + if (is_dwork) + kthread_cancel_delayed_work_timer(work, flags);
/* * Try to remove the work from a worker list. It might either
From: Petr Mladek pmladek@suse.com
commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.
The system might hang with the following backtrace:
schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec
It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync():
CPU0 CPU1
Context: Thread A Context: Thread B
kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock()
work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list
spin_unlock()
kthread_flush_work() // flush_work is put at the tail of the dwork
wait_for_completion()
Context: IRQ
kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock()
BANG: flush_work is not longer linked and will never get proceed.
The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer.
A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay.
The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced.
The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock.
Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock.
Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek pmladek@suse.com Reported-by: Martin Liu liumartin@google.com Cc: jenhaochen@google.com Cc: Minchan Kim minchan@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Oleg Nesterov oleg@redhat.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/kthread.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)
--- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1037,8 +1037,11 @@ static void kthread_cancel_delayed_work_ }
/* - * This function removes the work from the worker queue. Also it makes sure - * that it won't get queued later via the delayed work's timer. + * This function removes the work from the worker queue. + * + * It is called under worker->lock. The caller must make sure that + * the timer used by delayed work is not running, e.g. by calling + * kthread_cancel_delayed_work_timer(). * * The work might still be in use when this function finishes. See the * current_work proceed by the worker. @@ -1046,13 +1049,8 @@ static void kthread_cancel_delayed_work_ * Return: %true if @work was pending and successfully canceled, * %false if @work was not pending */ -static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, - unsigned long *flags) +static bool __kthread_cancel_work(struct kthread_work *work) { - /* Try to cancel the timer if exists. */ - if (is_dwork) - kthread_cancel_delayed_work_timer(work, flags); - /* * Try to remove the work from a worker list. It might either * be from worker->work_list or from worker->delayed_work_list. @@ -1105,11 +1103,23 @@ bool kthread_mod_delayed_work(struct kth /* Work must not be used with >1 worker, see kthread_queue_work() */ WARN_ON_ONCE(work->worker != worker);
- /* Do not fight with another command that is canceling this work. */ + /* + * Temporary cancel the work but do not fight with another command + * that is canceling the work as well. + * + * It is a bit tricky because of possible races with another + * mod_delayed_work() and cancel_delayed_work() callers. + * + * The timer must be canceled first because worker->lock is released + * when doing so. But the work can be removed from the queue (list) + * only when it can be queued again so that the return value can + * be used for reference counting. + */ + kthread_cancel_delayed_work_timer(work, &flags); if (work->canceling) goto out; + ret = __kthread_cancel_work(work);
- ret = __kthread_cancel_work(work, true, &flags); fast_queue: __kthread_queue_delayed_work(worker, dwork, delay); out: @@ -1131,7 +1141,10 @@ static bool __kthread_cancel_work_sync(s /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker);
- ret = __kthread_cancel_work(work, is_dwork, &flags); + if (is_dwork) + kthread_cancel_delayed_work_timer(work, &flags); + + ret = __kthread_cancel_work(work);
if (worker->current_work != work) goto out_fast;
From: Juergen Gross jgross@suse.com
commit 3de218ff39b9e3f0d453fe3154f12a174de44b25 upstream.
In order to avoid a race condition for user events when changing cpu affinity reset the active flag only when EOI-ing the event.
This is working fine as all user events are lateeoi events. Note that lateeoi_ack_mask_dynirq() is not modified as there is no explicit call to xen_irq_lateeoi() expected later.
Cc: stable@vger.kernel.org Reported-by: Julien Grall julien@xen.org Fixes: b6622798bc50b62 ("xen/events: avoid handling the same event on two cpus at the same time") Tested-by: Julien Grall julien@xen.org Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Boris Ostrovsky boris.ostrvsky@oracle.com Link: https://lore.kernel.org/r/20210623130913.9405-1-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/events/events_base.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
--- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -524,6 +524,9 @@ static void xen_irq_lateeoi_locked(struc }
info->eoi_time = 0; + + /* is_active hasn't been reset yet, do it now. */ + smp_store_release(&info->is_active, 0); do_unmask(info, EVT_MASK_REASON_EOI_PENDING); }
@@ -1780,10 +1783,22 @@ static void lateeoi_ack_dynirq(struct ir struct irq_info *info = info_for_irq(data->irq); evtchn_port_t evtchn = info ? info->evtchn : 0;
- if (VALID_EVTCHN(evtchn)) { - do_mask(info, EVT_MASK_REASON_EOI_PENDING); - ack_dynirq(data); - } + if (!VALID_EVTCHN(evtchn)) + return; + + do_mask(info, EVT_MASK_REASON_EOI_PENDING); + + if (unlikely(irqd_is_setaffinity_pending(data)) && + likely(!irqd_irq_disabled(data))) { + do_mask(info, EVT_MASK_REASON_TEMPORARY); + + clear_evtchn(evtchn); + + irq_move_masked_irq(data); + + do_unmask(info, EVT_MASK_REASON_TEMPORARY); + } else + clear_evtchn(evtchn); }
static void lateeoi_mask_ack_dynirq(struct irq_data *data)
From: Alper Gun alpergun@google.com
commit 934002cd660b035b926438244b4294e647507e13 upstream.
Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding fails. If a failure happens after a successful LAUNCH_START command, a decommission command should be executed. Otherwise, guest context will be unfreed inside the AMD SP. After the firmware will not have memory to allocate more SEV guest context, LAUNCH_START command will begin to fail with SEV_RET_RESOURCE_LIMIT error.
The existing code calls decommission inside sev_unbind_asid, but it is not called if a failure happens before guest activation succeeds. If sev_bind_asid fails, decommission is never called. PSP firmware has a limit for the number of guests. If sev_asid_binding fails many times, PSP firmware will not have resources to create another guest context.
Cc: stable@vger.kernel.org Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Reported-by: Peter Gonda pgonda@google.com Signed-off-by: Alper Gun alpergun@google.com Reviewed-by: Marc Orr marcorr@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Message-Id: 20210610174604.2554090-1-alpergun@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/svm.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1791,9 +1791,25 @@ static void sev_asid_free(struct kvm *kv __sev_asid_free(sev->asid); }
-static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +static void sev_decommission(unsigned int handle) { struct sev_data_decommission *decommission; + + if (!handle) + return; + + decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); + if (!decommission) + return; + + decommission->handle = handle; + sev_guest_decommission(decommission, NULL); + + kfree(decommission); +} + +static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +{ struct sev_data_deactivate *data;
if (!handle) @@ -1811,15 +1827,7 @@ static void sev_unbind_asid(struct kvm * sev_guest_df_flush(NULL); kfree(data);
- decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); - if (!decommission) - return; - - /* decommission handle */ - decommission->handle = handle; - sev_guest_decommission(decommission, NULL); - - kfree(decommission); + sev_decommission(handle); }
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, @@ -6469,8 +6477,10 @@ static int sev_launch_start(struct kvm *
/* Bind ASID to this guest */ ret = sev_bind_asid(kvm, start->handle, error); - if (ret) + if (ret) { + sev_decommission(start->handle); goto e_free_session; + }
/* return handle to userspace */ params.handle = start->handle;
From: afzal mohammed afzal.mohd.ma@gmail.com
commit b75ca5217743e4d7076cf65e044e88389e44318d upstream.
request_irq() is preferred over setup_irq(). Invocations of setup_irq() occur after memory allocators are ready.
Per tglx[1], setup_irq() existed in olden days when allocators were not ready by the time early interrupts were initialized.
Hence replace setup_irq() by request_irq().
[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org Signed-off-by: afzal mohammed afzal.mohd.ma@gmail.com Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-omap1/pm.c | 13 ++++++------- arch/arm/mach-omap1/time.c | 10 +++------- arch/arm/mach-omap1/timer32k.c | 10 +++------- arch/arm/mach-omap2/timer.c | 11 +++-------- 4 files changed, 15 insertions(+), 29 deletions(-)
--- a/arch/arm/mach-omap1/pm.c +++ b/arch/arm/mach-omap1/pm.c @@ -610,11 +610,6 @@ static irqreturn_t omap_wakeup_interrupt return IRQ_HANDLED; }
-static struct irqaction omap_wakeup_irq = { - .name = "peripheral wakeup", - .handler = omap_wakeup_interrupt -}; -
static const struct platform_suspend_ops omap_pm_ops = { @@ -627,6 +622,7 @@ static const struct platform_suspend_ops static int __init omap_pm_init(void) { int error = 0; + int irq;
if (!cpu_class_is_omap1()) return -ENODEV; @@ -670,9 +666,12 @@ static int __init omap_pm_init(void) arm_pm_idle = omap1_pm_idle;
if (cpu_is_omap7xx()) - setup_irq(INT_7XX_WAKE_UP_REQ, &omap_wakeup_irq); + irq = INT_7XX_WAKE_UP_REQ; else if (cpu_is_omap16xx()) - setup_irq(INT_1610_WAKE_UP_REQ, &omap_wakeup_irq); + irq = INT_1610_WAKE_UP_REQ; + if (request_irq(irq, omap_wakeup_interrupt, 0, "peripheral wakeup", + NULL)) + pr_err("Failed to request irq %d (peripheral wakeup)\n", irq);
/* Program new power ramp-up time * (0 for most boards since we don't lower voltage when in deep sleep) --- a/arch/arm/mach-omap1/time.c +++ b/arch/arm/mach-omap1/time.c @@ -155,15 +155,11 @@ static irqreturn_t omap_mpu_timer1_inter return IRQ_HANDLED; }
-static struct irqaction omap_mpu_timer1_irq = { - .name = "mpu_timer1", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap_mpu_timer1_interrupt, -}; - static __init void omap_init_mpu_timer(unsigned long rate) { - setup_irq(INT_TIMER1, &omap_mpu_timer1_irq); + if (request_irq(INT_TIMER1, omap_mpu_timer1_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "mpu_timer1", NULL)) + pr_err("Failed to request irq %d (mpu_timer1)\n", INT_TIMER1); omap_mpu_timer_start(0, (rate / HZ) - 1, 1);
clockevent_mpu_timer1.cpumask = cpumask_of(0); --- a/arch/arm/mach-omap1/timer32k.c +++ b/arch/arm/mach-omap1/timer32k.c @@ -148,15 +148,11 @@ static irqreturn_t omap_32k_timer_interr return IRQ_HANDLED; }
-static struct irqaction omap_32k_timer_irq = { - .name = "32KHz timer", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap_32k_timer_interrupt, -}; - static __init void omap_init_32k_timer(void) { - setup_irq(INT_OS_TIMER, &omap_32k_timer_irq); + if (request_irq(INT_OS_TIMER, omap_32k_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "32KHz timer", NULL)) + pr_err("Failed to request irq %d(32KHz timer)\n", INT_OS_TIMER);
clockevent_32k_timer.cpumask = cpumask_of(0); clockevents_config_and_register(&clockevent_32k_timer, --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -92,12 +92,6 @@ static irqreturn_t omap2_gp_timer_interr return IRQ_HANDLED; }
-static struct irqaction omap2_gp_timer_irq = { - .name = "gp_timer", - .flags = IRQF_TIMER | IRQF_IRQPOLL, - .handler = omap2_gp_timer_interrupt, -}; - static int omap2_gp_timer_set_next_event(unsigned long cycles, struct clock_event_device *evt) { @@ -383,8 +377,9 @@ static void __init omap2_gp_clockevent_i &clockevent_gpt.name, OMAP_TIMER_POSTED); BUG_ON(res);
- omap2_gp_timer_irq.dev_id = &clkev; - setup_irq(clkev.irq, &omap2_gp_timer_irq); + if (request_irq(clkev.irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", &clkev)) + pr_err("Failed to request irq %d (gp_timer)\n", clkev.irq);
__omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW);
From: Tony Lindgren tony@atomide.com
commit 52762fbd1c4778ac9b173624ca0faacd22ef4724 upstream.
We can move the TI dmtimer clockevent and clocksource to live under drivers/clocksource if we rely only on the clock framework, and handle the module configuration directly in the clocksource driver based on the device tree data.
This removes the early dependency with system timers to the interconnect related code, and we can probe pretty much everything else later on at the module_init level.
Let's first add a new driver for timer-ti-dm-systimer based on existing arch/arm/mach-omap2/timer.c. Then let's start moving SoCs to probe with device tree data while still keeping the old timer.c. And eventually we can just drop the old timer.c.
Let's take the opportunity to switch to use readl/writel as pointed out by Daniel Lezcano daniel.lezcano@linaro.org. This allows further clean-up of the timer-ti-dm code the a lot of the shared helpers can just become static to the non-syster related code.
Note the boards can optionally configure different timer source clocks if needed with assigned-clocks and assigned-clock-parents.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-omap2/timer.c | 109 ++++++++++++++++++++++++++------------------ 1 file changed, 65 insertions(+), 44 deletions(-)
--- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -64,15 +64,28 @@
/* Clockevent code */
-static struct omap_dm_timer clkev; -static struct clock_event_device clockevent_gpt; - /* Clockevent hwmod for am335x and am437x suspend */ static struct omap_hwmod *clockevent_gpt_hwmod;
/* Clockesource hwmod for am437x suspend */ static struct omap_hwmod *clocksource_gpt_hwmod;
+struct dmtimer_clockevent { + struct clock_event_device dev; + struct omap_dm_timer timer; +}; + +static struct dmtimer_clockevent clockevent; + +static struct omap_dm_timer *to_dmtimer(struct clock_event_device *clockevent) +{ + struct dmtimer_clockevent *clkevt = + container_of(clockevent, struct dmtimer_clockevent, dev); + struct omap_dm_timer *timer = &clkevt->timer; + + return timer; +} + #ifdef CONFIG_SOC_HAS_REALTIME_COUNTER static unsigned long arch_timer_freq;
@@ -84,10 +97,11 @@ void set_cntfreq(void)
static irqreturn_t omap2_gp_timer_interrupt(int irq, void *dev_id) { - struct clock_event_device *evt = &clockevent_gpt; - - __omap_dm_timer_write_status(&clkev, OMAP_TIMER_INT_OVERFLOW); + struct dmtimer_clockevent *clkevt = dev_id; + struct clock_event_device *evt = &clkevt->dev; + struct omap_dm_timer *timer = &clkevt->timer;
+ __omap_dm_timer_write_status(timer, OMAP_TIMER_INT_OVERFLOW); evt->event_handler(evt); return IRQ_HANDLED; } @@ -95,7 +109,9 @@ static irqreturn_t omap2_gp_timer_interr static int omap2_gp_timer_set_next_event(unsigned long cycles, struct clock_event_device *evt) { - __omap_dm_timer_load_start(&clkev, OMAP_TIMER_CTRL_ST, + struct omap_dm_timer *timer = to_dmtimer(evt); + + __omap_dm_timer_load_start(timer, OMAP_TIMER_CTRL_ST, 0xffffffff - cycles, OMAP_TIMER_POSTED);
return 0; @@ -103,22 +119,26 @@ static int omap2_gp_timer_set_next_event
static int omap2_gp_timer_shutdown(struct clock_event_device *evt) { - __omap_dm_timer_stop(&clkev, OMAP_TIMER_POSTED, clkev.rate); + struct omap_dm_timer *timer = to_dmtimer(evt); + + __omap_dm_timer_stop(timer, OMAP_TIMER_POSTED, timer->rate); + return 0; }
static int omap2_gp_timer_set_periodic(struct clock_event_device *evt) { + struct omap_dm_timer *timer = to_dmtimer(evt); u32 period;
- __omap_dm_timer_stop(&clkev, OMAP_TIMER_POSTED, clkev.rate); + __omap_dm_timer_stop(timer, OMAP_TIMER_POSTED, timer->rate);
- period = clkev.rate / HZ; + period = timer->rate / HZ; period -= 1; /* Looks like we need to first set the load value separately */ - __omap_dm_timer_write(&clkev, OMAP_TIMER_LOAD_REG, 0xffffffff - period, + __omap_dm_timer_write(timer, OMAP_TIMER_LOAD_REG, 0xffffffff - period, OMAP_TIMER_POSTED); - __omap_dm_timer_load_start(&clkev, + __omap_dm_timer_load_start(timer, OMAP_TIMER_CTRL_AR | OMAP_TIMER_CTRL_ST, 0xffffffff - period, OMAP_TIMER_POSTED); return 0; @@ -132,26 +152,17 @@ static void omap_clkevt_idle(struct cloc omap_hwmod_idle(clockevent_gpt_hwmod); }
-static void omap_clkevt_unidle(struct clock_event_device *unused) +static void omap_clkevt_unidle(struct clock_event_device *evt) { + struct omap_dm_timer *timer = to_dmtimer(evt); + if (!clockevent_gpt_hwmod) return;
omap_hwmod_enable(clockevent_gpt_hwmod); - __omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW); + __omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW); }
-static struct clock_event_device clockevent_gpt = { - .features = CLOCK_EVT_FEAT_PERIODIC | - CLOCK_EVT_FEAT_ONESHOT, - .rating = 300, - .set_next_event = omap2_gp_timer_set_next_event, - .set_state_shutdown = omap2_gp_timer_shutdown, - .set_state_periodic = omap2_gp_timer_set_periodic, - .set_state_oneshot = omap2_gp_timer_shutdown, - .tick_resume = omap2_gp_timer_shutdown, -}; - static const struct of_device_id omap_timer_match[] __initconst = { { .compatible = "ti,omap2420-timer", }, { .compatible = "ti,omap3430-timer", }, @@ -361,44 +372,54 @@ static void __init omap2_gp_clockevent_i const char *fck_source, const char *property) { + struct dmtimer_clockevent *clkevt = &clockevent; + struct omap_dm_timer *timer = &clkevt->timer; int res;
- clkev.id = gptimer_id; - clkev.errata = omap_dm_timer_get_errata(); + timer->id = gptimer_id; + timer->errata = omap_dm_timer_get_errata(); + clkevt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; + clkevt->dev.rating = 300; + clkevt->dev.set_next_event = omap2_gp_timer_set_next_event; + clkevt->dev.set_state_shutdown = omap2_gp_timer_shutdown; + clkevt->dev.set_state_periodic = omap2_gp_timer_set_periodic; + clkevt->dev.set_state_oneshot = omap2_gp_timer_shutdown; + clkevt->dev.tick_resume = omap2_gp_timer_shutdown;
/* * For clock-event timers we never read the timer counter and * so we are not impacted by errata i103 and i767. Therefore, * we can safely ignore this errata for clock-event timers. */ - __omap_dm_timer_override_errata(&clkev, OMAP_TIMER_ERRATA_I103_I767); + __omap_dm_timer_override_errata(timer, OMAP_TIMER_ERRATA_I103_I767);
- res = omap_dm_timer_init_one(&clkev, fck_source, property, - &clockevent_gpt.name, OMAP_TIMER_POSTED); + res = omap_dm_timer_init_one(timer, fck_source, property, + &clkevt->dev.name, OMAP_TIMER_POSTED); BUG_ON(res);
- if (request_irq(clkev.irq, omap2_gp_timer_interrupt, - IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", &clkev)) - pr_err("Failed to request irq %d (gp_timer)\n", clkev.irq); - - __omap_dm_timer_int_enable(&clkev, OMAP_TIMER_INT_OVERFLOW); - - clockevent_gpt.cpumask = cpu_possible_mask; - clockevent_gpt.irq = omap_dm_timer_get_irq(&clkev); - clockevents_config_and_register(&clockevent_gpt, clkev.rate, + clkevt->dev.cpumask = cpu_possible_mask; + clkevt->dev.irq = omap_dm_timer_get_irq(timer); + + if (request_irq(timer->irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", clkevt)) + pr_err("Failed to request irq %d (gp_timer)\n", timer->irq); + + __omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW); + + clockevents_config_and_register(&clkevt->dev, timer->rate, 3, /* Timer internal resynch latency */ 0xffffffff);
if (soc_is_am33xx() || soc_is_am43xx()) { - clockevent_gpt.suspend = omap_clkevt_idle; - clockevent_gpt.resume = omap_clkevt_unidle; + clkevt->dev.suspend = omap_clkevt_idle; + clkevt->dev.resume = omap_clkevt_unidle;
clockevent_gpt_hwmod = - omap_hwmod_lookup(clockevent_gpt.name); + omap_hwmod_lookup(clkevt->dev.name); }
- pr_info("OMAP clockevent source: %s at %lu Hz\n", clockevent_gpt.name, - clkev.rate); + pr_info("OMAP clockevent source: %s at %lu Hz\n", clkevt->dev.name, + timer->rate); }
/* Clocksource code */
From: Tony Lindgren tony@atomide.com
commit 3efe7a878a11c13b5297057bfc1e5639ce1241ce upstream.
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days.
To work around the issue, we need to use timer-ti-dm timers instead.
Let's prepare for adding support for percpu timers by adding a common dmtimer_clkevt_init_common() and call it from __omap_sync32k_timer_init(). This patch makes no intentional functional changes.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-omap2/timer.c | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-)
--- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -368,18 +368,21 @@ void tick_broadcast(const struct cpumask } #endif
-static void __init omap2_gp_clockevent_init(int gptimer_id, - const char *fck_source, - const char *property) +static void __init dmtimer_clkevt_init_common(struct dmtimer_clockevent *clkevt, + int gptimer_id, + const char *fck_source, + unsigned int features, + const struct cpumask *cpumask, + const char *property, + int rating, const char *name) { - struct dmtimer_clockevent *clkevt = &clockevent; struct omap_dm_timer *timer = &clkevt->timer; int res;
timer->id = gptimer_id; timer->errata = omap_dm_timer_get_errata(); - clkevt->dev.features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT; - clkevt->dev.rating = 300; + clkevt->dev.features = features; + clkevt->dev.rating = rating; clkevt->dev.set_next_event = omap2_gp_timer_set_next_event; clkevt->dev.set_state_shutdown = omap2_gp_timer_shutdown; clkevt->dev.set_state_periodic = omap2_gp_timer_set_periodic; @@ -397,19 +400,15 @@ static void __init omap2_gp_clockevent_i &clkevt->dev.name, OMAP_TIMER_POSTED); BUG_ON(res);
- clkevt->dev.cpumask = cpu_possible_mask; + clkevt->dev.cpumask = cpumask; clkevt->dev.irq = omap_dm_timer_get_irq(timer);
- if (request_irq(timer->irq, omap2_gp_timer_interrupt, - IRQF_TIMER | IRQF_IRQPOLL, "gp_timer", clkevt)) - pr_err("Failed to request irq %d (gp_timer)\n", timer->irq); + if (request_irq(clkevt->dev.irq, omap2_gp_timer_interrupt, + IRQF_TIMER | IRQF_IRQPOLL, name, clkevt)) + pr_err("Failed to request irq %d (gp_timer)\n", clkevt->dev.irq);
__omap_dm_timer_int_enable(timer, OMAP_TIMER_INT_OVERFLOW);
- clockevents_config_and_register(&clkevt->dev, timer->rate, - 3, /* Timer internal resynch latency */ - 0xffffffff); - if (soc_is_am33xx() || soc_is_am43xx()) { clkevt->dev.suspend = omap_clkevt_idle; clkevt->dev.resume = omap_clkevt_unidle; @@ -559,7 +558,12 @@ static void __init __omap_sync32k_timer_ { omap_clk_init(); omap_dmtimer_init(); - omap2_gp_clockevent_init(clkev_nr, clkev_src, clkev_prop); + dmtimer_clkevt_init_common(&clockevent, clkev_nr, clkev_src, + CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, + cpu_possible_mask, clkev_prop, 300, "clockevent"); + clockevents_config_and_register(&clockevent.dev, clockevent.timer.rate, + 3, /* Timer internal resynch latency */ + 0xffffffff);
/* Enable the use of clocksource="gp_timer" kernel parameter */ if (use_gptimer_clksrc || gptimer)
From: Tony Lindgren tony@atomide.com
commit 25de4ce5ed02994aea8bc111d133308f6fd62566 upstream.
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days.
To work around the issue, we need to use timer-ti-dm percpu timers instead.
Let's configure dmtimer3 and 4 as percpu timers by default, and warn about the issue if the dtb is not configured properly.
For more information, please see the errata for "AM572x Sitara Processors Silicon Revisions 1.1, 2.0":
https://www.ti.com/lit/er/sprz429m/sprz429m.pdf
The concept is based on earlier reference patches done by Tero Kristo and Keerthy.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Keerthy j-keerthy@ti.com Cc: Tero Kristo kristo@kernel.org [tony@atomide.com: backported to 4.19.y] Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/dra7.dtsi | 11 +++++++ arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 53 +++++++++++++++++++++++++++++++++++- drivers/clk/ti/clk-7xx.c | 1 include/linux/cpuhotplug.h | 1 5 files changed, 67 insertions(+), 3 deletions(-)
--- a/arch/arm/boot/dts/dra7.dtsi +++ b/arch/arm/boot/dts/dra7.dtsi @@ -48,6 +48,7 @@
timer { compatible = "arm,armv7-timer"; + status = "disabled"; /* See ARM architected timer wrap erratum i940 */ interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, <GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(2) | IRQ_TYPE_LEVEL_LOW)>, @@ -910,6 +911,8 @@ reg = <0x48032000 0x80>; interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>; ti,hwmods = "timer2"; + clock-names = "fck"; + clocks = <&l4per_clkctrl DRA7_TIMER2_CLKCTRL 24>; };
timer3: timer@48034000 { @@ -917,6 +920,10 @@ reg = <0x48034000 0x80>; interrupts = <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>; ti,hwmods = "timer3"; + clock-names = "fck"; + clocks = <&l4per_clkctrl DRA7_TIMER3_CLKCTRL 24>; + assigned-clocks = <&l4per_clkctrl DRA7_TIMER3_CLKCTRL 24>; + assigned-clock-parents = <&timer_sys_clk_div>; };
timer4: timer@48036000 { @@ -924,6 +931,10 @@ reg = <0x48036000 0x80>; interrupts = <GIC_SPI 35 IRQ_TYPE_LEVEL_HIGH>; ti,hwmods = "timer4"; + clock-names = "fck"; + clocks = <&l4per_clkctrl DRA7_TIMER4_CLKCTRL 24>; + assigned-clocks = <&l4per_clkctrl DRA7_TIMER4_CLKCTRL 24>; + assigned-clock-parents = <&timer_sys_clk_div>; };
timer5: timer@48820000 { --- a/arch/arm/mach-omap2/board-generic.c +++ b/arch/arm/mach-omap2/board-generic.c @@ -330,7 +330,7 @@ DT_MACHINE_START(DRA74X_DT, "Generic DRA .init_late = dra7xx_init_late, .init_irq = omap_gic_of_init, .init_machine = omap_generic_init, - .init_time = omap5_realtime_timer_init, + .init_time = omap3_gptimer_timer_init, .dt_compat = dra74x_boards_compat, .restart = omap44xx_restart, MACHINE_END @@ -353,7 +353,7 @@ DT_MACHINE_START(DRA72X_DT, "Generic DRA .init_late = dra7xx_init_late, .init_irq = omap_gic_of_init, .init_machine = omap_generic_init, - .init_time = omap5_realtime_timer_init, + .init_time = omap3_gptimer_timer_init, .dt_compat = dra72x_boards_compat, .restart = omap44xx_restart, MACHINE_END --- a/arch/arm/mach-omap2/timer.c +++ b/arch/arm/mach-omap2/timer.c @@ -42,6 +42,7 @@ #include <linux/platform_device.h> #include <linux/platform_data/dmtimer-omap.h> #include <linux/sched_clock.h> +#include <linux/cpu.h>
#include <asm/mach/time.h> #include <asm/smp_twd.h> @@ -421,6 +422,53 @@ static void __init dmtimer_clkevt_init_c timer->rate); }
+static DEFINE_PER_CPU(struct dmtimer_clockevent, dmtimer_percpu_timer); + +static int omap_gptimer_starting_cpu(unsigned int cpu) +{ + struct dmtimer_clockevent *clkevt = per_cpu_ptr(&dmtimer_percpu_timer, cpu); + struct clock_event_device *dev = &clkevt->dev; + struct omap_dm_timer *timer = &clkevt->timer; + + clockevents_config_and_register(dev, timer->rate, 3, ULONG_MAX); + irq_force_affinity(dev->irq, cpumask_of(cpu)); + + return 0; +} + +static int __init dmtimer_percpu_quirk_init(void) +{ + struct dmtimer_clockevent *clkevt; + struct clock_event_device *dev; + struct device_node *arm_timer; + struct omap_dm_timer *timer; + int cpu = 0; + + arm_timer = of_find_compatible_node(NULL, NULL, "arm,armv7-timer"); + if (of_device_is_available(arm_timer)) { + pr_warn_once("ARM architected timer wrap issue i940 detected\n"); + return 0; + } + + for_each_possible_cpu(cpu) { + clkevt = per_cpu_ptr(&dmtimer_percpu_timer, cpu); + dev = &clkevt->dev; + timer = &clkevt->timer; + + dmtimer_clkevt_init_common(clkevt, 0, "timer_sys_ck", + CLOCK_EVT_FEAT_ONESHOT, + cpumask_of(cpu), + "assigned-clock-parents", + 500, "percpu timer"); + } + + cpuhp_setup_state(CPUHP_AP_OMAP_DM_TIMER_STARTING, + "clockevents/omap/gptimer:starting", + omap_gptimer_starting_cpu, NULL); + + return 0; +} + /* Clocksource code */ static struct omap_dm_timer clksrc; static bool use_gptimer_clksrc __initdata; @@ -565,6 +613,9 @@ static void __init __omap_sync32k_timer_ 3, /* Timer internal resynch latency */ 0xffffffff);
+ if (soc_is_dra7xx()) + dmtimer_percpu_quirk_init(); + /* Enable the use of clocksource="gp_timer" kernel parameter */ if (use_gptimer_clksrc || gptimer) omap2_gptimer_clocksource_init(clksrc_nr, clksrc_src, @@ -592,7 +643,7 @@ void __init omap3_secure_sync32k_timer_i #endif /* CONFIG_ARCH_OMAP3 */
#if defined(CONFIG_ARCH_OMAP3) || defined(CONFIG_SOC_AM33XX) || \ - defined(CONFIG_SOC_AM43XX) + defined(CONFIG_SOC_AM43XX) || defined(CONFIG_SOC_DRA7XX) void __init omap3_gptimer_timer_init(void) { __omap_sync32k_timer_init(2, "timer_sys_ck", NULL, --- a/drivers/clk/ti/clk-7xx.c +++ b/drivers/clk/ti/clk-7xx.c @@ -733,6 +733,7 @@ const struct omap_clkctrl_data dra7_clkc static struct ti_dt_clk dra7xx_clks[] = { DT_CLK(NULL, "timer_32k_ck", "sys_32k_ck"), DT_CLK(NULL, "sys_clkin_ck", "timer_sys_clk_div"), + DT_CLK(NULL, "timer_sys_ck", "timer_sys_clk_div"), DT_CLK(NULL, "sys_clkin", "sys_clkin1"), DT_CLK(NULL, "atl_dpll_clk_mux", "atl_cm:0000:24"), DT_CLK(NULL, "atl_gfclk_mux", "atl_cm:0000:26"), --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -118,6 +118,7 @@ enum cpuhp_state { CPUHP_AP_ARM_L2X0_STARTING, CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_ARCH_TIMER_STARTING, + CPUHP_AP_OMAP_DM_TIMER_STARTING, CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, CPUHP_AP_JCORE_TIMER_STARTING, CPUHP_AP_ARM_TWD_STARTING,
On Fri, 09 Jul 2021 15:20:16 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.197-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.197-rc1
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
Tony Lindgren tony@atomide.com clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
afzal mohammed afzal.mohd.ma@gmail.com ARM: OMAP: replace setup_irq() by request_irq()
Alper Gun alpergun@google.com KVM: SVM: Call SEV Guest Decommission if ASID binding fails
Juergen Gross jgross@suse.com xen/events: reset active flag for lateeoi events later
Petr Mladek pmladek@suse.com kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Petr Mladek pmladek@suse.com kthread_worker: split code for canceling the delayed work timer
Anson Huang Anson.Huang@nxp.com ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment
David Rientjes rientjes@google.com KVM: SVM: Periodically schedule when unregistering regions on destroy
Tahsin Erdogan trdgn@amazon.com ext4: eliminate bogus error in ext4_data_block_valid_rcu()
Christian König christian.koenig@amd.com drm/nouveau: fix dma_address check for CPU/GPU sync
ManYi Li limanyi@uniontech.com scsi: sr: Return appropriate error code when disk is ejected
Hugh Dickins hughd@google.com mm, futex: fix shared futex pgoff on shmem huge page
Hugh Dickins hughd@google.com mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
Hugh Dickins hughd@google.com mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): get vma_address_end() earlier
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): use goto instead of while (1)
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): add a level of indentation
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): crossing page table boundary
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): settle PageHuge on entry
Hugh Dickins hughd@google.com mm: page_vma_mapped_walk(): use page for pvmw->page
Yang Shi shy828301@gmail.com mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
Hugh Dickins hughd@google.com mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
Jue Wang juew@google.com mm/thp: fix page_address_in_vma() on file THP tails
Hugh Dickins hughd@google.com mm/thp: fix vma_address() if virtual address below file offset
Hugh Dickins hughd@google.com mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Hugh Dickins hughd@google.com mm/thp: make is_huge_zero_pmd() safe and quicker
Hugh Dickins hughd@google.com mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
Miaohe Lin linmiaohe@huawei.com mm/rmap: use page_not_mapped in try_to_unmap()
Miaohe Lin linmiaohe@huawei.com mm/rmap: remove unneeded semicolon in page_not_mapped()
Alex Shi alex.shi@linux.alibaba.com mm: add VM_WARN_ON_ONCE_PAGE() macro
Diffstat:
Makefile | 4 +- arch/arm/boot/dts/dra7.dtsi | 11 ++ arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 - arch/arm/mach-omap1/pm.c | 13 ++- arch/arm/mach-omap1/time.c | 10 +- arch/arm/mach-omap1/timer32k.c | 10 +- arch/arm/mach-omap2/board-generic.c | 4 +- arch/arm/mach-omap2/timer.c | 181 +++++++++++++++++++++++---------- arch/x86/kvm/svm.c | 33 ++++-- drivers/clk/ti/clk-7xx.c | 1 + drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/scsi/sr.c | 2 + drivers/xen/events/events_base.c | 23 ++++- fs/ext4/block_validity.c | 4 +- include/linux/cpuhotplug.h | 1 + include/linux/huge_mm.h | 8 +- include/linux/hugetlb.h | 16 --- include/linux/mm.h | 3 + include/linux/mmdebug.h | 13 +++ include/linux/pagemap.h | 13 +-- include/linux/rmap.h | 3 +- kernel/futex.c | 2 +- kernel/kthread.c | 77 +++++++++----- mm/huge_memory.c | 56 +++++----- mm/hugetlb.c | 5 +- mm/internal.h | 53 +++++++--- mm/memory.c | 41 ++++++++ mm/page_vma_mapped.c | 160 ++++++++++++++++++----------- mm/pgtable-generic.c | 4 +- mm/rmap.c | 48 +++++---- mm/truncate.c | 43 ++++---- 31 files changed, 546 insertions(+), 304 deletions(-)
All tests passing for Tegra ...
Test results for stable-v4.19: 10 builds: 10 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 4.19.197-rc1-gdf520a4397b2 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 7/9/21 7:20 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.197-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
Hi Greg,
On Fri, Jul 09, 2021 at 03:20:16PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.1.1 20210702): 63 configs -> no failure arm (gcc version 11.1.1 20210702): 116 configs -> no new failure arm64 (gcc version 11.1.1 20210702): 2 configs -> no failure x86_64 (gcc version 10.2.1 20210110): 2 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression.
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Fri, 9 Jul 2021 at 18:51, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.197-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 4.19.197-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-4.19.y * git commit: df520a4397b2a281e4b2ac05b0c85f8875c3ea11 * git describe: v4.19.196-35-gdf520a4397b2 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19....
## No regressions (compared to v4.19.196-25-g7bbc9654862c)
## No fixes (compared to v4.19.196-25-g7bbc9654862c)
## Test result summary total: 82675, pass: 62074, fail: 2967, skip: 15186, xfail: 2448,
## Build Summary * arm: 97 total, 97 passed, 0 failed * arm64: 25 total, 25 passed, 0 failed * dragonboard-410c: 2 total, 2 passed, 0 failed * hi6220-hikey: 2 total, 2 passed, 0 failed * i386: 15 total, 15 passed, 0 failed * juno-r2: 2 total, 2 passed, 0 failed * mips: 39 total, 39 passed, 0 failed * s390: 9 total, 9 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 2 total, 2 passed, 0 failed * x86: 2 total, 2 passed, 0 failed * x86_64: 15 total, 15 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- * kselftest-x86 * kselftest-zram * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On Fri, Jul 09, 2021 at 03:20:16PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 424 pass: 424 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
Hi!
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
On 2021/7/9 21:20, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.197 release. There are 34 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.197-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Tested on arm64 and x86 for 4.19.197-rc1,
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-4.19.y Version: 4.19.197-rc1 Commit: df520a4397b2a281e4b2ac05b0c85f8875c3ea11 Compiler: gcc version 7.3.0 (GCC)
arm64: -------------------------------------------------------------------- Testcase Result Summary: total: 8858 passed: 8858 failed: 0 timeout: 0 --------------------------------------------------------------------
x86: -------------------------------------------------------------------- Testcase Result Summary: total: 8858 passed: 8858 failed: 0 timeout: 0 --------------------------------------------------------------------
Tested-by: Hulk Robot hulkrobot@huawei.com
linux-stable-mirror@lists.linaro.org