This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.275-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.275-rc1
Juergen Gross jgross@suse.com xen/events: reset active flag for lateeoi events later
Petr Mladek pmladek@suse.com kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Petr Mladek pmladek@suse.com kthread_worker: split code for canceling the delayed work timer
Christian König christian.koenig@amd.com drm/nouveau: fix dma_address check for CPU/GPU sync
ManYi Li limanyi@uniontech.com scsi: sr: Return appropriate error code when disk is ejected
Hugh Dickins hughd@google.com mm, futex: fix shared futex pgoff on shmem huge page
Yang Shi shy828301@gmail.com mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
Alex Shi alex.shi@linux.alibaba.com mm: add VM_WARN_ON_ONCE_PAGE() macro
Michal Hocko mhocko@kernel.org include/linux/mmdebug.h: make VM_WARN* non-rvals
-------------
Diffstat:
Makefile | 4 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/scsi/sr.c | 2 + drivers/xen/events/events_base.c | 23 +++++++++-- include/linux/hugetlb.h | 15 ------- include/linux/mmdebug.h | 21 ++++++++-- include/linux/pagemap.h | 13 +++--- kernel/futex.c | 2 +- kernel/kthread.c | 77 ++++++++++++++++++++++++------------ mm/huge_memory.c | 29 +++++--------- mm/hugetlb.c | 5 +-- 11 files changed, 112 insertions(+), 83 deletions(-)
From: Michal Hocko mhocko@kernel.org
[ Upstream commit 91241681c62a5a690c88eb2aca027f094125eaac ]
At present the construct
if (VM_WARN(...))
will compile OK with CONFIG_DEBUG_VM=y and will fail with CONFIG_DEBUG_VM=n. The reason is that VM_{WARN,BUG}* have always been special wrt. {WARN/BUG}* and never generate any code when DEBUG_VM is disabled. So we cannot really use it in conditionals.
We considered changing things so that this construct works in both cases but that might cause unwanted code generation with CONFIG_DEBUG_VM=n. It is safer and simpler to make the build fail in both cases.
[akpm@linux-foundation.org: changelog] Signed-off-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Stephen Rothwell sfr@canb.auug.org.au Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mmdebug.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 451a811f48f2..deaba1cc3cfc 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -36,10 +36,10 @@ void dump_mm(const struct mm_struct *mm); BUG(); \ } \ } while (0) -#define VM_WARN_ON(cond) WARN_ON(cond) -#define VM_WARN_ON_ONCE(cond) WARN_ON_ONCE(cond) -#define VM_WARN_ONCE(cond, format...) WARN_ONCE(cond, format) -#define VM_WARN(cond, format...) WARN(cond, format) +#define VM_WARN_ON(cond) (void)WARN_ON(cond) +#define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond) +#define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format) +#define VM_WARN(cond, format...) (void)WARN(cond, format) #else #define VM_BUG_ON(cond) BUILD_BUG_ON_INVALID(cond) #define VM_BUG_ON_PAGE(cond, page) VM_BUG_ON(cond)
From: Alex Shi alex.shi@linux.alibaba.com
[ Upstream commit a4055888629bc0467d12d912cd7c90acdf3d9b12 part ]
Add VM_WARN_ON_ONCE_PAGE() macro.
Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.a... Signed-off-by: Alex Shi alex.shi@linux.alibaba.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: original commit was titled mm/memcg: warning on !memcg after readahead page charged which included uses of this macro in mm/memcontrol.c: here omitted.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mmdebug.h | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index deaba1cc3cfc..d1fb3bbff37a 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -36,6 +36,18 @@ void dump_mm(const struct mm_struct *mm); BUG(); \ } \ } while (0) +#define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \ + static bool __section(".data.once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ + dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\ + __warned = true; \ + WARN_ON(1); \ + } \ + unlikely(__ret_warn_once); \ +}) + #define VM_WARN_ON(cond) (void)WARN_ON(cond) #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond) #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format) @@ -47,6 +59,7 @@ void dump_mm(const struct mm_struct *mm); #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond) #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond) +#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond) #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond) #endif
From: Yang Shi shy828301@gmail.com
[ Upstream commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 ]
When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion.
As this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE().
[1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi shy828301@gmail.com Reviewed-by: Zi Yan ziy@nvidia.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Hugh Dickins hughd@google.com Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: fixed up variables, split_queue_lock, tree_lock in split_huge_page_to_list(); adapted to early version of unmap_page().
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 14cd0ef33b62..177ca028b986 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1891,7 +1891,7 @@ static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED; - int i, ret; + int i;
VM_BUG_ON_PAGE(!PageHead(page), page);
@@ -1899,15 +1899,16 @@ static void unmap_page(struct page *page) ttu_flags |= TTU_MIGRATION;
/* We only need TTU_SPLIT_HUGE_PMD once */ - ret = try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD); - for (i = 1; !ret && i < HPAGE_PMD_NR; i++) { + try_to_unmap(page, ttu_flags | TTU_SPLIT_HUGE_PMD); + for (i = 1; i < HPAGE_PMD_NR; i++) { /* Cut short if the page is unmapped */ if (page_count(page) == 1) return;
- ret = try_to_unmap(page + i, ttu_flags); + try_to_unmap(page + i, ttu_flags); } - VM_BUG_ON_PAGE(ret, page + i - 1); + + VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); }
static void remap_page(struct page *page) @@ -2137,7 +2138,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; - int count, mapcount, extra_pins, ret; + int extra_pins, ret; bool mlocked; unsigned long flags; pgoff_t end; @@ -2200,7 +2201,6 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
mlocked = PageMlocked(page); unmap_page(head); - VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* Make sure the page is not on per-CPU pagevec as it takes pin */ if (mlocked) @@ -2226,9 +2226,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
/* Prevent deferred_split_scan() touching ->_refcount */ spin_lock(&pgdata->split_queue_lock); - count = page_count(head); - mapcount = total_mapcount(head); - if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { + if (page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { pgdata->split_queue_len--; list_del(page_deferred_list(head)); @@ -2239,16 +2237,9 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) __split_huge_page(page, list, end, flags); ret = 0; } else { - if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { - pr_alert("total_mapcount: %u, page_count(): %u\n", - mapcount, count); - if (PageTail(page)) - dump_page(head, NULL); - dump_page(page, "total_mapcount(head) > 0"); - BUG(); - } spin_unlock(&pgdata->split_queue_lock); -fail: if (mapping) +fail: + if (mapping) spin_unlock(&mapping->tree_lock); spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); remap_page(head);
From: Hugh Dickins hughd@google.com
[ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ]
If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong.
When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages.
page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head.
Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it.
[akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]
Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu neelnatu@google.com Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Zhang Yi wetpzy@gmail.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Darren Hart dvhart@infradead.org Cc: Davidlohr Bueso dave@stgolabs.net Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
Note on stable backport: leave redundant #include <linux/hugetlb.h> in kernel/futex.c, to avoid conflict over the header files included. Resolved trivial conflicts in include/linux/hugetlb.h.
Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/hugetlb.h | 15 --------------- include/linux/pagemap.h | 13 +++++++------ kernel/futex.c | 2 +- mm/hugetlb.c | 5 +---- 4 files changed, 9 insertions(+), 26 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8dd365c65478..6417bc845db5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -451,17 +451,6 @@ static inline int hstate_index(struct hstate *h) return h - hstates; }
-pgoff_t __basepage_index(struct page *page); - -/* Return page->index in PAGE_SIZE units */ -static inline pgoff_t basepage_index(struct page *page) -{ - if (!PageCompound(page)) - return page->index; - - return __basepage_index(page); -} - extern int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn); static inline bool hugepage_migration_supported(struct hstate *h) @@ -529,10 +518,6 @@ static inline unsigned int pages_per_huge_page(struct hstate *h) #define hstate_index_to_shift(index) 0 #define hstate_index(h) 0
-static inline pgoff_t basepage_index(struct page *page) -{ - return page->index; -} #define dissolve_free_huge_pages(s, e) 0 #define hugepage_migration_supported(h) false
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 35f4c4d9c405..8672291633dd 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -374,7 +374,7 @@ static inline struct page *read_mapping_page(struct address_space *mapping, }
/* - * Get index of the page with in radix-tree + * Get index of the page within radix-tree (but not for hugetlb pages). * (TODO: remove once hugetlb pages will have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_index(struct page *page) @@ -393,15 +393,16 @@ static inline pgoff_t page_to_index(struct page *page) return pgoff; }
+extern pgoff_t hugetlb_basepage_index(struct page *page); + /* - * Get the offset in PAGE_SIZE. - * (TODO: hugepage should have ->index in PAGE_SIZE) + * Get the offset in PAGE_SIZE (even for hugetlb pages). + * (TODO: hugetlb pages should have ->index in PAGE_SIZE) */ static inline pgoff_t page_to_pgoff(struct page *page) { - if (unlikely(PageHeadHuge(page))) - return page->index << compound_order(page); - + if (unlikely(PageHuge(page))) + return hugetlb_basepage_index(page); return page_to_index(page); }
diff --git a/kernel/futex.c b/kernel/futex.c index 324fb85c8904..b3823736af6f 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -717,7 +717,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
key->both.offset |= FUT_OFF_INODE; /* inode-based key */ key->shared.i_seq = get_inode_sequence_number(inode); - key->shared.pgoff = basepage_index(tail); + key->shared.pgoff = page_to_pgoff(tail); rcu_read_unlock(); }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b7215b0807ca..de89e9295f6c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1380,15 +1380,12 @@ int PageHeadHuge(struct page *page_head) return get_compound_page_dtor(page_head) == free_huge_page; }
-pgoff_t __basepage_index(struct page *page) +pgoff_t hugetlb_basepage_index(struct page *page) { struct page *page_head = compound_head(page); pgoff_t index = page_index(page_head); unsigned long compound_idx;
- if (!PageHuge(page_head)) - return page_index(page); - if (compound_order(page_head) >= MAX_ORDER) compound_idx = page_to_pfn(page) - page_to_pfn(page_head); else
From: ManYi Li limanyi@uniontech.com
[ Upstream commit 7dd753ca59d6c8cc09aa1ed24f7657524803c7f3 ]
Handle a reported media event code of 3. This indicates that the media has been removed from the drive and user intervention is required to proceed. Return DISK_EVENT_EJECT_REQUEST in that case.
Link: https://lore.kernel.org/r/20210611094402.23884-1-limanyi@uniontech.com Signed-off-by: ManYi Li limanyi@uniontech.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/sr.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 67a73ea0a615..5e51a39a0c27 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -216,6 +216,8 @@ static unsigned int sr_get_events(struct scsi_device *sdev) return DISK_EVENT_EJECT_REQUEST; else if (med->media_event_code == 2) return DISK_EVENT_MEDIA_CHANGE; + else if (med->media_event_code == 3) + return DISK_EVENT_EJECT_REQUEST; return 0; }
From: Christian König christian.koenig@amd.com
[ Upstream commit d330099115597bbc238d6758a4930e72b49ea9ba ]
AGP for example doesn't have a dma_address array.
Signed-off-by: Christian König christian.koenig@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christia... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index a2e6a81669e7..94b7798bdea4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -447,7 +447,7 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; int i;
- if (!ttm_dma) + if (!ttm_dma || !ttm_dma->dma_address) return;
/* Don't waste time looping if the object is coherent */ @@ -467,7 +467,7 @@ nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; int i;
- if (!ttm_dma) + if (!ttm_dma || !ttm_dma->dma_address) return;
/* Don't waste time looping if the object is coherent */
From: Petr Mladek pmladek@suse.com
commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream.
Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()".
This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling.
This patch (of 2):
Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().
It does not modify the existing behavior.
Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek pmladek@suse.com Cc: jenhaochen@google.com Cc: Martin Liu liumartin@google.com Cc: Minchan Kim minchan@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Oleg Nesterov oleg@redhat.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/kthread.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-)
--- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -952,6 +952,33 @@ void kthread_flush_work(struct kthread_w EXPORT_SYMBOL_GPL(kthread_flush_work);
/* + * Make sure that the timer is neither set nor running and could + * not manipulate the work list_head any longer. + * + * The function is called under worker->lock. The lock is temporary + * released but the timer can't be set again in the meantime. + */ +static void kthread_cancel_delayed_work_timer(struct kthread_work *work, + unsigned long *flags) +{ + struct kthread_delayed_work *dwork = + container_of(work, struct kthread_delayed_work, work); + struct kthread_worker *worker = work->worker; + + /* + * del_timer_sync() must be called to make sure that the timer + * callback is not running. The lock must be temporary released + * to avoid a deadlock with the callback. In the meantime, + * any queuing is blocked by setting the canceling counter. + */ + work->canceling++; + spin_unlock_irqrestore(&worker->lock, *flags); + del_timer_sync(&dwork->timer); + spin_lock_irqsave(&worker->lock, *flags); + work->canceling--; +} + +/* * This function removes the work from the worker queue. Also it makes sure * that it won't get queued later via the delayed work's timer. * @@ -965,23 +992,8 @@ static bool __kthread_cancel_work(struct unsigned long *flags) { /* Try to cancel the timer if exists. */ - if (is_dwork) { - struct kthread_delayed_work *dwork = - container_of(work, struct kthread_delayed_work, work); - struct kthread_worker *worker = work->worker; - - /* - * del_timer_sync() must be called to make sure that the timer - * callback is not running. The lock must be temporary released - * to avoid a deadlock with the callback. In the meantime, - * any queuing is blocked by setting the canceling counter. - */ - work->canceling++; - spin_unlock_irqrestore(&worker->lock, *flags); - del_timer_sync(&dwork->timer); - spin_lock_irqsave(&worker->lock, *flags); - work->canceling--; - } + if (is_dwork) + kthread_cancel_delayed_work_timer(work, flags);
/* * Try to remove the work from a worker list. It might either
From: Petr Mladek pmladek@suse.com
commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.
The system might hang with the following backtrace:
schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec
It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync():
CPU0 CPU1
Context: Thread A Context: Thread B
kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock()
work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list
spin_unlock()
kthread_flush_work() // flush_work is put at the tail of the dwork
wait_for_completion()
Context: IRQ
kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock()
BANG: flush_work is not longer linked and will never get proceed.
The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer.
A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay.
The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced.
The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock.
Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock.
Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek pmladek@suse.com Reported-by: Martin Liu liumartin@google.com Cc: jenhaochen@google.com Cc: Minchan Kim minchan@google.com Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: Oleg Nesterov oleg@redhat.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/kthread.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)
--- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -979,8 +979,11 @@ static void kthread_cancel_delayed_work_ }
/* - * This function removes the work from the worker queue. Also it makes sure - * that it won't get queued later via the delayed work's timer. + * This function removes the work from the worker queue. + * + * It is called under worker->lock. The caller must make sure that + * the timer used by delayed work is not running, e.g. by calling + * kthread_cancel_delayed_work_timer(). * * The work might still be in use when this function finishes. See the * current_work proceed by the worker. @@ -988,13 +991,8 @@ static void kthread_cancel_delayed_work_ * Return: %true if @work was pending and successfully canceled, * %false if @work was not pending */ -static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, - unsigned long *flags) +static bool __kthread_cancel_work(struct kthread_work *work) { - /* Try to cancel the timer if exists. */ - if (is_dwork) - kthread_cancel_delayed_work_timer(work, flags); - /* * Try to remove the work from a worker list. It might either * be from worker->work_list or from worker->delayed_work_list. @@ -1047,11 +1045,23 @@ bool kthread_mod_delayed_work(struct kth /* Work must not be used with >1 worker, see kthread_queue_work() */ WARN_ON_ONCE(work->worker != worker);
- /* Do not fight with another command that is canceling this work. */ + /* + * Temporary cancel the work but do not fight with another command + * that is canceling the work as well. + * + * It is a bit tricky because of possible races with another + * mod_delayed_work() and cancel_delayed_work() callers. + * + * The timer must be canceled first because worker->lock is released + * when doing so. But the work can be removed from the queue (list) + * only when it can be queued again so that the return value can + * be used for reference counting. + */ + kthread_cancel_delayed_work_timer(work, &flags); if (work->canceling) goto out; + ret = __kthread_cancel_work(work);
- ret = __kthread_cancel_work(work, true, &flags); fast_queue: __kthread_queue_delayed_work(worker, dwork, delay); out: @@ -1073,7 +1083,10 @@ static bool __kthread_cancel_work_sync(s /* Work must not be used with >1 worker, see kthread_queue_work(). */ WARN_ON_ONCE(work->worker != worker);
- ret = __kthread_cancel_work(work, is_dwork, &flags); + if (is_dwork) + kthread_cancel_delayed_work_timer(work, &flags); + + ret = __kthread_cancel_work(work);
if (worker->current_work != work) goto out_fast;
From: Juergen Gross jgross@suse.com
commit 3de218ff39b9e3f0d453fe3154f12a174de44b25 upstream.
In order to avoid a race condition for user events when changing cpu affinity reset the active flag only when EOI-ing the event.
This is working fine as all user events are lateeoi events. Note that lateeoi_ack_mask_dynirq() is not modified as there is no explicit call to xen_irq_lateeoi() expected later.
Cc: stable@vger.kernel.org Reported-by: Julien Grall julien@xen.org Fixes: b6622798bc50b62 ("xen/events: avoid handling the same event on two cpus at the same time") Tested-by: Julien Grall julien@xen.org Signed-off-by: Juergen Gross jgross@suse.com Reviewed-by: Boris Ostrovsky boris.ostrvsky@oracle.com Link: https://lore.kernel.org/r/20210623130913.9405-1-jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/xen/events/events_base.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
--- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -533,6 +533,9 @@ static void xen_irq_lateeoi_locked(struc }
info->eoi_time = 0; + + /* is_active hasn't been reset yet, do it now. */ + smp_store_release(&info->is_active, 0); do_unmask(info, EVT_MASK_REASON_EOI_PENDING); }
@@ -1778,10 +1781,22 @@ static void lateeoi_ack_dynirq(struct ir struct irq_info *info = info_for_irq(data->irq); evtchn_port_t evtchn = info ? info->evtchn : 0;
- if (VALID_EVTCHN(evtchn)) { - do_mask(info, EVT_MASK_REASON_EOI_PENDING); - ack_dynirq(data); - } + if (!VALID_EVTCHN(evtchn)) + return; + + do_mask(info, EVT_MASK_REASON_EOI_PENDING); + + if (unlikely(irqd_is_setaffinity_pending(data)) && + likely(!irqd_irq_disabled(data))) { + do_mask(info, EVT_MASK_REASON_TEMPORARY); + + clear_evtchn(evtchn); + + irq_move_masked_irq(data); + + do_unmask(info, EVT_MASK_REASON_TEMPORARY); + } else + clear_evtchn(evtchn); }
static void lateeoi_mask_ack_dynirq(struct irq_data *data)
On Fri, 09 Jul 2021 15:18:27 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.275-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.275-rc1
Juergen Gross jgross@suse.com xen/events: reset active flag for lateeoi events later
Petr Mladek pmladek@suse.com kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Petr Mladek pmladek@suse.com kthread_worker: split code for canceling the delayed work timer
Christian König christian.koenig@amd.com drm/nouveau: fix dma_address check for CPU/GPU sync
ManYi Li limanyi@uniontech.com scsi: sr: Return appropriate error code when disk is ejected
Hugh Dickins hughd@google.com mm, futex: fix shared futex pgoff on shmem huge page
Yang Shi shy828301@gmail.com mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
Alex Shi alex.shi@linux.alibaba.com mm: add VM_WARN_ON_ONCE_PAGE() macro
Michal Hocko mhocko@kernel.org include/linux/mmdebug.h: make VM_WARN* non-rvals
Diffstat:
Makefile | 4 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- drivers/scsi/sr.c | 2 + drivers/xen/events/events_base.c | 23 +++++++++-- include/linux/hugetlb.h | 15 ------- include/linux/mmdebug.h | 21 ++++++++-- include/linux/pagemap.h | 13 +++--- kernel/futex.c | 2 +- kernel/kthread.c | 77 ++++++++++++++++++++++++------------ mm/huge_memory.c | 29 +++++--------- mm/hugetlb.c | 5 +-- 11 files changed, 112 insertions(+), 83 deletions(-)
All tests passing for Tegra ...
Test results for stable-v4.9: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 32 tests: 32 pass, 0 fail
Linux version: 4.9.275-rc1-g972f4299f6ca Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 7/9/21 7:18 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.275-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Fri, 9 Jul 2021 at 18:48, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.275-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 4.9.275-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-4.9.y * git commit: 972f4299f6ca84d948d7be286aeca2e200c4c62c * git describe: v4.9.274-10-g972f4299f6ca * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.9.y/build/v4.9.27...
## No regressions (compared to v4.9.274-7-g901e917fb1e9)
## No fixes (compared to v4.9.274-7-g901e917fb1e9)
## Test result summary total: 65597, pass: 51002, fail: 595, skip: 11784, xfail: 2216,
## Build Summary * arm: 97 total, 97 passed, 0 failed * arm64: 24 total, 24 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 14 total, 14 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 36 total, 36 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 14 total, 14 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest-android * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-[ * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On Fri, Jul 09, 2021 at 03:18:27PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
Build results: total: 163 pass: 163 fail: 0 Qemu test results: total: 382 pass: 382 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 7/9/2021 6:18 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.275-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB, using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 7/9/21 6:18 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.275 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.275-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB, using 32-bit and 64-bit ARM kernels:
Tested-by: Florian Fainelli f.fainelli@gmail.com
linux-stable-mirror@lists.linaro.org