On Thu, Aug 22, 2019 at 3:21 PM Andrew Morton <akpm(a)linux-foundation.org> wrote:
>
> On Wed, 21 Aug 2019 11:26:25 +0800 Joseph Qi <joseph.qi(a)linux.alibaba.com> wrote:
>
> > Only when calling the poll syscall the first time can user
> > receive POLLPRI correctly. After that, user always fails to
> > acquire the event signal.
> >
> > Reproduce case:
> > 1. Get the monitor code in Documentation/accounting/psi.txt
> > 2. Run it, and wait for the event triggered.
> > 3. Kill and restart the process.
> >
> > The question is why we can end up with poll_scheduled = 1 but the work
> > not running (which would reset it to 0). And the answer is because the
> > scheduling side sees group->poll_kworker under RCU protection and then
> > schedules it, but here we cancel the work and destroy the worker. The
> > cancel needs to pair with resetting the poll_scheduled flag.
>
> Should this be backported into -stable kernels?
Adding GregKH and stable(a)vger.kernel.org
I was able to cleanly apply this patch to stable master and
linux-5.2.y branches (these are the only branches that have psi
triggers).
Greg, Andrew got this patch into -mm tree. Please advise on how we
should proceed to land it in stable 5.2.y and master.
Thanks,
Suren.
The patch titled
Subject: mm, page_owner: handle THP splits correctly
has been added to the -mm tree. Its filename is
mm-page_owner-handle-thp-splits-correctly.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_owner-handle-thp-splits-co…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_owner-handle-thp-splits-co…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_owner: handle THP splits correctly
THP splitting path is missing the split_page_owner() call that
split_page() has. As a result, split THP pages are wrongly reported in
the page_owner file as order-9 pages. Furthermore when the former head
page is freed, the remaining former tail pages are not listed in the
page_owner file at all. This patch fixes that by adding the
split_page_owner() call into __split_huge_page().
Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill(a)shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/huge_memory.c~mm-page_owner-handle-thp-splits-correctly
+++ a/mm/huge_memory.c
@@ -32,6 +32,7 @@
#include <linux/shmem_fs.h>
#include <linux/oom.h>
#include <linux/numa.h>
+#include <linux/page_owner.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2516,6 +2517,9 @@ static void __split_huge_page(struct pag
}
ClearPageCompound(head);
+
+ split_page_owner(head, HPAGE_PMD_ORDER);
+
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-page_owner-handle-thp-splits-correctly.patch
mm-page_owner-record-page-owner-for-each-subpage.patch
mm-page_owner-keep-owner-info-when-freeing-the-page.patch
mm-page_owner-debug_pagealloc-save-and-dump-freeing-stack-trace.patch
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone-fix-2.patch
mm-reclaim-cleanup-should_continue_reclaim.patch
mm-compaction-raise-compaction-priority-after-it-withdrawns.patch
The patch titled
Subject: userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
has been added to the -mm tree. Its filename is
userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/userfaultfd_release-always-remove-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd_release-always-remove-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even
if mm->core_state != NULL.
Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.
Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41fb ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Tested-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
--- a/fs/userfaultfd.c~userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx
+++ a/fs/userfaultfd.c
@@ -880,6 +880,7 @@ static int userfaultfd_release(struct in
/* len == 0 means wake all */
struct userfaultfd_wake_range range = { .len = 0, };
unsigned long new_flags;
+ bool still_valid;
WRITE_ONCE(ctx->released, true);
@@ -895,8 +896,7 @@ static int userfaultfd_release(struct in
* taking the mmap_sem for writing.
*/
down_write(&mm->mmap_sem);
- if (!mmget_still_valid(mm))
- goto skip_mm;
+ still_valid = mmget_still_valid(mm);
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -907,19 +907,20 @@ static int userfaultfd_release(struct in
continue;
}
new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
- prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
- new_flags, vma->anon_vma,
- vma->vm_file, vma->vm_pgoff,
- vma_policy(vma),
- NULL_VM_UFFD_CTX);
- if (prev)
- vma = prev;
- else
- prev = vma;
+ if (still_valid) {
+ prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
+ new_flags, vma->anon_vma,
+ vma->vm_file, vma->vm_pgoff,
+ vma_policy(vma),
+ NULL_VM_UFFD_CTX);
+ if (prev)
+ vma = prev;
+ else
+ prev = vma;
+ }
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
-skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
wakeup:
_
Patches currently in -mm which might be from oleg(a)redhat.com are
userfaultfd_release-always-remove-uffd-flags-and-clear-vm_userfaultfd_ctx.patch
aio-simplify-read_events.patch
Currently, we don't call dma_set_max_seg_size() for i915 because we
intentionally do not limit the segment length that the device supports.
However, this results in a warning being emitted if we try to map
anything larger than SZ_64K on a kernel with CONFIG_DMA_API_DEBUG_SG
enabled:
[ 7.751926] DMA-API: i915 0000:00:02.0: mapping sg segment longer
than device claims to support [len=98304] [max=65536]
[ 7.751934] WARNING: CPU: 5 PID: 474 at kernel/dma/debug.c:1220
debug_dma_map_sg+0x20f/0x340
This was originally brought up on
https://bugs.freedesktop.org/show_bug.cgi?id=108517 , and the consensus
there was it wasn't really useful to set a limit (and that dma-debug
isn't really all that useful for i915 in the first place). Unfortunately
though, CONFIG_DMA_API_DEBUG_SG is enabled in the debug configs for
various distro kernels. Since a WARN_ON() will disable automatic problem
reporting (and cause any CI with said option enabled to start
complaining), we really should just fix the problem.
Note that as me and Chris Wilson discussed, the other solution for this
would be to make DMA-API not make such assumptions when a driver hasn't
explicitly set a maximum segment size. But, taking a look at the commit
which originally introduced this behavior, commit 78c47830a5cb
("dma-debug: check scatterlist segments"), there is an explicit mention
of this assumption and how it applies to devices with no segment size:
Conversely, devices which are less limited than the rather
conservative defaults, or indeed have no limitations at all
(e.g. GPUs with their own internal MMU), should be encouraged to
set appropriate dma_parms, as they may get more efficient DMA
mapping performance out of it.
So unless there's any concerns (I'm open to discussion!), let's just
follow suite and call dma_set_max_seg_size() with UINT_MAX as our limit
to silence any warnings.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: <stable(a)vger.kernel.org> # v4.18+
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0b81e0b64393..a1475039d182 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3152,6 +3152,11 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt)
if (ret)
return ret;
+ /* We don't have a max segment size, so set it to the max so sg's
+ * debugging layer doesn't complain
+ */
+ dma_set_max_seg_size(ggtt->vm.dma, UINT_MAX);
+
if ((ggtt->vm.total - 1) >> 32) {
DRM_ERROR("We never expected a Global GTT with more than 32bits"
" of address space! Found %lldM!\n",
--
2.21.0