The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 373c4557d2aa362702c4c2d41288fb1e54990b7c Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 14 Nov 2017 01:03:44 +0100
Subject: [PATCH] mm/pagewalk.c: report holes in hugetlb ranges
This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace. It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2:
- don't crash on ->pte_hole==NULL (Andrew Morton)
- add Cc stable (Andrew Morton)
Fixes: 1e25a271c8ac ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 8bd4afa83cb8..23a3e415ac2c 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -188,8 +188,12 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
do {
next = hugetlb_entry_end(h, addr, end);
pte = huge_pte_offset(walk->mm, addr & hmask, sz);
- if (pte && walk->hugetlb_entry)
+
+ if (pte)
err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
+ else if (walk->pte_hole)
+ err = walk->pte_hole(addr, next, walk);
+
if (err)
break;
} while (addr = next, addr != end);
When I added entry_SYSCALL_64_after_hwframe, I left TRACE_IRQS_OFF
before it. This means that users of entry_SYSCALL_64_after_hwframe
were responsible for invoking TRACE_IRQS_OFF, and the one and only
user (added in the same commit) got it wrong.
I think this would manifest as a warning if a Xen PV guest with
CONFIG_DEBUG_LOCKDEP=y were used with context tracking. (The
context tracking bit is to cause lockdep to get invoked before we
turn IRQs back on.) I haven't tested that for real yet because I
can't get a kernel configured like that to boot at all on Xen PV.
I've reported it upstream. The problem seems to be that Xen PV is
missing early #UD handling, is hitting some WARN, and we rely on
Move TRACE_IRQS_OFF below the label.
Cc: stable(a)vger.kernel.org
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Juergen Gross <jgross(a)suse.com>
Fixes: 8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries")
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
arch/x86/entry/entry_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a2b30ec69497..5063ed1214dd 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -148,8 +148,6 @@ ENTRY(entry_SYSCALL_64)
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
- TRACE_IRQS_OFF
-
/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
@@ -170,6 +168,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */
UNWIND_HINT_REGS extra=0
+ TRACE_IRQS_OFF
+
/*
* If we need to do entry work or if we guess we'll need to do
* exit work, go straight to the slow path.
--
2.13.6
This patch converts several network drivers to use smp_rmb
rather than read_barrier_depends. The initial issue was
discovered with ixgbe on a Power machine which resulted
in skb list corruption due to fetching a stale skb pointer.
More details can be found in the ixgbe patch description.
Changes since v1:
- Remove NULLing of tx_buffer->skb in the ixgbe patch
Brian King (7):
ixgbe: Fix skb list corruption on Power systems
i40e: Use smp_rmb rather than read_barrier_depends
ixgbevf: Use smp_rmb rather than read_barrier_depends
igbvf: Use smp_rmb rather than read_barrier_depends
igb: Use smp_rmb rather than read_barrier_depends
fm10k: Use smp_rmb rather than read_barrier_depends
i40evf: Use smp_rmb rather than read_barrier_depends
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
8 files changed, 8 insertions(+), 8 deletions(-)
--
1.8.3.1
The patch titled
Subject: IB/core: disable memory registration of fileystem-dax vmas
has been added to the -mm tree. Its filename is
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ib-core-disable-memory-registratio…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ib-core-disable-memory-registratio…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: IB/core: disable memory registration of fileystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow RDMA to create long standing memory registrations against
filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/infiniband/core/umem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas drivers/infiniband/core/umem.c
--- a/drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas
+++ a/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
sg_list_start = umem->sg_head.sgl;
while (npages) {
- ret = get_user_pages(cur_base,
+ ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
gup_flags, page_list, vma_list);
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: v4l2: disable filesystem-dax mapping support
has been added to the -mm tree. Its filename is
v4l2-disable-filesystem-dax-mapping-support.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/v4l2-disable-filesystem-dax-mappin…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/v4l2-disable-filesystem-dax-mappin…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: v4l2: disable filesystem-dax mapping support
V4L2 memory registrations are incompatible with filesystem-dax that needs
the ability to revoke dma access to a mapping at will, or otherwise allow
the kernel to wait for completion of DMA. The filesystem-dax
implementation breaks the traditional solution of truncate of active file
backed mappings since there is no page-cache page we can orphan to sustain
ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jan Kara <jack(a)suse.cz>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support drivers/media/v4l2-core/videobuf-dma-sg.c
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
has been added to the -mm tree. Its filename is
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-fail-get_vaddr_frames-for-files…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-fail-get_vaddr_frames-for-files…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow V4L2, Exynos, and other frame vector users to create long
standing / irrevocable memory registrations against filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/frame_vector.c | 4 ++++
1 file changed, 4 insertions(+)
diff -puN mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings mm/frame_vector.c
--- a/mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings
+++ a/mm/frame_vector.c
@@ -53,6 +53,10 @@ int get_vaddr_frames(unsigned long start
ret = -EFAULT;
goto out;
}
+
+ if (vma_is_fsdax(vma))
+ return -EOPNOTSUPP;
+
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
vec->is_pfns = false;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: mm: introduce get_user_pages_longterm
has been added to the -mm tree. Its filename is
mm-introduce-get_user_pages_longterm.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-introduce-get_user_pages_longte…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-introduce-get_user_pages_longte…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to keep
an elevated page count indefinitely. This is distinct from usages like
iov_iter_get_pages where the elevated page counts are transient. The
iov_iter_get_pages cases immediately turn around and submit the pages to a
device driver which will put_page when the i/o operation completes (under
kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime of
the block / page and needs reasonable limits on how long it can wait for
pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before blocks
from a truncate/hole-punch operation are repurposed is saved for a later
patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings were
supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow long standing memory registrations against filesytem-dax
vmas. Device-dax vmas do not have this problem and are explicitly
allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and V4L2).
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fs.h | 14 +++++++++
include/linux/mm.h | 13 ++++++++
mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
diff -puN include/linux/fs.h~mm-introduce-get_user_pages_longterm include/linux/fs.h
--- a/include/linux/fs.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/fs.h
@@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
diff -puN include/linux/mm.h~mm-introduce-get_user_pages_longterm include/linux/mm.h
--- a/include/linux/mm.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/mm.h
@@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
diff -puN mm/gup.c~mm-introduce-get_user_pages_longterm mm/gup.c
--- a/mm/gup.c~mm-introduce-get_user_pages_longterm
+++ a/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch