6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alistair Popple apopple@nvidia.com
[ Upstream commit 7851bf649d423edd7286b292739f2eefded3d35c ]
Patch series "fs/dax: Fix ZONE_DEVICE page reference counts", v9.
Device and FS DAX pages have always maintained their own page reference counts without following the normal rules for page reference counting. In particular pages are considered free when the refcount hits one rather than zero and refcounts are not added when mapping the page.
Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary mechanism for allowing GUP to hold references on the page (see get_dev_pagemap). However there doesn't seem to be any reason why FS DAX pages need their own reference counting scheme.
By treating the refcounts on these pages the same way as normal pages we can remove a lot of special checks. In particular pXd_trans_huge() becomes the same as pXd_leaf(), although I haven't made that change here. It also frees up a valuable SW define PTE bit on architectures that have devmap PTE bits defined.
It also almost certainly allows further clean-up of the devmap managed functions, but I have left that as a future improvment. It also enables support for compound ZONE_DEVICE pages which is one of my primary motivators for doing this work.
This patch (of 20):
FS DAX requires file systems to call into the DAX layout prior to unlinking inodes to ensure there is no ongoing DMA or other remote access to the direct mapped page. The fuse file system implements fuse_dax_break_layouts() to do this which includes a comment indicating that passing dmap_end == 0 leads to unmapping of the whole file.
However this is not true - passing dmap_end == 0 will not unmap anything before dmap_start, and further more dax_layout_busy_page_range() will not scan any of the range to see if there maybe ongoing DMA access to the range. Fix this by passing -1 for dmap_end to fuse_dax_break_layouts() which will invalidate the entire file range to dax_layout_busy_page_range().
Link: https://lkml.kernel.org/r/cover.8068ad144a7eea4a813670301f4d2a86a8e68ec4.174... Link: https://lkml.kernel.org/r/f09a34b6c40032022e4ddee6fadb7cc676f08867.174071340... Fixes: 6ae330cad6ef ("virtiofs: serialize truncate/punch_hole and dax fault path") Signed-off-by: Alistair Popple apopple@nvidia.com Co-developed-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Balbir Singh balbirs@nvidia.com Tested-by: Alison Schofield alison.schofield@intel.com Cc: Vivek Goyal vgoyal@redhat.com Cc: Alexander Gordeev agordeev@linux.ibm.com Cc: Asahi Lina lina@asahilina.net Cc: Bjorn Helgaas bhelgaas@google.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Christoph Hellwig hch@lst.de Cc: Chunyan Zhang zhang.lyra@gmail.com Cc: "Darrick J. Wong" djwong@kernel.org Cc: Dave Chinner david@fromorbit.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: David Hildenbrand david@redhat.com Cc: Gerald Schaefer gerald.schaefer@linux.ibm.com Cc: Heiko Carstens hca@linux.ibm.com Cc: Huacai Chen chenhuacai@kernel.org Cc: Ira Weiny ira.weiny@intel.com Cc: Jan Kara jack@suse.cz Cc: Jason Gunthorpe jgg@nvidia.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: John Hubbard jhubbard@nvidia.com Cc: linmiaohe linmiaohe@huawei.com Cc: Logan Gunthorpe logang@deltatee.com Cc: Matthew Wilcow (Oracle) willy@infradead.org Cc: Michael "Camp Drill Sergeant" Ellerman mpe@ellerman.id.au Cc: Nicholas Piggin npiggin@gmail.com Cc: Peter Xu peterx@redhat.com Cc: Sven Schnelle svens@linux.ibm.com Cc: Ted Ts'o tytso@mit.edu Cc: Vasily Gorbik gor@linux.ibm.com Cc: Vishal Verma vishal.l.verma@intel.com Cc: WANG Xuerui kernel@xen0n.name Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fuse/dax.c | 1 - fs/fuse/dir.c | 2 +- fs/fuse/file.c | 4 ++-- 3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 6e71904c396f1..dc28c28654d93 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -681,7 +681,6 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, 0, 0, fuse_wait_dax_page(inode)); }
-/* dmap_end == 0 leads to unmapping of whole file */ int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end) { diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index de31cb8eb7201..c431abbf48e66 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1712,7 +1712,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, if (FUSE_IS_DAX(inode) && is_truncate) { filemap_invalidate_lock(mapping); fault_blocked = true; - err = fuse_dax_break_layouts(inode, 0, 0); + err = fuse_dax_break_layouts(inode, 0, -1); if (err) { filemap_invalidate_unlock(mapping); return err; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 0df1311afb87d..723dd9b94e567 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -240,7 +240,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
if (dax_truncate) { filemap_invalidate_lock(inode->i_mapping); - err = fuse_dax_break_layouts(inode, 0, 0); + err = fuse_dax_break_layouts(inode, 0, -1); if (err) goto out_inode_unlock; } @@ -3020,7 +3020,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, inode_lock(inode); if (block_faults) { filemap_invalidate_lock(inode->i_mapping); - err = fuse_dax_break_layouts(inode, 0, 0); + err = fuse_dax_break_layouts(inode, 0, -1); if (err) goto out; }