The following series implements... Changes since v3 [1]:
* Kill IS_DAX() in favor of explicit IS_FSDAX() and IS_DEVDAX() helpers. Jan noted, "having IS_DAX() and IS_FSDAX() doing almost the same, just not exactly the same, is IMHO a recipe for confusion", and I agree. A nice side effect of this elimination is a cleanup to remove occasions of "#ifdef CONFIG_FS_DAX" in C files, it is all moved to header files now. (Jan)
---
The vfio interface, like RDMA, wants to setup long term (indefinite) pins of the pages backing an address range so that a guest or userspace driver can perform DMA to the with physical address. Given that this pinning may lead to filesystem operations deadlocking in the filesystem-dax case, the pinning request needs to be rejected.
The longer term fix for vfio, RDMA, and any other long term pin user, is to provide a 'pin with lease' mechanism. Similar to the leases that are hold for pNFS RDMA layouts, this userspace lease gives the kernel a way to notify userspace that the block layout of the file is changing and the kernel is revoking access to pinned pages.
Related to this change is the discovery that vma_is_fsdax() was causing device-dax inode detection to fail. That lead to series of fixes and cleanups to make sure that S_DAX is defined correctly in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case.
---
Dan Williams (12): dax: fix vma_is_fsdax() helper dax: introduce IS_DEVDAX() and IS_FSDAX() ext2, dax: finish implementing dax_sem helpers ext2, dax: define ext2_dax_*() infrastructure in all cases ext4, dax: define ext4_dax_*() infrastructure in all cases ext2, dax: replace IS_DAX() with IS_FSDAX() ext4, dax: replace IS_DAX() with IS_FSDAX() xfs, dax: replace IS_DAX() with IS_FSDAX() mm, dax: replace IS_DAX() with IS_DEVDAX() or IS_FSDAX() fs, dax: kill IS_DAX() dax: fix S_DAX definition vfio: disable filesystem-dax page pinning
drivers/vfio/vfio_iommu_type1.c | 18 ++++++++++++++-- fs/ext2/ext2.h | 6 +++++ fs/ext2/file.c | 19 +++++------------ fs/ext2/inode.c | 10 ++++----- fs/ext4/file.c | 18 +++++----------- fs/ext4/inode.c | 4 ++-- fs/ext4/ioctl.c | 2 +- fs/ext4/super.c | 2 +- fs/iomap.c | 2 +- fs/xfs/xfs_file.c | 14 ++++++------- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_iomap.c | 6 +++-- fs/xfs/xfs_reflink.c | 2 +- include/linux/dax.h | 12 ++++++++--- include/linux/fs.h | 43 ++++++++++++++++++++++++++++----------- mm/fadvise.c | 3 ++- mm/filemap.c | 4 ++-- mm/huge_memory.c | 4 +++- mm/madvise.c | 3 ++- 19 files changed, 102 insertions(+), 74 deletions(-)
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm") Cc: stable@vger.kernel.org Reported-by: Gerd Rausch gerd.rausch@oracle.com Acked-by: Jane Chu jane.chu@oracle.com Reported-by: Haozhong Zhang haozhong.zhang@intel.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 2a815560fda0..79c413985305 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3198,7 +3198,7 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma) if (!vma_is_dax(vma)) return false; inode = file_inode(vma->vm_file); - if (inode->i_mode == S_IFCHR) + if (S_ISCHR(inode->i_mode)) return false; /* device-dax */ return true; }
The current IS_DAX() helper that checks the S_DAX inode flag is ambiguous, and currently has the broken assumption that the S_DAX flag is only non-zero in the CONFIG_FS_DAX=y case. In preparation for defining S_DAX to non-zero in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, introduce two explicit helpers to replace IS_DAX().
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org (supporter:XFS FILESYSTEM) Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..bd0c46880572 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,28 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV)
+static inline bool IS_DEVDAX(struct inode *inode) +{ + if (!IS_ENABLED(CONFIG_DEV_DAX)) + return false; + if ((inode->i_flags & S_DAX) == 0) + return false; + if (!S_ISCHR(inode->i_mode)) + return false; + return true; +} + +static inline bool IS_FSDAX(struct inode *inode) +{ + if (!IS_ENABLED(CONFIG_FS_DAX)) + return false; + if ((inode->i_flags & S_DAX) == 0) + return false; + if (S_ISCHR(inode->i_mode)) + return false; + return true; +} + static inline bool HAS_UNMAPPED_ID(struct inode *inode) { return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
On Mon 26-02-18 20:20:05, Dan Williams wrote:
The current IS_DAX() helper that checks the S_DAX inode flag is ambiguous, and currently has the broken assumption that the S_DAX flag
I don't think S_DAX flag is really ambiguous. It is just that in CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, the compiler is not able to figure out some calls behind IS_DAX() are dead code and so the kernel won't compile / link. Or is there any other problem I'm missing?
If I'm indeed right, then please tell this in the changelog and don't talk about abstract ambiguity of S_DAX flag.
As much as I'd prefer to solve link-time problems with stubs instead of relying on dead-code elimination, I can live with split macros so once the changelog is settled, feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
is only non-zero in the CONFIG_FS_DAX=y case. In preparation for defining S_DAX to non-zero in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, introduce two explicit helpers to replace IS_DAX().
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org (supporter:XFS FILESYSTEM) Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com
include/linux/fs.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..bd0c46880572 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,28 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) +static inline bool IS_DEVDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_DEV_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- if (!S_ISCHR(inode->i_mode))
return false;
- return true;
+}
+static inline bool IS_FSDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_FS_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- if (S_ISCHR(inode->i_mode))
return false;
- return true;
+}
static inline bool HAS_UNMAPPED_ID(struct inode *inode) { return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
dax_sem_{up,down}_write_sem() allow the ext2 dax semaphore to be compiled out in the CONFIG_FS_DAX=n case. However there are still some open coded uses of the semaphore. Add dax_sem_{up_read,down_read,_is_locked}() helpers and convert all open-coded usages of the semaphore to the helpers.
Cc: Jan Kara jack@suse.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext2/ext2.h | 6 ++++++ fs/ext2/file.c | 5 ++--- fs/ext2/inode.c | 4 +--- 3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 032295e1d386..7ceb29733cdb 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -711,9 +711,15 @@ struct ext2_inode_info { #ifdef CONFIG_FS_DAX #define dax_sem_down_write(ext2_inode) down_write(&(ext2_inode)->dax_sem) #define dax_sem_up_write(ext2_inode) up_write(&(ext2_inode)->dax_sem) +#define dax_sem_is_locked(ext2_inode) rwsem_is_locked(&(ext2_inode)->dax_sem) +#define dax_sem_down_read(ext2_inode) down_read(&(ext2_inode)->dax_sem) +#define dax_sem_up_read(ext2_inode) up_read(&(ext2_inode)->dax_sem) #else #define dax_sem_down_write(ext2_inode) #define dax_sem_up_write(ext2_inode) +#define dax_sem_is_locked(ext2_inode) (true) +#define dax_sem_down_read(ext2_inode) +#define dax_sem_up_read(ext2_inode) #endif
/* diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 09640220fda8..1c7ea1bcddde 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -91,18 +91,17 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) static int ext2_dax_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); - struct ext2_inode_info *ei = EXT2_I(inode); int ret;
if (vmf->flags & FAULT_FLAG_WRITE) { sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); } - down_read(&ei->dax_sem); + dax_sem_down_read(EXT2_I(inode));
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops);
- up_read(&ei->dax_sem); + dax_sem_up_read(EXT2_I(inode)); if (vmf->flags & FAULT_FLAG_WRITE) sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 9b2ac55ac34f..e04295e99d90 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1187,9 +1187,7 @@ static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) blocksize = inode->i_sb->s_blocksize; iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb);
-#ifdef CONFIG_FS_DAX - WARN_ON(!rwsem_is_locked(&ei->dax_sem)); -#endif + WARN_ON(!dax_sem_is_locked(ei));
n = ext2_block_to_path(inode, iblock, offsets, NULL); if (n == 0)
On Mon 26-02-18 20:20:11, Dan Williams wrote:
dax_sem_{up,down}_write_sem() allow the ext2 dax semaphore to be compiled out in the CONFIG_FS_DAX=n case. However there are still some open coded uses of the semaphore. Add dax_sem_{up_read,down_read,_is_locked}() helpers and convert all open-coded usages of the semaphore to the helpers.
Just one nit below. With that fixed you can add:
Reviewed-by: Jan Kara jack@suse.cz
Cc: Jan Kara jack@suse.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
fs/ext2/ext2.h | 6 ++++++ fs/ext2/file.c | 5 ++--- fs/ext2/inode.c | 4 +--- 3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 032295e1d386..7ceb29733cdb 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -711,9 +711,15 @@ struct ext2_inode_info { #ifdef CONFIG_FS_DAX #define dax_sem_down_write(ext2_inode) down_write(&(ext2_inode)->dax_sem) #define dax_sem_up_write(ext2_inode) up_write(&(ext2_inode)->dax_sem) +#define dax_sem_is_locked(ext2_inode) rwsem_is_locked(&(ext2_inode)->dax_sem) +#define dax_sem_down_read(ext2_inode) down_read(&(ext2_inode)->dax_sem) +#define dax_sem_up_read(ext2_inode) up_read(&(ext2_inode)->dax_sem) #else #define dax_sem_down_write(ext2_inode) #define dax_sem_up_write(ext2_inode) +#define dax_sem_is_locked(ext2_inode) (true)
This is a bit dangerous as depending on the use of dax_sem_is_locked() you'd need this to return true or false. E.g. assume we'd have a place where we'd like to do:
WARN_ON(dax_sem_is_locked(inode));
or just
if (dax_sem_is_locked(inode)) bail out due to congestion...
How I'd solve this is that I'd define:
#define assert_dax_sem_locked(ext2_inode) WARN_ON(rwsem_is_locked(&(ext2_inode)->dax_sem)
and just as do {} while (0) for !CONFIG_FS_DAX.
Honza
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will still be able to determine that all the related code can be discarded in the CONFIG_FS_DAX=n case.
Cc: Jan Kara jack@suse.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext2/file.c | 8 -------- include/linux/dax.h | 10 ++++++++-- 2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 1c7ea1bcddde..5ac98d074323 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -29,7 +29,6 @@ #include "xattr.h" #include "acl.h"
-#ifdef CONFIG_FS_DAX static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = iocb->ki_filp->f_mapping->host; @@ -128,9 +127,6 @@ static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) vma->vm_flags |= VM_MIXEDMAP; return 0; } -#else -#define ext2_file_mmap generic_file_mmap -#endif
/* * Called when filp is released. This happens when all file descriptors @@ -162,19 +158,15 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { -#ifdef CONFIG_FS_DAX if (IS_DAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_read_iter(iocb, to); -#endif return generic_file_read_iter(iocb, to); }
static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { -#ifdef CONFIG_FS_DAX if (IS_DAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_write_iter(iocb, from); -#endif return generic_file_write_iter(iocb, from); }
diff --git a/include/linux/dax.h b/include/linux/dax.h index 0185ecdae135..47edbce4fc52 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -93,8 +93,6 @@ void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev);
-ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, - const struct iomap_ops *ops); int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); int dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, @@ -107,6 +105,8 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, int __dax_zero_page_range(struct block_device *bdev, struct dax_device *dax_dev, sector_t sector, unsigned int offset, unsigned int length); +ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops); #else static inline int __dax_zero_page_range(struct block_device *bdev, struct dax_device *dax_dev, sector_t sector, @@ -114,6 +114,12 @@ static inline int __dax_zero_page_range(struct block_device *bdev, { return -ENXIO; } + +static inline ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops) +{ + return -ENXIO; +} #endif
static inline bool dax_mapping(struct address_space *mapping)
On Mon 26-02-18 20:20:16, Dan Williams wrote:
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will still be able to determine that all the related code can be discarded in the CONFIG_FS_DAX=n case.
Cc: Jan Kara jack@suse.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/ext2/file.c | 8 -------- include/linux/dax.h | 10 ++++++++-- 2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 1c7ea1bcddde..5ac98d074323 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -29,7 +29,6 @@ #include "xattr.h" #include "acl.h" -#ifdef CONFIG_FS_DAX static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = iocb->ki_filp->f_mapping->host; @@ -128,9 +127,6 @@ static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) vma->vm_flags |= VM_MIXEDMAP; return 0; } -#else -#define ext2_file_mmap generic_file_mmap -#endif /*
- Called when filp is released. This happens when all file descriptors
@@ -162,19 +158,15 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync) static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { -#ifdef CONFIG_FS_DAX if (IS_DAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_read_iter(iocb, to); -#endif return generic_file_read_iter(iocb, to); } static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { -#ifdef CONFIG_FS_DAX if (IS_DAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_write_iter(iocb, from); -#endif return generic_file_write_iter(iocb, from); } diff --git a/include/linux/dax.h b/include/linux/dax.h index 0185ecdae135..47edbce4fc52 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -93,8 +93,6 @@ void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev); -ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); int dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, @@ -107,6 +105,8 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, int __dax_zero_page_range(struct block_device *bdev, struct dax_device *dax_dev, sector_t sector, unsigned int offset, unsigned int length); +ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
#else static inline int __dax_zero_page_range(struct block_device *bdev, struct dax_device *dax_dev, sector_t sector, @@ -114,6 +114,12 @@ static inline int __dax_zero_page_range(struct block_device *bdev, { return -ENXIO; }
+static inline ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops)
+{
- return -ENXIO;
+} #endif static inline bool dax_mapping(struct address_space *mapping)
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will still be able to determine that all the related code can be discarded in the CONFIG_FS_DAX=n case.
Cc: Jan Kara jack@suse.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext4/file.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index fb6f023622fe..51854e7608f0 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -34,7 +34,6 @@ #include "xattr.h" #include "acl.h"
-#ifdef CONFIG_FS_DAX static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); @@ -60,7 +59,6 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) file_accessed(iocb->ki_filp); return ret; } -#endif
static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { @@ -70,10 +68,8 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; /* skip atime */
-#ifdef CONFIG_FS_DAX if (IS_DAX(file_inode(iocb->ki_filp))) return ext4_dax_read_iter(iocb, to); -#endif return generic_file_read_iter(iocb, to); }
@@ -179,7 +175,6 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return iov_iter_count(from); }
-#ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -208,7 +203,6 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = generic_write_sync(iocb, ret); return ret; } -#endif
static ssize_t ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
On Mon 26-02-18 20:20:21, Dan Williams wrote:
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will still be able to determine that all the related code can be discarded in the CONFIG_FS_DAX=n case.
Cc: Jan Kara jack@suse.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/ext4/file.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index fb6f023622fe..51854e7608f0 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -34,7 +34,6 @@ #include "xattr.h" #include "acl.h" -#ifdef CONFIG_FS_DAX static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); @@ -60,7 +59,6 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) file_accessed(iocb->ki_filp); return ret; } -#endif static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { @@ -70,10 +68,8 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; /* skip atime */ -#ifdef CONFIG_FS_DAX if (IS_DAX(file_inode(iocb->ki_filp))) return ext4_dax_read_iter(iocb, to); -#endif return generic_file_read_iter(iocb, to); } @@ -179,7 +175,6 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return iov_iter_count(from); } -#ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -208,7 +203,6 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = generic_write_sync(iocb, ret); return ret; } -#endif static ssize_t ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Jan Kara jack@suse.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext2/file.c | 6 +++--- fs/ext2/inode.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 5ac98d074323..702a36df6c01 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -119,7 +119,7 @@ static const struct vm_operations_struct ext2_dax_vm_ops = {
static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) { - if (!IS_DAX(file_inode(file))) + if (!IS_FSDAX(file_inode(file))) return generic_file_mmap(file, vma);
file_accessed(file); @@ -158,14 +158,14 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { - if (IS_DAX(iocb->ki_filp->f_mapping->host)) + if (IS_FSDAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_read_iter(iocb, to); return generic_file_read_iter(iocb, to); }
static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { - if (IS_DAX(iocb->ki_filp->f_mapping->host)) + if (IS_FSDAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_write_iter(iocb, from); return generic_file_write_iter(iocb, from); } diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index e04295e99d90..72284f9fd034 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -733,7 +733,7 @@ static int ext2_get_blocks(struct inode *inode, goto cleanup; }
- if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { /* * We must unmap blocks before zeroing so that writeback cannot * overwrite zeros with stale data from block device page cache. @@ -940,7 +940,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret;
- if (WARN_ON_ONCE(IS_DAX(inode))) + if (WARN_ON_ONCE(IS_FSDAX(inode))) return -EIO;
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block); @@ -1294,7 +1294,7 @@ static int ext2_setsize(struct inode *inode, loff_t newsize)
inode_dio_wait(inode);
- if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { error = iomap_zero_range(inode, newsize, PAGE_ALIGN(newsize) - newsize, NULL, &ext2_iomap_ops);
On Mon 26-02-18 20:20:27, Dan Williams wrote:
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Jan Kara jack@suse.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
fs/ext2/file.c | 6 +++--- fs/ext2/inode.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 5ac98d074323..702a36df6c01 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -119,7 +119,7 @@ static const struct vm_operations_struct ext2_dax_vm_ops = { static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) {
- if (!IS_DAX(file_inode(file)))
- if (!IS_FSDAX(file_inode(file))) return generic_file_mmap(file, vma);
file_accessed(file); @@ -158,14 +158,14 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync) static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) {
- if (IS_DAX(iocb->ki_filp->f_mapping->host))
- if (IS_FSDAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_read_iter(iocb, to); return generic_file_read_iter(iocb, to);
} static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) {
- if (IS_DAX(iocb->ki_filp->f_mapping->host))
- if (IS_FSDAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_write_iter(iocb, from); return generic_file_write_iter(iocb, from);
} diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index e04295e99d90..72284f9fd034 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -733,7 +733,7 @@ static int ext2_get_blocks(struct inode *inode, goto cleanup; }
- if (IS_DAX(inode)) {
- if (IS_FSDAX(inode)) { /*
- We must unmap blocks before zeroing so that writeback cannot
- overwrite zeros with stale data from block device page cache.
@@ -940,7 +940,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret;
- if (WARN_ON_ONCE(IS_DAX(inode)))
- if (WARN_ON_ONCE(IS_FSDAX(inode))) return -EIO;
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block); @@ -1294,7 +1294,7 @@ static int ext2_setsize(struct inode *inode, loff_t newsize) inode_dio_wait(inode);
- if (IS_DAX(inode)) {
- if (IS_FSDAX(inode)) { error = iomap_zero_range(inode, newsize, PAGE_ALIGN(newsize) - newsize, NULL, &ext2_iomap_ops);
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Jan Kara jack@suse.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext4/file.c | 12 +++++------- fs/ext4/inode.c | 4 ++-- fs/ext4/ioctl.c | 2 +- fs/ext4/super.c | 2 +- 4 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 51854e7608f0..561ea843b458 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -48,7 +48,7 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) * Recheck under inode lock - at this point we are sure it cannot * change anymore */ - if (!IS_DAX(inode)) { + if (!IS_FSDAX(inode)) { inode_unlock_shared(inode); /* Fallback to buffered IO in case we cannot support DAX */ return generic_file_read_iter(iocb, to); @@ -68,7 +68,7 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; /* skip atime */
- if (IS_DAX(file_inode(iocb->ki_filp))) + if (IS_FSDAX(file_inode(iocb->ki_filp))) return ext4_dax_read_iter(iocb, to); return generic_file_read_iter(iocb, to); } @@ -216,10 +216,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO;
-#ifdef CONFIG_FS_DAX - if (IS_DAX(inode)) + if (IS_FSDAX(inode)) return ext4_dax_write_iter(iocb, from); -#endif if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP;
@@ -361,11 +359,11 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */ - if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) + if (!IS_FSDAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(file); - if (IS_DAX(file_inode(file))) { + if (IS_FSDAX(file_inode(file))) { vma->vm_ops = &ext4_dax_vm_ops; vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; } else { diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c94780075b04..1879b33aa391 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3858,7 +3858,7 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return 0;
/* DAX uses iomap path now */ - if (WARN_ON_ONCE(IS_DAX(inode))) + if (WARN_ON_ONCE(IS_FSDAX(inode))) return 0;
trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); @@ -4076,7 +4076,7 @@ static int ext4_block_zero_page_range(handle_t *handle, if (length > max || length < 0) length = max;
- if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { return iomap_zero_range(inode, from, length, NULL, &ext4_iomap_ops); } diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 7e99ad02f1ba..040fc6570ddb 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -790,7 +790,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) "Online defrag not supported with bigalloc"); err = -EOPNOTSUPP; goto mext_out; - } else if (IS_DAX(inode)) { + } else if (IS_FSDAX(inode)) { ext4_msg(sb, KERN_ERR, "Online defrag not supported with DAX"); err = -EOPNOTSUPP; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 39bf464c35f1..933e12940181 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1161,7 +1161,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, if (inode->i_ino == EXT4_ROOT_INO) return -EPERM;
- if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode))) + if (WARN_ON_ONCE(IS_FSDAX(inode) && i_size_read(inode))) return -EINVAL;
res = ext4_convert_inline_data(inode);
On Mon 26-02-18 20:20:32, Dan Williams wrote:
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Jan Kara jack@suse.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/ext4/file.c | 12 +++++------- fs/ext4/inode.c | 4 ++-- fs/ext4/ioctl.c | 2 +- fs/ext4/super.c | 2 +- 4 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 51854e7608f0..561ea843b458 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -48,7 +48,7 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) * Recheck under inode lock - at this point we are sure it cannot * change anymore */
- if (!IS_DAX(inode)) {
- if (!IS_FSDAX(inode)) { inode_unlock_shared(inode); /* Fallback to buffered IO in case we cannot support DAX */ return generic_file_read_iter(iocb, to);
@@ -68,7 +68,7 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; /* skip atime */
- if (IS_DAX(file_inode(iocb->ki_filp)))
- if (IS_FSDAX(file_inode(iocb->ki_filp))) return ext4_dax_read_iter(iocb, to); return generic_file_read_iter(iocb, to);
} @@ -216,10 +216,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; -#ifdef CONFIG_FS_DAX
- if (IS_DAX(inode))
- if (IS_FSDAX(inode)) return ext4_dax_write_iter(iocb, from);
-#endif if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; @@ -361,11 +359,11 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */
- if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
- if (!IS_FSDAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(file);
- if (IS_DAX(file_inode(file))) {
- if (IS_FSDAX(file_inode(file))) { vma->vm_ops = &ext4_dax_vm_ops; vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; } else {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c94780075b04..1879b33aa391 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3858,7 +3858,7 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return 0; /* DAX uses iomap path now */
- if (WARN_ON_ONCE(IS_DAX(inode)))
- if (WARN_ON_ONCE(IS_FSDAX(inode))) return 0;
trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); @@ -4076,7 +4076,7 @@ static int ext4_block_zero_page_range(handle_t *handle, if (length > max || length < 0) length = max;
- if (IS_DAX(inode)) {
- if (IS_FSDAX(inode)) { return iomap_zero_range(inode, from, length, NULL, &ext4_iomap_ops); }
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 7e99ad02f1ba..040fc6570ddb 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -790,7 +790,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) "Online defrag not supported with bigalloc"); err = -EOPNOTSUPP; goto mext_out;
} else if (IS_DAX(inode)) {
} else if (IS_FSDAX(inode)) { ext4_msg(sb, KERN_ERR, "Online defrag not supported with DAX"); err = -EOPNOTSUPP;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 39bf464c35f1..933e12940181 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1161,7 +1161,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, if (inode->i_ino == EXT4_ROOT_INO) return -EPERM;
- if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode)))
- if (WARN_ON_ONCE(IS_FSDAX(inode) && i_size_read(inode))) return -EINVAL;
res = ext4_convert_inline_data(inode);
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Jan Kara jack@suse.com Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/xfs/xfs_file.c | 14 +++++++------- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_iomap.c | 6 +++--- fs/xfs/xfs_reflink.c | 2 +- 4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9ea08326f876..46a098b90fd0 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -288,7 +288,7 @@ xfs_file_read_iter( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO;
- if (IS_DAX(inode)) + if (IS_FSDAX(inode)) ret = xfs_file_dax_read(iocb, to); else if (iocb->ki_flags & IOCB_DIRECT) ret = xfs_file_dio_aio_read(iocb, to); @@ -726,7 +726,7 @@ xfs_file_write_iter( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO;
- if (IS_DAX(inode)) + if (IS_FSDAX(inode)) ret = xfs_file_dax_write(iocb, from); else if (iocb->ki_flags & IOCB_DIRECT) { /* @@ -1045,7 +1045,7 @@ __xfs_filemap_fault( }
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); - if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { pfn_t pfn;
ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); @@ -1070,7 +1070,7 @@ xfs_filemap_fault( { /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE, - IS_DAX(file_inode(vmf->vma->vm_file)) && + IS_FSDAX(file_inode(vmf->vma->vm_file)) && (vmf->flags & FAULT_FLAG_WRITE)); }
@@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault( struct vm_fault *vmf, enum page_entry_size pe_size) { - if (!IS_DAX(file_inode(vmf->vma->vm_file))) + if (!IS_FSDAX(file_inode(vmf->vma->vm_file))) return VM_FAULT_FALLBACK;
/* DAX can shortcut the normal fault path on write faults! */ @@ -1124,12 +1124,12 @@ xfs_file_mmap( * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */ - if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) + if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(filp); vma->vm_ops = &xfs_file_vm_ops; - if (IS_DAX(file_inode(filp))) + if (IS_FSDAX(file_inode(filp))) vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; return 0; } diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 89fb1eb80aae..234279ff66ce 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate( }
/* If the DAX state is not changing, we have nothing to do here. */ - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode)) + if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode)) return 0; - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode)) + if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode)) return 0;
/* lock, flush and invalidate mapping in preparation for flag change */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 66e1edbfb2b2..cf794d429aec 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -241,7 +241,7 @@ xfs_iomap_write_direct( * the reserve block pool for bmbt block allocation if there is no space * left but we need to do unwritten extent conversion. */ - if (IS_DAX(VFS_I(ip))) { + if (IS_FSDAX(VFS_I(ip))) { bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO; if (imap->br_state == XFS_EXT_UNWRITTEN) { tflags |= XFS_TRANS_RESERVE; @@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode, return !nimaps || imap->br_startblock == HOLESTARTBLOCK || imap->br_startblock == DELAYSTARTBLOCK || - (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN); + (IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN); }
static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags) @@ -988,7 +988,7 @@ xfs_file_iomap_begin( return -EIO;
if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) && - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { + !IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ return xfs_file_iomap_begin_delay(inode, offset, length, iomap); } diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 270246943a06..a126e00e05e3 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1351,7 +1351,7 @@ xfs_reflink_remap_range( goto out_unlock;
/* Don't share DAX file data for now. */ - if (IS_DAX(inode_in) || IS_DAX(inode_out)) + if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out)) goto out_unlock;
ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
On Mon 26-02-18 20:20:37, Dan Williams wrote:
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Jan Kara jack@suse.com Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/xfs/xfs_file.c | 14 +++++++------- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_iomap.c | 6 +++--- fs/xfs/xfs_reflink.c | 2 +- 4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9ea08326f876..46a098b90fd0 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -288,7 +288,7 @@ xfs_file_read_iter( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO;
- if (IS_DAX(inode))
- if (IS_FSDAX(inode)) ret = xfs_file_dax_read(iocb, to); else if (iocb->ki_flags & IOCB_DIRECT) ret = xfs_file_dio_aio_read(iocb, to);
@@ -726,7 +726,7 @@ xfs_file_write_iter( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO;
- if (IS_DAX(inode))
- if (IS_FSDAX(inode)) ret = xfs_file_dax_write(iocb, from); else if (iocb->ki_flags & IOCB_DIRECT) { /*
@@ -1045,7 +1045,7 @@ __xfs_filemap_fault( } xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- if (IS_DAX(inode)) {
- if (IS_FSDAX(inode)) { pfn_t pfn;
ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); @@ -1070,7 +1070,7 @@ xfs_filemap_fault( { /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
IS_DAX(file_inode(vmf->vma->vm_file)) &&
IS_FSDAX(file_inode(vmf->vma->vm_file)) && (vmf->flags & FAULT_FLAG_WRITE));
} @@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault( struct vm_fault *vmf, enum page_entry_size pe_size) {
- if (!IS_DAX(file_inode(vmf->vma->vm_file)))
- if (!IS_FSDAX(file_inode(vmf->vma->vm_file))) return VM_FAULT_FALLBACK;
/* DAX can shortcut the normal fault path on write faults! */ @@ -1124,12 +1124,12 @@ xfs_file_mmap( * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */
- if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
- if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(filp); vma->vm_ops = &xfs_file_vm_ops;
- if (IS_DAX(file_inode(filp)))
- if (IS_FSDAX(file_inode(filp))) vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; return 0;
} diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 89fb1eb80aae..234279ff66ce 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate( } /* If the DAX state is not changing, we have nothing to do here. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode)) return 0;
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode)) return 0;
/* lock, flush and invalidate mapping in preparation for flag change */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 66e1edbfb2b2..cf794d429aec 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -241,7 +241,7 @@ xfs_iomap_write_direct( * the reserve block pool for bmbt block allocation if there is no space * left but we need to do unwritten extent conversion. */
- if (IS_DAX(VFS_I(ip))) {
- if (IS_FSDAX(VFS_I(ip))) { bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO; if (imap->br_state == XFS_EXT_UNWRITTEN) { tflags |= XFS_TRANS_RESERVE;
@@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode, return !nimaps || imap->br_startblock == HOLESTARTBLOCK || imap->br_startblock == DELAYSTARTBLOCK ||
(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
(IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
} static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags) @@ -988,7 +988,7 @@ xfs_file_iomap_begin( return -EIO; if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
/* Reserve delalloc blocks for regular writeback. */ return xfs_file_iomap_begin_delay(inode, offset, length, iomap); }!IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) {
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 270246943a06..a126e00e05e3 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1351,7 +1351,7 @@ xfs_reflink_remap_range( goto out_unlock; /* Don't share DAX file data for now. */
- if (IS_DAX(inode_in) || IS_DAX(inode_out))
- if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out)) goto out_unlock;
ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for the DEVDAX and FSDAX sub-cases of DAX functionality.
Cc: Jan Kara jack@suse.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 16 +++++++--------- mm/fadvise.c | 3 ++- mm/filemap.c | 4 ++-- mm/huge_memory.c | 4 +++- mm/madvise.c | 3 ++- 5 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index bd0c46880572..4a5aa8761011 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3208,21 +3208,19 @@ static inline bool io_is_direct(struct file *filp)
static inline bool vma_is_dax(struct vm_area_struct *vma) { - return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); + struct inode *inode; + + if (!vma->vm_file) + return false; + inode = vma->vm_file->f_mapping->host; + return IS_FSDAX(inode) || IS_DEVDAX(inode); }
static inline bool vma_is_fsdax(struct vm_area_struct *vma) { - struct inode *inode; - if (!vma->vm_file) return false; - if (!vma_is_dax(vma)) - return false; - inode = file_inode(vma->vm_file); - if (S_ISCHR(inode->i_mode)) - return false; /* device-dax */ - return true; + return IS_FSDAX(file_inode(vma->vm_file)); }
static inline int iocb_flags(struct file *file) diff --git a/mm/fadvise.c b/mm/fadvise.c index 767887f5f3bf..00d9317636a2 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -55,7 +55,8 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
bdi = inode_to_bdi(mapping->host);
- if (IS_DAX(inode) || (bdi == &noop_backing_dev_info)) { + if (IS_FSDAX(inode) || IS_DEVDAX(inode) + || (bdi == &noop_backing_dev_info)) { switch (advice) { case POSIX_FADV_NORMAL: case POSIX_FADV_RANDOM: diff --git a/mm/filemap.c b/mm/filemap.c index 693f62212a59..4bc4e067ebf2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2357,7 +2357,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) * DAX files, so don't bother trying. */ if (retval < 0 || !count || iocb->ki_pos >= size || - IS_DAX(inode)) + IS_FSDAX(inode)) goto out; }
@@ -3225,7 +3225,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * not succeed (even if it did, DAX does not handle dirty * page-cache pages correctly). */ - if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) + if (written < 0 || !iov_iter_count(from) || IS_FSDAX(inode)) goto out;
status = generic_perform_write(file, from, pos = iocb->ki_pos); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..ed238936e29b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -529,10 +529,12 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { loff_t off = (loff_t)pgoff << PAGE_SHIFT; + struct inode *inode;
if (addr) goto out; - if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) + inode = filp->f_mapping->host; + if (!IS_FSDAX(inode) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) goto out;
addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); diff --git a/mm/madvise.c b/mm/madvise.c index 4d3c922ea1a1..bdb83cf018b1 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -275,6 +275,7 @@ static long madvise_willneed(struct vm_area_struct *vma, unsigned long start, unsigned long end) { struct file *file = vma->vm_file; + struct inode *inode = file_inode(file);
*prev = vma; #ifdef CONFIG_SWAP @@ -293,7 +294,7 @@ static long madvise_willneed(struct vm_area_struct *vma, return -EBADF; #endif
- if (IS_DAX(file_inode(file))) { + if (IS_FSDAX(inode) || IS_DEVDAX(inode)) { /* no bad return value, but ignore advice */ return 0; }
On Mon 26-02-18 20:20:42, Dan Williams wrote:
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for the DEVDAX and FSDAX sub-cases of DAX functionality.
Cc: Jan Kara jack@suse.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
Just one nit below. With that fixed you can add:
Reviewed-by: Jan Kara jack@suse.cz
@@ -3208,21 +3208,19 @@ static inline bool io_is_direct(struct file *filp) static inline bool vma_is_dax(struct vm_area_struct *vma) {
- return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
- struct inode *inode;
- if (!vma->vm_file)
return false;
- inode = vma->vm_file->f_mapping->host;
When changing this, use file_inode() here as well?
Honza
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all the remaining IS_DAX() usages to use explicit tests for FSDAX.
Cc: Jan Kara jack@suse.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/iomap.c | 2 +- include/linux/dax.h | 2 +- include/linux/fs.h | 3 +-- 3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c index afd163586aa0..fe379d8949fd 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -377,7 +377,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = pos & (PAGE_SIZE - 1); /* Within page */ bytes = min_t(loff_t, PAGE_SIZE - offset, count);
- if (IS_DAX(inode)) + if (IS_FSDAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else status = iomap_zero(inode, pos, offset, bytes, iomap); diff --git a/include/linux/dax.h b/include/linux/dax.h index 47edbce4fc52..ce520e932adc 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -124,7 +124,7 @@ static inline ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
static inline bool dax_mapping(struct address_space *mapping) { - return mapping->host && IS_DAX(mapping->host); + return mapping->host && IS_FSDAX(mapping->host); }
struct writeback_control; diff --git a/include/linux/fs.h b/include/linux/fs.h index 4a5aa8761011..8021f10068d3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1903,7 +1903,6 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_IMA(inode) ((inode)->i_flags & S_IMA) #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT) #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC) -#define IS_DAX(inode) ((inode)->i_flags & S_DAX) #define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED)
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ @@ -3203,7 +3202,7 @@ extern int file_update_time(struct file *file);
static inline bool io_is_direct(struct file *filp) { - return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); + return (filp->f_flags & O_DIRECT) || IS_FSDAX(filp->f_mapping->host); }
static inline bool vma_is_dax(struct vm_area_struct *vma)
On Mon 26-02-18 20:20:48, Dan Williams wrote:
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all the remaining IS_DAX() usages to use explicit tests for FSDAX.
Cc: Jan Kara jack@suse.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams dan.j.williams@intel.com
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/iomap.c | 2 +- include/linux/dax.h | 2 +- include/linux/fs.h | 3 +-- 3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c index afd163586aa0..fe379d8949fd 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -377,7 +377,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = pos & (PAGE_SIZE - 1); /* Within page */ bytes = min_t(loff_t, PAGE_SIZE - offset, count);
if (IS_DAX(inode))
else status = iomap_zero(inode, pos, offset, bytes, iomap);if (IS_FSDAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap);
diff --git a/include/linux/dax.h b/include/linux/dax.h index 47edbce4fc52..ce520e932adc 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -124,7 +124,7 @@ static inline ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, static inline bool dax_mapping(struct address_space *mapping) {
- return mapping->host && IS_DAX(mapping->host);
- return mapping->host && IS_FSDAX(mapping->host);
} struct writeback_control; diff --git a/include/linux/fs.h b/include/linux/fs.h index 4a5aa8761011..8021f10068d3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1903,7 +1903,6 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_IMA(inode) ((inode)->i_flags & S_IMA) #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT) #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC) -#define IS_DAX(inode) ((inode)->i_flags & S_DAX) #define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED) #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ @@ -3203,7 +3202,7 @@ extern int file_update_time(struct file *file); static inline bool io_is_direct(struct file *filp) {
- return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
- return (filp->f_flags & O_DIRECT) || IS_FSDAX(filp->f_mapping->host);
} static inline bool vma_is_dax(struct vm_area_struct *vma)
Make sure S_DAX is defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. Otherwise vma_is_dax() may incorrectly return false in the Device-DAX case.
Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 8021f10068d3..ae8d2495f51e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1859,7 +1859,7 @@ struct super_operations { #define S_IMA 1024 /* Inode has an associated IMA struct */ #define S_AUTOMOUNT 2048 /* Automount/referral quasi-directory */ #define S_NOSEC 4096 /* no suid or xattr security attributes */ -#ifdef CONFIG_FS_DAX +#if IS_ENABLED(CONFIG_FS_DAX) || IS_ENABLED(CONFIG_DEV_DAX) #define S_DAX 8192 /* Direct Access, avoiding the page cache */ #else #define S_DAX 0 /* Make all the DAX code disappear */
Filesystem-DAX is incompatible with 'longterm' page pinning. Without page cache indirection a DAX mapping maps filesystem blocks directly. This means that the filesystem must not modify a file's block map while any page in a mapping is pinned. In order to prevent the situation of userspace holding of filesystem operations indefinitely, disallow 'longterm' Filesystem-DAX mappings.
RDMA has the same conflict and the plan there is to add a 'with lease' mechanism to allow the kernel to notify userspace that the mapping is being torn down for block-map maintenance. Perhaps something similar can be put in place for vfio.
Note that xfs and ext4 still report:
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk"
...at mount time, and resolving the dax-dma-vs-truncate problem is one of the last hurdles to remove that designation.
Acked-by: Alex Williamson alex.williamson@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: Christoph Hellwig hch@lst.de Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Reported-by: Haozhong Zhang haozhong.zhang@intel.com Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e30e29ae4819..45657e2b1ff7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, { struct page *page[1]; struct vm_area_struct *vma; + struct vm_area_struct *vmas[1]; int ret;
if (mm == current->mm) { - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), - page); + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), + page, vmas); } else { unsigned int flags = 0;
@@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
down_read(&mm->mmap_sem); ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - NULL, NULL); + vmas, NULL); + /* + * The lifetime of a vaddr_get_pfn() page pin is + * userspace-controlled. In the fs-dax case this could + * lead to indefinite stalls in filesystem operations. + * Disallow attempts to pin fs-dax pages via this + * interface. + */ + if (ret > 0 && vma_is_fsdax(vmas[0])) { + ret = -EOPNOTSUPP; + put_page(page[0]); + } up_read(&mm->mmap_sem); }
On Mon, Feb 26, 2018 at 08:19:54PM -0800, Dan Williams wrote:
The following series implements... Changes since v3 [1]:
- Kill IS_DAX() in favor of explicit IS_FSDAX() and IS_DEVDAX() helpers. Jan noted, "having IS_DAX() and IS_FSDAX() doing almost the same, just not exactly the same, is IMHO a recipe for confusion", and I agree. A nice side effect of this elimination is a cleanup to remove occasions of "#ifdef CONFIG_FS_DAX" in C files, it is all moved to header files now. (Jan)
Dan, can you please stop sending random patches in a patch set to random lists? Your patchsets are hitting 4 or 5 different procmail filters here and so it gets split across several different mailing list buckets. It's really annoying to have to go reconstruct every patch set you send back into a single series in a single bucket....
Can you please fix up your patch set sending again?
-Dave.
linux-stable-mirror@lists.linaro.org