Changes since v4 [1]: * Fix the changelog of "dax: introduce IS_DEVDAX() and IS_FSDAX()" to better clarify the need for new helpers (Jan) * Replace dax_sem_is_locked() with dax_sem_assert_held() (Jan) * Use file_inode() in vma_is_dax() (Jan) * Resend the full series to linux-xfs@ (Dave) * Collect Jan's Reviewed-by
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014271.html ---
The vfio interface, like RDMA, wants to setup long term (indefinite) pins of the pages backing an address range so that a guest or userspace driver can perform DMA to the with physical address. Given that this pinning may lead to filesystem operations deadlocking in the filesystem-dax case, the pinning request needs to be rejected.
The longer term fix for vfio, RDMA, and any other long term pin user, is to provide a 'pin with lease' mechanism. Similar to the leases that are hold for pNFS RDMA layouts, this userspace lease gives the kernel a way to notify userspace that the block layout of the file is changing and the kernel is revoking access to pinned pages.
Related to this change is the discovery that vma_is_fsdax() was causing device-dax inode detection to fail. That lead to series of fixes and cleanups to make sure that S_DAX is defined correctly in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case.
---
Dan Williams (12): dax: fix vma_is_fsdax() helper dax: introduce IS_DEVDAX() and IS_FSDAX() ext2, dax: finish implementing dax_sem helpers ext2, dax: define ext2_dax_*() infrastructure in all cases ext4, dax: define ext4_dax_*() infrastructure in all cases ext2, dax: replace IS_DAX() with IS_FSDAX() ext4, dax: replace IS_DAX() with IS_FSDAX() xfs, dax: replace IS_DAX() with IS_FSDAX() mm, dax: replace IS_DAX() with IS_DEVDAX() or IS_FSDAX() fs, dax: kill IS_DAX() dax: fix S_DAX definition vfio: disable filesystem-dax page pinning
drivers/vfio/vfio_iommu_type1.c | 18 ++++++++++++++-- fs/ext2/ext2.h | 6 +++++ fs/ext2/file.c | 19 +++++------------ fs/ext2/inode.c | 10 ++++----- fs/ext4/file.c | 18 +++++----------- fs/ext4/inode.c | 4 ++-- fs/ext4/ioctl.c | 2 +- fs/ext4/super.c | 2 +- fs/iomap.c | 2 +- fs/xfs/xfs_file.c | 14 ++++++------- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_iomap.c | 6 +++-- fs/xfs/xfs_reflink.c | 2 +- include/linux/dax.h | 12 ++++++++--- include/linux/fs.h | 43 ++++++++++++++++++++++++++++----------- mm/fadvise.c | 3 ++- mm/filemap.c | 4 ++-- mm/huge_memory.c | 4 +++- mm/madvise.c | 3 ++- 19 files changed, 102 insertions(+), 74 deletions(-)
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm") Cc: stable@vger.kernel.org Reported-by: Gerd Rausch gerd.rausch@oracle.com Acked-by: Jane Chu jane.chu@oracle.com Reported-by: Haozhong Zhang haozhong.zhang@intel.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 2a815560fda0..79c413985305 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3198,7 +3198,7 @@ static inline bool vma_is_fsdax(struct vm_area_struct *vma) if (!vma_is_dax(vma)) return false; inode = file_inode(vma->vm_file); - if (inode->i_mode == S_IFCHR) + if (S_ISCHR(inode->i_mode)) return false; /* device-dax */ return true; }
Looks good,
Reviewed-by: Christoph Hellwig hch@lst.de
The current IS_DAX() helper that checks if a file is in DAX mode serves two purposes. It is a control flow branch condition for DAX vs non-DAX paths and it is a mechanism to perform dead code elimination. The dead code elimination is required in the CONFIG_FS_DAX=n case since there are symbols in fs/dax.c that will be elided. While the dead code elimination can be addressed with nop stubs for the fs/dax.c symbols that does not address the need for a DAX control flow helper where fs/dax.c symbols are not involved.
Moreover, the control flow changes, in some cases, need to be cognizant of whether the DAX file is a typical file or a Device-DAX special file. Introduce IS_DEVDAX() and IS_FSDAX() to simultaneously address the file-type control flow and dead-code elimination use cases. IS_DAX() will be deleted after all sites are converted to use the file-type specific helper.
Note, this change is also a pre-requisite for fixing the definition of the S_DAX inode flag in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. The flag needs to be defined, non-zero, if either DAX facility is enabled.
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..bd0c46880572 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,28 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV)
+static inline bool IS_DEVDAX(struct inode *inode) +{ + if (!IS_ENABLED(CONFIG_DEV_DAX)) + return false; + if ((inode->i_flags & S_DAX) == 0) + return false; + if (!S_ISCHR(inode->i_mode)) + return false; + return true; +} + +static inline bool IS_FSDAX(struct inode *inode) +{ + if (!IS_ENABLED(CONFIG_FS_DAX)) + return false; + if ((inode->i_flags & S_DAX) == 0) + return false; + if (S_ISCHR(inode->i_mode)) + return false; + return true; +} + static inline bool HAS_UNMAPPED_ID(struct inode *inode) { return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
On Thu, Mar 01, 2018 at 07:53:44PM -0800, Dan Williams wrote:
The current IS_DAX() helper that checks if a file is in DAX mode serves two purposes. It is a control flow branch condition for DAX vs non-DAX paths and it is a mechanism to perform dead code elimination. The dead code elimination is required in the CONFIG_FS_DAX=n case since there are symbols in fs/dax.c that will be elided. While the dead code elimination can be addressed with nop stubs for the fs/dax.c symbols that does not address the need for a DAX control flow helper where fs/dax.c symbols are not involved.
Moreover, the control flow changes, in some cases, need to be cognizant of whether the DAX file is a typical file or a Device-DAX special file. Introduce IS_DEVDAX() and IS_FSDAX() to simultaneously address the file-type control flow and dead-code elimination use cases. IS_DAX() will be deleted after all sites are converted to use the file-type specific helper.
Note, this change is also a pre-requisite for fixing the definition of the S_DAX inode flag in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. The flag needs to be defined, non-zero, if either DAX facility is enabled.
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com
include/linux/fs.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..bd0c46880572 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,28 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) +static inline bool IS_DEVDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_DEV_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- if (!S_ISCHR(inode->i_mode))
return false;
- return true;
+}
+static inline bool IS_FSDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_FS_DAX))
return false;
I echo Jan's complaint from the last round that the dead code elimination here is subtle, as compared to:
#if IS_ENABLED(CONFIG_FS_DAX) static inline bool IS_FSDAX(struct inode *inode) { ... } #else # define IS_FSDAX(inode) (false) #endif
But I guess even with that we're relying on dead code elimination higher up in the call stack...
- if ((inode->i_flags & S_DAX) == 0)
return false;
- if (S_ISCHR(inode->i_mode))
return false;
I'm curious, do we have character devices with S_DAX set?
I /think/ we're expecting that only block/char devices and files will ever have S_DAX set, so IS_FSDAX is only true for block devices and files. Right?
(A comment here about why S_ISCHR->false here would be helpful.)
--D
- return true;
+}
static inline bool HAS_UNMAPPED_ID(struct inode *inode) { return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
-- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 2, 2018 at 9:45 AM, Darrick J. Wong darrick.wong@oracle.com wrote:
On Thu, Mar 01, 2018 at 07:53:44PM -0800, Dan Williams wrote:
The current IS_DAX() helper that checks if a file is in DAX mode serves two purposes. It is a control flow branch condition for DAX vs non-DAX paths and it is a mechanism to perform dead code elimination. The dead code elimination is required in the CONFIG_FS_DAX=n case since there are symbols in fs/dax.c that will be elided. While the dead code elimination can be addressed with nop stubs for the fs/dax.c symbols that does not address the need for a DAX control flow helper where fs/dax.c symbols are not involved.
Moreover, the control flow changes, in some cases, need to be cognizant of whether the DAX file is a typical file or a Device-DAX special file. Introduce IS_DEVDAX() and IS_FSDAX() to simultaneously address the file-type control flow and dead-code elimination use cases. IS_DAX() will be deleted after all sites are converted to use the file-type specific helper.
Note, this change is also a pre-requisite for fixing the definition of the S_DAX inode flag in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. The flag needs to be defined, non-zero, if either DAX facility is enabled.
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com
include/linux/fs.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..bd0c46880572 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,28 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV)
+static inline bool IS_DEVDAX(struct inode *inode) +{
if (!IS_ENABLED(CONFIG_DEV_DAX))
return false;
if ((inode->i_flags & S_DAX) == 0)
return false;
if (!S_ISCHR(inode->i_mode))
return false;
return true;
+}
+static inline bool IS_FSDAX(struct inode *inode) +{
if (!IS_ENABLED(CONFIG_FS_DAX))
return false;
I echo Jan's complaint from the last round that the dead code elimination here is subtle, as compared to:
#if IS_ENABLED(CONFIG_FS_DAX) static inline bool IS_FSDAX(struct inode *inode) { ... } #else # define IS_FSDAX(inode) (false) #endif
But I guess even with that we're relying on dead code elimination higher up in the call stack...
If IS_FSDAX() was only a dead-code elimination mechanism rather than a runtime branch condition then I agree. Otherwise I think IS_ENABLED() is suitable and not subtle, especially when used in a header file.
if ((inode->i_flags & S_DAX) == 0)
return false;
if (S_ISCHR(inode->i_mode))
return false;
I'm curious, do we have character devices with S_DAX set?
Yes, Device-DAX, see:
ab68f2622136 /dev/dax, pmem: direct access to persistent memory
I /think/ we're expecting that only block/char devices and files will ever have S_DAX set, so IS_FSDAX is only true for block devices and files. Right?
We had S_DAX on block-devices for a short while, but deleted it and went with the Device-DAX interface instead. So it's only regular files and /dev/daxX.Y nodes these days.
(A comment here about why S_ISCHR->false here would be helpful.)
Ok.
The current IS_DAX() helper that checks if a file is in DAX mode serves two purposes. It is a control flow branch condition for DAX vs non-DAX paths and it is a mechanism to perform dead code elimination. The dead code elimination is required in the CONFIG_FS_DAX=n case since there are symbols in fs/dax.c that will be elided. While the dead code elimination can be addressed with nop stubs for the fs/dax.c symbols that does not address the need for a DAX control flow helper where fs/dax.c symbols are not involved.
Moreover, the control flow changes, in some cases, need to be cognizant of whether the DAX file is a typical file or a Device-DAX special file. Introduce IS_DEVDAX() and IS_FSDAX() to simultaneously address the file-type control flow and dead-code elimination use cases. IS_DAX() will be deleted after all sites are converted to use the file-type specific helper.
Note, this change is also a pre-requisite for fixing the definition of the S_DAX inode flag in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. The flag needs to be defined, non-zero, if either DAX facility is enabled.
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- Changes since v5: * add comments to clarify the S_ISCHR() checks (Darrick)
include/linux/fs.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..751975b8b29b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,30 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV)
+static inline bool IS_DEVDAX(struct inode *inode) +{ + if (!IS_ENABLED(CONFIG_DEV_DAX)) + return false; + if ((inode->i_flags & S_DAX) == 0) + return false; + /* regular files with S_DAX are filesystem-dax instances */ + if (!S_ISCHR(inode->i_mode)) + return false; + return true; +} + +static inline bool IS_FSDAX(struct inode *inode) +{ + if (!IS_ENABLED(CONFIG_FS_DAX)) + return false; + if ((inode->i_flags & S_DAX) == 0) + return false; + /* character devices with S_DAX are device-dax instances */ + if (S_ISCHR(inode->i_mode)) + return false; + return true; +} + static inline bool HAS_UNMAPPED_ID(struct inode *inode) { return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
On Fri, Mar 02, 2018 at 11:06:36AM -0800, Dan Williams wrote:
The current IS_DAX() helper that checks if a file is in DAX mode serves two purposes. It is a control flow branch condition for DAX vs non-DAX paths and it is a mechanism to perform dead code elimination. The dead code elimination is required in the CONFIG_FS_DAX=n case since there are symbols in fs/dax.c that will be elided. While the dead code elimination can be addressed with nop stubs for the fs/dax.c symbols that does not address the need for a DAX control flow helper where fs/dax.c symbols are not involved.
Moreover, the control flow changes, in some cases, need to be cognizant of whether the DAX file is a typical file or a Device-DAX special file. Introduce IS_DEVDAX() and IS_FSDAX() to simultaneously address the file-type control flow and dead-code elimination use cases. IS_DAX() will be deleted after all sites are converted to use the file-type specific helper.
Note, this change is also a pre-requisite for fixing the definition of the S_DAX inode flag in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. The flag needs to be defined, non-zero, if either DAX facility is enabled.
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reported-by: Jan Kara jack@suse.cz Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com
Changes since v5:
- add comments to clarify the S_ISCHR() checks (Darrick)
Looks ok, Reviewed-by: Darrick J. Wong darrick.wong@oracle.com
--D
include/linux/fs.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 79c413985305..751975b8b29b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1909,6 +1909,30 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) +static inline bool IS_DEVDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_DEV_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- /* regular files with S_DAX are filesystem-dax instances */
- if (!S_ISCHR(inode->i_mode))
return false;
- return true;
+}
+static inline bool IS_FSDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_FS_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- /* character devices with S_DAX are device-dax instances */
- if (S_ISCHR(inode->i_mode))
return false;
- return true;
+}
static inline bool HAS_UNMAPPED_ID(struct inode *inode) { return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid);
-- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
+static inline bool IS_DEVDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_DEV_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- if (!S_ISCHR(inode->i_mode))
return false;
- return true;
+}
+static inline bool IS_FSDAX(struct inode *inode) +{
- if (!IS_ENABLED(CONFIG_FS_DAX))
return false;
- if ((inode->i_flags & S_DAX) == 0)
return false;
- if (S_ISCHR(inode->i_mode))
return false;
- return true;
Encoding the is char device or not thing here is just nasty. I think this is going entirely in the wrong direction.
dax_sem_{up,down}_write_sem() allow the ext2 dax semaphore to be compiled out in the CONFIG_FS_DAX=n case. However there are still some open coded uses of the semaphore. Add dax_sem_{up_read,down_read}() and dax_sem_assert_held() helpers. Use them to convert all open-coded usages of the semaphore to the helpers.
Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext2/ext2.h | 6 ++++++ fs/ext2/file.c | 5 ++--- fs/ext2/inode.c | 4 +--- 3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 032295e1d386..203c31dfe549 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -711,9 +711,15 @@ struct ext2_inode_info { #ifdef CONFIG_FS_DAX #define dax_sem_down_write(ext2_inode) down_write(&(ext2_inode)->dax_sem) #define dax_sem_up_write(ext2_inode) up_write(&(ext2_inode)->dax_sem) +#define dax_sem_assert_held(ei) WARN_ON(!rwsem_is_locked(&(ei)->dax_sem)) +#define dax_sem_down_read(ext2_inode) down_read(&(ext2_inode)->dax_sem) +#define dax_sem_up_read(ext2_inode) up_read(&(ext2_inode)->dax_sem) #else #define dax_sem_down_write(ext2_inode) #define dax_sem_up_write(ext2_inode) +#define dax_sem_assert_held(ext2_inode) +#define dax_sem_down_read(ext2_inode) +#define dax_sem_up_read(ext2_inode) #endif
/* diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 09640220fda8..1c7ea1bcddde 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -91,18 +91,17 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) static int ext2_dax_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); - struct ext2_inode_info *ei = EXT2_I(inode); int ret;
if (vmf->flags & FAULT_FLAG_WRITE) { sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); } - down_read(&ei->dax_sem); + dax_sem_down_read(EXT2_I(inode));
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops);
- up_read(&ei->dax_sem); + dax_sem_up_read(EXT2_I(inode)); if (vmf->flags & FAULT_FLAG_WRITE) sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 9b2ac55ac34f..4783db0e4873 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1187,9 +1187,7 @@ static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) blocksize = inode->i_sb->s_blocksize; iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb);
-#ifdef CONFIG_FS_DAX - WARN_ON(!rwsem_is_locked(&ei->dax_sem)); -#endif + dax_sem_assert_held(ei);
n = ext2_block_to_path(inode, iblock, offsets, NULL); if (n == 0)
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will still be able to determine that all the related code can be discarded in the CONFIG_FS_DAX=n case.
Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext2/file.c | 8 -------- include/linux/dax.h | 10 ++++++++-- 2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 1c7ea1bcddde..5ac98d074323 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -29,7 +29,6 @@ #include "xattr.h" #include "acl.h"
-#ifdef CONFIG_FS_DAX static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = iocb->ki_filp->f_mapping->host; @@ -128,9 +127,6 @@ static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) vma->vm_flags |= VM_MIXEDMAP; return 0; } -#else -#define ext2_file_mmap generic_file_mmap -#endif
/* * Called when filp is released. This happens when all file descriptors @@ -162,19 +158,15 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { -#ifdef CONFIG_FS_DAX if (IS_DAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_read_iter(iocb, to); -#endif return generic_file_read_iter(iocb, to); }
static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { -#ifdef CONFIG_FS_DAX if (IS_DAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_write_iter(iocb, from); -#endif return generic_file_write_iter(iocb, from); }
diff --git a/include/linux/dax.h b/include/linux/dax.h index 0185ecdae135..47edbce4fc52 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -93,8 +93,6 @@ void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev);
-ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, - const struct iomap_ops *ops); int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size, pfn_t *pfnp, int *errp, const struct iomap_ops *ops); int dax_finish_sync_fault(struct vm_fault *vmf, enum page_entry_size pe_size, @@ -107,6 +105,8 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, int __dax_zero_page_range(struct block_device *bdev, struct dax_device *dax_dev, sector_t sector, unsigned int offset, unsigned int length); +ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops); #else static inline int __dax_zero_page_range(struct block_device *bdev, struct dax_device *dax_dev, sector_t sector, @@ -114,6 +114,12 @@ static inline int __dax_zero_page_range(struct block_device *bdev, { return -ENXIO; } + +static inline ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops) +{ + return -ENXIO; +} #endif
static inline bool dax_mapping(struct address_space *mapping)
In preparation for fixing S_DAX to be defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, move the definition of these routines outside of the "#ifdef CONFIG_FS_DAX" guard. This is also a coding-style fix to move all ifdef handling to header files rather than in the source. The compiler will still be able to determine that all the related code can be discarded in the CONFIG_FS_DAX=n case.
Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext4/file.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index fb6f023622fe..51854e7608f0 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -34,7 +34,6 @@ #include "xattr.h" #include "acl.h"
-#ifdef CONFIG_FS_DAX static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); @@ -60,7 +59,6 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) file_accessed(iocb->ki_filp); return ret; } -#endif
static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { @@ -70,10 +68,8 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; /* skip atime */
-#ifdef CONFIG_FS_DAX if (IS_DAX(file_inode(iocb->ki_filp))) return ext4_dax_read_iter(iocb, to); -#endif return generic_file_read_iter(iocb, to); }
@@ -179,7 +175,6 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return iov_iter_count(from); }
-#ifdef CONFIG_FS_DAX static ssize_t ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -208,7 +203,6 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = generic_write_sync(iocb, ret); return ret; } -#endif
static ssize_t ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext2/file.c | 6 +++--- fs/ext2/inode.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ext2/file.c b/fs/ext2/file.c index 5ac98d074323..702a36df6c01 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -119,7 +119,7 @@ static const struct vm_operations_struct ext2_dax_vm_ops = {
static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma) { - if (!IS_DAX(file_inode(file))) + if (!IS_FSDAX(file_inode(file))) return generic_file_mmap(file, vma);
file_accessed(file); @@ -158,14 +158,14 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { - if (IS_DAX(iocb->ki_filp->f_mapping->host)) + if (IS_FSDAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_read_iter(iocb, to); return generic_file_read_iter(iocb, to); }
static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { - if (IS_DAX(iocb->ki_filp->f_mapping->host)) + if (IS_FSDAX(iocb->ki_filp->f_mapping->host)) return ext2_dax_write_iter(iocb, from); return generic_file_write_iter(iocb, from); } diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 4783db0e4873..5352207da9d5 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -733,7 +733,7 @@ static int ext2_get_blocks(struct inode *inode, goto cleanup; }
- if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { /* * We must unmap blocks before zeroing so that writeback cannot * overwrite zeros with stale data from block device page cache. @@ -940,7 +940,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) loff_t offset = iocb->ki_pos; ssize_t ret;
- if (WARN_ON_ONCE(IS_DAX(inode))) + if (WARN_ON_ONCE(IS_FSDAX(inode))) return -EIO;
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block); @@ -1294,7 +1294,7 @@ static int ext2_setsize(struct inode *inode, loff_t newsize)
inode_dio_wait(inode);
- if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { error = iomap_zero_range(inode, newsize, PAGE_ALIGN(newsize) - newsize, NULL, &ext2_iomap_ops);
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/ext4/file.c | 12 +++++------- fs/ext4/inode.c | 4 ++-- fs/ext4/ioctl.c | 2 +- fs/ext4/super.c | 2 +- 4 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 51854e7608f0..561ea843b458 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -48,7 +48,7 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) * Recheck under inode lock - at this point we are sure it cannot * change anymore */ - if (!IS_DAX(inode)) { + if (!IS_FSDAX(inode)) { inode_unlock_shared(inode); /* Fallback to buffered IO in case we cannot support DAX */ return generic_file_read_iter(iocb, to); @@ -68,7 +68,7 @@ static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (!iov_iter_count(to)) return 0; /* skip atime */
- if (IS_DAX(file_inode(iocb->ki_filp))) + if (IS_FSDAX(file_inode(iocb->ki_filp))) return ext4_dax_read_iter(iocb, to); return generic_file_read_iter(iocb, to); } @@ -216,10 +216,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO;
-#ifdef CONFIG_FS_DAX - if (IS_DAX(inode)) + if (IS_FSDAX(inode)) return ext4_dax_write_iter(iocb, from); -#endif if (!o_direct && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP;
@@ -361,11 +359,11 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma) * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */ - if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) + if (!IS_FSDAX(file_inode(file)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(file); - if (IS_DAX(file_inode(file))) { + if (IS_FSDAX(file_inode(file))) { vma->vm_ops = &ext4_dax_vm_ops; vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; } else { diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c94780075b04..1879b33aa391 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3858,7 +3858,7 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return 0;
/* DAX uses iomap path now */ - if (WARN_ON_ONCE(IS_DAX(inode))) + if (WARN_ON_ONCE(IS_FSDAX(inode))) return 0;
trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter)); @@ -4076,7 +4076,7 @@ static int ext4_block_zero_page_range(handle_t *handle, if (length > max || length < 0) length = max;
- if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { return iomap_zero_range(inode, from, length, NULL, &ext4_iomap_ops); } diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 7e99ad02f1ba..040fc6570ddb 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -790,7 +790,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) "Online defrag not supported with bigalloc"); err = -EOPNOTSUPP; goto mext_out; - } else if (IS_DAX(inode)) { + } else if (IS_FSDAX(inode)) { ext4_msg(sb, KERN_ERR, "Online defrag not supported with DAX"); err = -EOPNOTSUPP; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 39bf464c35f1..933e12940181 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1161,7 +1161,7 @@ static int ext4_set_context(struct inode *inode, const void *ctx, size_t len, if (inode->i_ino == EXT4_ROOT_INO) return -EPERM;
- if (WARN_ON_ONCE(IS_DAX(inode) && i_size_read(inode))) + if (WARN_ON_ONCE(IS_FSDAX(inode) && i_size_read(inode))) return -EINVAL;
res = ext4_convert_inline_data(inode);
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/xfs/xfs_file.c | 14 +++++++------- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_iomap.c | 6 +++--- fs/xfs/xfs_reflink.c | 2 +- 4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9ea08326f876..46a098b90fd0 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -288,7 +288,7 @@ xfs_file_read_iter( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO;
- if (IS_DAX(inode)) + if (IS_FSDAX(inode)) ret = xfs_file_dax_read(iocb, to); else if (iocb->ki_flags & IOCB_DIRECT) ret = xfs_file_dio_aio_read(iocb, to); @@ -726,7 +726,7 @@ xfs_file_write_iter( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO;
- if (IS_DAX(inode)) + if (IS_FSDAX(inode)) ret = xfs_file_dax_write(iocb, from); else if (iocb->ki_flags & IOCB_DIRECT) { /* @@ -1045,7 +1045,7 @@ __xfs_filemap_fault( }
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); - if (IS_DAX(inode)) { + if (IS_FSDAX(inode)) { pfn_t pfn;
ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); @@ -1070,7 +1070,7 @@ xfs_filemap_fault( { /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE, - IS_DAX(file_inode(vmf->vma->vm_file)) && + IS_FSDAX(file_inode(vmf->vma->vm_file)) && (vmf->flags & FAULT_FLAG_WRITE)); }
@@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault( struct vm_fault *vmf, enum page_entry_size pe_size) { - if (!IS_DAX(file_inode(vmf->vma->vm_file))) + if (!IS_FSDAX(file_inode(vmf->vma->vm_file))) return VM_FAULT_FALLBACK;
/* DAX can shortcut the normal fault path on write faults! */ @@ -1124,12 +1124,12 @@ xfs_file_mmap( * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */ - if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) + if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(filp); vma->vm_ops = &xfs_file_vm_ops; - if (IS_DAX(file_inode(filp))) + if (IS_FSDAX(file_inode(filp))) vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; return 0; } diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 89fb1eb80aae..234279ff66ce 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate( }
/* If the DAX state is not changing, we have nothing to do here. */ - if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode)) + if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode)) return 0; - if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode)) + if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode)) return 0;
/* lock, flush and invalidate mapping in preparation for flag change */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 66e1edbfb2b2..cf794d429aec 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -241,7 +241,7 @@ xfs_iomap_write_direct( * the reserve block pool for bmbt block allocation if there is no space * left but we need to do unwritten extent conversion. */ - if (IS_DAX(VFS_I(ip))) { + if (IS_FSDAX(VFS_I(ip))) { bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO; if (imap->br_state == XFS_EXT_UNWRITTEN) { tflags |= XFS_TRANS_RESERVE; @@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode, return !nimaps || imap->br_startblock == HOLESTARTBLOCK || imap->br_startblock == DELAYSTARTBLOCK || - (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN); + (IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN); }
static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags) @@ -988,7 +988,7 @@ xfs_file_iomap_begin( return -EIO;
if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) && - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { + !IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ return xfs_file_iomap_begin_delay(inode, offset, length, iomap); } diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 270246943a06..a126e00e05e3 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1351,7 +1351,7 @@ xfs_reflink_remap_range( goto out_unlock;
/* Don't share DAX file data for now. */ - if (IS_DAX(inode_in) || IS_DAX(inode_out)) + if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out)) goto out_unlock;
ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
On Thu, Mar 01, 2018 at 07:54:16PM -0800, Dan Williams wrote:
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for FSDAX since DAX is ambiguous.
Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com
Looks ok, Reviewed-by: Darrick J. Wong darrick.wong@oracle.com
--D
fs/xfs/xfs_file.c | 14 +++++++------- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_iomap.c | 6 +++--- fs/xfs/xfs_reflink.c | 2 +- 4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9ea08326f876..46a098b90fd0 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -288,7 +288,7 @@ xfs_file_read_iter( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO;
- if (IS_DAX(inode))
- if (IS_FSDAX(inode)) ret = xfs_file_dax_read(iocb, to); else if (iocb->ki_flags & IOCB_DIRECT) ret = xfs_file_dio_aio_read(iocb, to);
@@ -726,7 +726,7 @@ xfs_file_write_iter( if (XFS_FORCED_SHUTDOWN(ip->i_mount)) return -EIO;
- if (IS_DAX(inode))
- if (IS_FSDAX(inode)) ret = xfs_file_dax_write(iocb, from); else if (iocb->ki_flags & IOCB_DIRECT) { /*
@@ -1045,7 +1045,7 @@ __xfs_filemap_fault( } xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- if (IS_DAX(inode)) {
- if (IS_FSDAX(inode)) { pfn_t pfn;
ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops); @@ -1070,7 +1070,7 @@ xfs_filemap_fault( { /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
IS_DAX(file_inode(vmf->vma->vm_file)) &&
IS_FSDAX(file_inode(vmf->vma->vm_file)) && (vmf->flags & FAULT_FLAG_WRITE));
} @@ -1079,7 +1079,7 @@ xfs_filemap_huge_fault( struct vm_fault *vmf, enum page_entry_size pe_size) {
- if (!IS_DAX(file_inode(vmf->vma->vm_file)))
- if (!IS_FSDAX(file_inode(vmf->vma->vm_file))) return VM_FAULT_FALLBACK;
/* DAX can shortcut the normal fault path on write faults! */ @@ -1124,12 +1124,12 @@ xfs_file_mmap( * We don't support synchronous mappings for non-DAX files. At least * until someone comes with a sensible use case. */
- if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
- if (!IS_FSDAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC)) return -EOPNOTSUPP;
file_accessed(filp); vma->vm_ops = &xfs_file_vm_ops;
- if (IS_DAX(file_inode(filp)))
- if (IS_FSDAX(file_inode(filp))) vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; return 0;
} diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 89fb1eb80aae..234279ff66ce 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1108,9 +1108,9 @@ xfs_ioctl_setattr_dax_invalidate( } /* If the DAX state is not changing, we have nothing to do here. */
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
- if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_FSDAX(inode)) return 0;
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
- if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_FSDAX(inode)) return 0;
/* lock, flush and invalidate mapping in preparation for flag change */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 66e1edbfb2b2..cf794d429aec 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -241,7 +241,7 @@ xfs_iomap_write_direct( * the reserve block pool for bmbt block allocation if there is no space * left but we need to do unwritten extent conversion. */
- if (IS_DAX(VFS_I(ip))) {
- if (IS_FSDAX(VFS_I(ip))) { bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO; if (imap->br_state == XFS_EXT_UNWRITTEN) { tflags |= XFS_TRANS_RESERVE;
@@ -952,7 +952,7 @@ static inline bool imap_needs_alloc(struct inode *inode, return !nimaps || imap->br_startblock == HOLESTARTBLOCK || imap->br_startblock == DELAYSTARTBLOCK ||
(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
(IS_FSDAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
} static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags) @@ -988,7 +988,7 @@ xfs_file_iomap_begin( return -EIO; if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
/* Reserve delalloc blocks for regular writeback. */ return xfs_file_iomap_begin_delay(inode, offset, length, iomap); }!IS_FSDAX(inode) && !xfs_get_extsz_hint(ip)) {
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 270246943a06..a126e00e05e3 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1351,7 +1351,7 @@ xfs_reflink_remap_range( goto out_unlock; /* Don't share DAX file data for now. */
- if (IS_DAX(inode_in) || IS_DAX(inode_out))
- if (IS_FSDAX(inode_in) || IS_FSDAX(inode_out)) goto out_unlock;
ret = vfs_clone_file_prep_inodes(inode_in, pos_in, inode_out, pos_out,
-- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all IS_DAX() usages to use explicit tests for the DEVDAX and FSDAX sub-cases of DAX functionality.
Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 16 +++++++--------- mm/fadvise.c | 3 ++- mm/filemap.c | 4 ++-- mm/huge_memory.c | 4 +++- mm/madvise.c | 3 ++- 5 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index bd0c46880572..33e859e7d100 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3208,21 +3208,19 @@ static inline bool io_is_direct(struct file *filp)
static inline bool vma_is_dax(struct vm_area_struct *vma) { - return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); + struct inode *inode; + + if (!vma->vm_file) + return false; + inode = file_inode(vma->vm_file); + return IS_FSDAX(inode) || IS_DEVDAX(inode); }
static inline bool vma_is_fsdax(struct vm_area_struct *vma) { - struct inode *inode; - if (!vma->vm_file) return false; - if (!vma_is_dax(vma)) - return false; - inode = file_inode(vma->vm_file); - if (S_ISCHR(inode->i_mode)) - return false; /* device-dax */ - return true; + return IS_FSDAX(file_inode(vma->vm_file)); }
static inline int iocb_flags(struct file *file) diff --git a/mm/fadvise.c b/mm/fadvise.c index 767887f5f3bf..00d9317636a2 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -55,7 +55,8 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
bdi = inode_to_bdi(mapping->host);
- if (IS_DAX(inode) || (bdi == &noop_backing_dev_info)) { + if (IS_FSDAX(inode) || IS_DEVDAX(inode) + || (bdi == &noop_backing_dev_info)) { switch (advice) { case POSIX_FADV_NORMAL: case POSIX_FADV_RANDOM: diff --git a/mm/filemap.c b/mm/filemap.c index 693f62212a59..4bc4e067ebf2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2357,7 +2357,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) * DAX files, so don't bother trying. */ if (retval < 0 || !count || iocb->ki_pos >= size || - IS_DAX(inode)) + IS_FSDAX(inode)) goto out; }
@@ -3225,7 +3225,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) * not succeed (even if it did, DAX does not handle dirty * page-cache pages correctly). */ - if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) + if (written < 0 || !iov_iter_count(from) || IS_FSDAX(inode)) goto out;
status = generic_perform_write(file, from, pos = iocb->ki_pos); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 87ab9b8f56b5..ed238936e29b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -529,10 +529,12 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { loff_t off = (loff_t)pgoff << PAGE_SHIFT; + struct inode *inode;
if (addr) goto out; - if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) + inode = filp->f_mapping->host; + if (!IS_FSDAX(inode) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) goto out;
addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE); diff --git a/mm/madvise.c b/mm/madvise.c index 4d3c922ea1a1..bdb83cf018b1 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -275,6 +275,7 @@ static long madvise_willneed(struct vm_area_struct *vma, unsigned long start, unsigned long end) { struct file *file = vma->vm_file; + struct inode *inode = file_inode(file);
*prev = vma; #ifdef CONFIG_SWAP @@ -293,7 +294,7 @@ static long madvise_willneed(struct vm_area_struct *vma, return -EBADF; #endif
- if (IS_DAX(file_inode(file))) { + if (IS_FSDAX(inode) || IS_DEVDAX(inode)) { /* no bad return value, but ignore advice */ return 0; }
On Thu, Mar 01, 2018 at 07:54:22PM -0800, Dan Williams wrote:
static inline bool vma_is_dax(struct vm_area_struct *vma) {
- return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
- struct inode *inode;
- if (!vma->vm_file)
return false;
- inode = file_inode(vma->vm_file);
- return IS_FSDAX(inode) || IS_DEVDAX(inode);
If you look at the definition of IS_FSDAX and IS_DEVDAX this is going to evaluate into some bullshit code.
In preparation for fixing the broken definition of S_DAX in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case, convert all the remaining IS_DAX() usages to use explicit tests for FSDAX.
Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- fs/iomap.c | 2 +- include/linux/dax.h | 2 +- include/linux/fs.h | 3 +-- 3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c index afd163586aa0..fe379d8949fd 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -377,7 +377,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = pos & (PAGE_SIZE - 1); /* Within page */ bytes = min_t(loff_t, PAGE_SIZE - offset, count);
- if (IS_DAX(inode)) + if (IS_FSDAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else status = iomap_zero(inode, pos, offset, bytes, iomap); diff --git a/include/linux/dax.h b/include/linux/dax.h index 47edbce4fc52..ce520e932adc 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -124,7 +124,7 @@ static inline ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
static inline bool dax_mapping(struct address_space *mapping) { - return mapping->host && IS_DAX(mapping->host); + return mapping->host && IS_FSDAX(mapping->host); }
struct writeback_control; diff --git a/include/linux/fs.h b/include/linux/fs.h index 33e859e7d100..b2b2e15d227b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1903,7 +1903,6 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags #define IS_IMA(inode) ((inode)->i_flags & S_IMA) #define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT) #define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC) -#define IS_DAX(inode) ((inode)->i_flags & S_DAX) #define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED)
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ @@ -3203,7 +3202,7 @@ extern int file_update_time(struct file *file);
static inline bool io_is_direct(struct file *filp) { - return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); + return (filp->f_flags & O_DIRECT) || IS_FSDAX(filp->f_mapping->host); }
static inline bool vma_is_dax(struct vm_area_struct *vma)
Make sure S_DAX is defined in the CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case. Otherwise vma_is_dax() may incorrectly return false in the Device-DAX case.
Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: stable@vger.kernel.org Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Dan Williams dan.j.williams@intel.com --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index b2b2e15d227b..1242511b1c46 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1859,7 +1859,7 @@ struct super_operations { #define S_IMA 1024 /* Inode has an associated IMA struct */ #define S_AUTOMOUNT 2048 /* Automount/referral quasi-directory */ #define S_NOSEC 4096 /* no suid or xattr security attributes */ -#ifdef CONFIG_FS_DAX +#if IS_ENABLED(CONFIG_FS_DAX) || IS_ENABLED(CONFIG_DEV_DAX) #define S_DAX 8192 /* Direct Access, avoiding the page cache */ #else #define S_DAX 0 /* Make all the DAX code disappear */
Filesystem-DAX is incompatible with 'longterm' page pinning. Without page cache indirection a DAX mapping maps filesystem blocks directly. This means that the filesystem must not modify a file's block map while any page in a mapping is pinned. In order to prevent the situation of userspace holding of filesystem operations indefinitely, disallow 'longterm' Filesystem-DAX mappings.
RDMA has the same conflict and the plan there is to add a 'with lease' mechanism to allow the kernel to notify userspace that the mapping is being torn down for block-map maintenance. Perhaps something similar can be put in place for vfio.
Note that xfs and ext4 still report:
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk"
...at mount time, and resolving the dax-dma-vs-truncate problem is one of the last hurdles to remove that designation.
Acked-by: Alex Williamson alex.williamson@redhat.com Cc: Michal Hocko mhocko@suse.com Cc: Christoph Hellwig hch@lst.de Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Reported-by: Haozhong Zhang haozhong.zhang@intel.com Tested-by: Haozhong Zhang haozhong.zhang@intel.com Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O") Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/vfio/vfio_iommu_type1.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e30e29ae4819..45657e2b1ff7 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, { struct page *page[1]; struct vm_area_struct *vma; + struct vm_area_struct *vmas[1]; int ret;
if (mm == current->mm) { - ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), - page); + ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), + page, vmas); } else { unsigned int flags = 0;
@@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
down_read(&mm->mmap_sem); ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - NULL, NULL); + vmas, NULL); + /* + * The lifetime of a vaddr_get_pfn() page pin is + * userspace-controlled. In the fs-dax case this could + * lead to indefinite stalls in filesystem operations. + * Disallow attempts to pin fs-dax pages via this + * interface. + */ + if (ret > 0 && vma_is_fsdax(vmas[0])) { + ret = -EOPNOTSUPP; + put_page(page[0]); + } up_read(&mm->mmap_sem); }
I really don't like these IS_DEVDAX and IS_FSDAX flags. We should stop pretending DAX is a global per-inode choice and get rid of these magic flags entirely. So please convert the instances inside the various file systems to checking the file system mount options instead.
For the core ones we'll need to differentiate:
- the checks in generic_file_read_iter and __generic_file_write_iter seem to not be needed anymore at all since we stopped abusing the direct I/O code for DAX, so they should probably be removed. - io_is_direct is a weird check and should probably just go away, as there is not point in always setting IOCB_DIRECT for DAX I/O - fadvise should either become a file op, or a flag on the inode that fadvice is supported instead of the nasty noop_backing_dev_info or DAX check. - Ditto for madvise - vma_is_dax should probably be replaced with a VMA flag. - thp_get_unmapped_area I don't really understand why we have a dax check there. - dax_mapping will be much harder to sort out.
But all these DAX flags certainly look like a major hodge podge to me.
On Fri, Mar 2, 2018 at 2:10 PM, Christoph Hellwig hch@lst.de wrote:
I really don't like these IS_DEVDAX and IS_FSDAX flags. We should stop pretending DAX is a global per-inode choice and get rid of these magic flags entirely. So please convert the instances inside the various file systems to checking the file system mount options instead.
For the core ones we'll need to differentiate:
- the checks in generic_file_read_iter and __generic_file_write_iter seem to not be needed anymore at all since we stopped abusing the direct I/O code for DAX, so they should probably be removed.
- io_is_direct is a weird check and should probably just go away, as there is not point in always setting IOCB_DIRECT for DAX I/O
- fadvise should either become a file op, or a flag on the inode that fadvice is supported instead of the nasty noop_backing_dev_info or DAX check.
- Ditto for madvise
- vma_is_dax should probably be replaced with a VMA flag.
- thp_get_unmapped_area I don't really understand why we have a dax check there.
- dax_mapping will be much harder to sort out.
But all these DAX flags certainly look like a major hodge podge to me.
They are indeed a hodge-podge. The problem is that the current IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks you propose for the next merge window.
Acceptable?
On Fri, Mar 02, 2018 at 02:21:40PM -0800, Dan Williams wrote:
They are indeed a hodge-podge. The problem is that the current IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks you propose for the next merge window.
The only thing broken about IS_DAX are the code elimination games based on the CONFIG_* flags. Remove those and just add proper stubs for the dax routines and everything will be fine for now until we can kill that inode flag.
IS_FSDAX and IS_DEVDAX on the other hand are a giant mess that isn't helping anyone.
On Fri, Mar 2, 2018 at 2:57 PM, Christoph Hellwig hch@lst.de wrote:
On Fri, Mar 02, 2018 at 02:21:40PM -0800, Dan Williams wrote:
They are indeed a hodge-podge. The problem is that the current IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks you propose for the next merge window.
The only thing broken about IS_DAX are the code elimination games based on the CONFIG_* flags. Remove those and just add proper stubs for the dax routines and everything will be fine for now until we can kill that inode flag.
IS_FSDAX and IS_DEVDAX on the other hand are a giant mess that isn't helping anyone.
Ok, I'll take another shot at something suitable for 4.16, but without these new helpers...
On Fri, Mar 2, 2018 at 3:49 PM, Dan Williams dan.j.williams@intel.com wrote:
On Fri, Mar 2, 2018 at 2:57 PM, Christoph Hellwig hch@lst.de wrote:
On Fri, Mar 02, 2018 at 02:21:40PM -0800, Dan Williams wrote:
They are indeed a hodge-podge. The problem is that the current IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks you propose for the next merge window.
The only thing broken about IS_DAX are the code elimination games based on the CONFIG_* flags. Remove those and just add proper stubs for the dax routines and everything will be fine for now until we can kill that inode flag.
IS_FSDAX and IS_DEVDAX on the other hand are a giant mess that isn't helping anyone.
Ok, I'll take another shot at something suitable for 4.16, but without these new helpers...
I'll drop patches 2-11 for now, and just get the high priority fixes in for the next rc.
linux-stable-mirror@lists.linaro.org