This is part of a longer set of changes to clean up the last remaining bits for the y2038 conversion. In XFS, three distinct problems need to be addressed:
1. The use of time_t in kernel sources -- I'm in the process of removing all of them so we can remove the definition itself, making it harder to write new y2038-unsafe code. This part is trivially done as a side-effect of the other two.
2. The use of time_t in a user API header for ioctls. When building against a new 32-bit libc with 64-bit time_t, the structures no longer match and we get incorrect data from ioctls. Unfortunately, there is no good way to fix XFS_IOC_FSBULKSTAT, I considered different approaches and in the end came up with three variants that are all part of this series. The idea is to pick just one of course.
3. On-disk timestamps hitting the y2038 limit. This applies to both inode timestamps and quota data. Both are extended to 40 bits, with the minimum timestamp still being year 1902, and the maximum getting extended from 2038 to 36744.
Please review and let me know which of ioctl API changes makes the most sense. I have not done any actual runtime testing on the patches, so this is clearly too late for the next merge window, but I hope to get it all merged for v5.6.
Arnd
Arnd Bergmann (5): xfs: [variant A] avoid time_t in user api xfs: [variant B] add time64 version of xfs_bstat xfs: [variant C] avoid i386-misaligned xfs_bstat xfs: extend inode format for 40-bit timestamps xfs: use 40-bit quota time limits
fs/xfs/libxfs/xfs_dquot_buf.c | 6 +- fs/xfs/libxfs/xfs_format.h | 11 +- fs/xfs/libxfs/xfs_fs.h | 37 +++++- fs/xfs/libxfs/xfs_inode_buf.c | 28 +++-- fs/xfs/libxfs/xfs_inode_buf.h | 1 + fs/xfs/libxfs/xfs_log_format.h | 6 +- fs/xfs/libxfs/xfs_trans_inode.c | 3 +- fs/xfs/xfs_dquot.c | 29 +++-- fs/xfs/xfs_inode.c | 3 +- fs/xfs/xfs_inode_item.c | 10 +- fs/xfs/xfs_ioctl.c | 195 +++++++++++++++++++++++++++++++- fs/xfs/xfs_ioctl.h | 12 ++ fs/xfs/xfs_ioctl32.c | 160 +++++--------------------- fs/xfs/xfs_ioctl32.h | 26 ++--- fs/xfs/xfs_iops.c | 3 +- fs/xfs/xfs_itable.c | 2 +- fs/xfs/xfs_qm.c | 18 ++- fs/xfs/xfs_qm.h | 6 +- fs/xfs/xfs_qm_syscalls.c | 16 ++- fs/xfs/xfs_quotaops.c | 6 +- fs/xfs/xfs_super.c | 2 +- fs/xfs/xfs_trans_dquot.c | 17 ++- 22 files changed, 387 insertions(+), 210 deletions(-)
The ioctl definitions for XFS_IOC_SWAPEXT, XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE are part of libxfs and based on time_t.
The definition for time_t differs between current kernels and coming 32-bit libc variants that define it as 64-bit. For most ioctls, that means the kernel has to be able to handle two different command codes based on the different structure sizes.
The same solution could be applied for XFS_IOC_SWAPEXT, but it would not work for XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE because the structure with the time_t is passed through an indirect pointer, and the command number itself is based on struct xfs_fsop_bulkreq, which does not differ based on time_t.
This means any solution that can be applied requires a change of the ABI definition in the xfs_fs.h header file, as well as doing the same change in any user application that contains a copy of this header.
The usual solution would be to define a replacement structure and use conditional compilation for the ioctl command codes to use one or the other, such as
#define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq) #define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq) #define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \ XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)
After this, the kernel would be able to implement both XFS_IOC_FSBULKSTAT_OLD and XFS_IOC_FSBULKSTAT_NEW handlers on 32-bit architectures with the correct ABI for either definition of time_t.
However, as long as two observations are true, a much simpler solution can be used:
1. xfsprogs is the only user space project that has a copy of this header 2. xfsprogs already has a replacement for all three affected ioctl commands, based on the xfs_bulkstat structure to pass 64-bit timestamps regardless of the architecture
Based on those assumptions, changing xfs_bstime to use __kernel_long_t instead of time_t in both the kernel and in xfsprogs preserves the current ABI for any libc definition of time_t and solves the problem of passing 64-bit timestamps to 32-bit user space.
If either of the two assumptions is invalid, more discussion is needed for coming up with a way to fix as much of the affected user space code as possible.
Signed-off-by: Arnd Bergmann arnd@arndb.de --- fs/xfs/libxfs/xfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index e9371a8e0e26..4c4330f6e653 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -324,7 +324,7 @@ typedef struct xfs_growfs_rt { * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE */ typedef struct xfs_bstime { - time_t tv_sec; /* seconds */ + __kernel_long_t tv_sec; /* seconds */ __s32 tv_nsec; /* and nanoseconds */ } xfs_bstime_t;
On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
However, as long as two observations are true, a much simpler solution can be used:
- xfsprogs is the only user space project that has a copy of this header
We can't guarantee that.
- xfsprogs already has a replacement for all three affected ioctl commands, based on the xfs_bulkstat structure to pass 64-bit timestamps regardless of the architecture
XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy. Most users actually use the new one now through libfrog, although I found a user of the direct ioctl in the xfs_io tool, which could easily be fixed as well.
XFS_IOC_SWAPEXT does not have a direct replacement, but the timestamp is only used to verify that the file did not change vs the previous stat. So not being able to represent > 2038 times is not a real problem anyway.
At some point we should probably look into a file system independent defrag ioctl anyway, at which point we can deprecate XFS_IOC_SWAPEXT.
Based on those assumptions, changing xfs_bstime to use __kernel_long_t instead of time_t in both the kernel and in xfsprogs preserves the current ABI for any libc definition of time_t and solves the problem of passing 64-bit timestamps to 32-bit user space.
As said above their are not entirely true, but I still think this patch is the right thing to do, if only to get the time_t out of the ABI..
Reviewed-by: Christoph Hellwig hch@lst.de
On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
However, as long as two observations are true, a much simpler solution can be used:
- xfsprogs is the only user space project that has a copy of this header
We can't guarantee that.
- xfsprogs already has a replacement for all three affected ioctl commands, based on the xfs_bulkstat structure to pass 64-bit timestamps regardless of the architecture
XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy. Most users actually use the new one now through libfrog, although I found a user of the direct ioctl in the xfs_io tool, which could easily be fixed as well.
Agreed, XFS_IOC_BULKSTAT is the replacement for the two FSBULKSTAT variants. The only question in my mind for the old ioctls is whether we should return EOVERFLOW if the timestamp would overflow? Or just truncate the results?
XFS_IOC_SWAPEXT does not have a direct replacement, but the timestamp is only used to verify that the file did not change vs the previous stat. So not being able to represent > 2038 times is not a real problem anyway.
Won't it become a problem when the tv_sec comparison in xfs_swap_extents is type-promoted to the larger type but lacks the upper bits?
I guess we could force the check to the lower 32 bits on the assumption that those are the most likely to change due to a file write.
I kinda want to do a SWAPEXT v5, fwiw....
At some point we should probably look into a file system independent defrag ioctl anyway, at which point we can deprecate XFS_IOC_SWAPEXT.
Based on those assumptions, changing xfs_bstime to use __kernel_long_t instead of time_t in both the kernel and in xfsprogs preserves the current ABI for any libc definition of time_t and solves the problem of passing 64-bit timestamps to 32-bit user space.
As said above their are not entirely true, but I still think this patch is the right thing to do, if only to get the time_t out of the ABI..
Reviewed-by: Christoph Hellwig hch@lst.de
Seconded, Reviewed-by: Darrick J. Wong darrick.wong@oracle.com
--D
On Wed, Nov 13, 2019 at 6:08 AM Darrick J. Wong darrick.wong@oracle.com wrote:
On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
However, as long as two observations are true, a much simpler solution can be used:
- xfsprogs is the only user space project that has a copy of this header
We can't guarantee that.
- xfsprogs already has a replacement for all three affected ioctl commands, based on the xfs_bulkstat structure to pass 64-bit timestamps regardless of the architecture
XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy. Most users actually use the new one now through libfrog, although I found a user of the direct ioctl in the xfs_io tool, which could easily be fixed as well.
Agreed, XFS_IOC_BULKSTAT is the replacement for the two FSBULKSTAT variants. The only question in my mind for the old ioctls is whether we should return EOVERFLOW if the timestamp would overflow? Or just truncate the results?
I think neither of these would be particularly helpful, the result is that users see no change in behavior until it's actually too late and the timestamps have overrun.
If we take variant A and just fix the ABI to 32-bit time_t, it is important that all user space stop using these ioctls and moves to the v5 interfaces instead (including SWAPEXT I guess).
Something along the lines of this change would work:
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d50135760622..87318486c96e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -830,6 +830,23 @@ xfs_fsinumbers_fmt( return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp)); }
+/* disallow y2038-unsafe ioctls with CONFIG_COMPAT_32BIT_TIME=n */ +static bool xfs_have_compat_bstat_time32(unsigned int cmd) +{ + if (IS_ENABLED(CONFIG_COMPAT_32BIT_TIME)) + return true; + + if (IS_ENABLED(CONFIG_64BIT) && !in_compat_syscall()) + return true; + + if (cmd == XFS_IOC_FSBULKSTAT_SINGLE || + cmd == XFS_IOC_FSBULKSTAT || + cmd == XFS_IOC_SWAPEXT) + return false; + + return true; +} + STATIC int xfs_ioc_fsbulkstat( xfs_mount_t *mp, @@ -850,6 +867,9 @@ xfs_ioc_fsbulkstat( if (!capable(CAP_SYS_ADMIN)) return -EPERM;
+ if (!xfs_have_compat_bstat_time32()) + return -EINVAL; + if (XFS_FORCED_SHUTDOWN(mp)) return -EIO;
@@ -2051,6 +2071,11 @@ xfs_ioc_swapext( struct fd f, tmp; int error = 0;
+ if (xfs_have_compat_bstat_time32(XFS_IOC_SWAPEXT)) { + error = -EINVAL + got out; + } + /* Pull information for the target fd */ f = fdget((int)sxp->sx_fdtarget); if (!f.file) {
This way, at least users that intentionally turn off CONFIG_COMPAT_32BIT_TIME run into the broken application right away, which forces them to upgrade or fix the code to use the v5 ioctl.
Based on those assumptions, changing xfs_bstime to use __kernel_long_t instead of time_t in both the kernel and in xfsprogs preserves the current ABI for any libc definition of time_t and solves the problem of passing 64-bit timestamps to 32-bit user space.
As said above their are not entirely true, but I still think this patch is the right thing to do, if only to get the time_t out of the ABI..
Reviewed-by: Christoph Hellwig hch@lst.de
Seconded, Reviewed-by: Darrick J. Wong darrick.wong@oracle.com
Thanks!
Arnd
On Wed, Nov 13, 2019 at 02:42:24PM +0100, Arnd Bergmann wrote:
On Wed, Nov 13, 2019 at 6:08 AM Darrick J. Wong darrick.wong@oracle.com wrote:
On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
On Tue, Nov 12, 2019 at 01:09:06PM +0100, Arnd Bergmann wrote:
However, as long as two observations are true, a much simpler solution can be used:
- xfsprogs is the only user space project that has a copy of this header
We can't guarantee that.
- xfsprogs already has a replacement for all three affected ioctl commands, based on the xfs_bulkstat structure to pass 64-bit timestamps regardless of the architecture
XFS_IOC_BULKSTAT replaces XFS_IOC_FSBULKSTAT directly, and can replace XFS_IOC_FSBULKSTAT_SINGLE indirectly, so that is easy. Most users actually use the new one now through libfrog, although I found a user of the direct ioctl in the xfs_io tool, which could easily be fixed as well.
Agreed, XFS_IOC_BULKSTAT is the replacement for the two FSBULKSTAT variants. The only question in my mind for the old ioctls is whether we should return EOVERFLOW if the timestamp would overflow? Or just truncate the results?
I think neither of these would be particularly helpful, the result is that users see no change in behavior until it's actually too late and the timestamps have overrun.
If we take variant A and just fix the ABI to 32-bit time_t, it is important that all user space stop using these ioctls and moves to the v5 interfaces instead (including SWAPEXT I guess).
Something along the lines of this change would work:
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d50135760622..87318486c96e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -830,6 +830,23 @@ xfs_fsinumbers_fmt( return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp)); }
+/* disallow y2038-unsafe ioctls with CONFIG_COMPAT_32BIT_TIME=n */ +static bool xfs_have_compat_bstat_time32(unsigned int cmd) +{
Wouldn't we want a test here like:
if (!xfs_sb_version_hasbigtimestamps(&mp->m_sb)) return true;
Since date overflow isn't a problem for existing xfs with 32-bit timestamps, right?
if (IS_ENABLED(CONFIG_COMPAT_32BIT_TIME))
Heh, I didn't know that existed.
return true;
if (IS_ENABLED(CONFIG_64BIT) && !in_compat_syscall())
return true;
if (cmd == XFS_IOC_FSBULKSTAT_SINGLE ||
cmd == XFS_IOC_FSBULKSTAT ||
cmd == XFS_IOC_SWAPEXT)
return false;
return true;
+}
STATIC int xfs_ioc_fsbulkstat( xfs_mount_t *mp, @@ -850,6 +867,9 @@ xfs_ioc_fsbulkstat( if (!capable(CAP_SYS_ADMIN)) return -EPERM;
if (!xfs_have_compat_bstat_time32())
return -EINVAL;
if (XFS_FORCED_SHUTDOWN(mp)) return -EIO;
@@ -2051,6 +2071,11 @@ xfs_ioc_swapext( struct fd f, tmp; int error = 0;
if (xfs_have_compat_bstat_time32(XFS_IOC_SWAPEXT)) {
error = -EINVAL
got out;
}
/* Pull information for the target fd */ f = fdget((int)sxp->sx_fdtarget); if (!f.file) {
This way, at least users that intentionally turn off CONFIG_COMPAT_32BIT_TIME run into the broken application right away, which forces them to upgrade or fix the code to use the v5 ioctl.
Sounds reasonable.
--D
Based on those assumptions, changing xfs_bstime to use __kernel_long_t instead of time_t in both the kernel and in xfsprogs preserves the current ABI for any libc definition of time_t and solves the problem of passing 64-bit timestamps to 32-bit user space.
As said above their are not entirely true, but I still think this patch is the right thing to do, if only to get the time_t out of the ABI..
Reviewed-by: Christoph Hellwig hch@lst.de
Seconded, Reviewed-by: Darrick J. Wong darrick.wong@oracle.com
Thanks!
Arnd
On Wed, Nov 13, 2019 at 5:34 PM Darrick J. Wong darrick.wong@oracle.com wrote:
On Wed, Nov 13, 2019 at 02:42:24PM +0100, Arnd Bergmann wrote:
On Wed, Nov 13, 2019 at 6:08 AM Darrick J. Wong darrick.wong@oracle.com wrote:
On Tue, Nov 12, 2019 at 03:16:00PM +0100, Christoph Hellwig wrote:
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d50135760622..87318486c96e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -830,6 +830,23 @@ xfs_fsinumbers_fmt( return xfs_ibulk_advance(breq, sizeof(struct xfs_inogrp)); }
+/* disallow y2038-unsafe ioctls with CONFIG_COMPAT_32BIT_TIME=n */ +static bool xfs_have_compat_bstat_time32(unsigned int cmd) +{
Wouldn't we want a test here like:
if (!xfs_sb_version_hasbigtimestamps(&mp->m_sb)) return true;
Since date overflow isn't a problem for existing xfs with 32-bit timestamps, right?
I'd say probably not.
This would be for a kernel that intentionally only supports y2038-safe user space to ensure that we don't get any surprises later. In that configuration, I think we're still better off not allowing broken ioctls regardless of whether the file system is also broken. ;-)
if (IS_ENABLED(CONFIG_COMPAT_32BIT_TIME))
Heh, I didn't know that existed.
It barely does. At the moment it is always enabled on all architectures except for 32-bit riscv, but I submitted a patch to make it user selectable last week.
Arnd
The ioctl definitions for XFS_IOC_SWAPEXT, XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE are part of libxfs and based on time_t.
The definition for time_t differs between current kernels and coming 32-bit libc variants that define it as 64-bit. For most ioctls, that means the kernel has to be able to handle two different command codes based on the different structure sizes.
The same solution is be applied for XFS_IOC_SWAPEXT, but it would does work for XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE because the structure with the time_t is passed through an indirect pointer, and the command number itself is based on struct xfs_fsop_bulkreq, which does not differ based on time_t.
The best workaround I could come up with is to change the header file to define new command numbers with the same structure and have users pick one or the other at compile-time based on the time_t definition in the C library, like:
#define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq) #define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq) #define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \ XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)
The native xfs_ioctl now handles both the time32 and the time64 version of the xfs_bstat data structure, and this gets called indirectly by the compat code implementing the xfs_fsop_bulkreq commands. For x86, we still need another implementation to deal with the broken alignment, so the existing code is left in ioctl32 but changed to now handle the misaligned time64 structure.
Based on the requirement to change the header file, a much simpler fix would be to change the time_t reference to 'long' and always keep passing the shorter timestamp as on 32-bit applications using the old ioctl, forcing code changes to use the v5 API to access post-y2038 timestamps, and still requiring the updated header when building with a new C library.
Signed-off-by: Arnd Bergmann arnd@arndb.de --- Note: I have not tested this patch beyond compiling it, for now this is for discussion, to decide which approach makes more sense. --- fs/xfs/libxfs/xfs_fs.h | 19 +++- fs/xfs/xfs_ioctl.c | 195 +++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_ioctl.h | 12 +++ fs/xfs/xfs_ioctl32.c | 82 ++++++++++------- fs/xfs/xfs_ioctl32.h | 26 +++--- 5 files changed, 278 insertions(+), 56 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 4c4330f6e653..9310576a45e5 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -324,8 +324,13 @@ typedef struct xfs_growfs_rt { * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE */ typedef struct xfs_bstime { - __kernel_long_t tv_sec; /* seconds */ +#ifdef __KERNEL__ + __s64 tv_sec; /* seconds */ __s32 tv_nsec; /* and nanoseconds */ +#else + time_t tv_sec; /* seconds */ + __s32 tv_nsec; /* and nanoseconds */ +#endif } xfs_bstime_t;
struct xfs_bstat { @@ -775,8 +780,8 @@ struct xfs_scrub_metadata { * ioctl commands that replace IRIX syssgi()'s */ #define XFS_IOC_FSGEOMETRY_V1 _IOR ('X', 100, struct xfs_fsop_geom_v1) -#define XFS_IOC_FSBULKSTAT _IOWR('X', 101, struct xfs_fsop_bulkreq) -#define XFS_IOC_FSBULKSTAT_SINGLE _IOWR('X', 102, struct xfs_fsop_bulkreq) +#define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq) +#define XFS_IOC_FSBULKSTAT_SINGLE_OLD _IOWR('X', 102, struct xfs_fsop_bulkreq) #define XFS_IOC_FSINUMBERS _IOWR('X', 103, struct xfs_fsop_bulkreq) #define XFS_IOC_PATH_TO_FSHANDLE _IOWR('X', 104, struct xfs_fsop_handlereq) #define XFS_IOC_PATH_TO_HANDLE _IOWR('X', 105, struct xfs_fsop_handlereq) @@ -805,6 +810,14 @@ struct xfs_scrub_metadata { #define XFS_IOC_FSGEOMETRY _IOR ('X', 126, struct xfs_fsop_geom) #define XFS_IOC_BULKSTAT _IOR ('X', 127, struct xfs_bulkstat_req) #define XFS_IOC_INUMBERS _IOR ('X', 128, struct xfs_inumbers_req) +#define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq) +#define XFS_IOC_FSBULKSTAT_SINGLE_NEW _IOWR('X', 130, struct xfs_fsop_bulkreq) + +#define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \ + XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW) +#define XFS_IOC_FSBULKSTAT_SINGLE ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \ + XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW) + /* XFS_IOC_GETFSUUID ---------- deprecated 140 */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d58f0d6a699e..d50135760622 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -24,6 +24,7 @@ #include "xfs_export.h" #include "xfs_trace.h" #include "xfs_icache.h" +#include "xfs_ioctl.h" #include "xfs_trans.h" #include "xfs_acl.h" #include "xfs_btree.h" @@ -713,7 +714,96 @@ xfs_ioc_space( return error; }
+/* + * Structures returned from ioctl XFS_IOC_FSBULKSTAT_TIME32 & XFS_IOC_FSBULKSTAT_SINGLE_TIME32 + */ +struct xfs_bstime32 { + __s32 tv_sec; /* seconds */ + __s32 tv_nsec; /* and nanoseconds */ +}; + +struct xfs_bstat_time32 { + __u64 bs_ino; /* inode number */ + __u16 bs_mode; /* type and mode */ + __u16 bs_nlink; /* number of links */ + __u32 bs_uid; /* user id */ + __u32 bs_gid; /* group id */ + __u32 bs_rdev; /* device value */ + __s32 bs_blksize; /* block size */ + __s64 bs_size; /* file size */ + struct xfs_bstime32 bs_atime; /* access time */ + struct xfs_bstime32 bs_mtime; /* modify time */ + struct xfs_bstime32 bs_ctime; /* inode change time */ + int64_t bs_blocks; /* number of blocks */ + __u32 bs_xflags; /* extended flags */ + __s32 bs_extsize; /* extent size */ + __s32 bs_extents; /* number of extents */ + __u32 bs_gen; /* generation count */ + __u16 bs_projid_lo; /* lower part of project id */ + __u16 bs_forkoff; /* inode fork offset in bytes */ + __u16 bs_projid_hi; /* higher part of project id */ + uint16_t bs_sick; /* sick inode metadata */ + uint16_t bs_checked; /* checked inode metadata */ + unsigned char bs_pad[2]; /* pad space, unused */ + __u32 bs_cowextsize; /* cow extent size */ + __u32 bs_dmevmask; /* DMIG event mask */ + __u16 bs_dmstate; /* DMIG state info */ + __u16 bs_aextents; /* attribute number of extents */ +} __compat_packed; /* packing for x86-64 compat mode */ + +/* Convert bulkstat (v5) to bstat (v1) with 32-bit time_t. */ +void +xfs_bulkstat_to_bstat_time32( + struct xfs_mount *mp, + struct xfs_bstat_time32 *bs1, + const struct xfs_bulkstat *bstat) +{ + /* memset is needed here because of padding holes in the structure. */ + memset(bs1, 0, sizeof(struct xfs_bstat)); + bs1->bs_ino = bstat->bs_ino; + bs1->bs_mode = bstat->bs_mode; + bs1->bs_nlink = bstat->bs_nlink; + bs1->bs_uid = bstat->bs_uid; + bs1->bs_gid = bstat->bs_gid; + bs1->bs_rdev = bstat->bs_rdev; + bs1->bs_blksize = bstat->bs_blksize; + bs1->bs_size = bstat->bs_size; + bs1->bs_atime.tv_sec = bstat->bs_atime; + bs1->bs_mtime.tv_sec = bstat->bs_mtime; + bs1->bs_ctime.tv_sec = bstat->bs_ctime; + bs1->bs_atime.tv_nsec = bstat->bs_atime_nsec; + bs1->bs_mtime.tv_nsec = bstat->bs_mtime_nsec; + bs1->bs_ctime.tv_nsec = bstat->bs_ctime_nsec; + bs1->bs_blocks = bstat->bs_blocks; + bs1->bs_xflags = bstat->bs_xflags; + bs1->bs_extsize = XFS_FSB_TO_B(mp, bstat->bs_extsize_blks); + bs1->bs_extents = bstat->bs_extents; + bs1->bs_gen = bstat->bs_gen; + bs1->bs_projid_lo = bstat->bs_projectid & 0xFFFF; + bs1->bs_forkoff = bstat->bs_forkoff; + bs1->bs_projid_hi = bstat->bs_projectid >> 16; + bs1->bs_sick = bstat->bs_sick; + bs1->bs_checked = bstat->bs_checked; + bs1->bs_cowextsize = XFS_FSB_TO_B(mp, bstat->bs_cowextsize_blks); + bs1->bs_dmevmask = 0; + bs1->bs_dmstate = 0; + bs1->bs_aextents = bstat->bs_aextents; +} + /* Return 0 on success or positive error */ +int +xfs_fsbulkstat_time32_one_fmt( + struct xfs_ibulk *breq, + const struct xfs_bulkstat *bstat) +{ + struct xfs_bstat_time32 bs1; + + xfs_bulkstat_to_bstat_time32(breq->mp, &bs1, bstat); + if (copy_to_user(breq->ubuffer, &bs1, sizeof(bs1))) + return -EFAULT; + return xfs_ibulk_advance(breq, sizeof(struct xfs_bstat_time32)); +} + int xfs_fsbulkstat_one_fmt( struct xfs_ibulk *breq, @@ -789,18 +879,40 @@ xfs_ioc_fsbulkstat( * is a special case because it has traditionally meant "first inode * in filesystem". */ - if (cmd == XFS_IOC_FSINUMBERS) { + switch (cmd) { + case XFS_IOC_FSINUMBERS: breq.startino = lastino ? lastino + 1 : 0; error = xfs_inumbers(&breq, xfs_fsinumbers_fmt); lastino = breq.startino - 1; - } else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE) { + break; + case XFS_IOC_FSBULKSTAT_SINGLE_OLD: + if (!IS_ENABLED(CONFIG_64BIT)) { + breq.startino = lastino; + breq.icount = 1; + error = xfs_bulkstat_one(&breq, xfs_fsbulkstat_time32_one_fmt); + break; + } + /* Fallthrough */ + case XFS_IOC_FSBULKSTAT_SINGLE_NEW: breq.startino = lastino; breq.icount = 1; error = xfs_bulkstat_one(&breq, xfs_fsbulkstat_one_fmt); - } else { /* XFS_IOC_FSBULKSTAT */ + break; + case XFS_IOC_FSBULKSTAT_OLD: + if (!IS_ENABLED(CONFIG_64BIT)) { + breq.startino = lastino ? lastino + 1 : 0; + error = xfs_bulkstat(&breq, xfs_fsbulkstat_time32_one_fmt); + lastino = breq.startino - 1; + break; + } + /* Fallthrough */ + case XFS_IOC_FSBULKSTAT_NEW: breq.startino = lastino ? lastino + 1 : 0; error = xfs_bulkstat(&breq, xfs_fsbulkstat_one_fmt); lastino = breq.startino - 1; + break; + default: + error = -EINVAL; }
if (error) @@ -2093,6 +2205,74 @@ xfs_ioc_setlabel( return error; }
+static int get_xfs_bstime32(struct xfs_bstime *bstime, + struct xfs_bstime32 __user *bstime32) +{ + struct xfs_bstime32 t; + + if (copy_from_user(&t, bstime32, sizeof(t))) + return -EFAULT; + + *bstime = (struct xfs_bstime){ + .tv_sec = t.tv_sec, + .tv_nsec = t.tv_nsec, + }; + + return 0; +} + +static int get_xfs_bstat_time32(struct xfs_bstat *bstat, + struct xfs_bstat_time32 __user *bstat32) +{ + if (copy_from_user(bstat, bstat32, + offsetof(struct xfs_bstat, bs_atime))) + return -EFAULT; + + if (get_xfs_bstime32(&bstat->bs_atime, &bstat32->bs_atime) || + get_xfs_bstime32(&bstat->bs_mtime, &bstat32->bs_mtime) || + get_xfs_bstime32(&bstat->bs_ctime, &bstat32->bs_ctime)) + return -EFAULT; + + if (copy_from_user(&bstat->bs_blocks, &bstat32->bs_blocks, + sizeof(struct xfs_bstat) - + offsetof(struct xfs_bstat, bs_blocks))) + return -EFAULT; + + return 0; +} + +/* + * Structure passed to XFS_IOC_SWAPEXT_TIME32 + */ +struct xfs_swapext_time32 +{ + int64_t sx_version; /* version */ + int64_t sx_fdtarget; /* fd of target file */ + int64_t sx_fdtmp; /* fd of tmp file */ + xfs_off_t sx_offset; /* offset into file */ + xfs_off_t sx_length; /* leng from offset */ + char sx_pad[16]; /* pad space, unused */ + struct xfs_bstat_time32 sx_stat;/* stat of target b4 copy */ +}; +#define XFS_IOC_SWAPEXT_TIME32 _IOWR('X', 109, struct xfs_swapext_time32) + +static int get_xfs_swapext(struct xfs_swapext *sxp, unsigned int cmd, void __user *arg) +{ + struct xfs_swapext_time32 *sxp32 = arg; + int ret; + + if (cmd == XFS_IOC_SWAPEXT) { + ret = copy_from_user(sxp, arg, sizeof(struct xfs_swapext)); + return ret ? -EFAULT : 0; + } + + ret = copy_from_user(sxp, arg, offsetof(struct xfs_swapext, sx_stat)); + if (ret) + return -EFAULT; + + return get_xfs_bstat_time32(&sxp->sx_stat, &sxp32->sx_stat); +} + /* * Note: some of the ioctl's return positive numbers as a * byte count indicating success, such as readlink_by_handle. @@ -2149,8 +2329,10 @@ xfs_file_ioctl( return 0; }
- case XFS_IOC_FSBULKSTAT_SINGLE: - case XFS_IOC_FSBULKSTAT: + case XFS_IOC_FSBULKSTAT_SINGLE_OLD: + case XFS_IOC_FSBULKSTAT_OLD: + case XFS_IOC_FSBULKSTAT_SINGLE_NEW: + case XFS_IOC_FSBULKSTAT_NEW: case XFS_IOC_FSINUMBERS: return xfs_ioc_fsbulkstat(mp, cmd, arg);
@@ -2242,10 +2424,11 @@ xfs_file_ioctl( case XFS_IOC_ATTRMULTI_BY_HANDLE: return xfs_attrmulti_by_handle(filp, arg);
+ case XFS_IOC_SWAPEXT_TIME32: case XFS_IOC_SWAPEXT: { struct xfs_swapext sxp;
- if (copy_from_user(&sxp, arg, sizeof(xfs_swapext_t))) + if (get_xfs_swapext(&sxp, cmd, arg)) return -EFAULT; error = mnt_want_write_file(filp); if (error) diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h index 654c0bb1bcf8..318ff243258f 100644 --- a/fs/xfs/xfs_ioctl.h +++ b/fs/xfs/xfs_ioctl.h @@ -83,6 +83,18 @@ struct xfs_inogrp;
int xfs_fsbulkstat_one_fmt(struct xfs_ibulk *breq, const struct xfs_bulkstat *bstat); +int xfs_fsbulkstat_time32_one_fmt(struct xfs_ibulk *breq, + const struct xfs_bulkstat *bstat); int xfs_fsinumbers_fmt(struct xfs_ibulk *breq, const struct xfs_inumbers *igrp);
+/* + * On intel, even if sizes match, alignment and/or padding may differ. + */ +#if defined(CONFIG_IA64) || defined(CONFIG_X86_64) +#define BROKEN_X86_ALIGNMENT +#define __compat_packed __attribute__((packed)) +#else +#define __compat_packed +#endif + #endif diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c index 1e08bf79b478..2ea7d3e12b4b 100644 --- a/fs/xfs/xfs_ioctl32.c +++ b/fs/xfs/xfs_ioctl32.c @@ -98,28 +98,18 @@ xfs_fsinumbers_fmt_compat( return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_inogrp)); }
-#else -#define xfs_fsinumbers_fmt_compat xfs_fsinumbers_fmt -#endif /* BROKEN_X86_ALIGNMENT */ - +/* struct xfs_bstat has differing alignment on intel */ STATIC int xfs_ioctl32_bstime_copyin( xfs_bstime_t *bstime, compat_xfs_bstime_t __user *bstime32) { - compat_time_t sec32; /* tv_sec differs on 64 vs. 32 */ - - if (get_user(sec32, &bstime32->tv_sec) || + if (get_user(bstime->tv_sec, &bstime32->tv_sec) || get_user(bstime->tv_nsec, &bstime32->tv_nsec)) return -EFAULT; - bstime->tv_sec = sec32; return 0; }
-/* - * struct xfs_bstat has differing alignment on intel, & bstime_t sizes - * everywhere - */ STATIC int xfs_ioctl32_bstat_copyin( struct xfs_bstat *bstat, @@ -158,10 +148,7 @@ xfs_bstime_store_compat( compat_xfs_bstime_t __user *p32, const xfs_bstime_t *p) { - __s32 sec32; - - sec32 = p->tv_sec; - if (put_user(sec32, &p32->tv_sec) || + if (put_user(p->tv_sec, &p32->tv_sec) || put_user(p->tv_nsec, &p32->tv_nsec)) return -EFAULT; return 0; @@ -206,6 +193,10 @@ xfs_fsbulkstat_one_fmt_compat( return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_bstat)); }
+#else +#define xfs_fsinumbers_fmt_compat xfs_fsinumbers_fmt +#endif /* BROKEN_X86_ALIGNMENT */ + /* copied from xfs_ioctl.c */ STATIC int xfs_compat_ioc_fsbulkstat( @@ -229,7 +220,12 @@ xfs_compat_ioc_fsbulkstat( * functions and structure size are the correct ones to use ... */ inumbers_fmt_pf inumbers_func = xfs_fsinumbers_fmt_compat; - bulkstat_one_fmt_pf bs_one_func = xfs_fsbulkstat_one_fmt_compat; + bulkstat_one_fmt_pf bs_one_func_old = xfs_fsbulkstat_time32_one_fmt; + bulkstat_one_fmt_pf bs_one_func_new = xfs_fsbulkstat_one_fmt; + +#ifdef BROKEN_X86_ALIGNMENT + bs_one_func_new = xfs_fsbulkstat_one_fmt_compat; +#endif
#ifdef CONFIG_X86_X32 if (in_x32_syscall()) { @@ -242,7 +238,7 @@ xfs_compat_ioc_fsbulkstat( * x32 userspace expects. */ inumbers_func = xfs_fsinumbers_fmt; - bs_one_func = xfs_fsbulkstat_one_fmt; + bs_one_func_old = xfs_fsbulkstat_one_fmt; } #endif
@@ -289,21 +285,37 @@ xfs_compat_ioc_fsbulkstat( * is a special case because it has traditionally meant "first inode * in filesystem". */ - if (cmd == XFS_IOC_FSINUMBERS_32) { + switch (cmd) { + case XFS_IOC_FSINUMBERS_32: breq.startino = lastino ? lastino + 1 : 0; error = xfs_inumbers(&breq, inumbers_func); lastino = breq.startino - 1; - } else if (cmd == XFS_IOC_FSBULKSTAT_SINGLE_32) { + break; + case XFS_IOC_FSBULKSTAT_SINGLE_OLD32: breq.startino = lastino; breq.icount = 1; - error = xfs_bulkstat_one(&breq, bs_one_func); + error = xfs_bulkstat_one(&breq, bs_one_func_old); lastino = breq.startino; - } else if (cmd == XFS_IOC_FSBULKSTAT_32) { + break; + case XFS_IOC_FSBULKSTAT_OLD32: breq.startino = lastino ? lastino + 1 : 0; - error = xfs_bulkstat(&breq, bs_one_func); + error = xfs_bulkstat(&breq, bs_one_func_old); lastino = breq.startino - 1; - } else { + break; + case XFS_IOC_FSBULKSTAT_SINGLE_NEW32: + breq.startino = lastino; + breq.icount = 1; + error = xfs_bulkstat_one(&breq, bs_one_func_new); + lastino = breq.startino; + break; + case XFS_IOC_FSBULKSTAT_NEW32: + breq.startino = lastino ? lastino + 1 : 0; + error = xfs_bulkstat(&breq, bs_one_func_new); + lastino = breq.startino - 1; + break; + default: error = -EINVAL; + break; } if (error) return error; @@ -548,7 +560,9 @@ xfs_file_compat_ioctl( struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; void __user *arg = compat_ptr(p); +#if defined(BROKEN_X86_ALIGNMENT) int error; +#endif
trace_xfs_file_compat_ioctl(ip);
@@ -596,13 +610,6 @@ xfs_file_compat_ioctl( mnt_drop_write_file(filp); return error; } -#endif - /* long changes size, but xfs only copiese out 32 bits */ - case XFS_IOC_GETXFLAGS_32: - case XFS_IOC_SETXFLAGS_32: - case XFS_IOC_GETVERSION_32: - cmd = _NATIVE_IOC(cmd, long); - return xfs_file_ioctl(filp, cmd, p); case XFS_IOC_SWAPEXT_32: { struct xfs_swapext sxp; struct compat_xfs_swapext __user *sxu = arg; @@ -619,8 +626,17 @@ xfs_file_compat_ioctl( mnt_drop_write_file(filp); return error; } - case XFS_IOC_FSBULKSTAT_32: - case XFS_IOC_FSBULKSTAT_SINGLE_32: +#endif + /* long changes size, but xfs only copiese out 32 bits */ + case XFS_IOC_GETXFLAGS_32: + case XFS_IOC_SETXFLAGS_32: + case XFS_IOC_GETVERSION_32: + cmd = _NATIVE_IOC(cmd, long); + return xfs_file_ioctl(filp, cmd, p); + case XFS_IOC_FSBULKSTAT_OLD32: + case XFS_IOC_FSBULKSTAT_SINGLE_OLD32: + case XFS_IOC_FSBULKSTAT_NEW32: + case XFS_IOC_FSBULKSTAT_SINGLE_NEW32: case XFS_IOC_FSINUMBERS_32: return xfs_compat_ioc_fsbulkstat(mp, cmd, arg); case XFS_IOC_FD_TO_HANDLE_32: diff --git a/fs/xfs/xfs_ioctl32.h b/fs/xfs/xfs_ioctl32.h index 7985344d3aa6..d7050f4360e9 100644 --- a/fs/xfs/xfs_ioctl32.h +++ b/fs/xfs/xfs_ioctl32.h @@ -21,20 +21,11 @@ #define XFS_IOC_SETXFLAGS_32 FS_IOC32_SETFLAGS #define XFS_IOC_GETVERSION_32 FS_IOC32_GETVERSION
-/* - * On intel, even if sizes match, alignment and/or padding may differ. - */ -#if defined(CONFIG_IA64) || defined(CONFIG_X86_64) -#define BROKEN_X86_ALIGNMENT -#define __compat_packed __attribute__((packed)) -#else -#define __compat_packed -#endif - +#ifdef BROKEN_X86_ALIGNMENT typedef struct compat_xfs_bstime { - compat_time_t tv_sec; /* seconds */ + __s64 tv_sec; /* seconds */ __s32 tv_nsec; /* and nanoseconds */ -} compat_xfs_bstime_t; +} __compat_packed compat_xfs_bstime_t;
struct compat_xfs_bstat { __u64 bs_ino; /* inode number */ @@ -62,6 +53,7 @@ struct compat_xfs_bstat { __u16 bs_dmstate; /* DMIG state info */ __u16 bs_aextents; /* attribute number of extents */ } __compat_packed; +#endif
struct compat_xfs_fsop_bulkreq { compat_uptr_t lastip; /* last inode # pointer */ @@ -70,10 +62,14 @@ struct compat_xfs_fsop_bulkreq { compat_uptr_t ocount; /* output count pointer */ };
-#define XFS_IOC_FSBULKSTAT_32 \ +#define XFS_IOC_FSBULKSTAT_OLD32 \ _IOWR('X', 101, struct compat_xfs_fsop_bulkreq) -#define XFS_IOC_FSBULKSTAT_SINGLE_32 \ +#define XFS_IOC_FSBULKSTAT_SINGLE_OLD32 \ _IOWR('X', 102, struct compat_xfs_fsop_bulkreq) +#define XFS_IOC_FSBULKSTAT_NEW32 \ + _IOWR('X', 129, struct compat_xfs_fsop_bulkreq) +#define XFS_IOC_FSBULKSTAT_SINGLE_NEW32 \ + _IOWR('X', 130, struct compat_xfs_fsop_bulkreq) #define XFS_IOC_FSINUMBERS_32 \ _IOWR('X', 103, struct compat_xfs_fsop_bulkreq)
@@ -98,6 +94,7 @@ typedef struct compat_xfs_fsop_handlereq { #define XFS_IOC_READLINK_BY_HANDLE_32 \ _IOWR('X', 108, struct compat_xfs_fsop_handlereq)
+#ifdef BROKEN_X86_ALIGNMENT /* The bstat field in the swapext struct needs translation */ typedef struct compat_xfs_swapext { int64_t sx_version; /* version */ @@ -110,6 +107,7 @@ typedef struct compat_xfs_swapext { } __compat_packed compat_xfs_swapext_t;
#define XFS_IOC_SWAPEXT_32 _IOWR('X', 109, struct compat_xfs_swapext) +#endif
typedef struct compat_xfs_fsop_attrlist_handlereq { struct compat_xfs_fsop_handlereq hreq; /* handle interface structure */
The change to the ioctl commands requires updating the header file. If we do that, the alignment problem on i386 could be solved for the new format by explictly aligning all 64-bit members to time_t. This complicates the header file definition, but saves the code to deal with a third format.
The native ioctl simply deals with both the time32 and the time64 version, and the latter is the same across all 64-bit architectures and all 32-bit user space with 32-bit time_t including i386 and x32.
If we decide to do this, this change should be folded into "xfs: [variant B] add time64 version of xfs_bstat".
Signed-off-by: Arnd Bergmann arnd@arndb.de --- fs/xfs/libxfs/xfs_fs.h | 20 +++++-- fs/xfs/xfs_ioctl32.c | 120 +---------------------------------------- 2 files changed, 18 insertions(+), 122 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 9310576a45e5..e95807d223ad 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -319,6 +319,17 @@ typedef struct xfs_growfs_rt { __u32 extsize; /* new realtime extent size, fsblocks */ } xfs_growfs_rt_t;
+/* + * on i386, u64 has 32-bit alignment, which makes the traditional xfs_bstat + * different from other architectures. Aligning all 64-bit members to + * sizeof(time_t) means that the new version with 64-bit time_t is padded + * the same as on all other architectures. + */ +#ifdef __i386__ +#define __TIME_ALIGN __attribute__((aligned(sizeof(time_t)))) +#else +#define __TIME_ALIGN +#endif
/* * Structures returned from ioctl XFS_IOC_FSBULKSTAT & XFS_IOC_FSBULKSTAT_SINGLE @@ -328,24 +339,24 @@ typedef struct xfs_bstime { __s64 tv_sec; /* seconds */ __s32 tv_nsec; /* and nanoseconds */ #else - time_t tv_sec; /* seconds */ + time_t tv_sec __TIME_ALIGN; /* seconds */ __s32 tv_nsec; /* and nanoseconds */ #endif } xfs_bstime_t;
struct xfs_bstat { - __u64 bs_ino; /* inode number */ + __u64 bs_ino __TIME_ALIGN; /* inode number */ __u16 bs_mode; /* type and mode */ __u16 bs_nlink; /* number of links */ __u32 bs_uid; /* user id */ __u32 bs_gid; /* group id */ __u32 bs_rdev; /* device value */ __s32 bs_blksize; /* block size */ - __s64 bs_size; /* file size */ + __s64 bs_size __TIME_ALIGN; /* file size */ xfs_bstime_t bs_atime; /* access time */ xfs_bstime_t bs_mtime; /* modify time */ xfs_bstime_t bs_ctime; /* inode change time */ - int64_t bs_blocks; /* number of blocks */ + int64_t bs_blocks __TIME_ALIGN; /* number of blocks */ __u32 bs_xflags; /* extended flags */ __s32 bs_extsize; /* extent size */ __s32 bs_extents; /* number of extents */ @@ -362,6 +373,7 @@ struct xfs_bstat { __u16 bs_dmstate; /* DMIG state info */ __u16 bs_aextents; /* attribute number of extents */ }; +#undef __TIME_ALIGN
/* New bulkstat structure that reports v5 features and fixes padding issues */ struct xfs_bulkstat { diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c index 2ea7d3e12b4b..8059cdc2d5ca 100644 --- a/fs/xfs/xfs_ioctl32.c +++ b/fs/xfs/xfs_ioctl32.c @@ -98,101 +98,6 @@ xfs_fsinumbers_fmt_compat( return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_inogrp)); }
-/* struct xfs_bstat has differing alignment on intel */ -STATIC int -xfs_ioctl32_bstime_copyin( - xfs_bstime_t *bstime, - compat_xfs_bstime_t __user *bstime32) -{ - if (get_user(bstime->tv_sec, &bstime32->tv_sec) || - get_user(bstime->tv_nsec, &bstime32->tv_nsec)) - return -EFAULT; - return 0; -} - -STATIC int -xfs_ioctl32_bstat_copyin( - struct xfs_bstat *bstat, - struct compat_xfs_bstat __user *bstat32) -{ - if (get_user(bstat->bs_ino, &bstat32->bs_ino) || - get_user(bstat->bs_mode, &bstat32->bs_mode) || - get_user(bstat->bs_nlink, &bstat32->bs_nlink) || - get_user(bstat->bs_uid, &bstat32->bs_uid) || - get_user(bstat->bs_gid, &bstat32->bs_gid) || - get_user(bstat->bs_rdev, &bstat32->bs_rdev) || - get_user(bstat->bs_blksize, &bstat32->bs_blksize) || - get_user(bstat->bs_size, &bstat32->bs_size) || - xfs_ioctl32_bstime_copyin(&bstat->bs_atime, &bstat32->bs_atime) || - xfs_ioctl32_bstime_copyin(&bstat->bs_mtime, &bstat32->bs_mtime) || - xfs_ioctl32_bstime_copyin(&bstat->bs_ctime, &bstat32->bs_ctime) || - get_user(bstat->bs_blocks, &bstat32->bs_size) || - get_user(bstat->bs_xflags, &bstat32->bs_size) || - get_user(bstat->bs_extsize, &bstat32->bs_extsize) || - get_user(bstat->bs_extents, &bstat32->bs_extents) || - get_user(bstat->bs_gen, &bstat32->bs_gen) || - get_user(bstat->bs_projid_lo, &bstat32->bs_projid_lo) || - get_user(bstat->bs_projid_hi, &bstat32->bs_projid_hi) || - get_user(bstat->bs_forkoff, &bstat32->bs_forkoff) || - get_user(bstat->bs_dmevmask, &bstat32->bs_dmevmask) || - get_user(bstat->bs_dmstate, &bstat32->bs_dmstate) || - get_user(bstat->bs_aextents, &bstat32->bs_aextents)) - return -EFAULT; - return 0; -} - -/* XFS_IOC_FSBULKSTAT and friends */ - -STATIC int -xfs_bstime_store_compat( - compat_xfs_bstime_t __user *p32, - const xfs_bstime_t *p) -{ - if (put_user(p->tv_sec, &p32->tv_sec) || - put_user(p->tv_nsec, &p32->tv_nsec)) - return -EFAULT; - return 0; -} - -/* Return 0 on success or positive error (to xfs_bulkstat()) */ -STATIC int -xfs_fsbulkstat_one_fmt_compat( - struct xfs_ibulk *breq, - const struct xfs_bulkstat *bstat) -{ - struct compat_xfs_bstat __user *p32 = breq->ubuffer; - struct xfs_bstat bs1; - struct xfs_bstat *buffer = &bs1; - - xfs_bulkstat_to_bstat(breq->mp, &bs1, bstat); - - if (put_user(buffer->bs_ino, &p32->bs_ino) || - put_user(buffer->bs_mode, &p32->bs_mode) || - put_user(buffer->bs_nlink, &p32->bs_nlink) || - put_user(buffer->bs_uid, &p32->bs_uid) || - put_user(buffer->bs_gid, &p32->bs_gid) || - put_user(buffer->bs_rdev, &p32->bs_rdev) || - put_user(buffer->bs_blksize, &p32->bs_blksize) || - put_user(buffer->bs_size, &p32->bs_size) || - xfs_bstime_store_compat(&p32->bs_atime, &buffer->bs_atime) || - xfs_bstime_store_compat(&p32->bs_mtime, &buffer->bs_mtime) || - xfs_bstime_store_compat(&p32->bs_ctime, &buffer->bs_ctime) || - put_user(buffer->bs_blocks, &p32->bs_blocks) || - put_user(buffer->bs_xflags, &p32->bs_xflags) || - put_user(buffer->bs_extsize, &p32->bs_extsize) || - put_user(buffer->bs_extents, &p32->bs_extents) || - put_user(buffer->bs_gen, &p32->bs_gen) || - put_user(buffer->bs_projid, &p32->bs_projid) || - put_user(buffer->bs_projid_hi, &p32->bs_projid_hi) || - put_user(buffer->bs_forkoff, &p32->bs_forkoff) || - put_user(buffer->bs_dmevmask, &p32->bs_dmevmask) || - put_user(buffer->bs_dmstate, &p32->bs_dmstate) || - put_user(buffer->bs_aextents, &p32->bs_aextents)) - return -EFAULT; - - return xfs_ibulk_advance(breq, sizeof(struct compat_xfs_bstat)); -} - #else #define xfs_fsinumbers_fmt_compat xfs_fsinumbers_fmt #endif /* BROKEN_X86_ALIGNMENT */ @@ -221,11 +126,6 @@ xfs_compat_ioc_fsbulkstat( */ inumbers_fmt_pf inumbers_func = xfs_fsinumbers_fmt_compat; bulkstat_one_fmt_pf bs_one_func_old = xfs_fsbulkstat_time32_one_fmt; - bulkstat_one_fmt_pf bs_one_func_new = xfs_fsbulkstat_one_fmt; - -#ifdef BROKEN_X86_ALIGNMENT - bs_one_func_new = xfs_fsbulkstat_one_fmt_compat; -#endif
#ifdef CONFIG_X86_X32 if (in_x32_syscall()) { @@ -305,12 +205,12 @@ xfs_compat_ioc_fsbulkstat( case XFS_IOC_FSBULKSTAT_SINGLE_NEW32: breq.startino = lastino; breq.icount = 1; - error = xfs_bulkstat_one(&breq, bs_one_func_new); + error = xfs_bulkstat_one(&breq, xfs_fsbulkstat_one_fmt); lastino = breq.startino; break; case XFS_IOC_FSBULKSTAT_NEW32: breq.startino = lastino ? lastino + 1 : 0; - error = xfs_bulkstat(&breq, bs_one_func_new); + error = xfs_bulkstat(&breq, xfs_fsbulkstat_one_fmt); lastino = breq.startino - 1; break; default: @@ -610,22 +510,6 @@ xfs_file_compat_ioctl( mnt_drop_write_file(filp); return error; } - case XFS_IOC_SWAPEXT_32: { - struct xfs_swapext sxp; - struct compat_xfs_swapext __user *sxu = arg; - - /* Bulk copy in up to the sx_stat field, then copy bstat */ - if (copy_from_user(&sxp, sxu, - offsetof(struct xfs_swapext, sx_stat)) || - xfs_ioctl32_bstat_copyin(&sxp.sx_stat, &sxu->sx_stat)) - return -EFAULT; - error = mnt_want_write_file(filp); - if (error) - return error; - error = xfs_ioc_swapext(&sxp); - mnt_drop_write_file(filp); - return error; - } #endif /* long changes size, but xfs only copiese out 32 bits */ case XFS_IOC_GETXFLAGS_32:
XFS is the only major file system that lacks timestamps beyond year 2038, and is already being deployed in systems that may have to be supported beyond that time.
Fortunately, the inode format still has a few reserved bits that can be used to extend the current format. There are two bits in the nanosecond portion that could be used in the same way that ext4 does, extending the timestamps until year 2378, as well as 12 unused bytes after the already allocated fields.
There are four timestamps that need to be extended, so using four bytes out of the reserved space gets us all the way until year 36676, by extending the current 1902-2036 with another 255 epochs, which seems to be a reasonable range.
I am not sure whether this change to the inode format requires a new version for the inode. All existing file system images remain compatible, while mounting a file systems with extended timestamps beyond 2038 would report that timestamp incorrectly in the 1902 through 2038 range, matching the traditional Linux behavior of wrapping timestamps.
Signed-off-by: Arnd Bergmann arnd@arndb.de --- fs/xfs/libxfs/xfs_format.h | 6 +++++- fs/xfs/libxfs/xfs_inode_buf.c | 28 ++++++++++++++++++++-------- fs/xfs/libxfs/xfs_inode_buf.h | 1 + fs/xfs/libxfs/xfs_log_format.h | 6 +++++- fs/xfs/libxfs/xfs_trans_inode.c | 3 ++- fs/xfs/xfs_inode.c | 3 ++- fs/xfs/xfs_inode_item.c | 10 +++++++--- fs/xfs/xfs_iops.c | 3 ++- fs/xfs/xfs_itable.c | 2 +- fs/xfs/xfs_super.c | 2 +- 10 files changed, 46 insertions(+), 18 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index c968b60cee15..dc8d160775fb 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -883,7 +883,11 @@ typedef struct xfs_dinode { __be64 di_lsn; /* flush sequence */ __be64 di_flags2; /* more random flags */ __be32 di_cowextsize; /* basic cow extent size for file */ - __u8 di_pad2[12]; /* more padding for future expansion */ + __u8 di_atime_hi; /* upper 8 bits of di_atime */ + __u8 di_mtime_hi; /* upper 8 bits of di_mtime */ + __u8 di_ctime_hi; /* upper 8 bits of di_ctime */ + __u8 di_crtime_hi; /* upper 8 bits of di_crtime */ + __u8 di_pad2[8]; /* more padding for future expansion */
/* fields only written to during inode creation */ xfs_timestamp_t di_crtime; /* time created */ diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 28ab3c5255e1..4989b6f1ac6f 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -228,16 +228,19 @@ xfs_inode_from_disk( to->di_flushiter = be16_to_cpu(from->di_flushiter);
/* - * Time is signed, so need to convert to signed 32 bit before - * storing in inode timestamp which may be 64 bit. Otherwise - * a time before epoch is converted to a time long after epoch - * on 64 bit systems. + * The supported time range starts at INT_MIN, corresponding to + * year 1902. With the traditional low 32 bits, this ends in + * year 2038, the extra 8 bits extend it by another 255 epochs + * of 136.1 years each, up to year 36744. */ - inode->i_atime.tv_sec = (int)be32_to_cpu(from->di_atime.t_sec); + inode->i_atime.tv_sec = be32_to_cpu(from->di_atime.t_sec) + + ((u64)from->di_atime_hi << 32); inode->i_atime.tv_nsec = (int)be32_to_cpu(from->di_atime.t_nsec); - inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec); + inode->i_mtime.tv_sec = (int)be32_to_cpu(from->di_mtime.t_sec) + + ((u64)from->di_mtime_hi << 32); inode->i_mtime.tv_nsec = (int)be32_to_cpu(from->di_mtime.t_nsec); - inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec); + inode->i_ctime.tv_sec = (int)be32_to_cpu(from->di_ctime.t_sec) + + ((u64)from->di_ctime_hi << 32); inode->i_ctime.tv_nsec = (int)be32_to_cpu(from->di_ctime.t_nsec); inode->i_generation = be32_to_cpu(from->di_gen); inode->i_mode = be16_to_cpu(from->di_mode); @@ -256,7 +259,8 @@ xfs_inode_from_disk( if (to->di_version == 3) { inode_set_iversion_queried(inode, be64_to_cpu(from->di_changecount)); - to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec); + to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec) + + ((u64)from->di_crtime_hi << 32); to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec); to->di_flags2 = be64_to_cpu(from->di_flags2); to->di_cowextsize = be32_to_cpu(from->di_cowextsize); @@ -284,10 +288,13 @@ xfs_inode_to_disk(
memset(to->di_pad, 0, sizeof(to->di_pad)); to->di_atime.t_sec = cpu_to_be32(inode->i_atime.tv_sec); + to->di_atime_hi = upper_32_bits(inode->i_atime.tv_sec); to->di_atime.t_nsec = cpu_to_be32(inode->i_atime.tv_nsec); to->di_mtime.t_sec = cpu_to_be32(inode->i_mtime.tv_sec); + to->di_mtime_hi = upper_32_bits(inode->i_mtime.tv_sec); to->di_mtime.t_nsec = cpu_to_be32(inode->i_mtime.tv_nsec); to->di_ctime.t_sec = cpu_to_be32(inode->i_ctime.tv_sec); + to->di_ctime_hi = upper_32_bits(inode->i_ctime.tv_sec); to->di_ctime.t_nsec = cpu_to_be32(inode->i_ctime.tv_nsec); to->di_nlink = cpu_to_be32(inode->i_nlink); to->di_gen = cpu_to_be32(inode->i_generation); @@ -307,6 +314,7 @@ xfs_inode_to_disk( if (from->di_version == 3) { to->di_changecount = cpu_to_be64(inode_peek_iversion(inode)); to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec); + to->di_crtime_hi = upper_32_bits(from->di_crtime.t_sec); to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec); to->di_flags2 = cpu_to_be64(from->di_flags2); to->di_cowextsize = cpu_to_be32(from->di_cowextsize); @@ -338,10 +346,13 @@ xfs_log_dinode_to_disk( memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec); + to->di_atime_hi = from->di_atime_hi; to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec); to->di_mtime.t_sec = cpu_to_be32(from->di_mtime.t_sec); + to->di_mtime_hi = from->di_mtime_hi; to->di_mtime.t_nsec = cpu_to_be32(from->di_mtime.t_nsec); to->di_ctime.t_sec = cpu_to_be32(from->di_ctime.t_sec); + to->di_ctime_hi = from->di_ctime_hi; to->di_ctime.t_nsec = cpu_to_be32(from->di_ctime.t_nsec);
to->di_size = cpu_to_be64(from->di_size); @@ -359,6 +370,7 @@ xfs_log_dinode_to_disk( if (from->di_version == 3) { to->di_changecount = cpu_to_be64(from->di_changecount); to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec); + to->di_crtime_hi = from->di_crtime_hi; to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec); to->di_flags2 = cpu_to_be64(from->di_flags2); to->di_cowextsize = cpu_to_be32(from->di_cowextsize); diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h index ab0f84165317..49556e1898da 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.h +++ b/fs/xfs/libxfs/xfs_inode_buf.h @@ -38,6 +38,7 @@ struct xfs_icdinode { uint32_t di_cowextsize; /* basic cow extent size for file */
xfs_ictimestamp_t di_crtime; /* time created */ + uint8_t di_crtime_hi; /* upper 8 bites of di_crtime */ };
/* diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index e5f97c69b320..c17e7c6511ff 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -414,7 +414,11 @@ struct xfs_log_dinode { xfs_lsn_t di_lsn; /* flush sequence */ uint64_t di_flags2; /* more random flags */ uint32_t di_cowextsize; /* basic cow extent size for file */ - uint8_t di_pad2[12]; /* more padding for future expansion */ + uint8_t di_atime_hi; /* upper 8 bits of di_atime */ + uint8_t di_mtime_hi; /* upper 8 bits of di_mtime */ + uint8_t di_ctime_hi; /* upper 8 bits of di_ctime */ + uint8_t di_crtime_hi; /* upper 8 bits of di_crtime */ + uint8_t di_pad2[8]; /* more padding for future expansion */
/* fields only written to during inode creation */ xfs_ictimestamp_t di_crtime; /* time created */ diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index a9ad90926b87..419356eec52c 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -67,7 +67,8 @@ xfs_trans_ichgtime( if (flags & XFS_ICHGTIME_CHG) inode->i_ctime = tv; if (flags & XFS_ICHGTIME_CREATE) { - ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec; + ip->i_d.di_crtime.t_sec = lower_32_bits(tv.tv_sec); + ip->i_d.di_crtime_hi = upper_32_bits(tv.tv_sec); ip->i_d.di_crtime.t_nsec = (int32_t)tv.tv_nsec; } } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 18f4b262e61c..c0d9d568ea4f 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -845,7 +845,8 @@ xfs_ialloc( inode_set_iversion(inode, 1); ip->i_d.di_flags2 = 0; ip->i_d.di_cowextsize = 0; - ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec; + ip->i_d.di_crtime.t_sec = lower_32_bits(tv.tv_sec); + ip->i_d.di_crtime_hi = upper_32_bits(tv.tv_sec); ip->i_d.di_crtime.t_nsec = (int32_t)tv.tv_nsec; }
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index bb8f076805b9..338188a5a698 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -314,11 +314,14 @@ xfs_inode_to_log_dinode(
memset(to->di_pad, 0, sizeof(to->di_pad)); memset(to->di_pad3, 0, sizeof(to->di_pad3)); - to->di_atime.t_sec = inode->i_atime.tv_sec; + to->di_atime.t_sec = lower_32_bits(inode->i_atime.tv_sec); + to->di_atime_hi = upper_32_bits(inode->i_atime.tv_sec); to->di_atime.t_nsec = inode->i_atime.tv_nsec; - to->di_mtime.t_sec = inode->i_mtime.tv_sec; + to->di_mtime.t_sec = lower_32_bits(inode->i_mtime.tv_sec); + to->di_mtime_hi = upper_32_bits(inode->i_mtime.tv_sec); to->di_mtime.t_nsec = inode->i_mtime.tv_nsec; - to->di_ctime.t_sec = inode->i_ctime.tv_sec; + to->di_ctime.t_sec = lower_32_bits(inode->i_ctime.tv_sec); + to->di_ctime_hi = upper_32_bits(inode->i_ctime.tv_sec); to->di_ctime.t_nsec = inode->i_ctime.tv_nsec; to->di_nlink = inode->i_nlink; to->di_gen = inode->i_generation; @@ -341,6 +344,7 @@ xfs_inode_to_log_dinode( if (from->di_version == 3) { to->di_changecount = inode_peek_iversion(inode); to->di_crtime.t_sec = from->di_crtime.t_sec; + to->di_crtime_hi = from->di_crtime_hi; to->di_crtime.t_nsec = from->di_crtime.t_nsec; to->di_flags2 = from->di_flags2; to->di_cowextsize = from->di_cowextsize; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index fe285d123d69..72d40ae1e91f 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -516,7 +516,8 @@ xfs_vn_getattr( if (ip->i_d.di_version == 3) { if (request_mask & STATX_BTIME) { stat->result_mask |= STATX_BTIME; - stat->btime.tv_sec = ip->i_d.di_crtime.t_sec; + stat->btime.tv_sec = ip->i_d.di_crtime.t_sec + + ((u64)ip->i_d.di_crtime_hi << 32); stat->btime.tv_nsec = ip->i_d.di_crtime.t_nsec; } } diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 884950adbd16..ea4bf4475727 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -97,7 +97,7 @@ xfs_bulkstat_one_int( buf->bs_mtime_nsec = inode->i_mtime.tv_nsec; buf->bs_ctime = inode->i_ctime.tv_sec; buf->bs_ctime_nsec = inode->i_ctime.tv_nsec; - buf->bs_btime = dic->di_crtime.t_sec; + buf->bs_btime = dic->di_crtime.t_sec + ((u64)dic->di_crtime_hi << 32); buf->bs_btime_nsec = dic->di_crtime.t_nsec; buf->bs_gen = inode->i_generation; buf->bs_mode = inode->i_mode; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 8d1df9f8be07..2adfe1039693 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1665,7 +1665,7 @@ xfs_fs_fill_super( sb->s_max_links = XFS_MAXLINK; sb->s_time_gran = 1; sb->s_time_min = S32_MIN; - sb->s_time_max = S32_MAX; + sb->s_time_max = S32_MAX + 255 * 0x100000000ull; sb->s_iflags |= SB_I_CGROUPWB;
set_posix_acl_flag(sb);
Amir just send another patch dealing with the time stamps. I'd suggest you chime into the discussion in that thread.
On Tue, Nov 12, 2019 at 4:16 PM Christoph Hellwig hch@lst.de wrote:
Amir just send another patch dealing with the time stamps. I'd suggest you chime into the discussion in that thread.
That's right I just posted the ext4 style extend to 34bits yesterday [1], but I like your version so much better, so I will withdraw mine.
Sorry I did not CC you nor Deepa nor y2038 list. I did not think you were going to actually deal with specific filesystems.
I'd also like to hear people's thoughts about migration process. Should the new feature be ro_compat as I defined it or incompat?
If all agree that Arnd's format change is preferred, I can assist with xfsprogs patches, tests or whatnot.
Thanks, Amir.
[1] https://lore.kernel.org/linux-xfs/20191112082524.GA18779@infradead.org/T/#mf...
On Tue, Nov 12, 2019 at 4:02 PM Amir Goldstein amir73il@gmail.com wrote:
On Tue, Nov 12, 2019 at 4:16 PM Christoph Hellwig hch@lst.de wrote:
Amir just send another patch dealing with the time stamps. I'd suggest you chime into the discussion in that thread.
That's right I just posted the ext4 style extend to 34bits yesterday [1], but I like your version so much better, so I will withdraw mine.
Thanks! I guess we probably want part of both in the end. I considered adding wrappers for encoding and decoding the timestamp like yours but in the end went with open-coding that part. The difference is pretty minimal, should we leave it with the open-coded addition?
One part that I was missing (as described in my changelog) is any versioning or feature flag, including the dynamically set sb->s_time_max value. From looking at your patch, I guess this is something we want.
Sorry I did not CC you nor Deepa nor y2038 list. I did not think you were going to actually deal with specific filesystems.
I was not going to, but in the process of cleaning up the remaining ioctls I came to xfs last week and thought it would be silly to extend the uapi but not the file system ;-)
I'd also like to hear people's thoughts about migration process. Should the new feature be ro_compat as I defined it or incompat?
If all agree that Arnd's format change is preferred, I can assist with xfsprogs patches, tests or whatnot.
Awesome, that would be great indeed!
Arnd
On Tue, Nov 12, 2019 at 01:09:09PM +0100, Arnd Bergmann wrote:
XFS is the only major file system that lacks timestamps beyond year 2038, and is already being deployed in systems that may have to be supported beyond that time.
Fortunately, the inode format still has a few reserved bits that can be used to extend the current format. There are two bits in the nanosecond portion that could be used in the same way that ext4 does, extending the timestamps until year 2378, as well as 12 unused bytes after the already allocated fields.
There are four timestamps that need to be extended, so using four bytes out of the reserved space gets us all the way until year 36676, by extending the current 1902-2036 with another 255 epochs, which seems to be a reasonable range.
I am not sure whether this change to the inode format requires a new version for the inode. All existing file system images remain compatible, while mounting a file systems with extended timestamps beyond 2038 would report that timestamp incorrectly in the 1902 through 2038 range, matching the traditional Linux behavior of wrapping timestamps.
Signed-off-by: Arnd Bergmann arnd@arndb.de
This is basically what I proposed ~5 years or so ago and posted a patch to implement it in an early y2038 discussion with you. I jsut mentioned that very patch in my reposnse to Amir's timestamp extension patchset, pointing out that this isn't the way we want to proceed with >y2038 on-disk support.
https://lore.kernel.org/linux-xfs/20191112161242.GA19334@infradead.org/T/#ma...
I'd suggest taking the discussion there....
Cheers,
Dave.
The quota handling in xfs is based around an in-memory representation of time_t, which overflows in year 2038 on 32-bit architectures, and an on-disk representation of __be32, which overflows in year 2106 based on interpreting the values as unsigned.
Extend both to allow for much longer times: the in-memory representation should just use time64_t and the on-disk representation has to live with the spare bits in struct xfs_disk_dquot. As there is an unused 32-bit field, and three time limits in it, allocating 8 bits per timeout seems appropriate.
Note: the quotactl() syscall is not affected by this, it has its own struct fs_disk_quota that may need a similar conversion.
Signed-off-by: Arnd Bergmann arnd@arndb.de --- fs/xfs/libxfs/xfs_dquot_buf.c | 6 +++--- fs/xfs/libxfs/xfs_format.h | 5 ++++- fs/xfs/xfs_dquot.c | 29 ++++++++++++++++++++--------- fs/xfs/xfs_qm.c | 18 ++++++++++++------ fs/xfs/xfs_qm.h | 6 +++--- fs/xfs/xfs_qm_syscalls.c | 16 +++++++++++----- fs/xfs/xfs_quotaops.c | 6 +++--- fs/xfs/xfs_trans_dquot.c | 17 +++++++++++------ 8 files changed, 67 insertions(+), 36 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_dquot_buf.c b/fs/xfs/libxfs/xfs_dquot_buf.c index e8bd688a4073..ee59c539f9ab 100644 --- a/fs/xfs/libxfs/xfs_dquot_buf.c +++ b/fs/xfs/libxfs/xfs_dquot_buf.c @@ -75,17 +75,17 @@ xfs_dquot_verify(
if (ddq->d_blk_softlimit && be64_to_cpu(ddq->d_bcount) > be64_to_cpu(ddq->d_blk_softlimit) && - !ddq->d_btimer) + !ddq->d_btimer && !ddq->d_btimer_high) return __this_address;
if (ddq->d_ino_softlimit && be64_to_cpu(ddq->d_icount) > be64_to_cpu(ddq->d_ino_softlimit) && - !ddq->d_itimer) + !ddq->d_itimer && !ddq->d_itimer_high) return __this_address;
if (ddq->d_rtb_softlimit && be64_to_cpu(ddq->d_rtbcount) > be64_to_cpu(ddq->d_rtb_softlimit) && - !ddq->d_rtbtimer) + !ddq->d_rtbtimer && !ddq->d_rtbtimer_high) return __this_address;
return NULL; diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index dc8d160775fb..83bd5166c0ee 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1168,7 +1168,10 @@ typedef struct xfs_disk_dquot { __be32 d_btimer; /* similar to above; for disk blocks */ __be16 d_iwarns; /* warnings issued wrt num inodes */ __be16 d_bwarns; /* warnings issued wrt disk blocks */ - __be32 d_pad0; /* 64 bit align */ + __u8 d_itimer_high; /* upper bits of d_itimer */ + __u8 d_btimer_high; /* upper bits of d_btimer */ + __u8 d_rtbtimer_high;/* upper bits of d_rtbtimer */ + __u8 d_pad0; /* 64 bit align */ __be64 d_rtb_hardlimit;/* absolute limit on realtime blks */ __be64 d_rtb_softlimit;/* preferred limit on RT disk blks */ __be64 d_rtbcount; /* realtime blocks owned */ diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index aeb95e7391c1..15b5a339f6df 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -116,6 +116,8 @@ xfs_qm_adjust_dqtimers( xfs_mount_t *mp, xfs_disk_dquot_t *d) { + time64_t timer; + ASSERT(d->d_id);
#ifdef DEBUG @@ -130,15 +132,17 @@ xfs_qm_adjust_dqtimers( be64_to_cpu(d->d_rtb_hardlimit)); #endif
- if (!d->d_btimer) { + if (!d->d_btimer && !d->d_btimer_high) { if ((d->d_blk_softlimit && (be64_to_cpu(d->d_bcount) > be64_to_cpu(d->d_blk_softlimit))) || (d->d_blk_hardlimit && (be64_to_cpu(d->d_bcount) > be64_to_cpu(d->d_blk_hardlimit)))) { - d->d_btimer = cpu_to_be32(get_seconds() + - mp->m_quotainfo->qi_btimelimit); + timer = ktime_get_real_seconds() + + mp->m_quotainfo->qi_btimelimit; + d->d_btimer = cpu_to_be32(lower_32_bits(timer)); + d->d_btimer_high = (u8)upper_32_bits(timer); } else { d->d_bwarns = 0; } @@ -150,18 +154,21 @@ xfs_qm_adjust_dqtimers( (be64_to_cpu(d->d_bcount) <= be64_to_cpu(d->d_blk_hardlimit)))) { d->d_btimer = 0; + d->d_btimer_high = 0; } }
- if (!d->d_itimer) { + if (!d->d_itimer && !d->d_itimer_high) { if ((d->d_ino_softlimit && (be64_to_cpu(d->d_icount) > be64_to_cpu(d->d_ino_softlimit))) || (d->d_ino_hardlimit && (be64_to_cpu(d->d_icount) > be64_to_cpu(d->d_ino_hardlimit)))) { - d->d_itimer = cpu_to_be32(get_seconds() + - mp->m_quotainfo->qi_itimelimit); + timer = ktime_get_real_seconds() + + mp->m_quotainfo->qi_itimelimit; + d->d_itimer = cpu_to_be32(lower_32_bits(timer)); + d->d_itimer_high = (u8)upper_32_bits(timer); } else { d->d_iwarns = 0; } @@ -173,18 +180,21 @@ xfs_qm_adjust_dqtimers( (be64_to_cpu(d->d_icount) <= be64_to_cpu(d->d_ino_hardlimit)))) { d->d_itimer = 0; + d->d_itimer_high = 0; } }
- if (!d->d_rtbtimer) { + if (!d->d_rtbtimer && !d->d_rtbtimer_high) { if ((d->d_rtb_softlimit && (be64_to_cpu(d->d_rtbcount) > be64_to_cpu(d->d_rtb_softlimit))) || (d->d_rtb_hardlimit && (be64_to_cpu(d->d_rtbcount) > be64_to_cpu(d->d_rtb_hardlimit)))) { - d->d_rtbtimer = cpu_to_be32(get_seconds() + - mp->m_quotainfo->qi_rtbtimelimit); + timer = ktime_get_real_seconds() + + mp->m_quotainfo->qi_rtbtimelimit; + d->d_rtbtimer = cpu_to_be32(lower_32_bits(timer)); + d->d_rtbtimer_high = (u8)upper_32_bits(timer); } else { d->d_rtbwarns = 0; } @@ -196,6 +206,7 @@ xfs_qm_adjust_dqtimers( (be64_to_cpu(d->d_rtbcount) <= be64_to_cpu(d->d_rtb_hardlimit)))) { d->d_rtbtimer = 0; + d->d_rtbtimer_high = 0; } } } diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index ecd8ce152ab1..afd0384850f9 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -613,12 +613,15 @@ xfs_qm_init_timelimits( * a user or group before he or she can not perform any * more writing. If it is zero, a default is used. */ - if (ddqp->d_btimer) - qinf->qi_btimelimit = be32_to_cpu(ddqp->d_btimer); - if (ddqp->d_itimer) - qinf->qi_itimelimit = be32_to_cpu(ddqp->d_itimer); - if (ddqp->d_rtbtimer) - qinf->qi_rtbtimelimit = be32_to_cpu(ddqp->d_rtbtimer); + if (ddqp->d_btimer || ddqp->d_btimer_high) + qinf->qi_btimelimit = be32_to_cpu(ddqp->d_btimer) + + ((u64)ddqp->d_btimer_high << 32); + if (ddqp->d_itimer || ddqp->d_itimer_high) + qinf->qi_itimelimit = be32_to_cpu(ddqp->d_itimer) + + ((u64)ddqp->d_itimer_high << 32); + if (ddqp->d_rtbtimer || ddqp->d_rtbtimer_high) + qinf->qi_rtbtimelimit = be32_to_cpu(ddqp->d_rtbtimer) + + ((u64)ddqp->d_rtbtimer_high << 32); if (ddqp->d_bwarns) qinf->qi_bwarnlimit = be16_to_cpu(ddqp->d_bwarns); if (ddqp->d_iwarns) @@ -867,8 +870,11 @@ xfs_qm_reset_dqcounts( ddq->d_icount = 0; ddq->d_rtbcount = 0; ddq->d_btimer = 0; + ddq->d_btimer_high = 0; ddq->d_itimer = 0; + ddq->d_itimer_high = 0; ddq->d_rtbtimer = 0; + ddq->d_rtbtimer_high = 0; ddq->d_bwarns = 0; ddq->d_iwarns = 0; ddq->d_rtbwarns = 0; diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h index b41b75089548..4742686d522e 100644 --- a/fs/xfs/xfs_qm.h +++ b/fs/xfs/xfs_qm.h @@ -64,9 +64,9 @@ typedef struct xfs_quotainfo { struct xfs_inode *qi_pquotaip; /* project quota inode */ struct list_lru qi_lru; int qi_dquots; - time_t qi_btimelimit; /* limit for blks timer */ - time_t qi_itimelimit; /* limit for inodes timer */ - time_t qi_rtbtimelimit;/* limit for rt blks timer */ + time64_t qi_btimelimit; /* limit for blks timer */ + time64_t qi_itimelimit; /* limit for inodes timer */ + time64_t qi_rtbtimelimit;/* limit for rt blks timer */ xfs_qwarncnt_t qi_bwarnlimit; /* limit for blks warnings */ xfs_qwarncnt_t qi_iwarnlimit; /* limit for inodes warnings */ xfs_qwarncnt_t qi_rtbwarnlimit;/* limit for rt blks warnings */ diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index da7ad0383037..8d7d075d7779 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -500,15 +500,18 @@ xfs_qm_scall_setqlim( */ if (newlim->d_fieldmask & QC_SPC_TIMER) { q->qi_btimelimit = newlim->d_spc_timer; - ddq->d_btimer = cpu_to_be32(newlim->d_spc_timer); + ddq->d_btimer = cpu_to_be32(lower_32_bits(newlim->d_spc_timer)); + ddq->d_btimer_high = (u8)upper_32_bits(newlim->d_spc_timer); } if (newlim->d_fieldmask & QC_INO_TIMER) { q->qi_itimelimit = newlim->d_ino_timer; - ddq->d_itimer = cpu_to_be32(newlim->d_ino_timer); + ddq->d_itimer = cpu_to_be32(lower_32_bits(newlim->d_ino_timer)); + ddq->d_itimer_high = (u8)upper_32_bits(newlim->d_ino_timer); } if (newlim->d_fieldmask & QC_RT_SPC_TIMER) { q->qi_rtbtimelimit = newlim->d_rt_spc_timer; ddq->d_rtbtimer = cpu_to_be32(newlim->d_rt_spc_timer); + ddq->d_rtbtimer_high = (u8)upper_32_bits(newlim->d_rt_spc_timer); } if (newlim->d_fieldmask & QC_SPC_WARNS) q->qi_bwarnlimit = newlim->d_spc_warns; @@ -623,8 +626,10 @@ xfs_qm_scall_getquota_fill_qc( dst->d_ino_softlimit = be64_to_cpu(dqp->q_core.d_ino_softlimit); dst->d_space = XFS_FSB_TO_B(mp, dqp->q_res_bcount); dst->d_ino_count = dqp->q_res_icount; - dst->d_spc_timer = be32_to_cpu(dqp->q_core.d_btimer); - dst->d_ino_timer = be32_to_cpu(dqp->q_core.d_itimer); + dst->d_spc_timer = be32_to_cpu(dqp->q_core.d_btimer) + + ((u64)dqp->q_core.d_btimer_high << 32); + dst->d_ino_timer = be32_to_cpu(dqp->q_core.d_itimer) + + ((u64)dqp->q_core.d_itimer_high << 32); dst->d_ino_warns = be16_to_cpu(dqp->q_core.d_iwarns); dst->d_spc_warns = be16_to_cpu(dqp->q_core.d_bwarns); dst->d_rt_spc_hardlimit = @@ -632,7 +637,8 @@ xfs_qm_scall_getquota_fill_qc( dst->d_rt_spc_softlimit = XFS_FSB_TO_B(mp, be64_to_cpu(dqp->q_core.d_rtb_softlimit)); dst->d_rt_space = XFS_FSB_TO_B(mp, dqp->q_res_rtbcount); - dst->d_rt_spc_timer = be32_to_cpu(dqp->q_core.d_rtbtimer); + dst->d_rt_spc_timer = be32_to_cpu(dqp->q_core.d_rtbtimer) + + ((u64)dqp->q_core.d_rtbtimer_high << 32); dst->d_rt_spc_warns = be16_to_cpu(dqp->q_core.d_rtbwarns);
/* diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c index cd6c7210a373..96c3818b27ad 100644 --- a/fs/xfs/xfs_quotaops.c +++ b/fs/xfs/xfs_quotaops.c @@ -37,9 +37,9 @@ xfs_qm_fill_state( tstate->flags |= QCI_SYSFILE; tstate->blocks = ip->i_d.di_nblocks; tstate->nextents = ip->i_d.di_nextents; - tstate->spc_timelimit = q->qi_btimelimit; - tstate->ino_timelimit = q->qi_itimelimit; - tstate->rt_spc_timelimit = q->qi_rtbtimelimit; + tstate->spc_timelimit = (u32)q->qi_btimelimit; + tstate->ino_timelimit = (u32)q->qi_itimelimit; + tstate->rt_spc_timelimit = (u32)q->qi_rtbtimelimit; tstate->spc_warnlimit = q->qi_bwarnlimit; tstate->ino_warnlimit = q->qi_iwarnlimit; tstate->rt_spc_warnlimit = q->qi_rtbwarnlimit; diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index 16457465833b..6efca54e0edb 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -580,7 +580,7 @@ xfs_trans_dqresv( { xfs_qcnt_t hardlimit; xfs_qcnt_t softlimit; - time_t timer; + time64_t timer; xfs_qwarncnt_t warns; xfs_qwarncnt_t warnlimit; xfs_qcnt_t total_count; @@ -600,7 +600,8 @@ xfs_trans_dqresv( softlimit = be64_to_cpu(dqp->q_core.d_blk_softlimit); if (!softlimit) softlimit = defq->bsoftlimit; - timer = be32_to_cpu(dqp->q_core.d_btimer); + timer = be32_to_cpu(dqp->q_core.d_btimer) + + ((u64)dqp->q_core.d_btimer_high << 32); warns = be16_to_cpu(dqp->q_core.d_bwarns); warnlimit = dqp->q_mount->m_quotainfo->qi_bwarnlimit; resbcountp = &dqp->q_res_bcount; @@ -612,7 +613,8 @@ xfs_trans_dqresv( softlimit = be64_to_cpu(dqp->q_core.d_rtb_softlimit); if (!softlimit) softlimit = defq->rtbsoftlimit; - timer = be32_to_cpu(dqp->q_core.d_rtbtimer); + timer = be32_to_cpu(dqp->q_core.d_rtbtimer) + + ((u64)dqp->q_core.d_rtbtimer_high << 32); warns = be16_to_cpu(dqp->q_core.d_rtbwarns); warnlimit = dqp->q_mount->m_quotainfo->qi_rtbwarnlimit; resbcountp = &dqp->q_res_rtbcount; @@ -635,7 +637,8 @@ xfs_trans_dqresv( goto error_return; } if (softlimit && total_count > softlimit) { - if ((timer != 0 && get_seconds() > timer) || + if ((timer != 0 && + ktime_get_real_seconds() > timer) || (warns != 0 && warns >= warnlimit)) { xfs_quota_warn(mp, dqp, QUOTA_NL_BSOFTLONGWARN); @@ -647,7 +650,8 @@ xfs_trans_dqresv( } if (ninos > 0) { total_count = be64_to_cpu(dqp->q_core.d_icount) + ninos; - timer = be32_to_cpu(dqp->q_core.d_itimer); + timer = be32_to_cpu(dqp->q_core.d_itimer) + + ((u64)dqp->q_core.d_itimer_high << 32); warns = be16_to_cpu(dqp->q_core.d_iwarns); warnlimit = dqp->q_mount->m_quotainfo->qi_iwarnlimit; hardlimit = be64_to_cpu(dqp->q_core.d_ino_hardlimit); @@ -662,7 +666,8 @@ xfs_trans_dqresv( goto error_return; } if (softlimit && total_count > softlimit) { - if ((timer != 0 && get_seconds() > timer) || + if ((timer != 0 && + ktime_get_real_seconds() > timer) || (warns != 0 && warns >= warnlimit)) { xfs_quota_warn(mp, dqp, QUOTA_NL_ISOFTLONGWARN);