This is the basic framework for the 64 bit time migration for filesystems. There might be some changes or additions required as I start adapting to filesystems.
This gives the basic high level concept so that we can start discussing.
Actual changes to vfs and other file systems will be in a separate series.
Changes since v1: * struct inode_time added unconditionally * uniform use of CONFIG_FS_USES_64BIT_TIME
Deepa Dinamani (4): fs: vfs: add accessors for inode times fs: Add new data type for inode times fs: Add support for 64 bit time fs: macros and functions support 64 bit time
include/linux/fs.h | 42 +++++++++++++++++++++++++++++++++++------- include/linux/stat.h | 12 +++++++++--- include/linux/time.h | 2 ++ include/linux/time64.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/time/time.c | 28 ++++++++++++++++++++++++++++ 5 files changed, 124 insertions(+), 10 deletions(-)
This is in preparation for the y2038 changes for the vfs. vfs inode timestamps overflow after 2038.
This abstracts the timestamp representation so that any logic to convert between the struct inode timestamps and other interfaces can be placed here.
The plan is to globally change all references to these types through macros only. So when the actually internal representation changes, it will be transparent everywhere outside of the macros.
The internal representation changes will support larger timestamp range.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com --- include/linux/fs.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 14ffad4..ab8799c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -679,6 +679,11 @@ struct inode { void *i_private; /* fs or device private pointer */ };
+#define FS_INODE_SET_XTIME(xtime, inode, ts) \ + ((inode)->i_##xtime = (ts)) +#define FS_INODE_GET_XTIME(xtime, inode) \ + ((inode)->i_##xtime) + static inline int inode_unhashed(struct inode *inode) { return hlist_unhashed(&inode->i_hash);
On Sunday 06 December 2015 22:04:03 Deepa Dinamani wrote:
This is in preparation for the y2038 changes for the vfs. vfs inode timestamps overflow after 2038.
This abstracts the timestamp representation so that any logic to convert between the struct inode timestamps and other interfaces can be placed here.
The plan is to globally change all references to these types through macros only. So when the actually internal representation changes, it will be transparent everywhere outside of the macros.
The internal representation changes will support larger timestamp range.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
Hi Deepa,
Can you give some more background how you plan to use these macros? With the current version, you seem to depend on the 'ts' argument to be the same type as i_##xtime, so we'd still have to change the all at the same time, right?
Arnd
The current representation of inode times in struct inode: struct timespec is not y2038 safe.
The 64 bit counterpart of struct timespec: struct timespec64 suffers from the shortcoming that the data type sizes are different on 32 bit and 64 bit systems.
Introduce a new struct inode_time to overcome the above limitations.
Also add time conversion api's between struct timespec64 and struct inode_time. This is required as the 64-bit time functions typically return struct timespec64 types.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com --- include/linux/time64.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+)
diff --git a/include/linux/time64.h b/include/linux/time64.h index 367d5af..bb574b8 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -13,6 +13,16 @@ typedef __s64 time64_t; #if __BITS_PER_LONG == 64 # define timespec64 timespec #define itimerspec64 itimerspec + +/* + * Internal kernel representation of inode time fields. + * This structure is not exposed to userspace. + * Use struct timespec64 representation for all userspace. + */ +struct inode_time { + time64_t tv_sec; + s32 tv_nsec; +} __aligned(4) __packed; #else struct timespec64 { time64_t tv_sec; /* seconds */ @@ -24,6 +34,7 @@ struct itimerspec64 { struct timespec64 it_value; };
+#define inode_time timespec64 #endif
/* Parameters used to convert the timespec values: */ @@ -42,6 +53,28 @@ struct itimerspec64 {
#if __BITS_PER_LONG == 64
+static inline struct inode_time +timespec64_to_inode_time(const struct timespec64 ts64) +{ + struct inode_time ret; + + ret.tv_sec = ts64.tv_sec; + ret.tv_nsec = ts64.tv_nsec; + + return ret; +} + +static inline struct timespec64 +inode_time_to_timespec64(const struct inode_time itime) +{ + struct timespec64 ret; + + ret.tv_sec = itime.tv_sec; + ret.tv_nsec = itime.tv_nsec; + + return ret; +} + static inline struct timespec timespec64_to_timespec(const struct timespec64 ts64) { return ts64; @@ -76,6 +109,18 @@ static inline struct itimerspec64 itimerspec_to_itimerspec64(struct itimerspec *
#else
+static inline struct inode_time +timespec64_to_inode_time(const struct timespec64 ts64) +{ + return ts64; +} + +static inline struct timespec64 +inode_time_to_timespec64(const struct inode_time itime) +{ + return itime; +} + static inline struct timespec timespec64_to_timespec(const struct timespec64 ts64) { struct timespec ret;
On Sunday 06 December 2015 22:04:04 Deepa Dinamani wrote:
The current representation of inode times in struct inode: struct timespec is not y2038 safe.
The 64 bit counterpart of struct timespec: struct timespec64 suffers from the shortcoming that the data type sizes are different on 32 bit and 64 bit systems.
Introduce a new struct inode_time to overcome the above limitations.
Also add time conversion api's between struct timespec64 and struct inode_time. This is required as the 64-bit time functions typically return struct timespec64 types.
I was hoping we could avoid adding another time type for the file systems. What is the problem with the different sizes in timespec64? Only the nanosecond portion is different, and we sanitize the values in there when coming from user space so we are guaranteed that the nanoseconds are zero or a positive less than 1 billion, so it always fits in either one.
Arnd
The current representation of inode times in struct inode, struct iattr, struct kstat: struct timespec are not y2038 safe.
Add provision to convert them to use 64 bit times in the future.
struct inode uses struct inode time to maintain same size for times across 32 bit and 64 bit architectures. structs iattr and kstat have no such requirement since these data types are only used to pass values in and out of vfs layer.
In addition, inode_time is defined as packed and aligned to a 4 byte boundary to make the structure use 12 bytes each instead of 16 bytes. This will help save RAM space as inode structure is cached in memory. The other structures are transient and such a change would not be beneficial.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com --- include/linux/fs.h | 33 ++++++++++++++++++++++++++------- include/linux/stat.h | 12 +++++++++--- 2 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index ab8799c..c56068f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -245,13 +245,19 @@ typedef void (dax_iodone_t)(struct buffer_head *bh_map, int uptodate); */ struct iattr { unsigned int ia_valid; - umode_t ia_mode; - kuid_t ia_uid; - kgid_t ia_gid; - loff_t ia_size; - struct timespec ia_atime; - struct timespec ia_mtime; - struct timespec ia_ctime; + umode_t ia_mode; + kuid_t ia_uid; + kgid_t ia_gid; + loff_t ia_size; +#ifdef CONFIG_FS_USES_64BIT_TIME + struct timespec64 ia_atime; + struct timespec64 ia_mtime; + struct timespec64 ia_ctime; +#else + struct timespec ia_atime; + struct timespec ia_mtime; + struct timespec ia_ctime; +#endif
/* * Not an attribute, but an auxiliary info for filesystems wanting to @@ -616,9 +622,15 @@ struct inode { }; dev_t i_rdev; loff_t i_size; +#ifdef CONFIG_FS_USES_64BIT_TIME + struct inode_time i_atime; + struct inode_time i_mtime; + struct inode_time i_ctime; +#else struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; +#endif spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; unsigned int i_blkbits; @@ -679,10 +691,17 @@ struct inode { void *i_private; /* fs or device private pointer */ };
+#ifdef CONFIG_FS_USES_64BIT_TIME +#define FS_INODE_SET_XTIME(xtime, inode, ts64) \ + ((inode)->i_##xtime = timespec64_to_inode_time(ts64)) +#define FS_INODE_GET_XTIME(xtime, inode) \ + (inode_time_to_timespec64((inode)->i_##xtime)) +#else #define FS_INODE_SET_XTIME(xtime, inode, ts) \ ((inode)->i_##xtime = (ts)) #define FS_INODE_GET_XTIME(xtime, inode) \ ((inode)->i_##xtime) +#endif
static inline int inode_unhashed(struct inode *inode) { diff --git a/include/linux/stat.h b/include/linux/stat.h index 075cb0c..e3443a9 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -27,9 +27,15 @@ struct kstat { kgid_t gid; dev_t rdev; loff_t size; - struct timespec atime; - struct timespec mtime; - struct timespec ctime; +#ifdef CONFIG_FS_USES_64BIT_TIME + struct timespec64 atime; + struct timespec64 mtime; + struct timespec64 ctime; +#else + struct timespec atime; + struct timespec mtime; + struct timespec ctime; +#endif unsigned long blksize; unsigned long long blocks; };
On Sunday 06 December 2015 22:04:05 Deepa Dinamani wrote:
The current representation of inode times in struct inode, struct iattr, struct kstat: struct timespec are not y2038 safe.
Add provision to convert them to use 64 bit times in the future.
struct inode uses struct inode time to maintain same size for times across 32 bit and 64 bit architectures. structs iattr and kstat have no such requirement since these data types are only used to pass values in and out of vfs layer.
Ok, ignore my comment on patch 1, I should have read all the way until here ;-)
I think you can safely merge patch 1 into patch 3, it will actually be clearer that way.
In addition, inode_time is defined as packed and aligned to a 4 byte boundary to make the structure use 12 bytes each instead of 16 bytes. This will help save RAM space as inode structure is cached in memory. The other structures are transient and such a change would not be beneficial.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
Ah, this also explains the use of inode_time: it's much nicer if we don't need it in the file systems but only in struct inode.
@@ -616,9 +622,15 @@ struct inode { }; dev_t i_rdev; loff_t i_size; +#ifdef CONFIG_FS_USES_64BIT_TIME
- struct inode_time i_atime;
- struct inode_time i_mtime;
- struct inode_time i_ctime;
+#else struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; +#endif spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; unsigned int i_blkbits; @@ -679,10 +691,17 @@ struct inode { void *i_private; /* fs or device private pointer */ }; +#ifdef CONFIG_FS_USES_64BIT_TIME +#define FS_INODE_SET_XTIME(xtime, inode, ts64) \
- ((inode)->i_##xtime = timespec64_to_inode_time(ts64))
+#define FS_INODE_GET_XTIME(xtime, inode) \
- (inode_time_to_timespec64((inode)->i_##xtime))
+#else #define FS_INODE_SET_XTIME(xtime, inode, ts) \ ((inode)->i_##xtime = (ts)) #define FS_INODE_GET_XTIME(xtime, inode) \ ((inode)->i_##xtime) +#endif
So this makes a lot of sense now. If it's the only use of timespec64_to_inode_time/inode_time_to_timespec64, you could also open-code that in the macro and avoid introducing the macro to start with.
We have had some discussion about limiting the range of the times in the inode to the range allowed by the file system at some point, so leaving them as inline functions will make it easier to extend them later that way.
static inline int inode_unhashed(struct inode *inode) { diff --git a/include/linux/stat.h b/include/linux/stat.h index 075cb0c..e3443a9 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -27,9 +27,15 @@ struct kstat { kgid_t gid; dev_t rdev; loff_t size;
- struct timespec atime;
- struct timespec mtime;
- struct timespec ctime;
+#ifdef CONFIG_FS_USES_64BIT_TIME
- struct timespec64 atime;
- struct timespec64 mtime;
- struct timespec64 ctime;
+#else
- struct timespec atime;
- struct timespec mtime;
- struct timespec ctime;
+#endif unsigned long blksize; unsigned long long blocks; };
Arnd
On Mon, Dec 7, 2015 at 1:30 AM, Arnd Bergmann arnd@arndb.de wrote:
On Sunday 06 December 2015 22:04:05 Deepa Dinamani wrote:
The current representation of inode times in struct inode, struct iattr, struct kstat: struct timespec are not y2038 safe.
Add provision to convert them to use 64 bit times in the future.
struct inode uses struct inode time to maintain same size for times across 32 bit and 64 bit architectures. structs iattr and kstat have no such requirement since these data types are only used to pass values in and out of vfs layer.
Ok, ignore my comment on patch 1, I should have read all the way until here ;-)
I think you can safely merge patch 1 into patch 3, it will actually be clearer that way.
will do
In addition, inode_time is defined as packed and aligned to a 4 byte boundary to make the structure use 12 bytes each instead of 16 bytes. This will help save RAM space as inode structure is cached in memory. The other structures are transient and such a change would not be beneficial.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
Ah, this also explains the use of inode_time: it's much nicer if we don't need it in the file systems but only in struct inode.
This is the plan. I did not have to use this inside of vfs anywhere else.
@@ -616,9 +622,15 @@ struct inode { }; dev_t i_rdev; loff_t i_size; +#ifdef CONFIG_FS_USES_64BIT_TIME
struct inode_time i_atime;
struct inode_time i_mtime;
struct inode_time i_ctime;
+#else struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; +#endif spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; unsigned int i_blkbits; @@ -679,10 +691,17 @@ struct inode { void *i_private; /* fs or device private pointer */ };
+#ifdef CONFIG_FS_USES_64BIT_TIME +#define FS_INODE_SET_XTIME(xtime, inode, ts64) \
((inode)->i_##xtime = timespec64_to_inode_time(ts64))
+#define FS_INODE_GET_XTIME(xtime, inode) \
(inode_time_to_timespec64((inode)->i_##xtime))
+#else #define FS_INODE_SET_XTIME(xtime, inode, ts) \ ((inode)->i_##xtime = (ts)) #define FS_INODE_GET_XTIME(xtime, inode) \ ((inode)->i_##xtime) +#endif
So this makes a lot of sense now. If it's the only use of timespec64_to_inode_time/inode_time_to_timespec64, you could also open-code that in the macro and avoid introducing the macro to start with.
This is how I had it to begin with. But, I think having this in separate function makes the code more modular. Are we trying to minimize number of functions?
We have had some discussion about limiting the range of the times in the inode to the range allowed by the file system at some point, so leaving them as inline functions will make it easier to extend them later that way.
Macros could be extended to call these functions or directly include range checking from superblock meta data?
I do not prefer using macros in general. But, in this instance I think it makes sense as we can avoid having 3 functions because of ## operator.
We could also probably pass in the actual inode_times to static inline functions and that could so away with macros. Is this what you meant?
-Deepa
On Monday 07 December 2015 10:29:35 Deepa Dinamani wrote:
On Mon, Dec 7, 2015 at 1:30 AM, Arnd Bergmann arnd@arndb.de wrote:
On Sunday 06 December 2015 22:04:05 Deepa Dinamani wrote:
In addition, inode_time is defined as packed and aligned to a 4 byte boundary to make the structure use 12 bytes each instead of 16 bytes. This will help save RAM space as inode structure is cached in memory. The other structures are transient and such a change would not be beneficial.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
Ah, this also explains the use of inode_time: it's much nicer if we don't need it in the file systems but only in struct inode.
This is the plan. I did not have to use this inside of vfs anywhere else.
Ok, good.
@@ -616,9 +622,15 @@ struct inode { }; dev_t i_rdev; loff_t i_size; +#ifdef CONFIG_FS_USES_64BIT_TIME
struct inode_time i_atime;
struct inode_time i_mtime;
struct inode_time i_ctime;
+#else struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; +#endif spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; unsigned int i_blkbits; @@ -679,10 +691,17 @@ struct inode { void *i_private; /* fs or device private pointer */ };
+#ifdef CONFIG_FS_USES_64BIT_TIME +#define FS_INODE_SET_XTIME(xtime, inode, ts64) \
((inode)->i_##xtime = timespec64_to_inode_time(ts64))
+#define FS_INODE_GET_XTIME(xtime, inode) \
(inode_time_to_timespec64((inode)->i_##xtime))
+#else #define FS_INODE_SET_XTIME(xtime, inode, ts) \ ((inode)->i_##xtime = (ts)) #define FS_INODE_GET_XTIME(xtime, inode) \ ((inode)->i_##xtime) +#endif
So this makes a lot of sense now. If it's the only use of timespec64_to_inode_time/inode_time_to_timespec64, you could also open-code that in the macro and avoid introducing the macro to start with.
This is how I had it to begin with. But, I think having this in separate function makes the code more modular. Are we trying to minimize number of functions?
Generally the optimization should prefer readability and performance. I was thinking here that the code is easier to follow if there is only one intermediate step that we introduce rather than two.
Another option would be to drop the structure entirely and put the members directly into the inode as i_atime_sec/i_atime_nsec/i_mtime_sec/ etc. That would also avoid marking the 64-bit members as 'packed', which may be inefficient on some CPU architectures.
We have had some discussion about limiting the range of the times in the inode to the range allowed by the file system at some point, so leaving them as inline functions will make it easier to extend them later that way.
Macros could be extended to call these functions or directly include range checking from superblock meta data?
I do not prefer using macros in general. But, in this instance I think it makes sense as we can avoid having 3 functions because of ## operator.
We could also probably pass in the actual inode_times to static inline functions and that could so away with macros. Is this what you meant?
Yes, that would be one way to handle it. As long as we have a macro or inline function abstraction, we can hide it in there.
Not needing macros might actually be nice too, so we could have a set of inline functions instead, like
static inline struct timespec64 inode_get_atime(struct inode *inode) { return (struct timespec64) { .tv_sec = inode->i_atime_sec, .tv_nsec = inode->i_atime_nsec, }; }
static inline void inode_set_atime(struct inode *inode, struct timespec64 *t) { inode->i_atime_sec = t->tv_sec; inode->i_atime_nsec = 0; }
There are only three of them, so it's not much more code than the macros, and those are certainly more readable and easier to extend.
There is also some code already in timespec_trunc that limits the precision to the resolution supported by the file system in the setattr functions. We can extend timespec_trunc to also limit the range, as long as it is used consistently.
Arnd
@@ -616,9 +622,15 @@ struct inode { }; dev_t i_rdev; loff_t i_size; +#ifdef CONFIG_FS_USES_64BIT_TIME
struct inode_time i_atime;
struct inode_time i_mtime;
struct inode_time i_ctime;
+#else struct timespec i_atime; struct timespec i_mtime; struct timespec i_ctime; +#endif spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; unsigned int i_blkbits; @@ -679,10 +691,17 @@ struct inode { void *i_private; /* fs or device private pointer */ };
+#ifdef CONFIG_FS_USES_64BIT_TIME +#define FS_INODE_SET_XTIME(xtime, inode, ts64) \
((inode)->i_##xtime = timespec64_to_inode_time(ts64))
+#define FS_INODE_GET_XTIME(xtime, inode) \
(inode_time_to_timespec64((inode)->i_##xtime))
+#else #define FS_INODE_SET_XTIME(xtime, inode, ts) \ ((inode)->i_##xtime = (ts)) #define FS_INODE_GET_XTIME(xtime, inode) \ ((inode)->i_##xtime) +#endif
So this makes a lot of sense now. If it's the only use of timespec64_to_inode_time/inode_time_to_timespec64, you could also open-code that in the macro and avoid introducing the macro to start with.
This is how I had it to begin with. But, I think having this in separate function makes the code more modular. Are we trying to minimize number of functions?
Generally the optimization should prefer readability and performance. I was thinking here that the code is easier to follow if there is only one intermediate step that we introduce rather than two.
Another option would be to drop the structure entirely and put the members directly into the inode as i_atime_sec/i_atime_nsec/i_mtime_sec/ etc. That would also avoid marking the 64-bit members as 'packed', which may be inefficient on some CPU architectures.
I considered this option. This would need weird way of adding the fields: iatime_sec; imtime_sec; ictime_sec; iatime_nsec; iatime_nsec; ictime_nsec;
And based on if anybody rearranges these fields while adding new ones, we might have different kind of packing. While the struct is guaranteed to have 12 bytes always.
We have had some discussion about limiting the range of the times in the inode to the range allowed by the file system at some point, so leaving them as inline functions will make it easier to extend them later that way.
Macros could be extended to call these functions or directly include range checking from superblock meta data?
I do not prefer using macros in general. But, in this instance I think it makes sense as we can avoid having 3 functions because of ## operator.
We could also probably pass in the actual inode_times to static inline functions and that could so away with macros. Is this what you meant?
Yes, that would be one way to handle it. As long as we have a macro or inline function abstraction, we can hide it in there.
Not needing macros might actually be nice too, so we could have a set of inline functions instead, like
static inline struct timespec64 inode_get_atime(struct inode *inode) { return (struct timespec64) { .tv_sec = inode->i_atime_sec, .tv_nsec = inode->i_atime_nsec, }; }
static inline void inode_set_atime(struct inode *inode, struct timespec64 *t) { inode->i_atime_sec = t->tv_sec; inode->i_atime_nsec = 0; }
There are only three of them, so it's not much more code than the macros, and those are certainly more readable and easier to extend.
I can go ahead and change it to use only inline functions instead of macros. I was thinking of this:
static inline struct timespec64 inode_get_inode_time(struct inode_time atime)
These can have definitions similar to that of inode_time and ts64 conversions I have now.
I will pair them with the inode struct as well and not put them in time64.h.
There is also some code already in timespec_trunc that limits the precision to the resolution supported by the file system in the setattr functions. We can extend timespec_trunc to also limit the range, as long as it is used consistently.
Ok. Yes, this seems like a good place to do the range checking. I will keep this in mind when I post an update.
Here is my plan:
1.Update the patches according to above comments. 2.Global vfs substitutions with inline functions. 3. Will extend ext4 and fat to support these inline functions as well. 4. Will use these file systems to do some testing. 5. Send out all the above patches for review together.
-Deepa
On Friday 11 December 2015 20:47:02 Deepa Dinamani wrote:
So this makes a lot of sense now. If it's the only use of timespec64_to_inode_time/inode_time_to_timespec64, you could also open-code that in the macro and avoid introducing the macro to start with.
This is how I had it to begin with. But, I think having this in separate function makes the code more modular. Are we trying to minimize number of functions?
Generally the optimization should prefer readability and performance. I was thinking here that the code is easier to follow if there is only one intermediate step that we introduce rather than two.
Another option would be to drop the structure entirely and put the members directly into the inode as i_atime_sec/i_atime_nsec/i_mtime_sec/ etc. That would also avoid marking the 64-bit members as 'packed', which may be inefficient on some CPU architectures.
I considered this option. This would need weird way of adding the fields: iatime_sec; imtime_sec; ictime_sec; iatime_nsec; iatime_nsec; ictime_nsec;
Yes, this is what I had in mind. In David Howells' xstat syscall series, they are done the same way, including in user space. The main advantage is that all fields are naturally aligned, so they take a couple of cycles less to access than a misaligned 64-bit field on certain architectures.
And based on if anybody rearranges these fields while adding new ones, we might have different kind of packing. While the struct is guaranteed to have 12 bytes always.
The inode structure is generally highly optimized, we wouldn't mindlessly rearrange things in general. We can definitely try out both, and it's easy to change afterwards when more comments come in, as long as the accessor is able to cope with either format.
Yes, that would be one way to handle it. As long as we have a macro or inline function abstraction, we can hide it in there.
Not needing macros might actually be nice too, so we could have a set of inline functions instead, like
static inline struct timespec64 inode_get_atime(struct inode *inode) { return (struct timespec64) { .tv_sec = inode->i_atime_sec, .tv_nsec = inode->i_atime_nsec, }; }
static inline void inode_set_atime(struct inode *inode, struct timespec64 *t) { inode->i_atime_sec = t->tv_sec; inode->i_atime_nsec = 0; }
There are only three of them, so it's not much more code than the macros, and those are certainly more readable and easier to extend.
I can go ahead and change it to use only inline functions instead of macros. I was thinking of this:
static inline struct timespec64 inode_get_inode_time(struct inode_time atime)
These can have definitions similar to that of inode_time and ts64 conversions I have now.
This would replace the inode_time_to_timespec64() function, and have the same definition, right?
I will pair them with the inode struct as well and not put them in time64.h.
It's nice how you get around naming the specific time field with a string concatenating macro like this, but I wonder if we also need to pass the inode here. We probably need to pass the inode or the superblock in the place where we assign the time, to be able to limit the range and resolution to the file system's capabilities, but I don't know if we also might need the inode when getting the data out again. Probably not, but we should think about it some more (and not add it unless we believe that it's actually needed).
With your approach, another advantage is how any code that assigns i_?time to current_fs_time() will not have to be modified, while separate fields do need the modification.
There is also some code already in timespec_trunc that limits the precision to the resolution supported by the file system in the setattr functions. We can extend timespec_trunc to also limit the range, as long as it is used consistently.
Ok. Yes, this seems like a good place to do the range checking. I will keep this in mind when I post an update.
Here is my plan:
1.Update the patches according to above comments. 2.Global vfs substitutions with inline functions. 3. Will extend ext4 and fat to support these inline functions as well. 4. Will use these file systems to do some testing. 5. Send out all the above patches for review together.
Two more thoughts:
I keep thinking that we need to do the same change for all three structures (inode, iattr, kstat), so we only have to modify each file system once. You currently use timespec64 in iattr and kstat, which is nicer in principle, but it means that we need an extra set of abtractions to read/write those. If we go with your inode_time method, it's probably best to do it in all three structures, while using separate fields, I guess we would need an extra layer of macros around them.
It would also be nice to have a generic way to get/set just the seconds portion with a helper, so we don't need to use a timespec64 at all in file systems that simply deal with whole seconds.
Arnd
On Mon, Dec 14, 2015 at 3:52 AM, Arnd Bergmann arnd@arndb.de wrote:
On Friday 11 December 2015 20:47:02 Deepa Dinamani wrote:
So this makes a lot of sense now. If it's the only use of timespec64_to_inode_time/inode_time_to_timespec64, you could also open-code that in the macro and avoid introducing the macro to start with.
This is how I had it to begin with. But, I think having this in separate function makes the code more modular. Are we trying to minimize number of functions?
Generally the optimization should prefer readability and performance. I was thinking here that the code is easier to follow if there is only one intermediate step that we introduce rather than two.
Another option would be to drop the structure entirely and put the members directly into the inode as i_atime_sec/i_atime_nsec/i_mtime_sec/ etc. That would also avoid marking the 64-bit members as 'packed', which may be inefficient on some CPU architectures.
I considered this option. This would need weird way of adding the fields: iatime_sec; imtime_sec; ictime_sec; iatime_nsec; iatime_nsec; ictime_nsec;
Yes, this is what I had in mind. In David Howells' xstat syscall series, they are done the same way, including in user space. The main advantage is that all fields are naturally aligned, so they take a couple of cycles less to access than a misaligned 64-bit field on certain architectures.
And based on if anybody rearranges these fields while adding new ones, we might have different kind of packing. While the struct is guaranteed to have 12 bytes always.
The inode structure is generally highly optimized, we wouldn't mindlessly rearrange things in general. We can definitely try out both, and it's easy to change afterwards when more comments come in, as long as the accessor is able to cope with either format.
Ok. In that case I can change it to be this way. If we change only the first field to be packed, this would also be a hack since we know that there will be three of them. Making the whole struct packed could take more performance hit as you said. I did not check if compiler could guess the size of second field because of aligned. I'm yet to do more research/ look at disassembly.
The only advantage of the structure is grouping of the two fields: sec and nsec together. So I would say it increases readability.
But, there are filesystems that access them this way as individual fields. So we could pick performance in this case?
Yes, that would be one way to handle it. As long as we have a macro or inline function abstraction, we can hide it in there.
Not needing macros might actually be nice too, so we could have a set of inline functions instead, like
static inline struct timespec64 inode_get_atime(struct inode *inode) { return (struct timespec64) { .tv_sec = inode->i_atime_sec, .tv_nsec = inode->i_atime_nsec, }; }
static inline void inode_set_atime(struct inode *inode, struct timespec64 *t) { inode->i_atime_sec = t->tv_sec; inode->i_atime_nsec = 0; }
There are only three of them, so it's not much more code than the macros, and those are certainly more readable and easier to extend.
I can go ahead and change it to use only inline functions instead of macros. I was thinking of this:
static inline struct timespec64 inode_get_inode_time(struct inode_time atime)
These can have definitions similar to that of inode_time and ts64 conversions I have now.
This would replace the inode_time_to_timespec64() function, and have the same definition, right?
I will pair them with the inode struct as well and not put them in time64.h.
It's nice how you get around naming the specific time field with a string concatenating macro like this, but I wonder if we also need to pass the inode here. We probably need to pass the inode or the superblock in the place where we assign the time, to be able to limit the range and resolution to the file system's capabilities, but I don't know if we also might need the inode when getting the data out again. Probably not, but we should think about it some more (and not add it unless we believe that it's actually needed).
I was thinking if there is a way to only use current_fs_time() to assign times and all the range checking could be done this way. I was able to do this in all of vfs code except one function: simple_set_acl() This I'm still looking into.
I don't know if range checking is needed for get functions. It might only be needed if it is updated between set and get. I will check if this is possible.
With your approach, another advantage is how any code that assigns i_?time to current_fs_time() will not have to be modified, while separate fields do need the modification.
There is also some code already in timespec_trunc that limits the precision to the resolution supported by the file system in the setattr functions. We can extend timespec_trunc to also limit the range, as long as it is used consistently.
Ok. Yes, this seems like a good place to do the range checking. I will keep this in mind when I post an update.
Here is my plan:
1.Update the patches according to above comments. 2.Global vfs substitutions with inline functions. 3. Will extend ext4 and fat to support these inline functions as well. 4. Will use these file systems to do some testing. 5. Send out all the above patches for review together.
Two more thoughts:
I keep thinking that we need to do the same change for all three structures (inode, iattr, kstat), so we only have to modify each file system once. You currently use timespec64 in iattr and kstat, which is nicer in principle, but it means that we need an extra set of abtractions to read/write those. If we go with your inode_time method, it's probably best to do it in all three structures, while using separate fields, I guess we would need an extra layer of macros around them.
I do not plan to use macros for iattr and kstat. This I kind of got lucky because it uses same field names as timespec ( it is timespec in 64 bit arch).
I'm modifying them concurrently and do not have a separate step. I could post the vfs patch as its done except for the one function I mentioned above.
It would also be nice to have a generic way to get/set just the seconds portion with a helper, so we don't need to use a timespec64 at all in file systems that simply deal with whole seconds.
I only noticed this problem when I started with ext4. Even though they allow higher granularity, they seem to be treating sec and nsec field as independent values. I need to do a couple of more fs to do this correctly. This is why I picked FAT next.
-Deepa
On Monday 14 December 2015 19:20:24 Deepa Dinamani wrote:
On Mon, Dec 14, 2015 at 3:52 AM, Arnd Bergmann arnd@arndb.de wrote:
On Friday 11 December 2015 20:47:02 Deepa Dinamani wrote:
And based on if anybody rearranges these fields while adding new ones, we might have different kind of packing. While the struct is guaranteed to have 12 bytes always.
The inode structure is generally highly optimized, we wouldn't mindlessly rearrange things in general. We can definitely try out both, and it's easy to change afterwards when more comments come in, as long as the accessor is able to cope with either format.
Ok. In that case I can change it to be this way. If we change only the first field to be packed, this would also be a hack since we know that there will be three of them. Making the whole struct packed could take more performance hit as you said. I did not check if compiler could guess the size of second field because of aligned. I'm yet to do more research/ look at disassembly.
The only advantage of the structure is grouping of the two fields: sec and nsec together. So I would say it increases readability.
But, there are filesystems that access them this way as individual fields. So we could pick performance in this case?
In any case, we should only ever need to mark the structure as __attibute__((packed, aligned(4))), and the effect of marking the first member or the entire struct is exactly the same then.
Yes, that would be one way to handle it. As long as we have a macro or inline function abstraction, we can hide it in there.
Not needing macros might actually be nice too, so we could have a set of inline functions instead, like
static inline struct timespec64 inode_get_atime(struct inode *inode) { return (struct timespec64) { .tv_sec = inode->i_atime_sec, .tv_nsec = inode->i_atime_nsec, }; }
static inline void inode_set_atime(struct inode *inode, struct timespec64 *t) { inode->i_atime_sec = t->tv_sec; inode->i_atime_nsec = 0; }
There are only three of them, so it's not much more code than the macros, and those are certainly more readable and easier to extend.
I can go ahead and change it to use only inline functions instead of macros. I was thinking of this:
static inline struct timespec64 inode_get_inode_time(struct inode_time atime)
These can have definitions similar to that of inode_time and ts64 conversions I have now.
This would replace the inode_time_to_timespec64() function, and have the same definition, right?
I will pair them with the inode struct as well and not put them in time64.h.
It's nice how you get around naming the specific time field with a string concatenating macro like this, but I wonder if we also need to pass the inode here. We probably need to pass the inode or the superblock in the place where we assign the time, to be able to limit the range and resolution to the file system's capabilities, but I don't know if we also might need the inode when getting the data out again. Probably not, but we should think about it some more (and not add it unless we believe that it's actually needed).
I was thinking if there is a way to only use current_fs_time() to assign times and all the range checking could be done this way. I was able to do this in all of vfs code except one function: simple_set_acl() This I'm still looking into.
I think simple_set_acl() is wrong, it should use current_fs_time() as well, and that is a patch we probably want to merge independently.
I don't know if range checking is needed for get functions. It might only be needed if it is updated between set and get. I will check if this is possible.
I've been thinking about it some more. We clearly don't need a resolution or range check when reading the inode from disk, as there won't be any out-of-range values by definition. We also don't want any timestamps to appear in the inode structure that are different after writing to disk and reading it back, and I checked that all file systems that implement their own getattr function just copy the fields from the inode into kstat, so it has to be done when setting the inode fields.
There may be a few more files that currently don't check the resolution when assigning the current time in the setattr function, or when storing the current time, and we should fix them as we get there to do both checks.
Two more thoughts:
I keep thinking that we need to do the same change for all three structures (inode, iattr, kstat), so we only have to modify each file system once. You currently use timespec64 in iattr and kstat, which is nicer in principle, but it means that we need an extra set of abtractions to read/write those. If we go with your inode_time method, it's probably best to do it in all three structures, while using separate fields, I guess we would need an extra layer of macros around them.
I do not plan to use macros for iattr and kstat. This I kind of got lucky because it uses same field names as timespec ( it is timespec in 64 bit arch).
As I mentioned in our meeting today, that only works for file systems that access the tv_sec/tv_nsec members directly, or that use the simple_getattr/generic_fillattr and simple_setattr/setattr_copy helpers to access them.
One counterexample I found is the cifs_set_file_info() function that passes attrs->ia_atime() into another function that takes a timespec argument. There are only a couple of those, but we need to have a strategy for handling them.
It would also be nice to have a generic way to get/set just the seconds portion with a helper, so we don't need to use a timespec64 at all in file systems that simply deal with whole seconds.
I only noticed this problem when I started with ext4. Even though they allow higher granularity, they seem to be treating sec and nsec field as independent values. I need to do a couple of more fs to do this correctly. This is why I picked FAT next.
Ok.
Arnd
The current file system macros and time functions are not y2038 safe because of the use of struct timespec.
Change these to use timespec64.
The change will be globally enabled along with all the other 64-bit time file system changes.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com --- include/linux/fs.h | 4 ++++ include/linux/time.h | 2 ++ include/linux/time64.h | 5 +++++ kernel/time/time.c | 28 ++++++++++++++++++++++++++++ 4 files changed, 39 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index c56068f..27bb983 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1399,7 +1399,11 @@ struct super_block { struct list_head s_inodes; /* all inodes */ };
+#ifdef CONFIG_FS_USES_64BIT_TIME +extern struct timespec64 current_fs_time(struct super_block *sb); +#else extern struct timespec current_fs_time(struct super_block *sb); +#endif
/* * Snapshotting support. diff --git a/include/linux/time.h b/include/linux/time.h index beebe3a..40fe4af 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -125,8 +125,10 @@ static inline bool timeval_valid(const struct timeval *tv)
extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
+#ifndef CONFIG_FS_USES_64BIT_TIME #define CURRENT_TIME (current_kernel_time()) #define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 }) +#endif
/* Some architectures do not supply their own clocksource. * This is mainly the case in architectures that get their diff --git a/include/linux/time64.h b/include/linux/time64.h index bb574b8..62f2b1a 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -37,6 +37,11 @@ struct itimerspec64 { #define inode_time timespec64 #endif
+#ifdef CONFIG_FS_USES_64BIT_TIME +#define CURRENT_TIME (current_kernel_time64()) +#define CURRENT_TIME_SEC ((struct timespec64) { ktime_get_seconds(), 0 }) +#endif + /* Parameters used to convert the timespec values: */ #define MSEC_PER_SEC 1000L #define USEC_PER_MSEC 1000L diff --git a/kernel/time/time.c b/kernel/time/time.c index 86751c6..92b6db1 100644 --- a/kernel/time/time.c +++ b/kernel/time/time.c @@ -237,11 +237,19 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p) * Return the current time truncated to the time granularity supported by * the fs. */ +#ifdef CONFIG_FS_USES_64BIT_TIME +struct timespec64 current_fs_time(struct super_block *sb) +{ + struct timespec64 now = current_kernel_time64(); + return timespec64_trunc(now, sb->s_time_gran); +} +#else struct timespec current_fs_time(struct super_block *sb) { struct timespec now = current_kernel_time(); return timespec_trunc(now, sb->s_time_gran); } +#endif EXPORT_SYMBOL(current_fs_time);
/* @@ -294,6 +302,25 @@ EXPORT_SYMBOL(jiffies_to_usecs); * Truncate a timespec to a granularity. Always rounds down. gran must * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns). */ +#ifdef CONFIG_FS_USES_64BIT_TIME +/* TODO: Change name to timespec64_trunc after the global switch to + * CONFIG_FS_USES_64BIT_TIME. + */ +struct timespec64 timespec_trunc(struct timespec64 t, unsigned gran) +{ + /* Avoid division in the common cases 1 ns and 1 s. */ + if (gran == 1) { + /* nothing */ + } else if (gran == NSEC_PER_SEC) { + t.tv_nsec = 0; + } else if (gran > 1 && gran < NSEC_PER_SEC) { + t.tv_nsec -= t.tv_nsec % gran; + } else { + WARN(1, "illegal file time granularity: %u", gran); + } + return t; +} +#else struct timespec timespec_trunc(struct timespec t, unsigned gran) { /* Avoid division in the common cases 1 ns and 1 s. */ @@ -308,6 +335,7 @@ struct timespec timespec_trunc(struct timespec t, unsigned gran) } return t; } +#endif EXPORT_SYMBOL(timespec_trunc);
/*
On Sunday 06 December 2015 22:04:06 Deepa Dinamani wrote:
The current file system macros and time functions are not y2038 safe because of the use of struct timespec.
Change these to use timespec64.
The change will be globally enabled along with all the other 64-bit time file system changes.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
Looks good.
Arnd