This is a preparation patch to add range checking for inode timestamps.
These range checks will be used to clamp timestamps to filesystem allowed ranges.
Individual filesystems do not have the same on disk format as the in memory inodes. Range checking and clamping times assigned to inodes will help keep in memory and on-disk timestamps to be in sync.
Extend struct super_block to include information about the max and min inode times each filesystem can hold. These are dependent on the on-disk format of filesystems.
Every time a new superblock is created, make sure that the superblock max and min timestamp fields are assigned invalid values.
Another series will initialize these fields to appropriate values for every filesystem.
The values are currently ignored. The exact policy and behavior will be decided in a separate patch.
MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN and S64_MAX respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
The original idea for the feature comes from the discussion: https://lkml.org/lkml/2014/5/30/669
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com ---
The intention is to include this as part of 4.6 so that the follow on patches can go into 4.7.
The series and the plan have been discussed with Arnd Bergmann.
fs/super.c | 2 ++ include/linux/fs.h | 13 ++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c index 1182af8..d70a8f6 100644 --- a/fs/super.c +++ b/fs/super.c @@ -239,6 +239,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) s->s_maxbytes = MAX_NON_LFS; s->s_op = &default_op; s->s_time_gran = 1000000000; + s->s_time_max = MAX_INVALID_VFS_TIME; + s->s_time_min = MIN_INVALID_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
s->s_shrink.seeks = DEFAULT_SEEKS; diff --git a/include/linux/fs.h b/include/linux/fs.h index 1af4727..15b41e6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif
+#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN + +#define MAX_INVALID_VFS_TIME S64_MIN +#define MIN_INVALID_VFS_TIME S64_MAX + #define FL_POSIX 1 #define FL_FLOCK 2 #define FL_DELEG 4 /* NFSv4 delegation */ @@ -1343,7 +1349,12 @@ struct super_block {
/* Granularity of c/m/atime in ns. Cannot be worse than a second */ - u32 s_time_gran; + u32 s_time_gran; + /* Max and min values for timestamps + * according to the range supported by filesystems. + */ + time64_t s_time_max; + time64_t s_time_min;
/* * The next field is for VFS *only*. No filesystems have any business
On Wednesday 02 March 2016 07:51:34 Deepa Dinamani wrote:
This is a preparation patch to add range checking for inode timestamps.
These range checks will be used to clamp timestamps to filesystem allowed ranges.
Individual filesystems do not have the same on disk format as the in memory inodes. Range checking and clamping times assigned to inodes will help keep in memory and on-disk timestamps to be in sync.
Extend struct super_block to include information about the max and min inode times each filesystem can hold. These are dependent on the on-disk format of filesystems.
Every time a new superblock is created, make sure that the superblock max and min timestamp fields are assigned invalid values.
Another series will initialize these fields to appropriate values for every filesystem.
The values are currently ignored. The exact policy and behavior will be decided in a separate patch.
MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN and S64_MAX respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
The original idea for the feature comes from the discussion: https://lkml.org/lkml/2014/5/30/669
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
The intention is to include this as part of 4.6 so that the follow on patches can go into 4.7.
The series and the plan have been discussed with Arnd Bergmann.
Acked-by: Arnd Bergmann arnd@arndb.de
fs/super.c | 2 ++ include/linux/fs.h | 13 ++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c index 1182af8..d70a8f6 100644 --- a/fs/super.c +++ b/fs/super.c @@ -239,6 +239,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) s->s_maxbytes = MAX_NON_LFS; s->s_op = &default_op; s->s_time_gran = 1000000000;
- s->s_time_max = MAX_INVALID_VFS_TIME;
- s->s_time_min = MIN_INVALID_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
s->s_shrink.seeks = DEFAULT_SEEKS; diff --git a/include/linux/fs.h b/include/linux/fs.h index 1af4727..15b41e6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
+#define MAX_INVALID_VFS_TIME S64_MIN +#define MIN_INVALID_VFS_TIME S64_MAX
#define FL_POSIX 1 #define FL_FLOCK 2 #define FL_DELEG 4 /* NFSv4 delegation */ @@ -1343,7 +1349,12 @@ struct super_block { /* Granularity of c/m/atime in ns. Cannot be worse than a second */
- u32 s_time_gran;
- u32 s_time_gran;
- /* Max and min values for timestamps
* according to the range supported by filesystems.
*/
- time64_t s_time_max;
- time64_t s_time_min;
/* * The next field is for VFS *only*. No filesystems have any business
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN and S64_MAX respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
I can't work out what MIN/MAX_INVALID_VFS_TIME is supposed to mean when I see it in the code. does it mean time that lies between MIN_INVALID_VFS_TIME > time > MAX_INVALID_VFS_TIME is invalid (unlikely, but that's the obvious reading :)?
Or that time < MIN_INVALID_VFS_TIME is invalid? Or is it valid? I can't tell...
Normally limits are specified by "min valid" and "max valid" defines, which are pretty clear in their meaning. Like:
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
Cheers,
Dave.
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN and S64_MAX respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
I can't work out what MIN/MAX_INVALID_VFS_TIME is supposed to mean when I see it in the code. does it mean time that lies between MIN_INVALID_VFS_TIME > time > MAX_INVALID_VFS_TIME is invalid (unlikely, but that's the obvious reading :)?
Or that time < MIN_INVALID_VFS_TIME is invalid? Or is it valid? I can't tell...
Normally limits are specified by "min valid" and "max valid" defines, which are pretty clear in their meaning. Like:
Hmm, I had meant to comment on this in my private discussion.
I agree this needs some more explanation in the source code, the idea is that once we have modified all file systems to correctly set the actual limits, we can then detect any newly added file system that forgets to set them, so we don't get accidental incorrect limits.
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
I think the file system should always set the actual on-disk format limits in the superblock, regardless of the word size of the CPU architecture.
We already support 32-bit time_t in compat tasks on 64-bit architectures, and we will support 64-bit time_t in normal tasks on 32-bit architectures in the future, and there is no need to range-check the timestamps written to disk.
The one range-check that we probably want for current 32-bit tasks is in the vfs_fstatat/vfs_getattr code, so a kernel that internally handles post-2038 timestamps can limit them to S32_MAX instead of wrapping.
Arnd
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
MAX_INVALID_VFS_TIME and MIN_INVALID_VFS_TIME are initialized to S64_MIN and S64_MAX respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
I can't work out what MIN/MAX_INVALID_VFS_TIME is supposed to mean when I see it in the code. does it mean time that lies between MIN_INVALID_VFS_TIME > time > MAX_INVALID_VFS_TIME is invalid (unlikely, but that's the obvious reading :)?
Or that time < MIN_INVALID_VFS_TIME is invalid? Or is it valid? I can't tell...
Normally limits are specified by "min valid" and "max valid" defines, which are pretty clear in their meaning. Like:
Hmm, I had meant to comment on this in my private discussion.
I agree this needs some more explanation in the source code, the idea is that once we have modified all file systems to correctly set the actual limits, we can then detect any newly added file system that forgets to set them, so we don't get accidental incorrect limits.
But if a superblock supports full 64 bit time resolution (i.e. MIN_VFS_TIME to MAX_VFS_TIME) then you can't tell the difference and will warn inappropriately.
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
I think the file system should always set the actual on-disk format limits in the superblock, regardless of the word size of the CPU architecture.
Eventually, yes. But right now, 32 bit systems are limited by the platform architecture. Hence no matter what the filesystem on-disk format supports, these are going to hard limits until filesystems start modifying the defaults to indicate what their on-disk format supports.
i.e. 32 bit systems should default to 32 bit time limits until the filesystem developers come along and verify that conversion from their on-disk format to 64 bit time works correctly. Then they set their real limits which may (or may not) be >y2038 compatible, and this means that filesystems that have not been converted to >y2038 compatible do not need any modifications to behave correctly on 32 bit systems...
We already support 32-bit time_t in compat tasks on 64-bit architectures, and we will support 64-bit time_t in normal tasks on 32-bit architectures in the future, and there is no need to range-check the timestamps written to disk.
The one range-check that we probably want for current 32-bit tasks is in the vfs_fstatat/vfs_getattr code, so a kernel that internally handles post-2038 timestamps can limit them to S32_MAX instead of wrapping.
Those should return EOVERFLOW rather than an incorrect timestamp, like we do for other 64 bit fields stat returns that can't fit in 32 bit stat fields. That's a clear indication of a process that should be using stat64() instead.
Cheers,
Dave.
I agree this needs some more explanation in the source code, the idea is that once we have modified all file systems to correctly set the actual limits, we can then detect any newly added file system that forgets to set them, so we don't get accidental incorrect limits.
But if a superblock supports full 64 bit time resolution (i.e. MIN_VFS_TIME to MAX_VFS_TIME) then you can't tell the difference and will warn inappropriately.
No, this will not happen. The max and min values are reversed for invalid values and the condition as mentioned in the commit text for catching invalid ranges in the vfs layer is max_time < min_time.
I will add a comment in the code to clarify.
Quoting from the patch:
+#define MAX_INVALID_VFS_TIME S64_MIN +#define MIN_INVALID_VFS_TIME S64_MAX
I will post a v2 which removes these macros and makes use of MAX/MIN_VFS_TIME macros instead.
-Deepa
On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
I think the file system should always set the actual on-disk format limits in the superblock, regardless of the word size of the CPU architecture.
Eventually, yes. But right now, 32 bit systems are limited by the platform architecture. Hence no matter what the filesystem on-disk format supports, these are going to hard limits until filesystems start modifying the defaults to indicate what their on-disk format supports.
The purpose of the patch for now is to allow file systems to set the defaults, we are not checking them yet, until all file systems are converted, hence setting the defaults to invalid.
Our initial idea was to set the defaults for minimum and maximum to zero, which would have made this clearer, but unfortunately that doesn't allow us to detect whether a file system actually supports time stamps at all: some file systems hardcode all timestamps to zero (1970) because they don't store them on disk.
How about setting the defaults for minimum and maximum to '-1'? That would allow us to detect whether file systems have set this correctly, and it would be less confusing to the reader.
i.e. 32 bit systems should default to 32 bit time limits until the filesystem developers come along and verify that conversion from their on-disk format to 64 bit time works correctly. Then they set their real limits which may (or may not) be >y2038 compatible, and this means that filesystems that have not been converted to >y2038 compatible do not need any modifications to behave correctly on 32 bit systems...
We can set the limits in the superblock if you like, but I would not do it by changing the constants here:
Most file systems have arbitrary other maximum dates that they need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
The file systems specific code should just set whatever it actually supports and then we can find a way to limit that further for 32-bit tasks.
We already support 32-bit time_t in compat tasks on 64-bit architectures, and we will support 64-bit time_t in normal tasks on 32-bit architectures in the future, and there is no need to range-check the timestamps written to disk.
The one range-check that we probably want for current 32-bit tasks is in the vfs_fstatat/vfs_getattr code, so a kernel that internally handles post-2038 timestamps can limit them to S32_MAX instead of wrapping.
Those should return EOVERFLOW rather than an incorrect timestamp, like we do for other 64 bit fields stat returns that can't fit in 32 bit stat fields. That's a clear indication of a process that should be using stat64() instead.
On most 32-bit architectures, stat64 still has a 32-bit time value, and even if it is 64-bit wide, glibc turns it into a 32-bit number when copying to its version of 'struct stat64' that differs from the kernel's and contains a 'struct timespec'.
I don't care if we return EOVERFLOW or give some other indication of the overflow, but this is fundamentally a separate problem from large file support, and we need a new syscall for it.
Arnd
On Thu, Mar 03, 2016 at 03:08:22PM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
I think the file system should always set the actual on-disk format limits in the superblock, regardless of the word size of the CPU architecture.
Eventually, yes. But right now, 32 bit systems are limited by the platform architecture. Hence no matter what the filesystem on-disk format supports, these are going to hard limits until filesystems start modifying the defaults to indicate what their on-disk format supports.
The purpose of the patch for now is to allow file systems to set the defaults, we are not checking them yet, until all file systems are converted, hence setting the defaults to invalid.
Wrong way around, IMO. Set the defaults to be limiting first, then as filesystems are converted the limits get expanded.
How about setting the defaults for minimum and maximum to '-1'? That would allow us to detect whether file systems have set this correctly, and it would be less confusing to the reader.
Perhaps, but I'm yet to understand why we even care if filesystems have set limits or not? If the filesystem has not set it's own limits, then surely we should assume that the filesystem is not y2038k compatible and have the default limits set appropriately for that?
i.e. 32 bit systems should default to 32 bit time limits until the filesystem developers come along and verify that conversion from their on-disk format to 64 bit time works correctly. Then they set their real limits which may (or may not) be >y2038 compatible, and this means that filesystems that have not been converted to >y2038 compatible do not need any modifications to behave correctly on 32 bit systems...
We can set the limits in the superblock if you like, but I would not do it by changing the constants here:
Most file systems have arbitrary other maximum dates that they need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
Yes, and that needs to be done when the filesystem is converted to use the proper timespec64 representation.
i.e. the "provide timestamp limit support" patchset starts out by limiting all times on all filesystems on all platforms to 32 bit times. Then after each filesystem is converted to use the VFS native 64 bit timestamps, we ensure that the filesystem sets it's known-valid timestamp limits in the superblock to correctly indicate the range of timestamps they support.
Hence by the end of the patchset we have the situation where filesystems that are not >y2038 compatible are limited to 32 bit timestamps, and those that can represent larger dates expose only the exact dates they can support to the VFS.
The file systems specific code should just set whatever it actually supports and then we can find a way to limit that further for 32-bit tasks.
Yes, 32 bit tasks is something for the syscall layer to care about.
But that doesn't change the fact that the *default timestamp range* should be "no >y2038 support", and it is up to the filesystem to override that with it's own limits if they are different.
We already support 32-bit time_t in compat tasks on 64-bit architectures, and we will support 64-bit time_t in normal tasks on 32-bit architectures in the future, and there is no need to range-check the timestamps written to disk.
The one range-check that we probably want for current 32-bit tasks is in the vfs_fstatat/vfs_getattr code, so a kernel that internally handles post-2038 timestamps can limit them to S32_MAX instead of wrapping.
Those should return EOVERFLOW rather than an incorrect timestamp, like we do for other 64 bit fields stat returns that can't fit in 32 bit stat fields. That's a clear indication of a process that should be using stat64() instead.
On most 32-bit architectures, stat64 still has a 32-bit time value, and even if it is 64-bit wide, glibc turns it into a 32-bit number when copying to its version of 'struct stat64' that differs from the kernel's and contains a 'struct timespec'.
So stat64 also returns EOVERLFOW to indicate to the application that it needs to use a different, 64 bit time aware syscall.
I don't care if we return EOVERFLOW or give some other indication of the overflow, but this is fundamentally a separate problem from large file support, and we need a new syscall for it.
Yes, but you seem to have missed my point that the we don't need to care about 32 bit times, whether it be syscalls or tasks, at the VFS/filesystem level. That's all a syscall layer issue. i.e. vfs_fstatat() only uses 64 bit times and only places where struct kstat contents get converted to whatever structure userspace has supplied (e.g. cp_new_stat()) need to detect/care about timestamp overflows like they already do with inode numbers and link counts.
Cheers,
Dave.
On Fri, Mar 4, 2016 at 6:40 AM, Dave Chinner david@fromorbit.com wrote:
On Thu, Mar 03, 2016 at 03:08:22PM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif
+#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
I think the file system should always set the actual on-disk format limits in the superblock, regardless of the word size of the CPU architecture.
Eventually, yes. But right now, 32 bit systems are limited by the platform architecture. Hence no matter what the filesystem on-disk format supports, these are going to hard limits until filesystems start modifying the defaults to indicate what their on-disk format supports.
The purpose of the patch for now is to allow file systems to set the defaults, we are not checking them yet, until all file systems are converted, hence setting the defaults to invalid.
Wrong way around, IMO. Set the defaults to be limiting first, then as filesystems are converted the limits get expanded.
The picture is not clear until you see how we use these values and these are just values until then. You should be able to get a better picture on what we are trying to do and the discussion makes more sense when we post the patches that use these. We are just trying to get some initial infrastructure so that we can proceed to post the subsequent patches.
Here are the approaches that were considered and why this one was picked:
Approach 1
Use default values in the vfs layer This was the first version considered (Arnd has a patch from me).
Steps involved:
a. Use default values to initialize limits in alloc_super() b. implement vfs policy c. Each filesystem chooses to fill in their actual values, at its own pace. d. Initialize to invalid defaults in vfs.
Cons:
a. There are unmaintained filesystems that will never be changed. b. What about the filesystems that support ranges lesser than S32_MAX and greater than S32_MIN Eg: ceph, fat, iso9660? vfs policies will not be safe here. c. wherever these values are clamped, it will lead to changing every filesystems twice, before and after right limits are filled in.
Approach 2
Use invalid values in the vfs layer.
Steps involved: a. Use invalid values to initialize limits in alloc_super() b. implement vfs policy, but only limit to filesystems which fill in actual range. c. We change each filesystem to fill in their actual values. d. Maybe make compulsory to have a valid range, according to policy chosen.
This approach was chosen because it is clear with this approach as it addresses all the cons in approach 1.
-Deepa
On Friday 04 March 2016 12:10:08 Dave Chinner wrote:
On Thu, Mar 03, 2016 at 03:08:22PM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
Eventually, yes. But right now, 32 bit systems are limited by the platform architecture. Hence no matter what the filesystem on-disk format supports, these are going to hard limits until filesystems start modifying the defaults to indicate what their on-disk format supports.
The purpose of the patch for now is to allow file systems to set the defaults, we are not checking them yet, until all file systems are converted, hence setting the defaults to invalid.
Wrong way around, IMO. Set the defaults to be limiting first, then as filesystems are converted the limits get expanded.
That would break perfectly working 64-bit systems, along with 32-bit systems that have the new syscalls, unless we intentionally treat them differently. :(
How about setting the defaults for minimum and maximum to '-1'? That would allow us to detect whether file systems have set this correctly, and it would be less confusing to the reader.
Perhaps, but I'm yet to understand why we even care if filesystems have set limits or not? If the filesystem has not set it's own limits, then surely we should assume that the filesystem is not y2038k compatible and have the default limits set appropriately for that?
I don't think we can make any such assumption. Out of the ~50 file systems that have store timestamps on disk or on the network, I only found eight that are use the 1902..2038 range that matches the 32-bit time_t.
i.e. 32 bit systems should default to 32 bit time limits until the filesystem developers come along and verify that conversion from their on-disk format to 64 bit time works correctly. Then they set their real limits which may (or may not) be >y2038 compatible, and this means that filesystems that have not been converted to >y2038 compatible do not need any modifications to behave correctly on 32 bit systems...
We can set the limits in the superblock if you like, but I would not do it by changing the constants here:
Most file systems have arbitrary other maximum dates that they need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
Yes, and that needs to be done when the filesystem is converted to use the proper timespec64 representation.
No, it's completely unrelated to the timespec64 change, other than the fact that you asked for the limits to be implemented as a result of the y2038 patch series. This is absolute reasonable because we are touching the same code, and we have to audit it anyway.
The file systems specific code should just set whatever it actually supports and then we can find a way to limit that further for 32-bit tasks.
Yes, 32 bit tasks is something for the syscall layer to care about.
But that doesn't change the fact that the *default timestamp range* should be "no >y2038 support", and it is up to the filesystem to override that with it's own limits if they are different.
I still think the opposite approach makes more sense here: Right now, we have 64-bit systems that have file systems with arbitrary limitations working in random ways, most of them correctly supporting at least time stamps between 1970 and 2106, some of them much wider ranges.
The 32-bit systems are currently all broken beyond 2038, and they have always been that way.
Changing everything to use timespec64 brings 32-bit systems to the same level of functionality as 64-bit systems, and extends the maximum supported timestamp range to what each file system can store on disk.
Adding the range checking ensures that we don't run into whatever limit a file system has in an uncontrolled way, but get a specific error, or controlled clamping of the range. We would need this even if all 32-bit systems would die out tomorrow and we only worried about 64-bit machines.
On most 32-bit architectures, stat64 still has a 32-bit time value, and even if it is 64-bit wide, glibc turns it into a 32-bit number when copying to its version of 'struct stat64' that differs from the kernel's and contains a 'struct timespec'.
So stat64 also returns EOVERLFOW to indicate to the application that it needs to use a different, 64 bit time aware syscall.
Right, except that for an application programmer this is not visible. The application won't be able to call a replacement system call, but it has to be built with -D_TIME_BITS=64 so all time_t uses get changed to 64-bit.
I believe for LFS support, user space was allowed to call a stat64() glibc function, but that won't be provided here.
I don't care if we return EOVERFLOW or give some other indication of the overflow, but this is fundamentally a separate problem from large file support, and we need a new syscall for it.
Yes, but you seem to have missed my point that the we don't need to care about 32 bit times, whether it be syscalls or tasks, at the VFS/filesystem level. That's all a syscall layer issue. i.e. vfs_fstatat() only uses 64 bit times and only places where struct kstat contents get converted to whatever structure userspace has supplied (e.g. cp_new_stat()) need to detect/care about timestamp overflows like they already do with inode numbers and link counts.
I think that was exactly the point I was making.
Arnd
On Thu, Mar 3, 2016 at 8:08 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 03 March 2016 10:45:11 Dave Chinner wrote:
On Thu, Mar 03, 2016 at 12:07:42AM +0100, Arnd Bergmann wrote:
On Thursday 03 March 2016 09:19:05 Dave Chinner wrote:
On Wed, Mar 02, 2016 at 07:51:34AM -0800, Deepa Dinamani wrote:
--- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,12 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif
+#define MAX_VFS_TIME S64_MAX +#define MIN_VFS_TIME S64_MIN
These. Anything ouside these ranges is invalid.
As such, I think this is wrong for 32 bit systems as the min/max VFS times right now are S32_MAX/S32_MIN...
I think the file system should always set the actual on-disk format limits in the superblock, regardless of the word size of the CPU architecture.
Eventually, yes. But right now, 32 bit systems are limited by the platform architecture. Hence no matter what the filesystem on-disk format supports, these are going to hard limits until filesystems start modifying the defaults to indicate what their on-disk format supports.
The purpose of the patch for now is to allow file systems to set the defaults, we are not checking them yet, until all file systems are converted, hence setting the defaults to invalid.
Our initial idea was to set the defaults for minimum and maximum to zero, which would have made this clearer, but unfortunately that doesn't allow us to detect whether a file system actually supports time stamps at all: some file systems hardcode all timestamps to zero (1970) because they don't store them on disk.
How about setting the defaults for minimum and maximum to '-1'? That would allow us to detect whether file systems have set this correctly, and it would be less confusing to the reader.
i.e. 32 bit systems should default to 32 bit time limits until the filesystem developers come along and verify that conversion from their on-disk format to 64 bit time works correctly. Then they set their real limits which may (or may not) be >y2038 compatible, and this means that filesystems that have not been converted to >y2038 compatible do not need any modifications to behave correctly on 32 bit systems...
We can set the limits in the superblock if you like, but I would not do it by changing the constants here:
Most file systems have arbitrary other maximum dates that they need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
I am puzzled why 2107 would be the maximum date for cifs. My calculation comes to a maximum date of approximately the year 60,000AD for 64 bit DCE time (cifs.ko gets DCE time back in all time fields when using CIFS, SMB2 or SMB3 except for the oldest dialects of CIFS). DCE time is 100 nanoseconds since 1601 - see the definition of FILETIME in section 2.3.3 of https://msdn.microsoft.com/en-us/library/cc230324.aspx). And shouldn't this be the same for NTFS as they use a similar DCE time internally?
The file systems specific code should just set whatever it actually supports and then we can find a way to limit that further for 32-bit tasks.
On Friday 04 March 2016 00:31:09 Steve French wrote:
i.e. 32 bit systems should default to 32 bit time limits until the filesystem developers come along and verify that conversion from their on-disk format to 64 bit time works correctly. Then they set their real limits which may (or may not) be >y2038 compatible, and this means that filesystems that have not been converted to >y2038 compatible do not need any modifications to behave correctly on 32 bit systems...
We can set the limits in the superblock if you like, but I would not do it by changing the constants here:
Most file systems have arbitrary other maximum dates that they need to set (on 64-bit systems), e.g. 2028 (iso9660), 2040 (hfsplus), 2106 (ceph), 2107 (fat, cifs), 2262 (logfs), 2514 (ext4).
I am puzzled why 2107 would be the maximum date for cifs. My calculation comes to a maximum date of approximately the year 60,000AD for 64 bit DCE time (cifs.ko gets DCE time back in all time fields when using CIFS, SMB2 or SMB3 except for the oldest dialects of CIFS). DCE time is 100 nanoseconds since 1601 - see the definition of FILETIME in section 2.3.3 of https://msdn.microsoft.com/en-us/library/cc230324.aspx). And shouldn't this be the same for NTFS as they use a similar DCE time internally?
The value I was looking up in my table at http://kernelnewbies.org/y2038 indeed referred to the ancient SMB timestamps that use a 7-bit year number starting in 1980, just as fat has.
The year I have listed in the table for modern cifs and ntfs is 30328, which assumes that it's a signed 64-bit multiple of 100 nanoseconds, and that's what I see in cifs_NTtimeToUnix() as well. The documentation you point to describes it as unsigned, which matches your 60,000AD date, so before we add the limits, we should try to clarify which one was intended.
Arnd