This is a preparation patch to add range checking for inode timestamps.
Extend struct super_block to include information about the max and min inode times each filesystem can hold. These are dependent on the on-disk format of filesystems.
These range checks will be used to clamp timestamps to filesystem allowed ranges.
Individual filesystems do not have the same on disk format as the in memory inodes. Range checking and clamping times assigned to inodes will help keep in memory and on-disk timestamps to be in sync.
Every time a new superblock is created, make sure that the superblock max and min timestamp fields are assigned invalid values.
Another series will initialize these fields to appropriate values for every filesystem.
The values are currently ignored. The exact policy and behavior will be decided in a separate patch.
max and min times are initialized to MIN_VFS_TIME and MAX_VFS_TIME respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
The original idea for the feature comes from the discussion: https://lkml.org/lkml/2014/5/30/669
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com ---
The intention is to include this as part of 4.6 so that the follow on patches can go into 4.7.
The series and the plan have been discussed with Arnd Bergmann.
Changes from v1: * Delete INVALID macros, use VFS_TIME macros directly. * Add comment in alloc_super() to explain range checking. * Reword the commit text to reflect the above.
fs/super.c | 7 +++++++ include/linux/fs.h | 12 +++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c index 1182af8..37ec188 100644 --- a/fs/super.c +++ b/fs/super.c @@ -239,6 +239,13 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) s->s_maxbytes = MAX_NON_LFS; s->s_op = &default_op; s->s_time_gran = 1000000000; + /* + * Assign a default empty range [MAX_VFS_TIME, MIN_VFS_TIME]. + * This will help VFS detect filesystems that do not populate + * these fields in the superblock. + */ + s->s_time_min = MAX_VFS_TIME; + s->s_time_max = MIN_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
s->s_shrink.seeks = DEFAULT_SEEKS; diff --git a/include/linux/fs.h b/include/linux/fs.h index 1af4727..cee8f99 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,9 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif
+#define MIN_VFS_TIME S64_MIN +#define MAX_VFS_TIME S64_MAX + #define FL_POSIX 1 #define FL_FLOCK 2 #define FL_DELEG 4 /* NFSv4 delegation */ @@ -1343,7 +1346,14 @@ struct super_block {
/* Granularity of c/m/atime in ns. Cannot be worse than a second */ - u32 s_time_gran; + u32 s_time_gran; + + /* + * Max and min values for timestamps + * according to the range supported by filesystems. + */ + time64_t s_time_min; + time64_t s_time_max;
/* * The next field is for VFS *only*. No filesystems have any business
On Wednesday 02 March 2016 23:50:01 Deepa Dinamani wrote:
/*
* Assign a default empty range [MAX_VFS_TIME, MIN_VFS_TIME].
* This will help VFS detect filesystems that do not populate
* these fields in the superblock.
*/
s->s_time_min = MAX_VFS_TIME;
s->s_time_max = MIN_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
I think this is still less clear than it should be, even with the comment explaining it.
As I just replied to Dave, how about initializing both to -1 instead?
Arnd
On Thu, Mar 3, 2016 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 02 March 2016 23:50:01 Deepa Dinamani wrote:
/*
* Assign a default empty range [MAX_VFS_TIME, MIN_VFS_TIME].
* This will help VFS detect filesystems that do not populate
* these fields in the superblock.
*/
s->s_time_min = MAX_VFS_TIME;
s->s_time_max = MIN_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
I think this is still less clear than it should be, even with the comment explaining it.
As I just replied to Dave, how about initializing both to -1 instead?
This was one of my first ideas. The reason I did not choose this is because 1. [-1, -1] is a valid range theoretically. 2. -1 would now be treated as a value.
It is true that no filesystem is using -1 as a limit. But, I did not feel right excluding a perfectly good value. Choosing [1,1] or any other number except probably 0 would work the same way.
This is the reason that I decided an invalid range should be represented by an invalid tuple and not a single value. My premise was that a range where min_time > max_time is invalid. My first choice here as in the first patch I shared with you privately was [1,-1]. But, the condition min_time < max_time can easily be achieved by changing just one of the values. For instance, [1, 2016]. But, this doesn't necessarily mean that the filesystem explicitly set both the values. For this reason, the invalid range I chose is [max_time, min_time]. Of course [min_time, min_time] and [max_time, max_time] ranges are also considered valid in this approach.
This guarantees 1. That both min and max values are assigned by each filesystem. 2. The range is universally invalid.
-Deepa
On Thursday 03 March 2016 21:08:25 Deepa Dinamani wrote:
On Thu, Mar 3, 2016 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 02 March 2016 23:50:01 Deepa Dinamani wrote:
/*
* Assign a default empty range [MAX_VFS_TIME, MIN_VFS_TIME].
* This will help VFS detect filesystems that do not populate
* these fields in the superblock.
*/
s->s_time_min = MAX_VFS_TIME;
s->s_time_max = MIN_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
I think this is still less clear than it should be, even with the comment explaining it.
As I just replied to Dave, how about initializing both to -1 instead?
This was one of my first ideas. The reason I did not choose this is because
- [-1, -1] is a valid range theoretically.
- -1 would now be treated as a value.
It is true that no filesystem is using -1 as a limit. But, I did not feel right excluding a perfectly good value. Choosing [1,1] or any other number except probably 0 would work the same way.
This is the reason that I decided an invalid range should be represented by an invalid tuple and not a single value. My premise was that a range where min_time > max_time is invalid.
Right.
My first choice here as in the first patch I shared with you privately was [1,-1]. But, the condition min_time < max_time can easily be achieved by changing just one of the values. For instance, [1, 2016]. But, this doesn't necessarily mean that the filesystem explicitly set both the values.
I see where you are coming from here, but I think we could also assume some minimum level of competence from the file system developers, so I would not have quite described it this way ;-)
For this reason, the invalid range I chose is [max_time, min_time]. Of course [min_time, min_time] and [max_time, max_time] ranges are also considered valid in this approach.
This guarantees
- That both min and max values are assigned by each filesystem.
- The range is universally invalid.
Ok, at least it clearly shows the intention that it cannot possibly be valid, unlike the '0' or '-1' defaults.
Arnd
On Thu, Mar 3, 2016 at 8:03 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 03 March 2016 21:08:25 Deepa Dinamani wrote:
On Thu, Mar 3, 2016 at 7:40 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 02 March 2016 23:50:01 Deepa Dinamani wrote:
/*
* Assign a default empty range [MAX_VFS_TIME, MIN_VFS_TIME].
* This will help VFS detect filesystems that do not populate
* these fields in the superblock.
*/
s->s_time_min = MAX_VFS_TIME;
s->s_time_max = MIN_VFS_TIME; s->cleancache_poolid = CLEANCACHE_NO_POOL;
I think this is still less clear than it should be, even with the comment explaining it.
As I just replied to Dave, how about initializing both to -1 instead?
This was one of my first ideas. The reason I did not choose this is because
- [-1, -1] is a valid range theoretically.
- -1 would now be treated as a value.
It is true that no filesystem is using -1 as a limit. But, I did not feel right excluding a perfectly good value. Choosing [1,1] or any other number except probably 0 would work the same way.
This is the reason that I decided an invalid range should be represented by an invalid tuple and not a single value. My premise was that a range where min_time > max_time is invalid.
Right.
My first choice here as in the first patch I shared with you privately was [1,-1]. But, the condition min_time < max_time can easily be achieved by changing just one of the values. For instance, [1, 2016]. But, this doesn't necessarily mean that the filesystem explicitly set both the values.
I see where you are coming from here, but I think we could also assume some minimum level of competence from the file system developers, so I would not have quite described it this way ;-)
Sorry, I did not mean to offend anybody. I was thinking it could avoid accidental update of only one of the bounds. The illustration I gave was meant to correlate with only setting s->s_time_max = U32_MAX (in 2106).
- Deepa
On 03/03/2016 01:50 AM, Deepa Dinamani wrote:
This is a preparation patch to add range checking for inode timestamps.
Extend struct super_block to include information about the max and min inode times each filesystem can hold. These are dependent on the on-disk format of filesystems.
These range checks will be used to clamp timestamps to filesystem allowed ranges.
I talked to Deepa and Arnd last week (in person!) and offered to take a look at this discussion and provide some input. I don't claim expertise, but maybe I have some familiarity that might help.
Individual filesystems do not have the same on disk format as the in memory inodes. Range checking and clamping times assigned to inodes will help keep in memory and on-disk timestamps to be in sync.
Every time a new superblock is created, make sure that the superblock max and min timestamp fields are assigned invalid values.
Another series will initialize these fields to appropriate values for every filesystem.
The values are currently ignored. The exact policy and behavior will be decided in a separate patch.
max and min times are initialized to MIN_VFS_TIME and MAX_VFS_TIME respectively so that even if one of the fields is uninitialized, it can be detected by using the condition max_time < min_time.
The original idea for the feature comes from the discussion: https://lkml.org/lkml/2014/5/30/669
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
The intention is to include this as part of 4.6 so that the follow on patches can go into 4.7.
The series and the plan have been discussed with Arnd Bergmann.
Changes from v1:
- Delete INVALID macros, use VFS_TIME macros directly.
- Add comment in alloc_super() to explain range checking.
- Reword the commit text to reflect the above.
fs/super.c | 7 +++++++ include/linux/fs.h | 12 +++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c index 1182af8..37ec188 100644 --- a/fs/super.c +++ b/fs/super.c @@ -239,6 +239,13 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) s->s_maxbytes = MAX_NON_LFS; s->s_op = &default_op; s->s_time_gran = 1000000000;
- /*
* Assign a default empty range [MAX_VFS_TIME, MIN_VFS_TIME].
* This will help VFS detect filesystems that do not populate
* these fields in the superblock.
*/
- s->s_time_min = MAX_VFS_TIME;
- s->s_time_max = MIN_VFS_TIME;
Dave found the symbol name you used before confusing, and I did too. I know what you're doing here--assigning the max as a min and the min as a max--but your comment doesn't do a good job of describing it, and the result is still in my view confusing.
Your comment says "assign these values" and asserts it allows detection of something. But you should say something more like:
It is invalid for s_time_min to be greater than s_time_max. We use this condition to represent a file system whose supported on-disk time range has not been properly initialized.
As far as the particular values, Arnd's suggestion that they (or at least s_time_max) be initialized to -1 seems reasonable, provided time64_t is signed (it is, but there are no comments at its definition to say it must be).
s->cleancache_poolid = CLEANCACHE_NO_POOL; s->s_shrink.seeks = DEFAULT_SEEKS; diff --git a/include/linux/fs.h b/include/linux/fs.h index 1af4727..cee8f99 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -927,6 +927,9 @@ static inline struct file *get_file(struct file *f) #define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL) #endif +#define MIN_VFS_TIME S64_MIN +#define MAX_VFS_TIME S64_MAX
Dave said he thought these should have their 32-bit counterpart values instead--at least for 32-bit tasks.
I believe his point is that we can't assume a file system implementation will properly convert between on-disk time format and a 64-bit time value until it is verified to do so. By using 32-bit max/min values in this case, the imposed limit is no different from the current 32-bit time_t limit.
However, this might be missing Deepa's assignment of an explicitly invalid combination of s_time_{min,max} values. At this point, there is no enforcement in place anyway, but I think the purpose of doing that was to know when a file system had not recorded its own limits, which would then trigger generic code to (probably) warn and then handle it specially (possibly using 32-bit limits as Dave suggested).
Looking at this patch alone, it's a little hard to envision what exactly is going to be done with these limit values. On the other hand, if we wish to handle this generically, it's pretty clear we need to have each file system type expose the on-disk time value limits to which it is constrained.
I personally think that defining these fields is a necessary step, and they should be 64 bits (to represent the One True VFS time). I believe Arnd told me it needed to be signed, and that should be noted in comments where defined. Their max and min values should therefore be S64_MAX and S64_MIN respectively. A particular file system type must convert between its own time and VFS time, and care needs to be taken to ensure none of the VFS changes assume such conversion occurs automatically.
Defining an expressly invalid combination of these values (with max < min) is important; a single invalid value is not enough (because a file system should be allowed to support all representable values).
It will be good to find out a little more about how exactly these *might* be used, even if the policy is not yet fully worked out.
So with all of that as background, I guess I recommend this patch (with perhaps very minor changes) get accepted. But I'd like to know if Dave has a different suggestion for how to support a unified VFS notion of time while generically handling file system specific differences in on-disk time representation.
I hope this helps.
-Alex
#define FL_POSIX 1 #define FL_FLOCK 2 #define FL_DELEG 4 /* NFSv4 delegation */ @@ -1343,7 +1346,14 @@ struct super_block { /* Granularity of c/m/atime in ns. Cannot be worse than a second */
- u32 s_time_gran;
- u32 s_time_gran;
- /*
* Max and min values for timestamps
* according to the range supported by filesystems.
*/
- time64_t s_time_min;
- time64_t s_time_max;
/* * The next field is for VFS *only*. No filesystems have any business