On Saturday 16 January 2016 12:14:22 Andreas Dilger wrote:
> >>
> >> Sure, and nfs is a pain because of all it's internal use of
> >> timespecs, too.
> >
> > lustre is probably the worst.
>
> Lustre currently only has one-second granularity in a 64-bit field,
> so it doesn't really care about the difference between timespec or
> timespec64 at all.
>
> The only other uses are for measuring relative times, so the 64-bitness
> shouldn't really matter.
>
> Could you please point out what issues exist so they can be fixed.
It's not really a bug that needs to be fixed, but more the general
issue of referencing inode->i_?time and attr->ia_?time and passing
them around. When we change the types in the inode and iattr from
timespec to timespec64, all assigments need to be modified, and lustre
has more of those assignments than any other file system I'm aware of.
Arnd
This is an update to Arnd Bergmann's RFC patch series:
https://lkml.org/lkml/2014/5/30/669 .
The syscalls and runtime libraries will be handled in separate patch series.
The filling of max and min timestamps for individual filesystems can be
leveraged from the patch series:
https://lkml.org/lkml/2015/11/20/413
1. Objective:
To transition all file system code to use 64 bit time.
This translates to using timespec64 across the fs code for any
timestamp representation.
2. Problem Description:
struct timespec cannot represent times after year 2038 on 32 bit
systems.
The alternative to timespec in the kernel world is timespec64 to get
around the above problem. This will be the UAPI exposed interface.
3. Design objectives:
The goal of this approach was to come up with small manageable patches
to address the above problem.
Preferably, a single patch per filesystem that can be merged independently.
Also, a more generic approach that all the individual filesystems could follow
was preferred.
4. Solution:
The solution incorporated in the patch series involves stages defined below:
4.1. CONFIG_FS_USES_64BIT_TIME
This new config is defined to #ifdef code that is required to support 64 bit
times.
4.2. struct inode times:
The struct inode saves {a,c,m}timestamps as struct timespec.
This leads to 2 problems:
a. The size of the structure depends on whether the machine is 32/ 64 bit.
b. y2038 problem described above.
struct timespec64 also has the same problem as in (a) above.
Choosing scalar types to store these timestamps(time64 and s32) solves both the
above problems.
Adding accessors to access these timestamps would hide the internal data type
so that this can be changed according to config in (4.1).
4.3. struct inode_timespec:
Use inode_timespec for all other timestamp representation throughout VFS and
individual filesystems.
inode_timespec is aliased to timespec or timespec64 based on config defined
in (4.1).
Using timespec64 in this stage would require ifdef-ing the code, which is
spread throughout the code.
4.4. Enable config in (4.1).
4.5. Convert inode_timespec to timespec64:
Drop all references to inode_timespec in (4.3).
Replace it with timespec64.
5. Alternate Solution:
Steps involved are:
5.1. Change VFS code to handle both timespec64 and timespec:
There are a few API's in the VFS that take timestamps as arguments:
generic_update_time(), inode->i_op->update_time(), lease_get_mtime(),
fstack_copy_attr_all(), setattr_copy(), generic_fillattr.
The attr and kstat properties could have accessors like inode.
But, the other functions will have to maintain copies or be updated
simultaenously along with other affecting filesystems.
5.2.struct timespec64:
Change individual fs to use timespec64.
5.3. struct inode times:
The struct inode saves {a,c,m}timestamps as struct timespec.
This leads to 2 problems:
a. The size of the structure depends on whether the machine is 32/ 64 bit.
b. y2038 problem described above.
struct timespec64 also has the same problem as in (a) above.
Choosing scalar types to store these timestamps(time64 and s32) solves both the
above problems.
Change individual filesystems to use macros to access inode times.
Inode macros can assume conversion from timespec64 always.
5.4. VFS:
Change vfs code also as above in (1) and (2).
5.5. Drop timespec
This involves dropping support for any api/ struct changes to make vfs use only
timespec64 for timestamps.
6. Rationale:
The advantage of the method described in (5) is that we do not have
inode_timespec aliases only to be dropped later.
But, the method suffers from disadvantages:
a. As mentioned in (5.1), our process is affected as all the filesystems
using each api must be changed simultaenously or VFS should have a copy.
b. While individual filesystems are being changed, VFS will have to have
2 copies of a few apis. This will mean that at this time, any new code
being added might use either. This might lead to confusion.
Misc:
7. Range check:
Patches include range check and clamping of timestamps.
This topic did not have a conclusion on the previous RFC.
7.1. Rationale
The method incorporated in the patch series is based on following principles:
a. Linux does not impose any fixed format for on-disk inodes.
LKML discussion is still ongoing concerning the best handling of file systems
used or updated after their expiration date.
b. Linux cannot surmise the side effects to a file system because of the
wrong timestamps as each fs saves timestamps differently.
c. Individual filesystems must be able to say what to do when timestamps
are clamped.
7.2. Solution
Based on the above principles, the solution is described below:
7.2.1. There are 2 instances that the solution handles differently:
a. While mounting a filesystem:
A filesystem that has already exceeded the range of its timestamp fields.
b. While doing operations on a mounted filesystem:
Timestamps start getting clamped after the filesystem is mounted.
7.2.2. In both the above cases, a function is invoked as per the callbacks registered
by filesystems.
8. Testing
This is a proof of concept implementation.
I want to get some feedback before I convert rest of the file systems.
I've done some initial testing based on the patches below on x86 64 bit arch.
Testing was mainly done on the root filesystem.
mount, stat, touch, read, write system calls were used for testing timestamp clamps and
other functionality.
Patches 8-15 are only included to provide a complete picture.
Deepa Dinamani (15):
fs: add Kconfig entry CONFIG_FS_USES_64BIT_TIME
vfs: Change all structures to support 64 bit time
kernel: time: Add macros and functions to support 64 bit time
vfs: Add support for vfs code to use 64 bit time
fs: cifs: Add support for cifs to use 64 bit time
fs: fat: convert fat to 64 bit time
fs: ext4: convert to use 64 bit time
fs: Enable 64 bit time
fs: cifs: replace inode_timespec with timespec64
fs: fat: replace inode_timespec with timespec64
fs: ext4: replace inode_timespec with timespec64
vfs: remove inode_timespec and timespec references
kernel: time: change inode_timespec to timespec64
vfs: Remove inode_timespec aliases
fs: Drop CONFIG_FS_USES_64BIT_TIME
fs/attr.c | 15 ++---
fs/bad_inode.c | 10 ++-
fs/binfmt_misc.c | 7 +-
fs/cifs/cache.c | 16 +++--
fs/cifs/cifsencrypt.c | 2 +-
fs/cifs/cifsglob.h | 6 +-
fs/cifs/cifsproto.h | 9 +--
fs/cifs/cifssmb.c | 17 +++--
fs/cifs/file.c | 9 ++-
fs/cifs/inode.c | 65 ++++++++++++-------
fs/cifs/netmisc.c | 26 ++++----
fs/ext4/acl.c | 3 +-
fs/ext4/ext4.h | 44 +++++++------
fs/ext4/extents.c | 25 ++++++--
fs/ext4/ialloc.c | 9 ++-
fs/ext4/inline.c | 10 ++-
fs/ext4/inode.c | 16 +++--
fs/ext4/ioctl.c | 16 +++--
fs/ext4/namei.c | 40 ++++++++----
fs/ext4/super.c | 6 +-
fs/ext4/xattr.c | 2 +-
fs/fat/dir.c | 7 +-
fs/fat/fat.h | 8 ++-
fs/fat/file.c | 10 ++-
fs/fat/inode.c | 46 ++++++++++----
fs/fat/misc.c | 7 +-
fs/fat/namei_msdos.c | 40 +++++++-----
fs/fat/namei_vfat.c | 41 ++++++++----
fs/inode.c | 53 +++++++++++-----
fs/libfs.c | 50 ++++++++++++---
fs/locks.c | 5 +-
fs/nsfs.c | 6 +-
fs/pipe.c | 6 +-
fs/posix_acl.c | 2 +-
fs/stack.c | 6 +-
fs/stat.c | 6 +-
fs/super.c | 10 +++
fs/utimes.c | 6 +-
include/linux/fs.h | 101 +++++++++++++++++++++++++----
include/linux/fs_stack.h | 9 +--
include/linux/stat.h | 6 +-
include/linux/time64.h | 4 ++
kernel/time/time.c | 162 ++++++++++++++++++++++++++++++++++++++++++-----
43 files changed, 691 insertions(+), 253 deletions(-)
--
1.9.1
32 bit systems using 'struct timeval' will break in the year 2038, so
we replace the code appropriately.
This patch replaces struct timeval and do_gettimeofday() with
monotonic ktime_get_ns(). Since we are interested in microseconds
portion of the time, we can use NSEC_PER_USEC macro but this would
lead to expensive division for a frequently used function.
Alternatively a bit shift has been done which performs a division of
1024 instead of 1000 which slightly alters the value returned by this
function.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606(a)gmail.com>
---
drivers/scsi/bfa/bfa_cs.h | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/scsi/bfa/bfa_cs.h b/drivers/scsi/bfa/bfa_cs.h
index 91a8aa3..daee860 100644
--- a/drivers/scsi/bfa/bfa_cs.h
+++ b/drivers/scsi/bfa/bfa_cs.h
@@ -32,13 +32,7 @@
#define BFA_TRC_MAX (4 * 1024)
#endif
-#define BFA_TRC_TS(_trcm) \
- ({ \
- struct timeval tv; \
- \
- do_gettimeofday(&tv); \
- (tv.tv_sec*1000000+tv.tv_usec); \
- })
+#define BFA_TRC_TS(_trcm) (ktime_get_ns() >> 10)
#ifndef BFA_TRC_TS
#define BFA_TRC_TS(_trcm) ((_trcm)->ticks++)
--
1.9.1
32 bit systems using 'time_t' will break in the year 2038, so
we modify the code appropriately.
This patch removes the cast to 'time_t' in the assignment statement
since we are eventually removing the time_t definition from the kernel
as an effort to solve the y2038 problem. This change will avoid the
build error but the code is still broken and requires a change in the
ioctl interface.
Further, since the variable io_profile_start_time will break in 2038
or 2106 depending on user space interpreting it as signed or unsigned,
comments have been added to highlight the same.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606(a)gmail.com>
---
Only apply this patch if it's seen as acceptable that the
io_profile_start_time remains truncated to 32 bits in
IOCMD_ITNIM_GET_IOPROFILE. If this is something that needs to
be fixed by adding a replacement vendor command, leave the
cast in place as a reminder.
drivers/scsi/bfa/bfa_defs_svc.h | 4 ++++
drivers/scsi/bfa/bfa_fcpim.c | 3 ++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/bfa/bfa_defs_svc.h b/drivers/scsi/bfa/bfa_defs_svc.h
index 638f441f..8ab7964 100644
--- a/drivers/scsi/bfa/bfa_defs_svc.h
+++ b/drivers/scsi/bfa/bfa_defs_svc.h
@@ -1211,6 +1211,10 @@ struct bfa_itnim_ioprofile_s {
u32 clock_res_mul;
u32 clock_res_div;
u32 index;
+ /*
+ * Overflow in 2038 or 2106 depending on user space interpreting it as
+ * signed or unsigned.
+ */
u32 io_profile_start_time; /* IO profile start time */
u32 iocomps[BFA_IOBUCKET_MAX]; /* IO completed */
struct bfa_itnim_latency_s io_latency;
diff --git a/drivers/scsi/bfa/bfa_fcpim.c b/drivers/scsi/bfa/bfa_fcpim.c
index 6730340..56df8d0 100644
--- a/drivers/scsi/bfa/bfa_fcpim.c
+++ b/drivers/scsi/bfa/bfa_fcpim.c
@@ -1478,7 +1478,8 @@ bfa_itnim_get_ioprofile(struct bfa_itnim_s *itnim,
return BFA_STATUS_IOPROFILE_OFF;
itnim->ioprofile.index = BFA_IOBUCKET_MAX;
- itnim->ioprofile.io_profile_start_time = (u32)(time_t)
+ /* io_profile_start_time will overflow in 2038 or 2106 */
+ itnim->ioprofile.io_profile_start_time = (u32)
bfa_io_profile_start_time(itnim->bfa);
itnim->ioprofile.clock_res_mul = bfa_io_lat_clock_res_mul;
itnim->ioprofile.clock_res_div = bfa_io_lat_clock_res_div;
--
1.9.1
32 bit systems using 'struct timeval' will break in the year 2038, so
we modify the code appropriately.
We only need to find the elapsed seconds rather than absolute time,
and we only care about full seconds, so it's better to use monotonic
times, so using ktime_get_seconds() makes the code more efficient
and more robust against a concurrent settimeofday()
Since we need monotonic time only, stats_reset_time variable
does not need to be changed from u32 to u64 type.
After the conversion we get a harmless compiler warning about the
possible use of an uninitialised warning. gcc is wrong here and the
code is actually ok. Introducing a temporary status_ok variable to
store the condition avoids the warning.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606(a)gmail.com>
---
drivers/scsi/bfa/bfa_svc.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/bfa/bfa_svc.c b/drivers/scsi/bfa/bfa_svc.c
index b3668e9..6521896 100644
--- a/drivers/scsi/bfa/bfa_svc.c
+++ b/drivers/scsi/bfa/bfa_svc.c
@@ -3081,7 +3081,6 @@ bfa_fcport_attach(struct bfa_s *bfa, void *bfad, struct bfa_iocfc_cfg_s *cfg,
struct bfa_fcport_s *fcport = BFA_FCPORT_MOD(bfa);
struct bfa_port_cfg_s *port_cfg = &fcport->cfg;
struct bfa_fcport_ln_s *ln = &fcport->ln;
- struct timeval tv;
fcport->bfa = bfa;
ln->fcport = fcport;
@@ -3094,8 +3093,7 @@ bfa_fcport_attach(struct bfa_s *bfa, void *bfad, struct bfa_iocfc_cfg_s *cfg,
/*
* initialize time stamp for stats reset
*/
- do_gettimeofday(&tv);
- fcport->stats_reset_time = tv.tv_sec;
+ fcport->stats_reset_time = ktime_get_seconds();
fcport->stats_dma_ready = BFA_FALSE;
/*
@@ -3345,16 +3343,17 @@ __bfa_cb_fcport_stats_get(void *cbarg, bfa_boolean_t complete)
struct bfa_cb_pending_q_s *cb;
struct list_head *qe, *qen;
union bfa_fcport_stats_u *ret;
+ bool status_ok = (fcport->stats_status == BFA_STATUS_OK);
if (complete) {
- struct timeval tv;
- if (fcport->stats_status == BFA_STATUS_OK)
- do_gettimeofday(&tv);
+ time64_t timestamp;
+ if (status_ok)
+ timestamp = ktime_get_seconds();
list_for_each_safe(qe, qen, &fcport->stats_pending_q) {
bfa_q_deq(&fcport->stats_pending_q, &qe);
cb = (struct bfa_cb_pending_q_s *)qe;
- if (fcport->stats_status == BFA_STATUS_OK) {
+ if (status_ok) {
ret = (union bfa_fcport_stats_u *)cb->data;
/* Swap FC QoS or FCoE stats */
if (bfa_ioc_get_fcmode(&fcport->bfa->ioc))
@@ -3364,7 +3363,7 @@ __bfa_cb_fcport_stats_get(void *cbarg, bfa_boolean_t complete)
bfa_fcport_fcoe_stats_swap(&ret->fcoe,
&fcport->stats->fcoe);
ret->fcoe.secs_reset =
- tv.tv_sec - fcport->stats_reset_time;
+ timestamp - fcport->stats_reset_time;
}
}
bfa_cb_queue_status(fcport->bfa, &cb->hcb_qe,
--
1.9.1