Introduction
This patch series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC macros.
The idea for the series evolved from my discussions with Arnd Bergmann.
This was originally part of the RFC series[2]: https://lkml.org/lkml/2016/1/7/20 (under discussion).
Dave Chinner suggested moving bug fixes out of the feature series to keep the original series simple.
There are 354 occurrences of the the above macros in the kernel. The series will be divided into 4 or 5 parts to keep the parts manageable and so that each part could be reviewed and merged independently. This is part 1 of the series.
Motivation
The macros: CURRENT_TIME and CURRENT_TIME_SEC are primarily used for filesystem timestamps. But, they are not accurate as they do not perform clamping according to filesystem timestamps ranges, nor do they truncate the nanoseconds value to the granularity as required by the filesystem.
The series is also viewed as an ancillary to another upcoming series[2] that attempts to transition file system timestamps to use 64 bit time to make these y2038 safe.
There will also be another series[3] to add range checks and clamping to filesystem time functions that are meant to substitute the above macros.
Solution
CURRENT_TIME macro has an equivalent function:
struct timespec current_fs_time(struct super_block *sb)
These will be the changes to the above function: 1. Function will return the type y2038 safe timespec64 in [2]. 2. Function will use y2038 safe 64 bit functions in [2]. 3. Function will be extended to perform range checks in [3].
A new function will be added to substitute for CURRENT_TIME_SEC macro in the current series:
struct timespec current_fs_time_sec(void)
These will be the changes to the above function: 1. Function will return the type y2038 safe timespec64 in [2]. 2. Function will use y2038 safe 64 bit functions in [2]. 3. Function will be extended to perform range checks in [3].
Any use of these macros outside of filesystem timestamps will be replaced by function calls to appropriate time functions.
Deepa Dinamani (10): fs: Add current_fs_time_sec() function vfs: Replace CURRENT_TIME by current_fs_time() fs: cifs: Replace CURRENT_TIME with current_fs_time() fs: cifs: Replace CURRENT_TIME with ktime_get_real_ts() fs: cifs: Replace CURRENT_TIME by get_seconds fs: ext4: Replace CURRENT_TIME_SEC with current_fs_time_sec() fs: ext4: Replace CURRENT_TIME with ext4_current_time() fs: ceph: replace CURRENT_TIME by current_fs_time() fs: ceph: Replace CURRENT_TIME by ktime_get_real_ts() fs: btrfs: Replace CURRENT_TIME by current_fs_time()
fs/btrfs/file.c | 4 ++-- fs/btrfs/inode.c | 25 +++++++++++++------------ fs/btrfs/ioctl.c | 8 ++++---- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 7 +++++-- fs/btrfs/xattr.c | 2 +- fs/ceph/file.c | 4 ++-- fs/ceph/inode.c | 2 +- fs/ceph/mds_client.c | 2 +- fs/ceph/xattr.c | 4 ++-- fs/cifs/cifsencrypt.c | 4 +++- fs/cifs/cifssmb.c | 10 +++++----- fs/cifs/inode.c | 15 +++++++-------- fs/ext4/ext4.h | 2 +- fs/ext4/super.c | 2 +- fs/libfs.c | 21 +++++++++++++-------- fs/nsfs.c | 3 ++- fs/pipe.c | 3 ++- fs/posix_acl.c | 2 +- include/linux/fs.h | 5 +++++ 20 files changed, 72 insertions(+), 55 deletions(-)
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe.
The function is meant to replace CURRENT_TIME_SEC macro. The macro CURRENT_TIME_SEC does not represent filesystem times correctly as it cannot perform range checks. current_fs_time_sec() will be extended to include these.
CURRENT_TIME_SEC is also not y2038 safe. current_fs_time_sec() will be transitioned to use 64 bit time along with vfs in a separate series.
The function is inline for now to maintain similar performance to that of the macro.
The function takes super block as a parameter to allow for future range checking of filesystem timestamps.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org --- include/linux/fs.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a75571..4af612f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1407,6 +1407,11 @@ struct super_block {
extern struct timespec current_fs_time(struct super_block *sb);
+static inline struct timespec current_fs_time_sec(struct super_block *sb) +{ + return (struct timespec) { get_seconds(), 0 }; +} + /* * Snapshotting support. */
The CURRENT_TIME macro is not sufficient for filesystem timestamps since it does not truncate the values according to filesystem granularity.
simple_link(), simple_unlink() and simple_rename() only need a single call to current_fs_time() since they do not span filesystems.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org --- fs/libfs.c | 21 +++++++++++++-------- fs/nsfs.c | 3 ++- fs/pipe.c | 3 ++- fs/posix_acl.c | 2 +- 4 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c index 0ca80b2..e3a8ed5 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -237,7 +237,7 @@ struct dentry *mount_pseudo(struct file_system_type *fs_type, char *name, */ root->i_ino = 1; root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR; - root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME; + root->i_atime = root->i_mtime = root->i_ctime = current_fs_time(s); dentry = __d_alloc(s, &d_name); if (!dentry) { iput(root); @@ -267,7 +267,8 @@ int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *den { struct inode *inode = d_inode(old_dentry);
- inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; + inode->i_ctime = dir->i_ctime = + dir->i_mtime = current_fs_time(dir->i_sb); inc_nlink(inode); ihold(inode); dget(dentry); @@ -301,7 +302,8 @@ int simple_unlink(struct inode *dir, struct dentry *dentry) { struct inode *inode = d_inode(dentry);
- inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; + inode->i_ctime = dir->i_ctime = + dir->i_mtime = current_fs_time(dir->i_sb); drop_nlink(inode); dput(dentry); return 0; @@ -340,8 +342,9 @@ int simple_rename(struct inode *old_dir, struct dentry *old_dentry, inc_nlink(new_dir); }
- old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime = - new_dir->i_mtime = inode->i_ctime = CURRENT_TIME; + old_dir->i_ctime = old_dir->i_mtime = + new_dir->i_ctime = new_dir->i_mtime = + inode->i_ctime = current_fs_time(inode->i_sb);
return 0; } @@ -492,7 +495,7 @@ int simple_fill_super(struct super_block *s, unsigned long magic, */ inode->i_ino = 1; inode->i_mode = S_IFDIR | 0755; - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_atime = inode->i_mtime = inode->i_ctime = current_fs_time(s); inode->i_op = &simple_dir_inode_operations; inode->i_fop = &simple_dir_operations; set_nlink(inode, 2); @@ -518,7 +521,8 @@ int simple_fill_super(struct super_block *s, unsigned long magic, goto out; } inode->i_mode = S_IFREG | files->mode; - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_atime = inode->i_mtime = + inode->i_ctime = current_fs_time(s); inode->i_fop = files->ops; inode->i_ino = i; d_add(dentry, inode); @@ -1064,7 +1068,8 @@ struct inode *alloc_anon_inode(struct super_block *s) inode->i_uid = current_fsuid(); inode->i_gid = current_fsgid(); inode->i_flags |= S_PRIVATE; - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_atime = inode->i_mtime = + inode->i_ctime = current_fs_time(inode->i_sb); return inode; } EXPORT_SYMBOL(alloc_anon_inode); diff --git a/fs/nsfs.c b/fs/nsfs.c index 8f20d60..f0e18b2 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -82,7 +82,8 @@ slow: return ERR_PTR(-ENOMEM); } inode->i_ino = ns->inum; - inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_atime = + inode->i_ctime = current_fs_time(inode->i_sb); inode->i_flags |= S_IMMUTABLE; inode->i_mode = S_IFREG | S_IRUGO; inode->i_fop = &ns_file_operations; diff --git a/fs/pipe.c b/fs/pipe.c index c1c1b26..ad3fc8d 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -699,7 +699,8 @@ static struct inode * get_pipe_inode(void) inode->i_mode = S_IFIFO | S_IRUSR | S_IWUSR; inode->i_uid = current_fsuid(); inode->i_gid = current_fsgid(); - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_atime = inode->i_mtime = + inode->i_ctime = current_fs_time(inode->i_sb);
return inode;
diff --git a/fs/posix_acl.c b/fs/posix_acl.c index 711dd51..778a27e 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -859,7 +859,7 @@ int simple_set_acl(struct inode *inode, struct posix_acl *acl, int type) acl = NULL; }
- inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); set_cached_acl(inode, type, acl); return 0; }
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Change signature of helper cifs_all_info_to_fattr since it now needs both super_block and cifs_sb_info.
Note: The inode timestamps read from the server are assumed to have correct granularity and range.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Steve French sfrench@samba.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org --- fs/cifs/inode.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index aeb26db..fa72359 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -320,9 +320,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb) fattr->cf_mode = S_IFDIR | S_IXUGO | S_IRWXU; fattr->cf_uid = cifs_sb->mnt_uid; fattr->cf_gid = cifs_sb->mnt_gid; - fattr->cf_atime = CURRENT_TIME; - fattr->cf_ctime = CURRENT_TIME; - fattr->cf_mtime = CURRENT_TIME; + fattr->cf_atime = fattr->cf_ctime = + fattr->cf_mtime = current_fs_time(sb); fattr->cf_nlink = 2; fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL; } @@ -584,9 +583,10 @@ static int cifs_sfu_mode(struct cifs_fattr *fattr, const unsigned char *path, /* Fill a cifs_fattr struct with info from FILE_ALL_INFO */ static void cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, - struct cifs_sb_info *cifs_sb, bool adjust_tz, + struct super_block *sb, bool adjust_tz, bool symlink) { + struct cifs_sb_info *cifs_sb = CIFS_SB(sb); struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
memset(fattr, 0, sizeof(*fattr)); @@ -597,7 +597,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info, if (info->LastAccessTime) fattr->cf_atime = cifs_NTtimeToUnix(info->LastAccessTime); else - fattr->cf_atime = CURRENT_TIME; + fattr->cf_atime = current_fs_time(sb);
fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime); fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime); @@ -657,7 +657,6 @@ cifs_get_file_info(struct file *filp) FILE_ALL_INFO find_data; struct cifs_fattr fattr; struct inode *inode = file_inode(filp); - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); struct cifsFileInfo *cfile = filp->private_data; struct cifs_tcon *tcon = tlink_tcon(cfile->tlink); struct TCP_Server_Info *server = tcon->ses->server; @@ -669,7 +668,7 @@ cifs_get_file_info(struct file *filp) rc = server->ops->query_file_info(xid, tcon, &cfile->fid, &find_data); switch (rc) { case 0: - cifs_all_info_to_fattr(&fattr, &find_data, cifs_sb, false, + cifs_all_info_to_fattr(&fattr, &find_data, inode->i_sb, false, false); break; case -EREMOTE: @@ -751,7 +750,7 @@ cifs_get_inode_info(struct inode **inode, const char *full_path, }
if (!rc) { - cifs_all_info_to_fattr(&fattr, data, cifs_sb, adjust_tz, + cifs_all_info_to_fattr(&fattr, data, sb, adjust_tz, symlink); } else if (rc == -EREMOTE) { cifs_create_dfs_fattr(&fattr, sb);
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned series.
Filesystem times will use current_fs_time() instead of CURRENT_TIME. Use ktime_get_real_ts() here as this is not filesystem time. ktime_get_real_ts() returns the timestamp in ns which can be used to calculate network time for NTLMv2 authentication timestamp.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Steve French sfrench@samba.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org --- fs/cifs/cifsencrypt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index d411654..f86e07d 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -460,6 +460,7 @@ find_timestamp(struct cifs_ses *ses) unsigned char *blobptr; unsigned char *blobend; struct ntlmssp2_name *attrptr; + struct timespec ts;
if (!ses->auth_key.len || !ses->auth_key.response) return 0; @@ -484,7 +485,8 @@ find_timestamp(struct cifs_ses *ses) blobptr += attrsize; /* advance attr value */ }
- return cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME)); + ktime_get_real_ts(&ts); + return cpu_to_le64(cifs_UnixTimeToNT(ts)); }
static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash,
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned series.
Filesystems will use current_fs_time() instead of CURRENT_TIME. Use get_seconds() here as this is not filesystem time. Only the seconds portion of the timestamp is necessary for timezone calculation using server time.
Assume that the difference between server and client times lie in the range INT_MIN..INT_MAX. This is valid because this is the difference between current times between server and client, and the largest timezone difference is in the range of one day.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Steve French sfrench@samba.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org --- fs/cifs/cifssmb.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 90b4f9f..1a9e43d 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -478,14 +478,14 @@ decode_lanman_negprot_rsp(struct TCP_Server_Info *server, NEGOTIATE_RSP *pSMBr) * this requirement. */ int val, seconds, remain, result; - struct timespec ts, utc; - utc = CURRENT_TIME; + struct timespec ts; + unsigned long utc = get_seconds(); ts = cnvrtDosUnixTm(rsp->SrvTime.Date, rsp->SrvTime.Time, 0); cifs_dbg(FYI, "SrvTime %d sec since 1970 (utc: %d) diff: %d\n", - (int)ts.tv_sec, (int)utc.tv_sec, - (int)(utc.tv_sec - ts.tv_sec)); - val = (int)(utc.tv_sec - ts.tv_sec); + (int)ts.tv_sec, (int)utc, + (int)(utc - ts.tv_sec)); + val = (int)(utc - ts.tv_sec); seconds = abs(val); result = (seconds / MIN_TZ_ADJ) * MIN_TZ_ADJ; remain = seconds % MIN_TZ_ADJ;
The macro CURRENT_TIME_SEC does not represent filesystem times correctly as it cannot perform range checks. current_fs_time_sec() will be extended to include this.
CURRENT_TIME_SEC is also not y2038 safe. current_fs_time_sec() will be transitioned to use 64 bit time along with vfs in a separate series.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org --- fs/ext4/ext4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 0662b28..8dd04f8 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1489,7 +1489,7 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode) static inline struct timespec ext4_current_time(struct inode *inode) { return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ? - current_fs_time(inode->i_sb) : CURRENT_TIME_SEC; + current_fs_time(inode->i_sb) : current_fs_time_sec(inode->i_sb); }
static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use ext4_current_time() instead which is appropriate for ext4 timestamps.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Theodore Ts'o" tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 3ed01ec..5e6c866 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5139,7 +5139,7 @@ static int ext4_quota_off(struct super_block *sb, int type) handle = ext4_journal_start(inode, EXT4_HT_QUOTA, 1); if (IS_ERR(handle)) goto out; - inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_ctime = ext4_current_time(inode); ext4_mark_inode_dirty(handle, inode); ext4_journal_stop(handle);
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Yan, Zheng" zyan@redhat.com Cc: Sage Weil sage@redhat.com Cc: Ilya Dryomov idryomov@gmail.com Cc: ceph-devel@vger.kernel.org --- fs/ceph/file.c | 4 ++-- fs/ceph/inode.c | 2 +- fs/ceph/xattr.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 86a9c38..9b338ff 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -783,7 +783,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, int num_pages = 0; int flags; int ret; - struct timespec mtime = CURRENT_TIME; + struct timespec mtime = current_fs_time(inode->i_sb); size_t count = iov_iter_count(iter); loff_t pos = iocb->ki_pos; bool write = iov_iter_rw(iter) == WRITE; @@ -988,7 +988,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, int flags; int check_caps = 0; int ret; - struct timespec mtime = CURRENT_TIME; + struct timespec mtime = current_fs_time(inode->i_sb); size_t count = iov_iter_count(from);
if (ceph_snap(file_inode(file)) != CEPH_NOSNAP) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index fb4ba2e..63d0198 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1959,7 +1959,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr) if (dirtied) { inode_dirty_flags = __ceph_mark_dirty_caps(ci, dirtied, &prealloc_cf); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); }
release &= issued; diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index 819163d..1e1c00a 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -999,7 +999,7 @@ retry: dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL, &prealloc_cf); ci->i_xattrs.dirty = true; - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); }
spin_unlock(&ci->i_ceph_lock); @@ -1136,7 +1136,7 @@ retry: dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL, &prealloc_cf); ci->i_xattrs.dirty = true; - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); spin_unlock(&ci->i_ceph_lock); if (lock_snap_rwsem) up_read(&mdsc->snap_rwsem);
On Feb 3, 2016, at 14:07, Deepa Dinamani deepa.kernel@gmail.com wrote:
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Yan, Zheng" zyan@redhat.com Cc: Sage Weil sage@redhat.com Cc: Ilya Dryomov idryomov@gmail.com Cc: ceph-devel@vger.kernel.org
applied, thanks
Yan, Zheng
fs/ceph/file.c | 4 ++-- fs/ceph/inode.c | 2 +- fs/ceph/xattr.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 86a9c38..9b338ff 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -783,7 +783,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter, int num_pages = 0; int flags; int ret;
- struct timespec mtime = CURRENT_TIME;
- struct timespec mtime = current_fs_time(inode->i_sb); size_t count = iov_iter_count(iter); loff_t pos = iocb->ki_pos; bool write = iov_iter_rw(iter) == WRITE;
@@ -988,7 +988,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, int flags; int check_caps = 0; int ret;
- struct timespec mtime = CURRENT_TIME;
struct timespec mtime = current_fs_time(inode->i_sb); size_t count = iov_iter_count(from);
if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index fb4ba2e..63d0198 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1959,7 +1959,7 @@ int ceph_setattr(struct dentry *dentry, struct iattr *attr) if (dirtied) { inode_dirty_flags = __ceph_mark_dirty_caps(ci, dirtied, &prealloc_cf);
inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_fs_time(inode->i_sb);
}
release &= issued;
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index 819163d..1e1c00a 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -999,7 +999,7 @@ retry: dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL, &prealloc_cf); ci->i_xattrs.dirty = true;
inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_fs_time(inode->i_sb);
}
spin_unlock(&ci->i_ceph_lock);
@@ -1136,7 +1136,7 @@ retry: dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL, &prealloc_cf); ci->i_xattrs.dirty = true;
- inode->i_ctime = CURRENT_TIME;
- inode->i_ctime = current_fs_time(inode->i_sb); spin_unlock(&ci->i_ceph_lock); if (lock_snap_rwsem) up_read(&mdsc->snap_rwsem);
-- 1.9.1
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned series.
Filesystems will use current_fs_time() instead of CURRENT_TIME. Use ktime_get_real_ts() here as this is not filesystem time. ktime_get_real_ts() returns the timestamp in ns which can be used to calculate MDS request timestamp.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Yan, Zheng" zyan@redhat.com Cc: Sage Weil sage@redhat.com Cc: Ilya Dryomov idryomov@gmail.com Cc: ceph-devel@vger.kernel.org --- fs/ceph/mds_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index e7b130a..348b22e 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item);
- req->r_stamp = CURRENT_TIME; + ktime_get_real_ts(&req->r_stamp);
req->r_op = op; req->r_direct_mode = mode;
On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani deepa.kernel@gmail.com wrote:
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned series.
Filesystems will use current_fs_time() instead of CURRENT_TIME. Use ktime_get_real_ts() here as this is not filesystem time. ktime_get_real_ts() returns the timestamp in ns which can be used to calculate MDS request timestamp.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Yan, Zheng" zyan@redhat.com Cc: Sage Weil sage@redhat.com Cc: Ilya Dryomov idryomov@gmail.com Cc: ceph-devel@vger.kernel.org
fs/ceph/mds_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index e7b130a..348b22e 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item);
req->r_stamp = CURRENT_TIME;
ktime_get_real_ts(&req->r_stamp);
I think we should use current_fs_time() here. I have squash the change into another patch
req->r_op = op; req->r_direct_mode = mode;
-- 1.9.1
On Wed, Feb 03, 2016 at 10:34:00PM +0800, Yan, Zheng wrote:
On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani deepa.kernel@gmail.com wrote:
This is in preparation for the series that transitions filesystem timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned series.
Filesystems will use current_fs_time() instead of CURRENT_TIME. Use ktime_get_real_ts() here as this is not filesystem time. ktime_get_real_ts() returns the timestamp in ns which can be used to calculate MDS request timestamp.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: "Yan, Zheng" zyan@redhat.com Cc: Sage Weil sage@redhat.com Cc: Ilya Dryomov idryomov@gmail.com Cc: ceph-devel@vger.kernel.org
fs/ceph/mds_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index e7b130a..348b22e 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item);
req->r_stamp = CURRENT_TIME;
ktime_get_real_ts(&req->r_stamp);
I think we should use current_fs_time() here. I have squash the change into another patch
Ok. I missed this commit b8e69066d8afa8d2670dc697252ff0e5907aafad earlier which says that the r_stamp is used as ctime now. I had assumed that this is a message timestamp.
I was not able to find any documentation on what the server does with the message sent by the client. Where can I find that?
So, this should actually look like
req->r_stamp = current_fs_time(mdsc->fsc->sb);
Let me know if you want me to resend.
-Deepa
On Wednesday 03 February 2016 08:17:23 Deepa Dinamani wrote:
On Wed, Feb 03, 2016 at 10:34:00PM +0800, Yan, Zheng wrote:
On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani deepa.kernel@gmail.com wrote:
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item);
req->r_stamp = CURRENT_TIME;
ktime_get_real_ts(&req->r_stamp);
I think we should use current_fs_time() here. I have squash the change into another patch
Ok. I missed this commit b8e69066d8afa8d2670dc697252ff0e5907aafad earlier which says that the r_stamp is used as ctime now. I had assumed that this is a message timestamp.
I was not able to find any documentation on what the server does with the message sent by the client. Where can I find that?
So, this should actually look like
req->r_stamp = current_fs_time(mdsc->fsc->sb);
Let me know if you want me to resend.
I see that the timestamp is sent using
ceph_encode_copy(&p, &req->r_stamp, sizeof(req->r_stamp));
What happens with the timestamp across reboots if we change the type? I assume the data will not be used across reboots, if it does, we already have a problem on machines that can boot both big-endian and little-endian kernels, or that can boot both 32-bit and 64-bit kernels.
Arnd
On Feb 4, 2016, at 05:27, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 03 February 2016 08:17:23 Deepa Dinamani wrote:
On Wed, Feb 03, 2016 at 10:34:00PM +0800, Yan, Zheng wrote:
On Wed, Feb 3, 2016 at 2:07 PM, Deepa Dinamani deepa.kernel@gmail.com wrote:
--- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1721,7 +1721,7 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) init_completion(&req->r_safe_completion); INIT_LIST_HEAD(&req->r_unsafe_item);
req->r_stamp = CURRENT_TIME;
ktime_get_real_ts(&req->r_stamp);
I think we should use current_fs_time() here. I have squash the change into another patch
Ok. I missed this commit b8e69066d8afa8d2670dc697252ff0e5907aafad earlier which says that the r_stamp is used as ctime now. I had assumed that this is a message timestamp.
I was not able to find any documentation on what the server does with the message sent by the client. Where can I find that?
So, this should actually look like
req->r_stamp = current_fs_time(mdsc->fsc->sb);
Let me know if you want me to resend.
I have already squashed the change into patch 8
I see that the timestamp is sent using
ceph_encode_copy(&p, &req->r_stamp, sizeof(req->r_stamp));
this code is outdated, current code is:
{ struct ceph_timespec ts; ceph_encode_timespec(&ts, &req->r_stamp); ceph_encode_copy(&p, &ts, sizeof(ts)); }
What happens with the timestamp across reboots if we change the type? I assume the data will not be used across reboots, if it does, we already have a problem on machines that can boot both big-endian and little-endian kernels, or that can boot both 32-bit and 64-bit kernels.
Arnd
On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
On Feb 4, 2016, at 05:27, Arnd Bergmann arnd@arndb.de wrote:
{ struct ceph_timespec ts; ceph_encode_timespec(&ts, &req->r_stamp); ceph_encode_copy(&p, &ts, sizeof(ts)); }
Ok, that does make the behavior consistent on all architectures, but leads to a different question:
struct ceph_timespec { __le32 tv_sec; __le32 tv_nsec; } __attribute__ ((packed));
How do you define ceph_timespec, is tv_sec supposed to be signed or unsigned?
It seems that you treat it as signed, meaning you interpret times from the server as being in the [1902..2038] range, rather than the [1970..2106] range:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); }
Is that intentional and documented? If yes, what is your plan to deal with y2038 support?
Arnd
On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
On Feb 4, 2016, at 05:27, Arnd Bergmann arnd@arndb.de wrote:
{ struct ceph_timespec ts; ceph_encode_timespec(&ts, &req->r_stamp); ceph_encode_copy(&p, &ts, sizeof(ts)); }
Ok, that does make the behavior consistent on all architectures, but leads to a different question:
struct ceph_timespec { __le32 tv_sec; __le32 tv_nsec; } __attribute__ ((packed));
How do you define ceph_timespec, is tv_sec supposed to be signed or unsigned?
It seems that you treat it as signed, meaning you interpret times from the server as being in the [1902..2038] range, rather than the [1970..2106] range:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); }
Is that intentional and documented? If yes, what is your plan to deal with y2038 support?
tv_sec is used as a time_t, so signed. The problem is that ceph_timespec is not only passed over the wire, but is also stored on disk, part of quite a few other data structures. The plan is to eventually switch to a 64-bit tv_sec and tv_nsec, bump the version on all the structures that contain it and add a cluster-wide feature bit to deal with older clients. We've recently had a discussion about this, so it may even happen in a not so distant future, but no promises ;)
Thanks,
Ilya
On Thursday 04 February 2016 10:01:31 Ilya Dryomov wrote:
On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
On Feb 4, 2016, at 05:27, Arnd Bergmann arnd@arndb.de wrote:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); }
Is that intentional and documented? If yes, what is your plan to deal with y2038 support?
tv_sec is used as a time_t, so signed. The problem is that ceph_timespec is not only passed over the wire, but is also stored on disk, part of quite a few other data structures.
That is only part of the issue though:
Most file systems that store a timespec on disk define the function differently:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { ts->tv_sec = (time_t)(u32)le32_to_cpu(tv->tv_sec); ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); }
On systems that have a 64-bit time_t, the 1902..1970 interval (0xffffffff80000000..0xffffffffffffffff) and the 2038..2106 interval (0x0000000080000000..0x00000000ffffffff) are written as the same 32-bit numbers, so when reading back you have to decide which interpretation you want, and your cast to __kernel_time_t means that you get the first representation on both 32-bit and 64-bit systems.
On systems with a 32-bit time_t, this is the only option you have anyway, and some other file systems (ext2/3/4, xfs, ...) made the same decision in order to behave in a consistent way independent of what kernel (32-bit or 64-bit) you use. This is generally a reasonable goal, but it means that you get the overflow in 2038 rather than 2106.
Alex Elder changed the cephs behavior in 2013 to be the same way, but from the changelog c3f56102f28d ("libceph: validate timespec conversions"), I guess this was not intentional, as he was also adding a comparison against U32_MAX, which should have been S32_MAX.
A lot of other file systems (jfs, jffs2, hpfs, minix) apparently prefer the 1970..2106 interpretation of time values.
The plan is to eventually switch to a 64-bit tv_sec and tv_nsec, bump the version on all the structures that contain it and add a cluster-wide feature bit to deal with older clients. We've recently had a discussion about this, so it may even happen in a not so distant future, but no promises
Ok. We have a (rough) plan to deal with file systems that don't support extended time stamps in the meantime, so depending on user preferences we would either allow them to be used as before with times clamped to the 2038 overflow date, or only mounted readonly for users that want to ensure their systems can survive without regressions in 2038.
Arnd
On Thu, Feb 4, 2016 at 5:31 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 04 February 2016 10:01:31 Ilya Dryomov wrote:
On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 04 February 2016 10:00:19 Yan, Zheng wrote:
On Feb 4, 2016, at 05:27, Arnd Bergmann arnd@arndb.de wrote:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); }
Is that intentional and documented? If yes, what is your plan to deal with y2038 support?
tv_sec is used as a time_t, so signed. The problem is that ceph_timespec is not only passed over the wire, but is also stored on disk, part of quite a few other data structures.
That is only part of the issue though:
Most file systems that store a timespec on disk define the function differently:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { ts->tv_sec = (time_t)(u32)le32_to_cpu(tv->tv_sec); ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); }
On systems that have a 64-bit time_t, the 1902..1970 interval (0xffffffff80000000..0xffffffffffffffff) and the 2038..2106 interval (0x0000000080000000..0x00000000ffffffff) are written as the same 32-bit numbers, so when reading back you have to decide which interpretation you want, and your cast to __kernel_time_t means that you get the first representation on both 32-bit and 64-bit systems.
On systems with a 32-bit time_t, this is the only option you have anyway, and some other file systems (ext2/3/4, xfs, ...) made the same decision in order to behave in a consistent way independent of what kernel (32-bit or 64-bit) you use. This is generally a reasonable goal, but it means that you get the overflow in 2038 rather than 2106.
Alex Elder changed the cephs behavior in 2013 to be the same way, but from the changelog c3f56102f28d ("libceph: validate timespec conversions"), I guess this was not intentional, as he was also adding a comparison against U32_MAX, which should have been S32_MAX.
A lot of other file systems (jfs, jffs2, hpfs, minix) apparently prefer the 1970..2106 interpretation of time values.
The plan is to eventually switch to a 64-bit tv_sec and tv_nsec, bump the version on all the structures that contain it and add a cluster-wide feature bit to deal with older clients. We've recently had a discussion about this, so it may even happen in a not so distant future, but no promises
Ok. We have a (rough) plan to deal with file systems that don't support extended time stamps in the meantime, so depending on user preferences we would either allow them to be used as before with times clamped to the 2038 overflow date, or only mounted readonly for users that want to ensure their systems can survive without regressions in 2038.
I dug up the email conversation, about it, although I think Adam has done more work than it indicates: http://www.spinics.net/lists/ceph-devel/msg27900.html. I can't speak to any kernel-specific issues but this kind of transition while maintaining wire compatibility with older code is something we've done a lot; it shouldn't be a big deal even in the kernel where we're slightly less prolific with such things. :) -Greg
On Thursday 04 February 2016 07:26:51 Gregory Farnum wrote:
On Thu, Feb 4, 2016 at 5:31 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 04 February 2016 10:01:31 Ilya Dryomov wrote:
On Thu, Feb 4, 2016 at 9:30 AM, Arnd Bergmann arnd@arndb.de wrote:
A lot of other file systems (jfs, jffs2, hpfs, minix) apparently prefer the 1970..2106 interpretation of time values.
The plan is to eventually switch to a 64-bit tv_sec and tv_nsec, bump the version on all the structures that contain it and add a cluster-wide feature bit to deal with older clients. We've recently had a discussion about this, so it may even happen in a not so distant future, but no promises
Ok. We have a (rough) plan to deal with file systems that don't support extended time stamps in the meantime, so depending on user preferences we would either allow them to be used as before with times clamped to the 2038 overflow date, or only mounted readonly for users that want to ensure their systems can survive without regressions in 2038.
I dug up the email conversation, about it, although I think Adam has done more work than it indicates: http://www.spinics.net/lists/ceph-devel/msg27900.html. I can't speak to any kernel-specific issues but this kind of transition while maintaining wire compatibility with older code is something we've done a lot; it shouldn't be a big deal even in the kernel where we're slightly less prolific with such things.
On the kernel side, the interesting part is to figure out whether the other end can support the new format or not, and setting the limit in the superblock accordingly. Once you have determined that both sides support the extended timestamps, sending a timestamp beyond 2038 must not fail or cause incorrect data.
On the wire protocol, you could consider extending the timestamps in the same way as ext4, as you already have nanosecond timestamps, and you can use the upper two bits of the nanoseconds to extend the seconds field to 34 bits, giving you a range of valid times between 1902 and 2446, though if you have to make an incompatible change anyway, going to 64 bit is easier.
Arnd
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Chris Mason clm@fb.com Cc: Josef Bacik jbacik@fb.com Cc: David Sterba dsterba@suse.com Cc: linux-btrfs@vger.kernel.org --- fs/btrfs/file.c | 4 ++-- fs/btrfs/inode.c | 25 +++++++++++++------------ fs/btrfs/ioctl.c | 8 ++++---- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 7 +++++-- fs/btrfs/xattr.c | 2 +- 6 files changed, 26 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 098bb8f..610f569 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2544,7 +2544,7 @@ out_trans: goto out_free;
inode_inc_iversion(inode); - inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb);
trans->block_rsv = &root->fs_info->trans_block_rsv; ret = btrfs_update_inode(trans, root, inode); @@ -2794,7 +2794,7 @@ static long btrfs_fallocate(struct file *file, int mode, if (IS_ERR(trans)) { ret = PTR_ERR(trans); } else { - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); i_size_write(inode, actual_end); btrfs_ordered_update_i_size(inode, actual_end, NULL); ret = btrfs_update_inode(trans, root, inode); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e28f3d4..59c0e22 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4013,7 +4013,8 @@ err: btrfs_i_size_write(dir, dir->i_size - name_len * 2); inode_inc_iversion(inode); inode_inc_iversion(dir); - inode->i_ctime = dir->i_mtime = dir->i_ctime = CURRENT_TIME; + inode->i_ctime = dir->i_mtime = + dir->i_ctime = current_fs_time(inode->i_sb); ret = btrfs_update_inode(trans, root, dir); out: return ret; @@ -4156,7 +4157,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
btrfs_i_size_write(dir, dir->i_size - name_len * 2); inode_inc_iversion(dir); - dir->i_mtime = dir->i_ctime = CURRENT_TIME; + dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb); ret = btrfs_update_inode_fallback(trans, root, dir); if (ret) btrfs_abort_transaction(trans, root, ret); @@ -5588,7 +5589,7 @@ static struct inode *new_simple_dir(struct super_block *s, inode->i_op = &btrfs_dir_ro_inode_operations; inode->i_fop = &simple_dir_operations; inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO; - inode->i_mtime = CURRENT_TIME; + inode->i_mtime = current_fs_time(inode->i_sb); inode->i_atime = inode->i_mtime; inode->i_ctime = inode->i_mtime; BTRFS_I(inode)->i_otime = inode->i_mtime; @@ -6160,7 +6161,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, inode_init_owner(inode, dir, mode); inode_set_bytes(inode, 0);
- inode->i_mtime = CURRENT_TIME; + inode->i_mtime = current_fs_time(inode->i_sb); inode->i_atime = inode->i_mtime; inode->i_ctime = inode->i_mtime; BTRFS_I(inode)->i_otime = inode->i_mtime; @@ -6273,7 +6274,8 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, btrfs_i_size_write(parent_inode, parent_inode->i_size + name_len * 2); inode_inc_iversion(parent_inode); - parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME; + parent_inode->i_mtime = parent_inode->i_ctime = + current_fs_time(parent_inode->i_sb); ret = btrfs_update_inode(trans, root, parent_inode); if (ret) btrfs_abort_transaction(trans, root, ret); @@ -6491,7 +6493,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, BTRFS_I(inode)->dir_index = 0ULL; inc_nlink(inode); inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); ihold(inode); set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags);
@@ -9234,7 +9236,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct btrfs_root *dest = BTRFS_I(new_dir)->root; struct inode *new_inode = d_inode(new_dentry); struct inode *old_inode = d_inode(old_dentry); - struct timespec ctime = CURRENT_TIME; u64 index = 0; u64 root_objectid; int ret; @@ -9331,9 +9332,9 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, inode_inc_iversion(old_dir); inode_inc_iversion(new_dir); inode_inc_iversion(old_inode); - old_dir->i_ctime = old_dir->i_mtime = ctime; - new_dir->i_ctime = new_dir->i_mtime = ctime; - old_inode->i_ctime = ctime; + old_dir->i_ctime = old_dir->i_mtime = + new_dir->i_ctime = new_dir->i_mtime = + old_inode->i_ctime = current_fs_time(old_dir->i_sb);
if (old_dentry->d_parent != new_dentry->d_parent) btrfs_record_unlink_dir(trans, old_dir, old_inode, 1); @@ -9358,7 +9359,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (new_inode) { inode_inc_iversion(new_inode); - new_inode->i_ctime = CURRENT_TIME; + new_inode->i_ctime = current_fs_time(new_inode->i_sb); if (unlikely(btrfs_ino(new_inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) { root_objectid = BTRFS_I(new_inode)->location.objectid; @@ -9836,7 +9837,7 @@ next: *alloc_hint = ins.objectid + ins.offset;
inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); BTRFS_I(inode)->flags |= BTRFS_INODE_PREALLOC; if (!(mode & FALLOC_FL_KEEP_SIZE) && (actual_len > inode->i_size) && diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 952172c..6f35d9c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -347,7 +347,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
btrfs_update_iflags(inode); inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); ret = btrfs_update_inode(trans, root, inode);
btrfs_end_transaction(trans, root); @@ -443,7 +443,7 @@ static noinline int create_subvol(struct inode *dir, struct btrfs_root *root = BTRFS_I(dir)->root; struct btrfs_root *new_root; struct btrfs_block_rsv block_rsv; - struct timespec cur_time = CURRENT_TIME; + struct timespec cur_time = current_fs_time(dir->i_sb); struct inode *inode; int ret; int err; @@ -3148,7 +3148,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
inode_inc_iversion(inode); if (!no_time_update) - inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb); /* * We round up to the block size at eof when determining which * extents to clone above, but shouldn't round up the file size. @@ -4956,7 +4956,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file, struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_root_item *root_item = &root->root_item; struct btrfs_trans_handle *trans; - struct timespec ct = CURRENT_TIME; + struct timespec ct = current_fs_time(inode->i_sb); int ret = 0; int received_uuid_changed;
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 7cf8509..0d3a4ee 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -488,7 +488,7 @@ void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_root_item *item = &root->root_item; - struct timespec ct = CURRENT_TIME; + struct timespec ct = current_fs_time(root->ino_cache_inode->i_sb);
spin_lock(&root->root_item_lock); btrfs_set_root_ctransid(item, trans->transid); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b6031ce..37562d6 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1333,7 +1333,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, struct dentry *dentry; struct extent_buffer *tmp; struct extent_buffer *old; - struct timespec cur_time = CURRENT_TIME; + struct timespec cur_time; int ret = 0; u64 to_reserve = 0; u64 index = 0; @@ -1381,6 +1381,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, parent_root = BTRFS_I(parent_inode)->root; record_root_in_trans(trans, parent_root);
+ cur_time = current_fs_time(parent_inode->i_sb); + /* * insert the directory item */ @@ -1523,7 +1525,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
btrfs_i_size_write(parent_inode, parent_inode->i_size + dentry->d_name.len * 2); - parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME; + parent_inode->i_mtime = parent_inode->i_ctime = + current_fs_time(parent_inode->i_sb); ret = btrfs_update_inode_fallback(trans, parent_root, parent_inode); if (ret) { btrfs_abort_transaction(trans, root, ret); diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index 6c68d63..f2a20d5 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -249,7 +249,7 @@ int __btrfs_setxattr(struct btrfs_trans_handle *trans, goto out;
inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags); ret = btrfs_update_inode(trans, root, inode); BUG_ON(ret);
On Tue, Feb 02, 2016 at 10:07:50PM -0800, Deepa Dinamani wrote:
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Chris Mason clm@fb.com Cc: Josef Bacik jbacik@fb.com Cc: David Sterba dsterba@suse.com Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba dsterba@suse.com
There's no actual change for btrfs as it uses granularity 1 which is a no-op and equivalent to CURRENT_TIME.
The kernel tester found a dereferencing NULL pointer issue with this patch.
I think this is the fix:
--- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -488,7 +488,7 @@ void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_root_item *item = &root->root_item; - struct timespec ct = current_fs_time(root->ino_cache_inode->i_sb); + struct timespec ct = current_fs_time(root->fs_info->sb);
I will test and re-post the patch.
-Deepa
On Thu, Feb 4, 2016 at 6:14 AM, David Sterba dsterba@suse.cz wrote:
On Tue, Feb 02, 2016 at 10:07:50PM -0800, Deepa Dinamani wrote:
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Chris Mason clm@fb.com Cc: Josef Bacik jbacik@fb.com Cc: David Sterba dsterba@suse.com Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba dsterba@suse.com
There's no actual change for btrfs as it uses granularity 1 which is a no-op and equivalent to CURRENT_TIME.
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Chris Mason clm@fb.com Cc: Josef Bacik jbacik@fb.com Cc: David Sterba dsterba@suse.com Cc: linux-btrfs@vger.kernel.org --- changes since v1: btrfs_update_root_times uses root->fs_info instead of root->ino_cache_inode to obtain struct super_block pointer from struct btrfs_root. The issue that was reported by the kernel tester was that a null pointer dereference occurred. This is because the inode cache is disabled by default and this inode pointer was null.
Inode cache inode pointer should not be used to access super block information.
fs/btrfs/file.c | 4 ++-- fs/btrfs/inode.c | 25 +++++++++++++------------ fs/btrfs/ioctl.c | 8 ++++---- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 7 +++++-- fs/btrfs/xattr.c | 2 +- 6 files changed, 26 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 098bb8f..610f569 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2544,7 +2544,7 @@ out_trans: goto out_free;
inode_inc_iversion(inode); - inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb);
trans->block_rsv = &root->fs_info->trans_block_rsv; ret = btrfs_update_inode(trans, root, inode); @@ -2794,7 +2794,7 @@ static long btrfs_fallocate(struct file *file, int mode, if (IS_ERR(trans)) { ret = PTR_ERR(trans); } else { - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); i_size_write(inode, actual_end); btrfs_ordered_update_i_size(inode, actual_end, NULL); ret = btrfs_update_inode(trans, root, inode); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e28f3d4..59c0e22 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4013,7 +4013,8 @@ err: btrfs_i_size_write(dir, dir->i_size - name_len * 2); inode_inc_iversion(inode); inode_inc_iversion(dir); - inode->i_ctime = dir->i_mtime = dir->i_ctime = CURRENT_TIME; + inode->i_ctime = dir->i_mtime = + dir->i_ctime = current_fs_time(inode->i_sb); ret = btrfs_update_inode(trans, root, dir); out: return ret; @@ -4156,7 +4157,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
btrfs_i_size_write(dir, dir->i_size - name_len * 2); inode_inc_iversion(dir); - dir->i_mtime = dir->i_ctime = CURRENT_TIME; + dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb); ret = btrfs_update_inode_fallback(trans, root, dir); if (ret) btrfs_abort_transaction(trans, root, ret); @@ -5588,7 +5589,7 @@ static struct inode *new_simple_dir(struct super_block *s, inode->i_op = &btrfs_dir_ro_inode_operations; inode->i_fop = &simple_dir_operations; inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO; - inode->i_mtime = CURRENT_TIME; + inode->i_mtime = current_fs_time(inode->i_sb); inode->i_atime = inode->i_mtime; inode->i_ctime = inode->i_mtime; BTRFS_I(inode)->i_otime = inode->i_mtime; @@ -6160,7 +6161,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, inode_init_owner(inode, dir, mode); inode_set_bytes(inode, 0);
- inode->i_mtime = CURRENT_TIME; + inode->i_mtime = current_fs_time(inode->i_sb); inode->i_atime = inode->i_mtime; inode->i_ctime = inode->i_mtime; BTRFS_I(inode)->i_otime = inode->i_mtime; @@ -6273,7 +6274,8 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, btrfs_i_size_write(parent_inode, parent_inode->i_size + name_len * 2); inode_inc_iversion(parent_inode); - parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME; + parent_inode->i_mtime = parent_inode->i_ctime = + current_fs_time(parent_inode->i_sb); ret = btrfs_update_inode(trans, root, parent_inode); if (ret) btrfs_abort_transaction(trans, root, ret); @@ -6491,7 +6493,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, BTRFS_I(inode)->dir_index = 0ULL; inc_nlink(inode); inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); ihold(inode); set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags);
@@ -9234,7 +9236,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct btrfs_root *dest = BTRFS_I(new_dir)->root; struct inode *new_inode = d_inode(new_dentry); struct inode *old_inode = d_inode(old_dentry); - struct timespec ctime = CURRENT_TIME; u64 index = 0; u64 root_objectid; int ret; @@ -9331,9 +9332,9 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, inode_inc_iversion(old_dir); inode_inc_iversion(new_dir); inode_inc_iversion(old_inode); - old_dir->i_ctime = old_dir->i_mtime = ctime; - new_dir->i_ctime = new_dir->i_mtime = ctime; - old_inode->i_ctime = ctime; + old_dir->i_ctime = old_dir->i_mtime = + new_dir->i_ctime = new_dir->i_mtime = + old_inode->i_ctime = current_fs_time(old_dir->i_sb);
if (old_dentry->d_parent != new_dentry->d_parent) btrfs_record_unlink_dir(trans, old_dir, old_inode, 1); @@ -9358,7 +9359,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
if (new_inode) { inode_inc_iversion(new_inode); - new_inode->i_ctime = CURRENT_TIME; + new_inode->i_ctime = current_fs_time(new_inode->i_sb); if (unlikely(btrfs_ino(new_inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) { root_objectid = BTRFS_I(new_inode)->location.objectid; @@ -9836,7 +9837,7 @@ next: *alloc_hint = ins.objectid + ins.offset;
inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); BTRFS_I(inode)->flags |= BTRFS_INODE_PREALLOC; if (!(mode & FALLOC_FL_KEEP_SIZE) && (actual_len > inode->i_size) && diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 952172c..6f35d9c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -347,7 +347,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
btrfs_update_iflags(inode); inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); ret = btrfs_update_inode(trans, root, inode);
btrfs_end_transaction(trans, root); @@ -443,7 +443,7 @@ static noinline int create_subvol(struct inode *dir, struct btrfs_root *root = BTRFS_I(dir)->root; struct btrfs_root *new_root; struct btrfs_block_rsv block_rsv; - struct timespec cur_time = CURRENT_TIME; + struct timespec cur_time = current_fs_time(dir->i_sb); struct inode *inode; int ret; int err; @@ -3148,7 +3148,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
inode_inc_iversion(inode); if (!no_time_update) - inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb); /* * We round up to the block size at eof when determining which * extents to clone above, but shouldn't round up the file size. @@ -4956,7 +4956,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file, struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_root_item *root_item = &root->root_item; struct btrfs_trans_handle *trans; - struct timespec ct = CURRENT_TIME; + struct timespec ct = current_fs_time(inode->i_sb); int ret = 0; int received_uuid_changed;
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 7cf8509..a25f3b2 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -488,7 +488,7 @@ void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_root_item *item = &root->root_item; - struct timespec ct = CURRENT_TIME; + struct timespec ct = current_fs_time(root->fs_info->sb);
spin_lock(&root->root_item_lock); btrfs_set_root_ctransid(item, trans->transid); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b6031ce..37562d6 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1333,7 +1333,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, struct dentry *dentry; struct extent_buffer *tmp; struct extent_buffer *old; - struct timespec cur_time = CURRENT_TIME; + struct timespec cur_time; int ret = 0; u64 to_reserve = 0; u64 index = 0; @@ -1381,6 +1381,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, parent_root = BTRFS_I(parent_inode)->root; record_root_in_trans(trans, parent_root);
+ cur_time = current_fs_time(parent_inode->i_sb); + /* * insert the directory item */ @@ -1523,7 +1525,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
btrfs_i_size_write(parent_inode, parent_inode->i_size + dentry->d_name.len * 2); - parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME; + parent_inode->i_mtime = parent_inode->i_ctime = + current_fs_time(parent_inode->i_sb); ret = btrfs_update_inode_fallback(trans, parent_root, parent_inode); if (ret) { btrfs_abort_transaction(trans, root, ret); diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c index 6c68d63..f2a20d5 100644 --- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -249,7 +249,7 @@ int __btrfs_setxattr(struct btrfs_trans_handle *trans, goto out;
inode_inc_iversion(inode); - inode->i_ctime = CURRENT_TIME; + inode->i_ctime = current_fs_time(inode->i_sb); set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags); ret = btrfs_update_inode(trans, root, inode); BUG_ON(ret);
On Sat, Feb 06, 2016 at 11:57:21PM -0800, Deepa Dinamani wrote:
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com Cc: Chris Mason clm@fb.com Cc: Josef Bacik jbacik@fb.com Cc: David Sterba dsterba@suse.com Cc: linux-btrfs@vger.kernel.org
Reviewed-by: David Sterba dsterba@suse.com
On Tuesday 02 February 2016 22:07:40 Deepa Dinamani wrote:
This patch series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC macros.
The idea for the series evolved from my discussions with Arnd Bergmann.
This was originally part of the RFC series[2]: https://lkml.org/lkml/2016/1/7/20 (under discussion).
Dave Chinner suggested moving bug fixes out of the feature series to keep the original series simple.
There are 354 occurrences of the the above macros in the kernel. The series will be divided into 4 or 5 parts to keep the parts manageable and so that each part could be reviewed and merged independently. This is part 1 of the series.
Looks very nice to me.
Motivation
The macros: CURRENT_TIME and CURRENT_TIME_SEC are primarily used for filesystem timestamps. But, they are not accurate as they do not perform clamping according to filesystem timestamps ranges, nor do they truncate the nanoseconds value to the granularity as required by the filesystem.
The series is also viewed as an ancillary to another upcoming series[2] that attempts to transition file system timestamps to use 64 bit time to make these y2038 safe.
There will also be another series[3] to add range checks and clamping to filesystem time functions that are meant to substitute the above macros.
Solution
CURRENT_TIME macro has an equivalent function:
struct timespec current_fs_time(struct super_block *sb)
These will be the changes to the above function:
- Function will return the type y2038 safe timespec64 in [2].
- Function will use y2038 safe 64 bit functions in [2].
- Function will be extended to perform range checks in [3].
I guess [2] and [3] are really independent of one another and can be done in either order, correct?
[2] will help to make 32-bit kernels work correctly on file systems that already support 64-bit timestamps internally, while [3] helps sanitize the behavior of file systems that cannot support that and that otherwise behave in unexpected ways on both 32-bit and 64-bit architectures.
Arnd
These will be the changes to the above function:
- Function will return the type y2038 safe timespec64 in [2].
- Function will use y2038 safe 64 bit functions in [2].
- Function will be extended to perform range checks in [3].
I guess [2] and [3] are really independent of one another and can be done in either order, correct?
That is correct. [2] and [3] are independent of each other .
We might need one more series for bug fixes though. Or, we can merge these independently. I will discuss this with you separately.
-Deepa