current_time is the last remaining caller of current_kernel_time64(),
which is a wrapper around ktime_get_coarse_real_ts64(). This calls the
latter directly for consistency with the rest of the kernel that is
moving to the ktime_get_ family of time accessors, as now documented
in Documentation/core-api/timekeeping.rst.
An open questions is whether we may want to actually call the more
accurate ktime_get_real_ts64() for file systems that save high-resolution
timestamps in their on-disk format. This would add a small overhead to
each update of the inode stamps but lead to inode timestamps to actually
have a usable resolution better than one jiffy (1 to 10 milliseconds
normally). Experiments on a variety of hardware platforms show a typical
time of around 100 CPU cycles to read the cycle counter and calculate
the accurate time from that. On old platforms without a cycle counter,
this can be signiciantly higher, up to several microseconds to access
a hardware clock, but those have become very rare by now.
I traced the original addition of the current_kernel_time() call to set
the nanosecond fields back to linux-2.5.48, where Andi Kleen added a
patch with subject "nanosecond stat timefields". Andi explains that the
motivation was to introduce as little overhead as possible back then. At
this time, reading the clock hardware was also more expensive when most
architectures did not have a cycle counter.
One side effect of having more accurate inode timestamp would be having
to write out the inode every time that mtime/ctime/atime get touched on
most systems, whereas many file systems today only write it when the
timestamps have changed, i.e. at most once per jiffy unless something
else changes as well. That change would certainly be noticed in some
workloads, which is enough reason to not do it without a good reason,
regardless of the cost of reading the time.
One thing we could still consider however would be to round the timestamps
from current_time() to multiples of NSEC_PER_JIFFY, e.g. full milliseconds
rather than having six or seven meaningless but confusing digits at the
end of the timestamp.
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
--
changes in v2:
* wait for Documentation to get merged first, as Dave Chinner requested
* rewrite changelog based on discussion
---
fs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index 462eb50b096f..c2dbab9a7cf5 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2105,7 +2105,9 @@ EXPORT_SYMBOL(timespec64_trunc);
*/
struct timespec64 current_time(struct inode *inode)
{
- struct timespec64 now = current_kernel_time64();
+ struct timespec64 now;
+
+ ktime_get_coarse_real_ts64(&now);
if (unlikely(!inode->i_sb)) {
WARN(1, "current_time() called with uninitialized super_block in the inode");
--
2.18.0
According to the official documentation for HFS+ [1], inode timestamps
are supposed to cover the time range from 1904 to 2040 as originally
used in classic MacOS.
The traditional Linux usage is to convert the timestamps into an unsigned
32-bit number based on the Unix epoch and from there to a time_t. On
32-bit systems, that wraps the time from 2038 to 1902, so the last
two years of the valid time range become garbled. On 64-bit systems,
all times before 1970 get turned into timestamps between 2038 and 2106,
which is more convenient but also different from the documented behavior.
Looking at the Darwin sources [2], it seems that MacOS is inconsistent in
yet another way: all timestamps are wrapped around to a 32-bit unsigned
number when written to the disk, but when read back, all numeric values
lower than 2082844800U are assumed to be invalid, so we cannot represent
the times before 1970 or the times after 2040.
While all implementations seem to agree on the interpretation of values
between 1970 and 2038, they often differ on the exact range they support
when reading back values outside of the common range:
MacOS (traditional): 1904-2040
Apple Documentation: 1904-2040
MacOS X source comments: 1970-2040
MacOS X source code: 1970-2038
32-bit Linux: 1902-2038
64-bit Linux: 1970-2106
hfsfuse: 1970-2040
hfsutils (32 bit, old libc) 1902-2038
hfsutils (32 bit, new libc) 1970-2106
hfsutils (64 bit) 1904-2040
hfsplus-utils 1904-2040
hfsexplorer 1904-2040
7-zip 1904-2040
This changes Linux over to mostly the same behavior as described in the
code comment in MacOS X, disallowing all times before 1970 and after
2040, while still allowing times between 2038 and 2040 like most other
implementations do. Most importantly, it means we can have the same
behavior on 32-bit and 64-bit.
Cc: stable(a)vger.kernel.org
Link: [1] https://developer.apple.com/library/archive/technotes/tn/tn1150.html
Link: [2] https://opensource.apple.com/source/hfs/hfs-407.30.1/core/MacOSStubs.c.auto…
Suggested-by: Viacheslav Dubeyko <slava(a)dubeyko.com>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
v2: treat pre-1970 dates as invalid following MacOS X behavior,
reword and expand changelog text
---
fs/hfs/hfs_fs.h | 29 +++++++++++++++++++++++++----
fs/hfsplus/hfsplus_fs.h | 26 +++++++++++++++++++++++---
2 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index 6d0783e2e276..1af998fb522e 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -246,14 +246,35 @@ extern void hfs_mark_mdb_dirty(struct super_block *sb);
* mac: unsigned big-endian since 00:00 GMT, Jan. 1, 1904
*
*/
-#define __hfs_u_to_mtime(sec) cpu_to_be32(sec + 2082844800U - sys_tz.tz_minuteswest * 60)
-#define __hfs_m_to_utime(sec) (be32_to_cpu(sec) - 2082844800U + sys_tz.tz_minuteswest * 60)
+static inline time64_t __hfs_m_to_utime(__be32 mt)
+{
+ time64_t ut = (u32)(be32_to_cpu(mt) - 2082844800U);
+
+ /*
+ * Times past 2040-02-06 06:28 are assumed to be invalid,
+ * matching the MacOS behavior.
+ */
+ if (ut > 2082844800U + UINT_MAX)
+ ut = 0;
+
+ return ut + sys_tz.tz_minuteswest * 60;
+}
+static inline __be32 __hfs_u_to_mtime(time64_t ut)
+{
+ ut -= - sys_tz.tz_minuteswest * 60;
+
+ /*
+ * MacOS wraps "invalid" times after 2040 when writing back, so
+ * let's do the same here.
+ */
+ return cpu_to_be32(lower_32_bits(ut + 2082844800U));
+}
#define HFS_I(inode) (container_of(inode, struct hfs_inode_info, vfs_inode))
#define HFS_SB(sb) ((struct hfs_sb_info *)(sb)->s_fs_info)
-#define hfs_m_to_utime(time) (struct timespec){ .tv_sec = __hfs_m_to_utime(time) }
-#define hfs_u_to_mtime(time) __hfs_u_to_mtime((time).tv_sec)
+#define hfs_m_to_utime(time) (struct timespec){ .tv_sec = __hfs_m_to_utime(time) }
+#define hfs_u_to_mtime(time) __hfs_u_to_mtime((time).tv_sec)
#define hfs_mtime() __hfs_u_to_mtime(get_seconds())
static inline const char *hfs_mdb_name(struct super_block *sb)
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index d9255abafb81..7f0943e540a0 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -530,9 +530,29 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, void *buf,
void **data, int op, int op_flags);
int hfsplus_read_wrapper(struct super_block *sb);
-/* time macros */
-#define __hfsp_mt2ut(t) (be32_to_cpu(t) - 2082844800U)
-#define __hfsp_ut2mt(t) (cpu_to_be32(t + 2082844800U))
+/* time helpers */
+static inline time64_t __hfsp_mt2ut(__be32 mt)
+{
+ time64_t ut = (u32)(be32_to_cpu(mt) - 2082844800U);
+
+ /*
+ * Times past 2040-02-06 06:28 are assumed to be invalid,
+ * matching the MacOS behavior.
+ */
+ if (ut > 2082844800U + UINT_MAX)
+ ut = 0;
+
+ return ut;
+}
+
+static inline __be32 __hfsp_ut2mt(time64_t ut)
+{
+ /*
+ * MacOS wraps "invalid" times after 2040 when writing back, so
+ * let's do the same here.
+ */
+ return cpu_to_be32(lower_32_bits(ut + 2082844800U));
+}
/* compatibility */
#define hfsp_mt2ut(t) (struct timespec){ .tv_sec = __hfsp_mt2ut(t) }
--
2.9.0
The ohci driver uses the get_seconds() function to implement the 32-bit
CSR_BUS_TIME register. This was added in 2010 commit a48777e03ad5
("firewire: add CSR BUS_TIME support").
As get_seconds() returns a 32-bit value (on 32-bit architectures), it
seems like a good fit for that register, but it is also deprecated because
of the y2038/y2106 overflow problem, and should be replaced throughout
the kernel with either ktime_get_real_seconds() or ktime_get_seconds().
I'm using the latter here, which uses monotonic time. This has the
advantage of behaving better during concurrent settimeofday() updates
or leap second adjustments and won't overflow a 32-bit integer, but
the downside of using CLOCK_MONOTONIC instead of CLOCK_REALTIME is
that the observed values are not related to external clocks.
If we instead need UTC but can live with clock jumps or overflows,
then we should use ktime_get_real_seconds() instead, retaining the
existing behavior.
Reviewed-by: Clemens Ladisch <clemens(a)ladisch.de>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
I notice that Stefan Richter has not been active on the mailing lists
since February 2018.
Andrew, could you pick it up in the meantime?
---
drivers/firewire/ohci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 45c048751f3b..5125841ea338 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -1765,7 +1765,7 @@ static u32 update_bus_time(struct fw_ohci *ohci)
if (unlikely(!ohci->bus_time_running)) {
reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_cycle64Seconds);
- ohci->bus_time = (lower_32_bits(get_seconds()) & ~0x7f) |
+ ohci->bus_time = (lower_32_bits(ktime_get_seconds()) & ~0x7f) |
(cycle_time_seconds & 0x40);
ohci->bus_time_running = true;
}
--
2.9.0
As Mathieu pointed out, my conversion to time64_t was incorrect and resulted
in negative times to be read from the RTC. The problem is that during the
conversion from a byte array to a time64_t, the 'unsigned char' variable
holding the top byte gets turned into a negative signed 32-bit integer
before being assigned to the 64-bit variable for any times after 1972.
This changes the logic to cast to an unsigned 32-bit number first for
the Macintosh time and then convert that to the Unix time, which then gives
us a time in the documented 1904..2040 year range. I decided not to use
the longer 1970..2106 range that other drivers use, for consistency with
the literal interpretation of the register, but that could be easily
changed if we decide we want to support any Mac after 2040.
Just to be on the safe side, I'm also adding a WARN_ON that will trigger
if either the year 2040 has come and is observed by this driver, or we
run into an RTC that got set back to a pre-1970 date for some reason
(the two are indistinguishable).
For the RTC write functions, Andreas found another problem: both
pmu_request() and cuda_request() are varargs functions, so changing
the type of the arguments passed into them from 32 bit to 64 bit
breaks the API for the set_rtc_time functions. This changes it
back to 32 bits.
The same code exists in arch/m68k/ and is patched in an identical way now
in a separate patch.
Fixes: 5bfd643583b2 ("powerpc: use time64_t in read_persistent_clock")
Reported-by: Mathieu Malaterre <malat(a)debian.org>
Reported-by: Andreas Schwab <schwab(a)linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
arch/powerpc/platforms/powermac/time.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/platforms/powermac/time.c b/arch/powerpc/platforms/powermac/time.c
index 7c968e46736f..12e6e4d30602 100644
--- a/arch/powerpc/platforms/powermac/time.c
+++ b/arch/powerpc/platforms/powermac/time.c
@@ -42,7 +42,11 @@
#define DBG(x...)
#endif
-/* Apparently the RTC stores seconds since 1 Jan 1904 */
+/*
+ * Offset between Unix time (1970-based) and Mac time (1904-based). Cuda and PMU
+ * times wrap in 2040. If we need to handle later times, the read_time functions
+ * need to be changed to interpret wrapped times as post-2040.
+ */
#define RTC_OFFSET 2082844800
/*
@@ -97,8 +101,11 @@ static time64_t cuda_get_time(void)
if (req.reply_len != 7)
printk(KERN_ERR "cuda_get_time: got %d byte reply\n",
req.reply_len);
- now = (req.reply[3] << 24) + (req.reply[4] << 16)
- + (req.reply[5] << 8) + req.reply[6];
+ now = (u32)((req.reply[3] << 24) + (req.reply[4] << 16) +
+ (req.reply[5] << 8) + req.reply[6]);
+ /* it's either after year 2040, or the RTC has gone backwards */
+ WARN_ON(now < RTC_OFFSET);
+
return now - RTC_OFFSET;
}
@@ -106,10 +113,10 @@ static time64_t cuda_get_time(void)
static int cuda_set_rtc_time(struct rtc_time *tm)
{
- time64_t nowtime;
+ u32 nowtime;
struct adb_request req;
- nowtime = rtc_tm_to_time64(tm) + RTC_OFFSET;
+ nowtime = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET);
if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME,
nowtime >> 24, nowtime >> 16, nowtime >> 8,
nowtime) < 0)
@@ -140,8 +147,12 @@ static time64_t pmu_get_time(void)
if (req.reply_len != 4)
printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n",
req.reply_len);
- now = (req.reply[0] << 24) + (req.reply[1] << 16)
- + (req.reply[2] << 8) + req.reply[3];
+ now = (u32)((req.reply[0] << 24) + (req.reply[1] << 16) +
+ (req.reply[2] << 8) + req.reply[3]);
+
+ /* it's either after year 2040, or the RTC has gone backwards */
+ WARN_ON(now < RTC_OFFSET);
+
return now - RTC_OFFSET;
}
@@ -149,10 +160,10 @@ static time64_t pmu_get_time(void)
static int pmu_set_rtc_time(struct rtc_time *tm)
{
- time64_t nowtime;
+ u32 nowtime;
struct adb_request req;
- nowtime = rtc_tm_to_time64(tm) + RTC_OFFSET;
+ nowtime = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET);
if (pmu_request(&req, NULL, 5, PMU_SET_RTC, nowtime >> 24,
nowtime >> 16, nowtime >> 8, nowtime) < 0)
return -ENXIO;
--
2.9.0
While working on extended rand for last_error/first_error timestamps,
I noticed that the endianess is wrong, we access the little-endian
fields in struct ext4_super_block as native-endian when we print them.
This adds a special case in ext4_attr_show() and ext4_attr_store()
to byteswap the superblock fields if needed.
In older kernels, this code was part of super.c, it got moved to sysfs.c
in linux-4.4.
Cc: stable(a)vger.kernel.org
Fixes: 52c198c6820f ("ext4: add sysfs entry showing whether the fs contains errors")
Reviewed-by: Andreas Dilger <adilger(a)dilger.ca>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
fs/ext4/sysfs.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index f34da0bb8f17..b970a200f20c 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -274,8 +274,12 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
case attr_pointer_ui:
if (!ptr)
return 0;
- return snprintf(buf, PAGE_SIZE, "%u\n",
- *((unsigned int *) ptr));
+ if (a->attr_ptr == ptr_ext4_super_block_offset)
+ return snprintf(buf, PAGE_SIZE, "%u\n",
+ le32_to_cpup(ptr));
+ else
+ return snprintf(buf, PAGE_SIZE, "%u\n",
+ *((unsigned int *) ptr));
case attr_pointer_atomic:
if (!ptr)
return 0;
@@ -308,7 +312,10 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
ret = kstrtoul(skip_spaces(buf), 0, &t);
if (ret)
return ret;
- *((unsigned int *) ptr) = t;
+ if (a->attr_ptr == ptr_ext4_super_block_offset)
+ *((__le32 *) ptr) = cpu_to_le32(t);
+ else
+ *((unsigned int *) ptr) = t;
return len;
case attr_inode_readahead:
return inode_readahead_blks_store(sbi, buf, len);
--
2.9.0
This is a mostly unchanged copy of a series I sent back in April for
an initial review. All the earlier syscall patches that Deepa or I sent
got merged now, and this is the largest chunk of remaining patches.
Changes this time are:
- This is actually tested with the LTP syscalls test suite,
both before and after the CONFIG_64BIT_TIME change (which is not
included here). I have created a patch series for musl libc to use
64-bit time_t and change all the system calls over to the new entry
points for this. The only bugs I found during that testing were in
later parts of the conversion that I have not posted yet.
- I rewrote the sys_io_getevents conversion after the
introduction of sys_sys_io_getevents. We obviously don't need to have
two of each, so we will only provide sys_io_pgetevents() with 64-bit
time_t but not sys_io_getevents(), which the libc can implement on
top of the former.
- While we have Deepa's POSIX timer conversion merged now, we
still need to decide on how we want to do the replacement
ABI for getitimer()/setitimer(). Like getrusage()/waitid() and
clock_adjtime() and unlike the system calls I'm posting here,
there is no one obvious ABI.
- For ppoll()/pselect6(), the ABI is fairly clear, but the
implementation still needs to be done. I tested with a simple
prototype based on the existing compat code, but we can
probably improve that. This is something that Deepa still
wants to work on.
- Finally, Christoph Hellwig objected to the idea of reusing the
compat_ namespace for the 32-bit native case. Changing that
would be a departure from our plans so far[2], and would make
some things end up differently. Until we have decided on how this
is to be done, I've decided to not change the code for this
post. We can clearly rename all the symbols and I've implemented
that in [3] for the current linux-next (not including the
series here). This is something we can definitely do, but I'd
need to know soon whether we can merge this series unchanged
for 4.19 or if I should rebase it on top of that patch with the
alternative naming.
Arnd
---
Previous cover letter announcement below, see [4] for the full
series:
After the first timekeeping series from Deepa (merged into -tip now)
and my follow-up for IPC system calls, this is a third set of system
call conversions following the same principle.
Most of the changes are straightforward, so I'm grouping them into a
larger series even though the system calls are mostly unrelated to one
another. After this series, the remaining calls that need to be changed
are getrusage()/waitid(), pselect6/ppoll(), timer{,fd}_{get,set}time()
and getitimer()/setitimer(). Those will be sent separately, once they
are matured enough.
To put the changes into perspective, a list of all system calls that
require changes is available in a spreadsheet[5] and I have made
another experimental patch that changes over x86[6] and arm[7] to
actually use them.
Link [1] https://lore.kernel.org/lkml/20180712082034.GA8802@infradead.org/
Link [2] https://lwn.net/Articles/643234/
Link [3] https://lore.kernel.org/lkml/20180713133204.3123939-1-arnd@arndb.de/
Link [4] https://lore.kernel.org/lkml/20180425160311.2718314-1-arnd@arndb.de/
Link [5] https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T…
Link [6] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/commit/…
Link [7] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/commit/…
Arnd Bergmann (17):
y2038: compat: Move common compat types to asm-generic/compat.h
y2038: Remove newstat family from default syscall set
y2038: Remove stat64 family from default syscall set
asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro
asm-generic: Remove empty asm/unistd.h
y2038: Change sys_utimensat() to use __kernel_timespec
y2038: Compile utimes()/futimesat() conditionally
y2038: utimes: Rework #ifdef guards for compat syscalls
y2038: futex: Move compat implementation into futex.c
y2038: futex: Add support for __kernel_timespec
y2038: Prepare sched_rr_get_interval for __kernel_timespec
y2038: aio: Prepare sys_io_{p,}getevents for __kernel_timespec
y2038: socket: Convert recvmmsg to __kernel_timespec
y2038: socket: Add compat_sys_recvmmsg_time64
y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
y2038: Make compat_sys_rt_sigtimedwait usable on 32-bit
y2038: signal: Add compat_sys_rt_sigtimedwait_time64
arch/alpha/include/asm/unistd.h | 2 +
arch/arc/include/uapi/asm/unistd.h | 1 +
arch/arm/include/asm/unistd.h | 4 +-
arch/arm64/include/asm/compat.h | 20 +--
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/c6x/include/uapi/asm/unistd.h | 1 +
arch/h8300/include/uapi/asm/unistd.h | 1 +
arch/hexagon/include/uapi/asm/unistd.h | 1 +
arch/ia64/include/asm/unistd.h | 3 +
arch/m68k/include/asm/unistd.h | 2 +-
arch/microblaze/include/asm/unistd.h | 2 +-
arch/mips/include/asm/compat.h | 22 +---
arch/mips/include/asm/unistd.h | 3 +-
arch/nds32/include/uapi/asm/unistd.h | 1 +
arch/nios2/include/uapi/asm/unistd.h | 1 +
arch/openrisc/include/uapi/asm/unistd.h | 1 +
arch/parisc/include/asm/compat.h | 18 +--
arch/parisc/include/asm/unistd.h | 3 +-
arch/powerpc/include/asm/compat.h | 18 +--
arch/powerpc/include/asm/unistd.h | 3 +-
arch/s390/include/asm/compat.h | 18 +--
arch/s390/include/asm/unistd.h | 3 +-
arch/sh/include/asm/unistd.h | 2 +-
arch/sparc/include/asm/compat.h | 19 +--
arch/sparc/include/asm/unistd.h | 3 +-
arch/unicore32/include/uapi/asm/unistd.h | 1 +
arch/x86/include/asm/compat.h | 19 +--
arch/x86/include/asm/unistd.h | 3 +-
arch/xtensa/include/asm/unistd.h | 2 +-
fs/aio.c | 77 ++++++++++--
fs/read_write.c | 2 +-
fs/stat.c | 3 +
fs/utimes.c | 59 +++++----
include/asm-generic/compat.h | 24 +++-
include/asm-generic/unistd.h | 13 --
include/linux/compat.h | 12 +-
include/linux/compat_time.h | 5 +
include/linux/futex.h | 8 --
include/linux/socket.h | 19 ++-
include/linux/syscalls.h | 25 ++--
include/uapi/asm-generic/unistd.h | 2 +
kernel/Makefile | 3 -
kernel/futex.c | 207 +++++++++++++++++++++++++++++--
kernel/futex_compat.c | 202 ------------------------------
kernel/sched/core.c | 4 +-
kernel/signal.c | 68 ++++++++--
kernel/sys_ni.c | 1 +
net/compat.c | 16 +--
net/socket.c | 55 ++++++--
50 files changed, 524 insertions(+), 461 deletions(-)
delete mode 100644 include/asm-generic/unistd.h
delete mode 100644 kernel/futex_compat.c
--
2.9.0