On Wednesday 22 April 2015 10:45:23 Thomas Gleixner wrote:
On Tue, 21 Apr 2015, Thomas Gleixner wrote:
So we could save one translation step if we implement new syscalls which have a scalar nsec interface instead of the timespec/timeval cruft and let user space do the translation to whatever it wants.
So
sys_clock_nanosleep(const clockid_t which_clock, int flags, const struct timespec __user *expires, struct timespec __user *reminder)
would get the new syscall variant:
sys_clock_nanosleep_ns(const clockid_t which_clock, int flags, const s64 expires, s64 __user *reminder)
As you might expect, there are a number of complications with this approach:
- John Stultz likes to point out that it's easier to do one change at a time, so extending the interface to 64-bit has less potential of breaking things than a more fundamental change. I think it's useful to drop a lot of the syscalls when a more modern version is around (e.g. let libc implement usleep and nanosleep through clock_nanosleep), but keep the syscalls as close to the known-working 64-bit versions as we can. - The inode timestamp related syscalls (stat, utimes and variants thereof) require the full range of time64_t and cannot use ktime_t. - converting between timespec types of different size is cheap, converting timespec to ktime_t is still relatively cheap, but converting ktime_t to timespec is rather expensive (at least eight 32-bit multiplies, plus a few shifts and additions if you don't have 64-bit arithmetic). - ioctls that pass a timespec need to keep doing that or would require a source-level change in user space instead of recompiling.
I personally would welcome such an interface as it makes user space programming simpler. Just (re)arming a periodic nanosleep based on absolute expiry time is horrible stupid today:
struct timespec expires; .... while () expires.tv_nsec += period.tv_nsec; expires.tv_sec += period.tv_sec; normalize_timespec(&expires); sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);
So with a scalar interface this would reduce to:
s64 expires; .... while () expires += period; sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);
There is a difference both in text and storage size plus the avoidance of the two translation steps (one translation step on 64bit).
We should probably look at it separately for each syscall. It's quite possible that we find a number of them for which it helps and others for which it hurts, so we need to see the big pictures.
There are also a few other calls that will never need 64-bit time_t because the range is limited by the need to only ever pass relative timeouts (select, poll, io_getevents, recvmmsg, clock_getres, rt_sigtimedwait, sched_rr_get_interval, getrusage, waitid, semtimedop, sysinfo), so we could actually leave them using a 32-bit structure and have the libc do the conversion.
I know that this is non portable, but OTOH if I look at the non portable mechanisms which are used by data bases, java VMs and other apps which exist to squeeze the last cycles out of the system, there is certainly some value to that.
The portable/spec conforming apps can still use the user space assisted translated timespec/timeval mechanisms.
There is one caveat though: sys_clock_gettime and sys_gettimeofday will still need a syscall_timespec64 variant. We have no double translation steps there because we maintain the timespec representation in the timekeeping code for performance reasons to avoid the division in the syscall interface. But everything else can do nicely without the timespec cruft.
We really should talk to libc folks and high performance users about this before blindly adding a gazillion of new timespec64 based interfaces.
I've started a list of affected syscalls at https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T_...
Still adding more calls and description, let me know if you want edit permissions.
Arnd