On Thu, Nov 14, 2019 at 1:38 AM Christian Brauner christian.brauner@ubuntu.com wrote:
On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote:
On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov gorcunov@gmail.com wrote:
On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote:
Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' here, to make them completely unambiguous?
The patch looks ok to me. I must confess I looked into rusage long ago so __kernel_timespec type used in uapi made me nervious at first, but then i found that we've this type defined in time_types.h uapi so userspace should be safe. I also like the idea of __kernel_rusage but definitely on top of the series.
There are clearly too many time types at the moment, but I'm in the process of throwing out the ones we no longer need now.
I do have a number patches implementing other variants for the syscall, and I suppose that if we end up adding __kernel_rusage, that would have to go with a set of syscalls using 64-bit seconds/nanoseconds rather than the old 32/64 microseconds. I don't know what other changes remain that anyone would want from sys_waitid() now that it does support pidfd.
If there is still a need for a new waitid() replacement, that should take that new __kernel_rusage I think, but until then I hope we are fine with today's getrusage+waitid based on the current struct rusage.
Note, that glibc does _not_ expose the rusage argument, i.e. most of userspace is unaware that waitid() does allow you to get rusage information. So users first need to know that waitid() has an rusage argument and then need to call the waitid() syscall directly.
On architectures that don't have a wait4 syscall (riscv32 for now), glibc uses waitid to implement wait4 and wait3.
BSD has wait6() to return separate rusage structures for 'self' and 'children', but I could not find any application (using the freebsd sources and debian code search) that actually uses that information, so there might not be any demand for that.
Speaking specifically for Linux now, I think that rusage does not actually expose the information most relevant users are interested in. On Linux nowadays it is _way_ more interesting to retrieve stats relative to the cgroup the task lived in etc. Doing a git grep -i rusage in the systemd source code shows that rusage is used _nowhere_. And I consider an init system to be the most likely candidate to be interested in rusage.
I looked at a couple of implementations of time(1), this is one example that sometimes uses wait3(), though other implementations just call getrusage() in the parent process before the fork/exec. None of them actually seem to report better than millisecond resolution, so there is not a strict reason to do a timespec replacement for these.
Arnd