On Wed, Feb 22, 2017 at 9:05 AM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
Hi all,
I have produced a fifth draft of what will eventually become the Y2038 design document:
https://sourceware.org/glibc/wiki/Y2038ProofnessDesign?rev=115
Relative to the previous draft:
It makes explicit that the implementation should allow the same application source code to build unchanged whether the default for time size is 32-bit (_TIME_BITS undefined or unequal to 64) or 64-bit (_TIME_BITS defined equal to 64).
Security issues considerations have been added (thanks to Carlos O'Donnel).
Timestamps and IOCTLs sections have been expanded.
Implementation examples for types has been added.
Implementation examples for APIs has been added.
As always, comments welcome.
I found a few minor inaccuracies:
You have classified sched_rr_get_interval() as y2038-compatible, but getrusage() as incompatible. I think these are both in one category, either incompatible or a third category: we pass only intervals here, so there won't be an overflow but redefining timeval still results in an incompatible ABI, unless the structures are redefined in terms of 32-bit types instead of time_t/timeval/timespec.
I've discussed the kernel side for "Y2038-incompatible socket timestamping" with Deep a while ago, and I think we came to a new conclusions for what would be the best approach. I'll let her comment here.
For "Y2038-compatible types", please clarify whether time32_t and time64_t (and related types) are internal-only types or visible to applications through header files. I assume they are internal only, but it is not 100% clear. Related to that, what is the expected definition of time32_t on architectures that never had a 32-bit time_t, such as existing 64-bit architectures? Is it left undefined and all code referring to time32_t compiled conditionally?
In "Y2038-compatible struct timespec", replace "microseconds" with "nanoseconds. Also, it's worth pointing out the known problems with the padding: - on big-endian systems, any code doing a variation of "struct timespec ts = { seconds, nanos };" is broken because it assigns the nanoseconds to the wrong struct member. The example code is nonconforming as neither POSIX nor C11 require a particular order of the struct members, but I could also easily find examples of existing programs doing this. Note that NetBSD, OpenBSD and Windows have no padding but do use 64-bit time_t. - If the padding is uninitialized, we have to explicitly zero it before calling a kernel function that assumes the 64-bit layout. This can be done in glibc before calling into the kernel, or at the kernel entry (where my last patch set does it), but it is awkward either way. Unfortunately, there doesn't seem to be a good solution here for either of the two problems. Maybe someone else has more ideas. Using a pointer type for the padding would at least cause a compile-time warning for broken code, other solutions might require GCC extensions or break C11 in another way.
I'll comment on the kernel/glibc incompatibilities section tomorrow, need to collect my thoughts there some more.
Arnd
Hi Arnd,
On Wed, 22 Feb 2017 16:39:53 +0100, Arnd Bergmann arnd@arndb.de wrote :
On Wed, Feb 22, 2017 at 9:05 AM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
Hi all,
I have produced a fifth draft of what will eventually become the Y2038 design document:
https://sourceware.org/glibc/wiki/Y2038ProofnessDesign?rev=115
Relative to the previous draft:
It makes explicit that the implementation should allow the same application source code to build unchanged whether the default for time size is 32-bit (_TIME_BITS undefined or unequal to 64) or 64-bit (_TIME_BITS defined equal to 64).
Security issues considerations have been added (thanks to Carlos O'Donnel).
Timestamps and IOCTLs sections have been expanded.
Implementation examples for types has been added.
Implementation examples for APIs has been added.
As always, comments welcome.
I found a few minor inaccuracies:
You have classified sched_rr_get_interval() as y2038-compatible, but getrusage() as incompatible. I think these are both in one category, either incompatible or a third category: we pass only intervals here, so there won't be an overflow but redefining timeval still results in an incompatible ABI, unless the structures are redefined in terms of 32-bit types instead of time_t/timeval/timespec.
Actually, sched_rr_get_interval deserves a fourth category, known as 'in a quantum state', because depending on where you look in the document, it is either compatible or incompatible, due to a copy-paste failure. :)
Joke apart, thanks for pointing it out and raising the point. My opinion is as follows:
1. Right now sched_rr_get_interval takes a struct timespec as an argument;
2. One goal of the project is that source code which works with 32-bit time should work unchanged if compiled with default time size set to 64 bits;
3. This means the 64-bit version of sched_rr_get_interval should keep explecting a (now 64-bit) 'struct timespec';
4. This in turn makes the ABIs of the 32- and 64-bit versions of sched_rr_get_interval mutually incompatible, since the actual types involed would be either the 32-bit time 'struct timespec' or the 64-bit time 'struct timespec64';
5. Ergo, sched_rr_get_interval should be classified as Y2038-incompatible.
I've discussed the kernel side for "Y2038-incompatible socket timestamping" with Deep a while ago, and I think we came to a new conclusions for what would be the best approach. I'll let her comment here.
For "Y2038-compatible types", please clarify whether time32_t and time64_t (and related types) are internal-only types or visible to applications through header files. I assume they are internal only, but it is not 100% clear. Related to that, what is the expected definition of time32_t on architectures that never had a 32-bit time_t, such as existing 64-bit architectures? Is it left undefined and all code referring to time32_t compiled conditionally?
(written after reading Joseph's reply, and type names adapted accordingly)
Yes, there would be two internal types, __time_t and __time64_t with sizes invariant with respect to default type size, whereas time_t, in the user facing public API, would be defined as either __time_t or __time64_t depending on which time bit size the user code would choose
For instance, with difftime:
- the existing, 32-bit-time version would be defined as double __difftime(__time_t time1, __time_t time0) ...
- the new, 64-bit-time version would be defined as double __difftime64(__time64_t time1, __time64_t time0);
- for user code which does not define _TIME_BITS=64 at compile time, GLIBC would emit [what amounts to] the following declaration: double __difftime(__time_t time1, __time_t time0); typedef __time_t time_t; __REDIRECT(difftime,(time_t time1, time_t time0),__difftime) so that when the user code would say time_t t1, t2; ...; difftime(t1, t2); this would amount to __time_t t1, t2; ...; __difftime(t1, t2);
- for user code which defines _TIME_BITS=64 at compile time, GLIBC would emit [what amounts to] the following declarations: double __difftime64(__time64_t time1, __time64_t time0); typedef __time64_t time_t; __REDIRECT(difftime,(time_t time1, time_t time0),__difftime64) so that when the user code would say time_t t1, t2; ...; difftime(t1, t2); this would turn into __time64_t t1, t2; ...; difftime64(t1, t2);
(should I rename the current __time_t to a more explicit __time32_t?)
As far as 64-bit architectures are concerned:
- pure 64-bit architectures already have a 64-bit time_t, and are out of the scope of my project; a 64-bit GLIBC is assumed to be Y2038- compatible as far as APIs go (there may be bugs though; again, if I see any, I'll raise an GLIC issue but outside of this project).
- this leaves the case of a 64-bit architecture kernel providing a 32-bit ABI to 32-bit code. I am not planning on supporting such a scenario.
In "Y2038-compatible struct timespec", replace "microseconds" with "nanoseconds.
Oops. Fixed in revision 118.
Also, it's worth pointing out the known problems with the padding:
- on big-endian systems, any code doing a variation of "struct timespec ts = { seconds, nanos };" is broken because it assigns the nanoseconds to the wrong struct member. The example code is nonconforming as neither POSIX nor C11 require a particular order of the struct members, but I could also easily find examples of existing programs doing this. Note that NetBSD, OpenBSD and Windows have no padding but do use 64-bit time_t.
- If the padding is uninitialized, we have to explicitly zero it before calling a kernel function that assumes the 64-bit layout. This can be done in glibc before calling into the kernel, or at the kernel entry (where my last patch set does it), but it is awkward either way.
Unfortunately, there doesn't seem to be a good solution here for either of the two problems. Maybe someone else has more ideas. Using a pointer type for the padding would at least cause a compile-time warning for broken code, other solutions might require GCC extensions or break C11 in another way.
Agreed on the whole line, and I will add these points in the document.
However, this makes me consider an option which would keep source code as happy as possible: keep tv_nsec a long even in struct timespec64 (so the struct would be 12 bytes: 8-byte tv_sec, 4-byte tv_nsec).
It would be ABI-incompatible with 64-bit code, but would conform to Posix, and the same exact user source code would then compile equally well in all three cases: 32-bit time 64-bit time on 32-bit arch, 64-bit arch, including the struct initializers you mentioned above.
There would be a rough edge left when running 32-bit arch, 64-bit time user code over a 64-bit arch GLIBC and kernel, because then we'd have to copy between 64-bit and 32-bit nanosecond fields, but then again, it is not a scenario I am aiming for.
I'll comment on the kernel/glibc incompatibilities section tomorrow, need to collect my thoughts there some more.
Thanks!
Arnd
Cordialement, Albert ARIBAUD 3ADEV
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
- this leaves the case of a 64-bit architecture kernel providing a 32-bit ABI to 32-bit code. I am not planning on supporting such a scenario.
There are several such ABIs (ILP32 ABI for a 64-bit architecture) already supported in glibc, e.g. MIPS n32, and all of them that don't already have support for 64-bit time_t will need to have it added.
Hi Joseph,
On Wed, 22 Feb 2017 20:55:39 +0000, Joseph Myers joseph@codesourcery.com wrote :
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
- this leaves the case of a 64-bit architecture kernel providing a 32-bit ABI to 32-bit code. I am not planning on supporting such a scenario.
There are several such ABIs (ILP32 ABI for a 64-bit architecture) already supported in glibc, e.g. MIPS n32, and all of them that don't already have support for 64-bit time_t will need to have it added.
I /was/ not planning on supporting such a scenario. :)
More seriously: is there already a list of these ABIs?
Cordialement, Albert ARIBAUD 3ADEV
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
Hi Joseph,
On Wed, 22 Feb 2017 20:55:39 +0000, Joseph Myers joseph@codesourcery.com wrote :
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
- this leaves the case of a 64-bit architecture kernel providing a 32-bit ABI to 32-bit code. I am not planning on supporting such a scenario.
There are several such ABIs (ILP32 ABI for a 64-bit architecture) already supported in glibc, e.g. MIPS n32, and all of them that don't already have support for 64-bit time_t will need to have it added.
I /was/ not planning on supporting such a scenario. :)
More seriously: is there already a list of these ABIs?
I'm not convinced that is a meaningful question. You can identify ABIs with 32-bit time_t by e.g. building for different ABIs with build-many-glibcs.py then compiling a test program with each compiler and corresponding options. Some such ABIs may require a 64-bit kernel. Some may work with both 32-bit and 64-bit kernels (e.g. 32-bit x86). Some may be for 32-bit-only architectures.
In all cases, whether the kernel is working with a 32-bit or 64-bit address space is not relevant; you work with the syscall ABI as it is, plus whatever additions are made to it in the course of the Y2038 work.
That said, I think the following ABIs use 64-bit register size in userspace while being ILP32 ABIs. You'll need to examine the code more closely in each case to determine what size time_t is, and to what extent if any 64-bit registers are involved in the syscall ABI. There may be other cases where 64-bit registers can in fact be used for what's normally considered a 32-bit ABI (e.g. people have done some work on being able to use 64-bit registers for 32-bit powerpc code, although the registers are 32-bit for all purposes in the function-calling ABI).
AArch64 ILP32 (not yet in glibc) MIPS n32 TileGX32 x86_64 x32
Hi Joseph,
On Wed, 22 Feb 2017 21:16:29 +0000, Joseph Myers joseph@codesourcery.com wrote :
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
Hi Joseph,
On Wed, 22 Feb 2017 20:55:39 +0000, Joseph Myers joseph@codesourcery.com wrote :
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
[...]
There are several such ABIs (ILP32 ABI for a 64-bit architecture) already supported in glibc, e.g. MIPS n32, and all of them that don't already have support for 64-bit time_t will need to have it added.
I /was/ not planning on supporting such a scenario. :)
More seriously: is there already a list of these ABIs?
I'm not convinced that is a meaningful question. You can identify ABIs with 32-bit time_t by e.g. building for different ABIs with build-many-glibcs.py then compiling a test program with each compiler and corresponding options. Some such ABIs may require a 64-bit kernel. Some may work with both 32-bit and 64-bit kernels (e.g. 32-bit x86). Some may be for 32-bit-only architectures.
In all cases, whether the kernel is working with a 32-bit or 64-bit address space is not relevant; you work with the syscall ABI as it is, plus whatever additions are made to it in the course of the Y2038 work.
That said, I think the following ABIs use 64-bit register size in userspace while being ILP32 ABIs. You'll need to examine the code more closely in each case to determine what size time_t is, and to what extent if any 64-bit registers are involved in the syscall ABI. There may be other cases where 64-bit registers can in fact be used for what's normally considered a 32-bit ABI (e.g. people have done some work on being able to use 64-bit registers for 32-bit powerpc code, although the registers are 32-bit for all purposes in the function-calling ABI).
AArch64 ILP32 (not yet in glibc) MIPS n32 TileGX32 x86_64 x32
Thanks. I'll look into these.
Cordialement, Albert ARIBAUD 3ADEV
On Wed, Feb 22, 2017 at 10:16 PM, Joseph Myers joseph@codesourcery.com wrote:
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
On Wed, 22 Feb 2017 20:55:39 +0000, Joseph Myers joseph@codesourcery.com wrote :
On Wed, 22 Feb 2017, Albert ARIBAUD wrote:
- this leaves the case of a 64-bit architecture kernel providing a 32-bit ABI to 32-bit code. I am not planning on supporting such a scenario.
There are several such ABIs (ILP32 ABI for a 64-bit architecture) already supported in glibc, e.g. MIPS n32, and all of them that don't already have support for 64-bit time_t will need to have it added.
I /was/ not planning on supporting such a scenario. :)
More seriously: is there already a list of these ABIs?
That said, I think the following ABIs use 64-bit register size in userspace while being ILP32 ABIs. You'll need to examine the code more closely in each case to determine what size time_t is, and to what extent if any 64-bit registers are involved in the syscall ABI. There may be other cases where 64-bit registers can in fact be used for what's normally considered a 32-bit ABI (e.g. people have done some work on being able to use 64-bit registers for 32-bit powerpc code, although the registers are 32-bit for all purposes in the function-calling ABI).
AArch64 ILP32 (not yet in glibc) MIPS n32 TileGX32 x86_64 x32
I don't think the register size matters at all here, any ILP32 ABI running on a 64-bit kernel is affected the same way. We know that Linux defines __kernel_time_t as __kernel_long_t, which is the same as 'long' on all ABIs other than x32, so all 32-bit ABIs that can potentially run on a 64-bit architecture are affected. At the moment this is arm (both aarch32 and in the future aarch64), mips (both o32 and n32), parisc, powerpc, s390, sparc, tile and x86 (i386-only, not x32), with risc-v getting added in the near future, likely before the kernel supports 64-bit time_t.
Any part of the ABI that is supported on 32-bit kernels must also be supported on 64-bit kernels in a compatible way (some drivers have ioctls that don't work in this configuration, but I consider those minor bugs that we can fix if someone cares enough about a particular driver). To do this, the 32-bit kernel ABIs will provide a __kernel_timespec that has a layout matching the 64-bit native timespec, the question here is how to make the conversion between the glibc-provided timespec and the kernel-internal timespec as easy as possible. My goal here is to ensure that any driver that takes a timespec argument in an ioctl will get it right if it supports the compat_ioctl at all, and not have a slightly incompatible ABI between 32-bit and 64-bit kernels running the same user space binary.
Arnd
On 2/22/2017 4:16 PM, Joseph Myers wrote:
That said, I think the following ABIs use 64-bit register size in userspace while being ILP32 ABIs. You'll need to examine the code more closely in each case to determine what size time_t is, and to what extent if any 64-bit registers are involved in the syscall ABI. There may be other cases where 64-bit registers can in fact be used for what's normally considered a 32-bit ABI (e.g. people have done some work on being able to use 64-bit registers for 32-bit powerpc code, although the registers are 32-bit for all purposes in the function-calling ABI).
AArch64 ILP32 (not yet in glibc) MIPS n32 TileGX32 x86_64 x32
TILE-Gx ILP32 mode uses only the low 32 bits of registers for all syscalls. It uses the full 64-bit register size for "long long" and the ABI generally permits passing and returning such values in a single register; we just don't ever do that with the kernel ABI.
On Wed, Feb 22, 2017 at 7:33 PM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
On Wed, 22 Feb 2017 16:39:53 +0100, Arnd Bergmann arnd@arndb.de wrote :
On Wed, Feb 22, 2017 at 9:05 AM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
- Ergo, sched_rr_get_interval should be classified as Y2038-incompatible.
Ok.
I've discussed the kernel side for "Y2038-incompatible socket timestamping" with Deep a while ago, and I think we came to a new conclusions for what would be the best approach. I'll let her comment here.
For "Y2038-compatible types", please clarify whether time32_t and time64_t (and related types) are internal-only types or visible to applications through header files. I assume they are internal only, but it is not 100% clear. Related to that, what is the expected definition of time32_t on architectures that never had a 32-bit time_t, such as existing 64-bit architectures? Is it left undefined and all code referring to time32_t compiled conditionally?
(written after reading Joseph's reply, and type names adapted accordingly)
Yes, there would be two internal types, __time_t and __time64_t with sizes invariant with respect to default type size, whereas time_t, in the user facing public API, would be defined as either __time_t or __time64_t depending on which time bit size the user code would choose
Ok, good.
As far as 64-bit architectures are concerned:
- pure 64-bit architectures already have a 64-bit time_t, and are out of the scope of my project; a 64-bit GLIBC is assumed to be Y2038- compatible as far as APIs go (there may be bugs though; again, if I see any, I'll raise an GLIC issue but outside of this project).
My question here was about the way it gets handled internally in glibc, as we want to compile the same source code on both 32-bit and 64-bit architectures. When the existing code that uses time_t gets changed to __time32_t, we probably need to do one of these:
- leave __time32_t undefined on 64-bit architectures and don't compile any of the files referencing it, but always provide the version that uses __time64_t internally for the redirect. This is probably the cleanest. - leave the internal name as __time_t rather than time32_t for all architectures, and leave it defined as 'long' (with a special case for x32), and provide the __time64_t variant only on targets that have a 32-bit __time_t.
- this leaves the case of a 64-bit architecture kernel providing a 32-bit ABI to 32-bit code. I am not planning on supporting such a scenario.
That is not what I was interested in here, and seems to have started a whole new sub-thread, see my reply there.
Also, it's worth pointing out the known problems with the padding:
- on big-endian systems, any code doing a variation of "struct timespec ts = { seconds, nanos };" is broken because it assigns the nanoseconds to the wrong struct member. The example code is nonconforming as neither POSIX nor C11 require a particular order of the struct members, but I could also easily find examples of existing programs doing this. Note that NetBSD, OpenBSD and Windows have no padding but do use 64-bit time_t.
- If the padding is uninitialized, we have to explicitly zero it before calling a kernel function that assumes the 64-bit layout. This can be done in glibc before calling into the kernel, or at the kernel entry (where my last patch set does it), but it is awkward either way.
Unfortunately, there doesn't seem to be a good solution here for either of the two problems. Maybe someone else has more ideas. Using a pointer type for the padding would at least cause a compile-time warning for broken code, other solutions might require GCC extensions or break C11 in another way.
Agreed on the whole line, and I will add these points in the document.
However, this makes me consider an option which would keep source code as happy as possible: keep tv_nsec a long even in struct timespec64 (so the struct would be 12 bytes: 8-byte tv_sec, 4-byte tv_nsec).
The size is actually architecture specific, most 32-bit ABIs would insert 4-byte implicit padding at the end, leaving it compatible with the 64-bit definition on little-endian but not big-endian systems, while x86-32 would be a rare exception that uses a 12 byte structure with 4 byte alignment, leading to slightly different structure layouts between the most common architecture on one side and everything else on the other side.
It would be ABI-incompatible with 64-bit code, but would conform to Posix, and the same exact user source code would then compile equally well in all three cases: 32-bit time 64-bit time on 32-bit arch, 64-bit arch, including the struct initializers you mentioned above.
There would be a rough edge left when running 32-bit arch, 64-bit time user code over a 64-bit arch GLIBC and kernel, because then we'd have to copy between 64-bit and 32-bit nanosecond fields, but then again, it is not a scenario I am aiming for.
The kernel will use a layout that matches the 64-bit architecture, and use the same layout on all architectures, so this would imply having to copy it everywhere, but to have it working sometimes, or broken in very subtle ways. I think if we end up with nanoseconds that are sometime in a different bit position between kernel and user space, we should mix up all of the structure e.g. like this:
struct timespec { void *__padding; long tv_nsec; time_t tv_sec; };
this would cause a build bug for anyone trying to initialize the first two members as integers, and it would cause an obvious runtime failure if anyone tries to interpret the first four or eight bytes as epoch seconds.
The disadvantage would be that we have to modify the contents both for data passed into the kernel and out of it, compared to the version you are proposing right now, which only requires special treatment of the padding data when passing it into the kernel.
Arnd
On Wed, 22 Feb 2017, Arnd Bergmann wrote:
In "Y2038-compatible struct timespec", replace "microseconds" with "nanoseconds. Also, it's worth pointing out the known problems with the padding:
- on big-endian systems, any code doing a variation of "struct timespec ts = { seconds, nanos };" is broken because it assigns the nanoseconds to the wrong struct member. The example code is nonconforming as neither POSIX nor C11 require a particular order of the struct members, but I could also easily find examples of existing programs doing this. Note that NetBSD, OpenBSD and Windows have no padding but do use 64-bit time_t.
- If the padding is uninitialized, we have to explicitly zero it before calling a kernel function that assumes the 64-bit layout. This can be done in glibc before calling into the kernel, or at the kernel entry (where my last patch set does it), but it is awkward either way.
Unfortunately, there doesn't seem to be a good solution here for either of the two problems. Maybe someone else has more ideas. Using a pointer type for the padding would at least cause a compile-time warning for broken code, other solutions might require GCC extensions or break C11 in another way.
You could declare the padding as an unnamed bit-field "int : 32;" to avoid it affecting initialization. But that complicated zeroing it (in cases where you're zeroing the original struct - when it's a read/write parameter passed to the kernel - rather than a copy), as you can no longer refer to it by name to assign zero to it; maybe you'd need to arrange for it to be named inside glibc but unnamed for user code.
On Thu, Feb 23, 2017 at 12:11 AM, Joseph Myers joseph@codesourcery.com wrote:
On Wed, 22 Feb 2017, Arnd Bergmann wrote:
In "Y2038-compatible struct timespec", replace "microseconds" with "nanoseconds. Also, it's worth pointing out the known problems with the padding:
- on big-endian systems, any code doing a variation of "struct timespec ts = { seconds, nanos };" is broken because it assigns the nanoseconds to the wrong struct member. The example code is nonconforming as neither POSIX nor C11 require a particular order of the struct members, but I could also easily find examples of existing programs doing this. Note that NetBSD, OpenBSD and Windows have no padding but do use 64-bit time_t.
- If the padding is uninitialized, we have to explicitly zero it before calling a kernel function that assumes the 64-bit layout. This can be done in glibc before calling into the kernel, or at the kernel entry (where my last patch set does it), but it is awkward either way.
Unfortunately, there doesn't seem to be a good solution here for either of the two problems. Maybe someone else has more ideas. Using a pointer type for the padding would at least cause a compile-time warning for broken code, other solutions might require GCC extensions or break C11 in another way.
You could declare the padding as an unnamed bit-field "int : 32;" to avoid it affecting initialization. But that complicated zeroing it (in cases where you're zeroing the original struct - when it's a read/write parameter passed to the kernel - rather than a copy), as you can no longer refer to it by name to assign zero to it; maybe you'd need to arrange for it to be named inside glibc but unnamed for user code.
I had thought of that as well, but wasn't sure if we could rely on bitfields to do the right thing across C standard versions and non-gcc compilers. I experimentally found that gcc-4.9 and higher accept that version without a -Wpedantic warning when using --std=c99 or --std=c11, but they warn about it with --std=c89 or --std=gnu89. Older gcc versions as far back as gcc-3.4 (the oldest I could try) always warn about anonymous bitfields with -pedantic, but I could still silence that warning using the __extension__ keyword. I think that's good enough.
With that problem out of the way, we'd still need to figure out where to clear the padding on the way from libc into the kernel. I think the tradeoff is:
Pro glibc: - IIRC already have code to do this for x32, and could do it the same way on all 32-bit architectures. (I could not find where this is done though, maybe I dreamed it up?) - We don't slow down the normal 64-bit kernel syscall path for every get_user_timespec64() by checking whether we are called from a 32-bit or 64-bit task to decide whether we should clear the upper bits or not.
Pro kernel: - No need to copy the data structure in user space to avoid writing into read-only variables - Can handle both ioctl and syscall path with the same helper function, no need for random ioctl users to clear the upper bits
Arnd
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
You could declare the padding as an unnamed bit-field "int : 32;" to avoid it affecting initialization. But that complicated zeroing it (in cases where you're zeroing the original struct - when it's a read/write parameter passed to the kernel - rather than a copy), as you can no longer refer to it by name to assign zero to it; maybe you'd need to arrange for it to be named inside glibc but unnamed for user code.
I had thought of that as well, but wasn't sure if we could rely on bitfields to do the right thing across C standard versions and non-gcc compilers. I experimentally found that gcc-4.9 and higher accept that version without a -Wpedantic warning when using --std=c99 or --std=c11, but they warn about it with --std=c89 or --std=gnu89. Older gcc versions as far back as gcc-3.4 (the oldest I could try) always warn about anonymous bitfields with -pedantic, but I could still silence that warning using the __extension__ keyword. I think that's good enough.
What's the warning you see, with what exact testcase? Unnamed bit-fields (provided the declared type is int / signed int / unsigned int, not any other integer or enum type), and initializers ignoring them, date back at least to the third public review draft of C89 (13 May 1988), which is the oldest version I have to hand.
(Of course 64-bit time_t on 32-bit systems requires long long, but glibc headers already rely on that C99 feature anyway.)
Pro glibc:
- IIRC already have code to do this for x32, and could do it the same way on all 32-bit architectures. (I could not find where this is done though, maybe I dreamed it up?)
glibc doesn't have code for it; I think musl does.
On Thu, Feb 23, 2017 at 6:31 PM, Joseph Myers joseph@codesourcery.com wrote:
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
You could declare the padding as an unnamed bit-field "int : 32;" to avoid it affecting initialization. But that complicated zeroing it (in cases where you're zeroing the original struct - when it's a read/write parameter passed to the kernel - rather than a copy), as you can no longer refer to it by name to assign zero to it; maybe you'd need to arrange for it to be named inside glibc but unnamed for user code.
I had thought of that as well, but wasn't sure if we could rely on bitfields to do the right thing across C standard versions and non-gcc compilers. I experimentally found that gcc-4.9 and higher accept that version without a -Wpedantic warning when using --std=c99 or --std=c11, but they warn about it with --std=c89 or --std=gnu89. Older gcc versions as far back as gcc-3.4 (the oldest I could try) always warn about anonymous bitfields with -pedantic, but I could still silence that warning using the __extension__ keyword. I think that's good enough.
What's the warning you see, with what exact testcase? Unnamed bit-fields (provided the declared type is int / signed int / unsigned int, not any other integer or enum type), and initializers ignoring them, date back at least to the third public review draft of C89 (13 May 1988), which is the oldest version I have to hand.
Sorry, my mistake, I used "long : 32", which causes "warning: type of bit-field '<anonymous>' is a GCC extension [-Wpedantic]". With 'int' it works fine.
Pro glibc:
- IIRC already have code to do this for x32, and could do it the same way on all 32-bit architectures. (I could not find where this is done though, maybe I dreamed it up?)
glibc doesn't have code for it; I think musl does.
Ok, that explains what I was thinking.
Arnd
Hi Arnd,
On Thu, 23 Feb 2017 21:24:22 +0100, Arnd Bergmann arnd@arndb.de wrote :
On Thu, Feb 23, 2017 at 6:31 PM, Joseph Myers joseph@codesourcery.com wrote:
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
[...]
I had thought of that as well, but wasn't sure if we could rely on bitfields to do the right thing across C standard versions and non-gcc compilers. I experimentally found that gcc-4.9 and higher accept that version without a -Wpedantic warning when using --std=c99 or --std=c11, but they warn about it with --std=c89 or --std=gnu89. Older gcc versions as far back as gcc-3.4 (the oldest I could try) always warn about anonymous bitfields with -pedantic, but I could still silence that warning using the __extension__ keyword. I think that's good enough.
What's the warning you see, with what exact testcase? Unnamed bit-fields (provided the declared type is int / signed int / unsigned int, not any other integer or enum type), and initializers ignoring them, date back at least to the third public review draft of C89 (13 May 1988), which is the oldest version I have to hand.
Sorry, my mistake, I used "long : 32", which causes "warning: type of bit-field '<anonymous>' is a GCC extension [-Wpedantic]". With 'int' it works fine.
So IIUC (and assuming we keep tv_nsec a long in APIs) we /could/ pad struct timespec with an anonymous bitfield and still be compatible with (most) existing user source code, and it is considered acceptable (but should be duly documented). Added this in rev. 122of the design doc.
Pro glibc:
- IIRC already have code to do this for x32, and could do it the same way on all 32-bit architectures. (I could not find where this is done though, maybe I dreamed it up?)
glibc doesn't have code for it; I think musl does.
Ok, that explains what I was thinking.
So that's one 'pro GLIBC' less, right?
Arnd
Cordialement, Albert ARIBAUD 3ADEV
On Mon, Feb 27, 2017 at 12:02 PM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
Hi Arnd,
On Thu, 23 Feb 2017 21:24:22 +0100, Arnd Bergmann arnd@arndb.de wrote :
On Thu, Feb 23, 2017 at 6:31 PM, Joseph Myers joseph@codesourcery.com wrote:
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
[...]
I had thought of that as well, but wasn't sure if we could rely on bitfields to do the right thing across C standard versions and non-gcc compilers. I experimentally found that gcc-4.9 and higher accept that version without a -Wpedantic warning when using --std=c99 or --std=c11, but they warn about it with --std=c89 or --std=gnu89. Older gcc versions as far back as gcc-3.4 (the oldest I could try) always warn about anonymous bitfields with -pedantic, but I could still silence that warning using the __extension__ keyword. I think that's good enough.
What's the warning you see, with what exact testcase? Unnamed bit-fields (provided the declared type is int / signed int / unsigned int, not any other integer or enum type), and initializers ignoring them, date back at least to the third public review draft of C89 (13 May 1988), which is the oldest version I have to hand.
Sorry, my mistake, I used "long : 32", which causes "warning: type of bit-field '<anonymous>' is a GCC extension [-Wpedantic]". With 'int' it works fine.
So IIUC (and assuming we keep tv_nsec a long in APIs) we /could/ pad struct timespec with an anonymous bitfield and still be compatible with (most) existing user source code, and it is considered acceptable (but should be duly documented). Added this in rev. 122of the design doc.
This solves the problem in user space, but it still leaves the other problem that Zack and I mentioned on the user/kernel boundary: we have to zero out the padding since at least 64-bit kernels with 32-bit user space would interpret the padding as nanoseconds over 1 billion. We may also want to make the kernel behavior consistent between 32-bit and 64-bit kernels, so the kernel will either always ignore the padding (which is a bit more work for the kernel) or it will always reject nonzero upper bits of the nanoseconds (as 64-bit kernels do today for native tasks, this means more work in 32-bit glibc, or using 64-bit nanoseconds as Zack suggested).
Right now, I have this in the kernel, to do the zeroing whenever we copy a __kernel_timespec from user space into a timespec64 in the kernel:
typedef __s64 __kernel_time64_t;
struct __kernel_timespec { __kernel_time64_t tv_sec; __s64 tv_nsec; };
int get_timespec64(struct timespec64 *ts, const struct __kernel_timespec __user *uts) { struct __kernel_timespec kts; int ret;
ret = copy_from_user(&kts, uts, sizeof(kts)); if (ret) return -EFAULT;
ts->tv_sec = kts.tv_sec; if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) kts.tv_nsec &= 0xfffffffful; ts->tv_nsec = kts.tv_nsec;
return 0; }
This should catch anything that passes in a timespec by itself (clock_settime, clock_nanosleep, timer_settime, timerfd_settime, ppoll, pselect6, io_getevents, recvmmsg, mq_timedsend, mq_timedreceive, semtimedop, utimensat, rt_sigtimedwait, futex and some ioctls), but not structures that embed a timespec (fortunately no syscalls, very few ioctls). The external function call plus the in_compat_syscall() check will add a couple of cycles on 64-bit architectures, for both 32-bit and 64-bit tasks. We could optimize this slightly using
if (IS_ENABLED(CONFIG_64BIT)) kts.tv_nsec &= current->tv_nsec_mask;
Arnd
On Mon, 27 Feb 2017, Arnd Bergmann wrote:
kernels, so the kernel will either always ignore the padding (which is a bit more work for the kernel) or it will always reject nonzero upper bits of the nanoseconds (as 64-bit kernels do today for native tasks, this means more work in 32-bit glibc, or using 64-bit nanoseconds as Zack suggested).
Note that the rejection for native tasks (ones where the userspace "long" type used for tv_nsec is 64-bit) is required. For example, for utimensat, POSIX requires an EINVAL error if "Either of the times argument structures specified a tv_nsec value that was neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or greater than or equal to 1000 million.". (There are some cases where glibc deals with such rejection itself, if e.g. it's converting an absolute timeout to a relative timeout in order to pass the latter to the kernel.)
On Mon, Feb 27, 2017 at 7:45 PM, Joseph Myers joseph@codesourcery.com wrote:
On Mon, 27 Feb 2017, Arnd Bergmann wrote:
kernels, so the kernel will either always ignore the padding (which is a bit more work for the kernel) or it will always reject nonzero upper bits of the nanoseconds (as 64-bit kernels do today for native tasks, this means more work in 32-bit glibc, or using 64-bit nanoseconds as Zack suggested).
Note that the rejection for native tasks (ones where the userspace "long" type used for tv_nsec is 64-bit) is required. For example, for utimensat, POSIX requires an EINVAL error if "Either of the times argument structures specified a tv_nsec value that was neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or greater than or equal to 1000 million.". (There are some cases where glibc deals with such rejection itself, if e.g. it's converting an absolute timeout to a relative timeout in order to pass the latter to the kernel.)
Sure, I was not suggesting that this check would be changed for 64-bit tasks. Aside from the clear violation of the relevant POSIX definition, that would also be a subtle ABI change that could lead to breaking existing application code.
What I meant above is that the kernel would either always treat new 32-bit tasks the same way as 64-bit tasks both under native 32-bit kernels and emulated on 64-bit kernels and leave the zeroing to user space, or we do the extra work I described in the kernel, to make it easier for a libc implementation to use 'long tv_nsec' without having to zero-pad of copy each timespec, again for both native and compat 32-bit tasks.
Arnd
Hi Arnd,
On Mon, 27 Feb 2017 20:46:30 +0100, Arnd Bergmann arnd@arndb.de wrote :
On Mon, Feb 27, 2017 at 7:45 PM, Joseph Myers joseph@codesourcery.com wrote:
On Mon, 27 Feb 2017, Arnd Bergmann wrote:
kernels, so the kernel will either always ignore the padding (which is a bit more work for the kernel) or it will always reject nonzero upper bits of the nanoseconds (as 64-bit kernels do today for native tasks, this means more work in 32-bit glibc, or using 64-bit nanoseconds as Zack suggested).
Note that the rejection for native tasks (ones where the userspace "long" type used for tv_nsec is 64-bit) is required. For example, for utimensat, POSIX requires an EINVAL error if "Either of the times argument structures specified a tv_nsec value that was neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or greater than or equal to 1000 million.". (There are some cases where glibc deals with such rejection itself, if e.g. it's converting an absolute timeout to a relative timeout in order to pass the latter to the kernel.)
Sure, I was not suggesting that this check would be changed for 64-bit tasks. Aside from the clear violation of the relevant POSIX definition, that would also be a subtle ABI change that could lead to breaking existing application code.
What I meant above is that the kernel would either always treat new 32-bit tasks the same way as 64-bit tasks both under native 32-bit kernels and emulated on 64-bit kernels and leave the zeroing to user space, or we do the extra work I described in the kernel, to make it easier for a libc implementation to use 'long tv_nsec' without having to zero-pad of copy each timespec, again for both native and compat 32-bit tasks.
Arnd
Maybe a silly question: We all assume that userland code always goes through GLIBC to invoke kernel syscalls which handle struct timespecs.
There is the possibility, at least theoretically, that userland invoke these syscalls directly.
Should we consider this possibility, or do we deem it purely theoretical?
If we consider that not all userland syscalls are made by GLIBC, then even if GLIBC checked and cleaned all struct it receives from userland and passes to the kernel, it seems to me that the kernel still has to check the struct timespecs it receives because they might not come from GLIBC and thus, they might not have been cleaned by it.
My feeling is that we should check the struct timespecs in the kernel systematically, and in GLIBC only when using them in computations (as opposed to passing them to the kernel).
Cordialement, Albert ARIBAUD 3ADEV
On Mon, Feb 27, 2017 at 9:06 PM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
Maybe a silly question: We all assume that userland code always goes through GLIBC to invoke kernel syscalls which handle struct timespecs.
There is the possibility, at least theoretically, that userland invoke these syscalls directly.
Should we consider this possibility, or do we deem it purely theoretical?
If we consider that not all userland syscalls are made by GLIBC, then even if GLIBC checked and cleaned all struct it receives from userland and passes to the kernel, it seems to me that the kernel still has to check the struct timespecs it receives because they might not come from GLIBC and thus, they might not have been cleaned by it.
My feeling is that we should check the struct timespecs in the kernel systematically, and in GLIBC only when using them in computations (as opposed to passing them to the kernel).
The kernel syscall interface does not always adhere to POSIX, and when a user calls into the syscalls directly, they should follow the interface that is documented in the kernel man pages instead.
If we decide to have the kernel interface done the way that Zack suggested based on the x32 precedent (and Linus before him when the x32 decision was made[1]), we'd leave it up to any libc implementation to either use the same timespec definition as the kernel, or use the POSIX definition and do the zero-pad in the syscall wrapper.
Arnd
Hi Arnd,
On Mon, 27 Feb 2017 21:17:41 +0100, Arnd Bergmann arnd@arndb.de wrote :
On Mon, Feb 27, 2017 at 9:06 PM, Albert ARIBAUD albert.aribaud@3adev.fr wrote:
Maybe a silly question: We all assume that userland code always goes through GLIBC to invoke kernel syscalls which handle struct timespecs.
There is the possibility, at least theoretically, that userland invoke these syscalls directly.
Should we consider this possibility, or do we deem it purely theoretical?
If we consider that not all userland syscalls are made by GLIBC, then even if GLIBC checked and cleaned all struct it receives from userland and passes to the kernel, it seems to me that the kernel still has to check the struct timespecs it receives because they might not come from GLIBC and thus, they might not have been cleaned by it.
My feeling is that we should check the struct timespecs in the kernel systematically, and in GLIBC only when using them in computations (as opposed to passing them to the kernel).
The kernel syscall interface does not always adhere to POSIX, and when a user calls into the syscalls directly, they should follow the interface that is documented in the kernel man pages instead.
If we decide to have the kernel interface done the way that Zack suggested based on the x32 precedent (and Linus before him when the x32 decision was made[1]), we'd leave it up to any libc implementation to either use the same timespec definition as the kernel, or use the POSIX definition and do the zero-pad in the syscall wrapper.
Arnd
IIUC, on the kernel side there is a will to move away from Posix (1) here and just make tv_nsec 64-bits with no padding when time_t is 64-bit.
If that is so, then there is little point in not doing the same on the GLIBC side; similar definitions of struct timespec for a given time size would remove any padding questions.
basically the only issues remaining would then be in the application code itself. As long as tv_nsec is signed 64-bit but holds values which fit within a signed 32-bit integer, application code reading from and writing to tv_nsec has no reason to fail; the only problem I see is if some weird code implicitly depends on tv_nsec's size being 'long'.
So, essentially: do we move away from Posix regarding tv_nsec, and define it as a 64-bit signed ?
(1) or possibly try to move Posix.
Cordialement, Albert ARIBAUD 3ADEV
On 03/08/2017 02:28 AM, Albert ARIBAUD wrote:
the only problem I see is if some weird code implicitly depends on tv_nsec's size being 'long'.
Although I have not found "weird code" like that in GNU applications I help maintain, I did find one instance of "weird code" back in 2012 when I was first looking into the problem: an example program in Kerrisk's book "The Linux Programming Interface" http://man7.org/tlpi/code/online/dist/timers/t_clock_nanosleep.c.html, which contains this:
printf("... Remaining: %ld.%09ld", (long) remain.tv_sec, remain.tv_nsec);
This fails on ABIs where tv_nsec is 'long long' and where 'long long' is passed differently from 'long'. (The code also fails on these ABIs if remain.tv_sec exceeds LONG_MAX, of course.)
So, although I'm mildly inclined to think it's OK to change tv_nsec to be wider than long, I suspect this change will break a few lower-quality applications in minor ways, and this is not a cost that we can entirely sweep under the rug.
On Wed, 8 Mar 2017, Albert ARIBAUD wrote:
IIUC, on the kernel side there is a will to move away from Posix (1) here and just make tv_nsec 64-bits with no padding when time_t is 64-bit.
In that case, that simply means it's glibc's responsibility to avoid the x32 mistake, and copy timespec values passed to the kernel to clear padding while staying compatible with POSIX. (Which would probably also provide the infrastructure required to fix the x32 bug in glibc along the lines used in musl.)
On 03/08/2017 09:10 PM, Joseph Myers wrote:
On Wed, 8 Mar 2017, Albert ARIBAUD wrote:
IIUC, on the kernel side there is a will to move away from Posix (1) here and just make tv_nsec 64-bits with no padding when time_t is 64-bit.
In that case, that simply means it's glibc's responsibility to avoid the x32 mistake, and copy timespec values passed to the kernel to clear padding while staying compatible with POSIX. (Which would probably also provide the infrastructure required to fix the x32 bug in glibc along the lines used in musl.)
That's not going to work because struct timespec can be hidden in ioctl arguments, ancillary data (cmsg), and Netlink messages. It's an uphill battle to lie to userspace about the kernel ABI, so we really shouldn't do that.
Thanks, Florian
On Thu, 9 Mar 2017, Florian Weimer wrote:
On 03/08/2017 09:10 PM, Joseph Myers wrote:
On Wed, 8 Mar 2017, Albert ARIBAUD wrote:
IIUC, on the kernel side there is a will to move away from Posix (1) here and just make tv_nsec 64-bits with no padding when time_t is 64-bit.
In that case, that simply means it's glibc's responsibility to avoid the x32 mistake, and copy timespec values passed to the kernel to clear padding while staying compatible with POSIX. (Which would probably also provide the infrastructure required to fix the x32 bug in glibc along the lines used in musl.)
That's not going to work because struct timespec can be hidden in ioctl arguments, ancillary data (cmsg), and Netlink messages. It's an uphill battle to lie to userspace about the kernel ABI, so we really shouldn't do that.
Then the kernel should follow POSIX in the processing of timespec structures from userspace if they don't want their 64-bit timespec interfaces ignored.
On Mon, Feb 27, 2017 at 2:46 PM, Arnd Bergmann arnd@arndb.de wrote:
What I meant above is that the kernel would either always treat new 32-bit tasks the same way as 64-bit tasks both under native 32-bit kernels and emulated on 64-bit kernels and leave the zeroing to user space, or we do the extra work I described in the kernel, to make it easier for a libc implementation to use 'long tv_nsec' without having to zero-pad of copy each timespec, again for both native and compat 32-bit tasks.
I don't think user-space zeroing is practical, especially if we don't want to deviate from POSIX (which I still think is the best solution overall).
Since the kernel has to have the code to check for out-of-range values anyway, though, it could postpone looking at the task type until it notices that the value is out of range. That is, instead of doing
if (32-bit task) { if (low 32 bits out of range) { error; } } else { if (64-bit quantity out of range) { error; } }
it should do
if (64-bit quantity out of range) { if (64-bit task) error; else if (low 32 bits out of range) error; }
That way only code that doesn't zero the entire structure is penalized.
zw
On Mon, Feb 27, 2017 at 9:59 PM, Zack Weinberg zackw@panix.com wrote:
On Mon, Feb 27, 2017 at 2:46 PM, Arnd Bergmann arnd@arndb.de wrote:
What I meant above is that the kernel would either always treat new 32-bit tasks the same way as 64-bit tasks both under native 32-bit kernels and emulated on 64-bit kernels and leave the zeroing to user space, or we do the extra work I described in the kernel, to make it easier for a libc implementation to use 'long tv_nsec' without having to zero-pad of copy each timespec, again for both native and compat 32-bit tasks.
I don't think user-space zeroing is practical, especially if we don't want to deviate from POSIX (which I still think is the best solution overall).
Since the kernel has to have the code to check for out-of-range values anyway, though, it could postpone looking at the task type until it notices that the value is out of range. That is, instead of doing
if (32-bit task) { if (low 32 bits out of range) { error; }
} else { if (64-bit quantity out of range) { error; } }
it should do
if (64-bit quantity out of range) { if (64-bit task) error; else if (low 32 bits out of range) error; }
That way only code that doesn't zero the entire structure is penalized.
Good idea, thanks! We currently do the checking separately from the copying, but doing the two together is a good cleanup and also helps the places that currently lack a range check (typically in drivers). We have three different checks on a timespec at the moment (no check on tv_sec range, tv_sec must be positive, or tv_sec must be within the 400 year range of 64-bit nanoseconds), but the check for the nanoseconds is always the same.
Arnd
On Mon, 27 Feb 2017, Arnd Bergmann wrote:
Good idea, thanks! We currently do the checking separately from the copying, but doing the two together is a good cleanup and also helps the places that currently lack a range check (typically in drivers). We have three different checks on a timespec at the moment (no check on tv_sec range, tv_sec must be positive, or tv_sec must be within the 400 year range of 64-bit nanoseconds), but the check for the nanoseconds is always the same.
I'd expect two different versions of the nanoseconds check (one for utimes functions that allows UTIME_OMIT and UTIME_NOW, one for other uses that doesn't). (utimes functions are also a case where a negative tv_sec is perfectly reasonable, but EINVAL is needed if the value is outside a filesystem-specific range - "A new file timestamp would be a value whose tv_sec component is not a value supported by the file system.".)
On Mon, Feb 27, 2017 at 10:19 PM, Joseph Myers joseph@codesourcery.com wrote:
On Mon, 27 Feb 2017, Arnd Bergmann wrote:
Good idea, thanks! We currently do the checking separately from the copying, but doing the two together is a good cleanup and also helps the places that currently lack a range check (typically in drivers). We have three different checks on a timespec at the moment (no check on tv_sec range, tv_sec must be positive, or tv_sec must be within the 400 year range of 64-bit nanoseconds), but the check for the nanoseconds is always the same.
I'd expect two different versions of the nanoseconds check (one for utimes functions that allows UTIME_OMIT and UTIME_NOW, one for other uses that doesn't).
Ok, good to know, I had forgotten about those.
(utimes functions are also a case where a negative tv_sec is perfectly reasonable, but EINVAL is needed if the value is outside a filesystem-specific range - "A new file timestamp would be a value whose tv_sec component is not a value supported by the file system.".)
Right, this is being worked on already, we are actually adding file system specific range checking here, based on the capabilities of the on-disk format, we might have different minimum and maximum values, e.g. 1902..2038 (ext3), 1970..2106 (afs), 1980..2099 (fat), 1902..2446 (ext4) or TIME_T_MIN..TIME_T_MAX (btrfs).
This was never handled correctly in Linux, and we usually just overflow the times today, resulting in more or less much random times when reading the data back.
Arnd
On Wed, Feb 22, 2017 at 4:39 PM, Arnd Bergmann arnd@arndb.de wrote:
I'll comment on the kernel/glibc incompatibilities section tomorrow, need to collect my thoughts there some more.
For build-time dependencies, any changes we do to the kernel headers should only add things for new glibc but leave the existing definitions unchanged as long as _TIME_BITS is not set to 64 on a 32-bit architecture.
In the few cases that can't be handled by simply adding new definitions, we need to replace e.g.
#define SIOCGSTAMPNS_OLD 0x8907
with something like
#define SIOCGSTAMPNS_OLD 0x8907 #define SIOCGSTAMPNS_TIME64 _IOR(0x89, 0x07, struct timespec) #define SIOCGSTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? \ SIOCGSTAMPNS_OLD : SIOCGSTAMPNS_TIME64)
This will define the constant to the traditional value for all existing libc implementations as well as new libc built without __USE_TIME_BITS64, but will use a newly defined value on new glibc with __USE_TIME_BITS64. If anyone tries to use the constant without first having included <time.h>, it will cause a compile-time error, which is probably the best we can do here (better than using an incorrect value).
There are other cases where we actually need to check _TIME_BITS, e.g.
#if defined(__KERNEL__) || \ (defined(_TIME_BITS) && (_TIME_BITS > __BITS_PER_LONG)) struct input_timeval { __u32 tv_sec; /* __u32 overflows in y2106 */ __s32 tv_usec; }; #else #define input_timeval timeval #endif
struct input_event { struct input_timeval time; __u16 type; __u16 code; __s32 value; }; #undef input_timeval
Here, we cannot change the binary layout of input_event as the kernel has no idea what kind of user space is running, but we have to prevent a new user space from using the definition based on an incompatible 'struct timeval'. In this case, it is safe to assume that either _TIME_BITS has been defined (on a new libc after including time.h to see 'struct timeval'), or that we are on an old libc with the traditional definition of timeval. Again, this is broken in a rare corner case: user space that assumes that input_event->time is a struct timeval, but this will cause a compile failure for incompatible types that I see now way around.
Both of the examples above will break silently when using an old version of the kernel header with _TIME_BITS=64. I would suggest to not allow building support for 64-bit time_t on a glibc with old kernel headers because of this, but I don't know how hard that is to do in glibc. It is probably safe to assume that kernel headers that define __NR_CLOCK_GETTIME64 in asm/unistd.h have a reasonable set of definitions in their other headers as well.
For run-time dependencies, glibc can try to support old kernels with new builds, but this support will not cover any interfaces that are not wrapped by glibc. With the above example fo SIOCGSTAMPNS, an application calling this ioctl on an old kernel will run into an error with errno=EINVAL. This can be mitigated by emulating a reasonable subset of the affected ioctl/fcntl/sockopts/etc calls in glibc, but it would be hard to guarantee that this works for every kernel subsystem and device driver.
The other interesting runtime case is a kernel that intentionally drops 32-bit time_t support in combination with an application that uses the _TIME_BITS=32 compatibility interfaces. In this case, we want any syscalls to fail rather than be emulated through the 64-bit calls. This should be the normal behavior with the implementation you describe, but I think it should be documented that this is intentional.
Arnd
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
I would suggest to not allow building support for 64-bit time_t on a glibc with old kernel headers because of this, but I don't know how hard that is to do in glibc. It is probably safe to
Increasing the minimum kernel headers version globally in glibc is easy (just change LIBC_LINUX_VERSION in sysdeps/unix/sysv/linux/configure.ac). It's also probably less controversial than increasing the minimum runtime version (we require 3.2 or later headers unconditionally, but still allow 2.6.32 kernels at runtime on x86_64 and x86).
Increasing the minimum just for architectures that currently have 32-bit time_t, without changing the runtime minimum, would be trickier.
On Thu, Feb 23, 2017 at 6:34 PM, Joseph Myers joseph@codesourcery.com wrote:
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
I would suggest to not allow building support for 64-bit time_t on a glibc with old kernel headers because of this, but I don't know how hard that is to do in glibc. It is probably safe to
Increasing the minimum kernel headers version globally in glibc is easy (just change LIBC_LINUX_VERSION in sysdeps/unix/sysv/linux/configure.ac). It's also probably less controversial than increasing the minimum runtime version (we require 3.2 or later headers unconditionally, but still allow 2.6.32 kernels at runtime on x86_64 and x86).
Increasing the minimum just for architectures that currently have 32-bit time_t, without changing the runtime minimum, would be trickier.
Ok, so I guess we have to change the minimum kernel header version anyway to add any new syscalls, right? It looks like the only syscall that glibc uses with from post-3.2 headers is renameat2, and it only does this when __NR_renameat is not defined (implying that __NR_renameat2 must be present).
Arnd
On Thu, 23 Feb 2017, Arnd Bergmann wrote:
Ok, so I guess we have to change the minimum kernel header version anyway to add any new syscalls, right? It looks like the only syscall that glibc uses with from post-3.2 headers is renameat2, and it only does this when __NR_renameat is not defined (implying that __NR_renameat2 must be present).
Normally we handle the case of older headers conditionally (with calls to new interfaces calling back to old syscalls if possible, which is the case for 64-bit time_t work, or returning ENOSYS errors otherwise) - just as the case of new headers but glibc supporting an older kernel at runtime needs runtime conditionals (64-bit time_t interfaces trying the new syscalls, and if they give ENOSYS errors then falling back to the old syscalls whenever possible). Although the headers version can be increased without increasing the runtime minimum, most commonly it's only increased when the runtime minimum is increased as well.
I've discussed the kernel side for "Y2038-incompatible socket timestamping" with Deep a while ago, and I think we came to a new conclusions for what would be the best approach. I'll let her comment here.
There is a small error in the way SO_TIMESTAMPING is described below:
SO_TIMESTAMPING uses an integer bitfield which is not sensitive to time bit-size.
SO_TIMESTAMPING uses the below struct as its cmsg data: struct scm_timestamping { struct timespec ts[3]; };
Timestamp options can also be differentiated based on the length argument in the cmsg. But, like the timeout options, since we do not want to rely on the lengths of the data, we agree that all the timestamping options will have 64 bit time_t versions like you mention.
Kernel will use a macro that Arnd suggested:
#define SO_TIMESTAMPNS ((sizeof(time_t) > sizeof(__kernel_long_t)) ? SO_TIMESTAMPNS64 : SO_TIMESTAMPNS32)
Your comments on SO_RCVTIMEO and SO_SNDTIMEO are consistent with what we had agreed upon before.
-Deepa
Hi Deepa,
On Tue, 28 Feb 2017 11:07:02 -0800, Deepa Dinamani deepa.kernel@gmail.com wrote :
I've discussed the kernel side for "Y2038-incompatible socket timestamping" with Deep a while ago, and I think we came to a new conclusions for what would be the best approach. I'll let her comment here.
There is a small error in the way SO_TIMESTAMPING is described below:
SO_TIMESTAMPING uses an integer bitfield which is not sensitive to time bit-size.
SO_TIMESTAMPING uses the below struct as its cmsg data: struct scm_timestamping { struct timespec ts[3]; };
I think we're talking about two different things: the integer bitfield I am talking about is the argument to setsockopt, described as such in Documentation/networking/timestamping.txt; the struct scm_timestamping you're talking about is the ancillary data read through recvmsg() when some SO_TIMESTAMPING options were set.
As far as recvmsg / sendmsg / cmsg are concerned, I /think/ there are no dependencies per se on time_t size; it only comes into play when/if CMSG_DATA() is type-cast to (a pointer to) a type which involves time_t, and such casts are done at application, not GLIBC, level.
This is AFAICT the case for the struct scm_timestamp from Linux is not (re)defined by GLIBC; application code which uses this struct will have taken it from Linux directly (and will have to be consistent re time_t between its Linux and GLIBC usage).
Timestamp options can also be differentiated based on the length argument in the cmsg. But, like the timeout options, since we do not want to rely on the lengths of the data, we agree that all the timestamping options will have 64 bit time_t versions like you mention.
Kernel will use a macro that Arnd suggested:
#define SO_TIMESTAMPNS ((sizeof(time_t) > sizeof(__kernel_long_t)) ? SO_TIMESTAMPNS64 : SO_TIMESTAMPNS32)
Your comments on SO_RCVTIMEO and SO_SNDTIMEO are consistent with what we had agreed upon before.
-Deepa
Thanks for your comments!
Cordialement, Albert ARIBAUD 3ADEV
There is a small error in the way SO_TIMESTAMPING is described below:
SO_TIMESTAMPING uses an integer bitfield which is not sensitive to time bit-size.
SO_TIMESTAMPING uses the below struct as its cmsg data: struct scm_timestamping { struct timespec ts[3]; };
I think we're talking about two different things: the integer bitfield I am talking about is the argument to setsockopt, described as such in Documentation/networking/timestamping.txt; the struct scm_timestamping you're talking about is the ancillary data read through recvmsg() when some SO_TIMESTAMPING options were set.
According to your draft:
SO_TIMESTAMP uses a struct timeval as its cmsg data;
SO_TIMESTAMPNS uses a struct timespec as its cmsg data;
SO_TIMESTAMPING uses an integer bitfield which is not sensitive to time bit-size.
Here you point out that the OOB data format for both SO_TIMESTAMP and SO_TIMESTAMPNS. But, you are using setsockopt options for SO_TIMESTAMPING. I was just pointing out this discrepancy.
As far as recvmsg / sendmsg / cmsg are concerned, I /think/ there are no dependencies per se on time_t size; it only comes into play when/if CMSG_DATA() is type-cast to (a pointer to) a type which involves time_t, and such casts are done at application, not GLIBC, level.
I was pointing to the discussion from last time here: that we could use a flag in recvmsg/ sendmsg to indicate time_t size. And, we decided to not rely on this.
-Deepa