On Wed, 11 Mar 2020 13:52:00 +01000, Arnd Bergmann wrote:
As discussed before, I tried using the rebootstrap tool [1] to see what problems come up once the entire distro gets rebuilt. Based on Lukasz' recommendation, I tried the 'y2038_edge' branch with his experimental glibc patches [2], using commit c2de7ee9461 dated 2020-02-17.
Here is a rough summary of what I tried, what worked, and what problems I ran into:
[...]
Actually building a time64 version of glibc turned out to be harder, including some issues discussed on the libc mailing list[5]:
- Always setting -D_TIME_BITS=64 in the global compiler flags for the distro breaks both the native 64-bit (x86_64) build and the 32-bit build, as glibc itself expects to be built without this.
This seems like a small issue, but glibc should probably either remove it from CFLAGS in the build system or at least catch it at configure time and error out, so that it's not confusing when it breaks.
- Removing the time32 symbols from the glibc shared object did not work as they are still used (a lot) internally, and by the testsuite.
That they're used internally sounds like a major problem; anywhere they're being used internally potentially has hidden Y2038 bugs. This is also why I'm concerned about glibc's approach of not building itself with _TIME_BITS=64, and just undefining it or doing something else in the wrapper files for the legacy time32 symbols.
- I tried converting all the internal symbols to use the time64 variants with the correct types (e.g. __clock_gettime64() instead of __clock_gettime()), but then ran into a lot of APIs that take timespec/timeval/... arguments and pass them down into internal functions. These seem to all be bugs that require adding a time64 version of the external ABI.
This also sounds bad. The set of functions that need time64 versions has little to do with the syscalls that needed changing, and rather is a matter of which functions have time_t-derived types in their public interfaces. I think it would be useful to compare current glibc patches against musl's "nm -D libc.so | grep time64", which has 63 lines. There may be more functions glibc needs to have time64 versions of because of of additional functionality it supports, but if it's lacking any of the ones musl has, that's probably indicative of a bug. I'm attaching the list.
- The nptl and sunrpc portions have numerous interfaces with 'timeval' or 'timespec' arguments that each cause an ABI break.
nptl is essential but I think sunrpc is pure legacy ABI and not intended to be linkable in the future.
- stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk() are some of the other interfaces that take a time_t based argument and need to grow a time64 version to avoid an ABI mismatch.
And this requires a decision whether to keep the __xstat framework with a new _STAT_VER or make a new symbol.
I have spent more time on this now than I had planned, and don't expect to do further work on it anytime soon, but I hope my summary is useful to others that are going to need this later. I can obviously share my patches and build artifacts if anyone needs them. There are two additional approaches that would likely get a Debian bootstrap further, but that I have not tried as they were previously dismissed:
It's really amazing how much time you put into this. Thank you!!
Rich