As discussed before, I tried using the rebootstrap tool [1] to see what problems come up once the entire distro gets rebuilt. Based on Lukasz' recommendation, I tried the 'y2038_edge' branch with his experimental glibc patches [2], using commit c2de7ee9461 dated 2020-02-17.
Here is a rough summary of what I tried, what worked, and what problems I ran into:
* Building a Debian package from this was fairly straightforward, using the 2.31 branch in the package git repository[3] after replacing the debian/patches/git-updates.diff file with one generated from [2] and disabling the hurd patches because of conflicts.
* After installing the modified x86 glibc package, I ran into a runtime bug in [4], which needs to pass AT_FDCWD instead of 0 to avoid ENOTDIR errors.
* Bootstrapping a regular time32 Debian armhf with this libc took me a few days to get right, but that was mostly for getting familiar with rebootstrap and running into known issues unrelated to time64 or the glibc changes.
* Actually building a time64 version of glibc turned out to be harder, including some issues discussed on the libc mailing list[5]:
- Always setting -D_TIME_BITS=64 in the global compiler flags for the distro breaks both the native 64-bit (x86_64) build and the 32-bit build, as glibc itself expects to be built without this.
- Removing the time32 symbols from the glibc shared object did not work as they are still used (a lot) internally, and by the testsuite.
- I tried converting all the internal symbols to use the time64 variants with the correct types (e.g. __clock_gettime64() instead of __clock_gettime()), but then ran into a lot of APIs that take timespec/timeval/... arguments and pass them down into internal functions. These seem to all be bugs that require adding a time64 version of the external ABI.
- After I abandoned that approach, I continued with a simple patch to features.h that sets _TIME_BITS/_FILE_OFFSET_BITS based on '#if !defined _LIBC && __TIMESIZE == 32', which ignores the bugs I found earlier but got me a lot further.
- Building the i386 glibc with that patch, I ran into over 150 testsuite failures [6]. This looked like there was a fundamental mistake on my side, but after I looked into a few of the failures, most seemed to be either glibc or testsuite bugs that have to be addressed individually. I considered giving up at this point, but as Lukasz has said that he had successfully built a working system using Yocto, I kept going anyway and marked these all as expected failures in the debian package.
* There are a couple of noteworthy issues in glibc-y2038 I'd like to point out in particular, though I'm sure these are not the only important ones:
- The clock_nanosleep() prototype needed a '__THROW' annotation to complete the build.
- The nptl and sunrpc portions have numerous interfaces with 'timeval' or 'timespec' arguments that each cause an ABI break.
- stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk() are some of the other interfaces that take a time_t based argument and need to grow a time64 version to avoid an ABI mismatch.
- The timeval prototype appears to be broken, as it's missing padding on architectures without native alignment of __time64 (e.g. i386) and on all big-endian architectures.
- some testcases hang in futex_wait() or clock_nanosleep() because of incorrect timeout arguments, presumably from type mismatches.
* There is an open question regarding the name of the Debian architecture. For my experiments, I kept using the 'armhf' name unmodified, though there seems to be a general feeling that using a different name would be required to address the broad incompatibilities between time32 and time64 versions of all the libraries in the distro. Gradually changing them won't work because of the timeline and the number of affected libraries. However, the new name of the distro also implies having a distinct target triplet, which must then be known by glibc along with everything else using config.guess/config.sub. I expect this topic to require a lot more discussion.
* Continuing with the rebootstrap build despite the known glibc issues and the open question on the architecture name went surprisingly well, only two out of the 152 source packages I built had compile-time problems:
- building the final gcc failed in libsanitizer, which has compile-time checks to ensure some libc data structures have the expected layout. It noticed that 'struct timeb' and 'struct dirent' are different based on _TIME_BITS and _FILE_OFFSET_BITS. I disabled the checks to be able to continue. To this properly, the library has to learn about the new data structures as well. I opened a bug report against the library[7].
- libpreludecpp12 failed to build because of checks for changes in the exported functions, which are different with time64. I disabled the checks. Once we have agreed on a new debian architecture name, the symbols can be made arch specific.
* After everything was built, I tried installing the packages into a chroot with qemu-debootstrap, which failed because I had configured the glibc to assume it's running on a new kernel while the qemu-user binary I had lacks the new syscalls. I believe this is fixed in upstream qemu, but did not try that.
* Trying to install again I used a clean debian-arm64 installation running in qemu-system-aarch64, and attempted installing the armhf packages using a regular debootstrap, running the 32-bit binaries in compat mode of a recent arm64 kernel. This partially worked and I could chroot into the system and use a shell, but ultimately the debootstrap did not complete because of errors. I saw that 'tar' had failed because of the stat() glibc ABI mismatch breaking its private gnulib fdutimens() implementation, and this is where I gave up.
I have spent more time on this now than I had planned, and don't expect to do further work on it anytime soon, but I hope my summary is useful to others that are going to need this later. I can obviously share my patches and build artifacts if anyone needs them. There are two additional approaches that would likely get a Debian bootstrap further, but that I have not tried as they were previously dismissed:
* Adding a time64 armhf as a separate (incompatible) target in glibc that defines __TIMESIZE==64 and a 64-bit __time_t would avoid most of the remaining ABI issues and put armhf-time64 in the same category as riscv32 and arc, but this idea was so far rejected by the glibc maintainers. Depending on how hard this turns out to be, it could be used to get to the point of self-hosting though, and help find time64 related bugs in the rest of the distro.
* Doing the bootstrap using a musleabihf target instead of gnueabihf would avoid the current issues internal to glibc-y2038, but instead lead to new problems with packages that do not currently work with musl. Adelie Linux has shown that it's already possible to build a useful distro using musl and time64[8], and this would sidestep the question of the target triplet. While it would also help find and fix additional bugs in packages, and make an interesting unoffical Debian target, I don't see it replacing the existing armhf port any time soon.
For additional information about the Debian plans, see the article on LWN[9] that summarizes the discussion started by Steve McIntyre [10].
Arnd
[1] https://wiki.debian.org/HelmutGrohne/rebootstrap [2] https://github.com/lmajewski/y2038_glibc/tree/y2038_edge [3] https://salsa.debian.org/glibc-team/glibc/-/tree/glibc-2.31 [4] https://github.com/lmajewski/y2038_glibc/commit/2f72ea2b6f6ee [5] https://sourceware.org/pipermail/libc-alpha/2020-February/111375.html [6] https://pastebin.com/fJYV2stF [7] https://bugs.llvm.org/show_bug.cgi?id=45138 [8] https://wiki.adelielinux.org/wiki/Project:Time64 [9] https://lwn.net/Articles/812767/ [10] https://lwn.net/ml/debian-devel/20200204131410.GF3043@tack.einval.com/