This is a series I worked on with Baolin in 2017 and 2018, but we
never quite managed to finish up the last pieces. During the
ALSA developer meetup at ELC-E 2018 in Edinburgh, a decision was
made to go with this approach for keeping best compatibility
with existing source code, and then I failed to follow up by
resending the patches.
Now I have patches for all remaining time_t uses in the kernel,
so it's absolutely time to revisit them. I have done more
review of the patches myself and found a couple of minor issues
that I have fixed up, otherwise the series is still the same as
before.
Conceptually, the idea of these patches is:
- 64-bit applications should see no changes at all, neither
compile-time nor run-time.
- 32-bit code compiled with a 64-bit time_t currently
does not work with ALSA, and requires kernel changes and/or
sound/asound.h changes
- Most 32-bit code using these interfaces will work correctly
on a modified kernel, with or without the uapi header changes.
- 32-bit code using SNDRV_TIMER_IOCTL_TREAD requires the
updated header file for 64-bit time_t support
- 32-bit i386 user space with 64-bit time_t is broken for
SNDRV_PCM_IOCTL_STATUS, SNDRV_RAWMIDI_IOCTL_STATUS and
SNDRV_PCM_IOCTL_SYNC_PTR because of i386 alignment. This is also
addressed by the updated uapi header.
- PCM mmap is currently supported on native x86 kernels
(both 32-bit and 64-bit) but not for compat mode. This series breaks
the 32-bit native mmap support for 32-bit time_t, but instead allows
it for 64-bit time_t on both native and compat kernels. This seems to
be the best trade-off, as mmap support is optional already, and most
32-bit code runs in compat mode anyway.
- I've tried to avoid breaking compilation of 32-bit code
as much as possible. Anything that does break however is likely code
that is already broken on 64-bit time_t and needs source changes to
fix them.
I hope I addressed all review comments by now, so please pull this
for linux-5.6.
A git branch with the same contents is available for testing at [1].
Arnd
[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git y2038-alsa-v7
[2] https://lore.kernel.org/lkml/CAK8P3a2Os66+iwQYf97qh05W2JP8rmWao8zmKoHiXqVHv…
Changes since v6: (Arnd):
- Add a patch to update the API versions
- Hide a timespec reference in #ifndef __KERNEL__ to remove the
last reference to time_t
- Use a more readable way to do padding and describe it in the
changelog
- Rebase to linux-5.5-rc1, changing include/sound/soc-component.h
and sound/drivers/aloop.c as needed.
Changes since v5 (Arnd):
- Rebased to linux-5.4-rc4
- Updated to completely remove timespec and time_t references from alsa
- found and fixed a few bugs
Changes since v4 (Baolin):
- Add patch 5 to change trigger_tstamp member of struct snd_pcm_runtime.
- Add patch 8 to change internal timespec.
- Add more explanation in commit message.
- Use ktime_get_real_ts64() in patch 6.
- Split common code out into a separate function in patch 6.
- Fix tu->tread bug in patch 6 and remove #if __BITS_PER_LONG == 64 macro.
Changes since v3:
- Move struct snd_pcm_status32 to pcm.h file.
- Modify comments and commit message.
- Add new patch2 ~ patch6.
Changes since v2:
- Renamed all structures to make clear.
- Remove CONFIG_X86_X32 macro and introduced new compat_snd_pcm_status64_x86_32.
Changes since v1:
- Add one macro for struct snd_pcm_status_32 which only active in 32bits kernel.
- Convert pcm_compat.c to use struct snd_pcm_status_64.
- Convert pcm_native.c to use struct snd_pcm_status_64.
---
Arnd Bergmann (3):
ALSA: move snd_pcm_ioctl_sync_ptr_compat into pcm_native.c
ALSA: add new 32-bit layout for snd_pcm_mmap_status/control
ALSA: bump uapi version numbers
Baolin Wang (6):
ALSA: Replace timespec with timespec64
ALSA: Avoid using timespec for struct snd_timer_status
ALSA: Avoid using timespec for struct snd_ctl_elem_value
ALSA: Avoid using timespec for struct snd_pcm_status
ALSA: Avoid using timespec for struct snd_rawmidi_status
ALSA: Avoid using timespec for struct snd_timer_tread
include/sound/pcm.h | 74 ++++++--
include/sound/soc-component.h | 4 +-
include/sound/timer.h | 4 +-
include/uapi/sound/asound.h | 145 +++++++++++++--
sound/core/pcm.c | 12 +-
sound/core/pcm_compat.c | 282 ++++++++----------------------
sound/core/pcm_lib.c | 38 ++--
sound/core/pcm_native.c | 226 +++++++++++++++++++++---
sound/core/rawmidi.c | 132 +++++++++++---
sound/core/rawmidi_compat.c | 87 +++------
sound/core/timer.c | 229 ++++++++++++++++++------
sound/core/timer_compat.c | 62 +------
sound/drivers/aloop.c | 2 +-
sound/pci/hda/hda_controller.c | 10 +-
sound/soc/intel/skylake/skl-pcm.c | 4 +-
15 files changed, 817 insertions(+), 494 deletions(-)
--
2.20.0
As discussed before, I tried using the rebootstrap tool [1] to see what
problems come up once the entire distro gets rebuilt. Based on Lukasz'
recommendation, I tried the 'y2038_edge' branch with his experimental
glibc patches [2], using commit c2de7ee9461 dated 2020-02-17.
Here is a rough summary of what I tried, what worked, and what problems
I ran into:
* Building a Debian package from this was fairly straightforward, using
the 2.31 branch in the package git repository[3] after replacing the
debian/patches/git-updates.diff file with one generated from [2] and
disabling the hurd patches because of conflicts.
* After installing the modified x86 glibc package, I ran into a runtime
bug in [4], which needs to pass AT_FDCWD instead of 0 to avoid
ENOTDIR errors.
* Bootstrapping a regular time32 Debian armhf with this libc took me
a few days to get right, but that was mostly for getting familiar
with rebootstrap and running into known issues unrelated to time64
or the glibc changes.
* Actually building a time64 version of glibc turned out to be
harder, including some issues discussed on the libc mailing list[5]:
- Always setting -D_TIME_BITS=64 in the global compiler flags for
the distro breaks both the native 64-bit (x86_64) build and the
32-bit build, as glibc itself expects to be built without this.
- Removing the time32 symbols from the glibc shared object did not
work as they are still used (a lot) internally, and by the testsuite.
- I tried converting all the internal symbols to use the time64
variants with the correct types (e.g. __clock_gettime64() instead
of __clock_gettime()), but then ran into a lot of APIs that take
timespec/timeval/... arguments and pass them down into internal
functions. These seem to all be bugs that require adding a time64
version of the external ABI.
- After I abandoned that approach, I continued with a simple
patch to features.h that sets _TIME_BITS/_FILE_OFFSET_BITS based on
'#if !defined _LIBC && __TIMESIZE == 32', which ignores the bugs I
found earlier but got me a lot further.
- Building the i386 glibc with that patch, I ran into over 150
testsuite failures [6]. This looked like there was a fundamental
mistake on my side, but after I looked into a few of the failures,
most seemed to be either glibc or testsuite bugs that have to be
addressed individually. I considered giving up at this point,
but as Lukasz has said that he had successfully built a working
system using Yocto, I kept going anyway and marked these all as
expected failures in the debian package.
* There are a couple of noteworthy issues in glibc-y2038 I'd like to
point out in particular, though I'm sure these are not the only
important ones:
- The clock_nanosleep() prototype needed a '__THROW' annotation
to complete the build.
- The nptl and sunrpc portions have numerous interfaces with
'timeval' or 'timespec' arguments that each cause an ABI break.
- stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk()
are some of the other interfaces that take a time_t based
argument and need to grow a time64 version to avoid an ABI mismatch.
- The timeval prototype appears to be broken, as it's missing
padding on architectures without native alignment of __time64
(e.g. i386) and on all big-endian architectures.
- some testcases hang in futex_wait() or clock_nanosleep()
because of incorrect timeout arguments, presumably from type
mismatches.
* There is an open question regarding the name of the Debian
architecture. For my experiments, I kept using the 'armhf' name
unmodified, though there seems to be a general feeling that using a
different name would be required to address the broad incompatibilities
between time32 and time64 versions of all the libraries in the
distro. Gradually changing them won't work because of the timeline and
the number of affected libraries. However, the new name of the distro
also implies having a distinct target triplet, which must then be known
by glibc along with everything else using config.guess/config.sub. I
expect this topic to require a lot more discussion.
* Continuing with the rebootstrap build despite the known glibc issues
and the open question on the architecture name went surprisingly
well, only two out of the 152 source packages I built had
compile-time problems:
- building the final gcc failed in libsanitizer, which has
compile-time checks to ensure some libc data structures have the
expected layout. It noticed that 'struct timeb' and 'struct dirent'
are different based on _TIME_BITS and _FILE_OFFSET_BITS. I disabled
the checks to be able to continue. To this properly, the library
has to learn about the new data structures as well. I opened a
bug report against the library[7].
- libpreludecpp12 failed to build because of checks for changes
in the exported functions, which are different with time64.
I disabled the checks. Once we have agreed on a new debian
architecture name, the symbols can be made arch specific.
* After everything was built, I tried installing the packages into
a chroot with qemu-debootstrap, which failed because I had
configured the glibc to assume it's running on a new kernel
while the qemu-user binary I had lacks the new syscalls.
I believe this is fixed in upstream qemu, but did not try that.
* Trying to install again I used a clean debian-arm64 installation
running in qemu-system-aarch64, and attempted installing the
armhf packages using a regular debootstrap, running the 32-bit
binaries in compat mode of a recent arm64 kernel. This partially
worked and I could chroot into the system and use a shell, but
ultimately the debootstrap did not complete because of errors.
I saw that 'tar' had failed because of the stat() glibc ABI mismatch
breaking its private gnulib fdutimens() implementation, and this is
where I gave up.
I have spent more time on this now than I had planned, and don't expect
to do further work on it anytime soon, but I hope my summary is useful
to others that are going to need this later. I can obviously share
my patches and build artifacts if anyone needs them. There are two
additional approaches that would likely get a Debian bootstrap further,
but that I have not tried as they were previously dismissed:
* Adding a time64 armhf as a separate (incompatible) target in glibc
that defines __TIMESIZE==64 and a 64-bit __time_t would avoid
most of the remaining ABI issues and put armhf-time64 in the same
category as riscv32 and arc, but this idea was so far rejected by the
glibc maintainers. Depending on how hard this turns out to be,
it could be used to get to the point of self-hosting though, and
help find time64 related bugs in the rest of the distro.
* Doing the bootstrap using a musleabihf target instead of gnueabihf
would avoid the current issues internal to glibc-y2038, but instead
lead to new problems with packages that do not currently work with
musl. Adelie Linux has shown that it's already possible to build
a useful distro using musl and time64[8], and this would
sidestep the question of the target triplet. While it would also
help find and fix additional bugs in packages, and make an
interesting unoffical Debian target, I don't see it replacing
the existing armhf port any time soon.
For additional information about the Debian plans, see the
article on LWN[9] that summarizes the discussion started by
Steve McIntyre [10].
Arnd
[1] https://wiki.debian.org/HelmutGrohne/rebootstrap
[2] https://github.com/lmajewski/y2038_glibc/tree/y2038_edge
[3] https://salsa.debian.org/glibc-team/glibc/-/tree/glibc-2.31
[4] https://github.com/lmajewski/y2038_glibc/commit/2f72ea2b6f6ee
[5] https://sourceware.org/pipermail/libc-alpha/2020-February/111375.html
[6] https://pastebin.com/fJYV2stF
[7] https://bugs.llvm.org/show_bug.cgi?id=45138
[8] https://wiki.adelielinux.org/wiki/Project:Time64
[9] https://lwn.net/Articles/812767/
[10] https://lwn.net/ml/debian-devel/20200204131410.GF3043@tack.einval.com/
On Mon, Mar 16, 2020 at 11:12 PM Lukasz Majewski <lukma(a)denx.de> wrote:
> > On Fri, Mar 13, 2020 at 9:22 PM Rich Felker <dalias(a)libc.org> wrote:
> > > > - Removing the time32 symbols from the glibc shared object did
> > > > not work as they are still used (a lot) internally, and by the
> > > > testsuite.
> > >
> > > That they're used internally sounds like a major problem; anywhere
> > > they're being used internally potentially has hidden Y2038 bugs.
> > > This is also why I'm concerned about glibc's approach of not
> > > building itself with _TIME_BITS=64, and just undefining it or doing
> > > something else in the wrapper files for the legacy time32 symbols.
> >
> > I thought this was the long-term plan. Working on the ABI first and
> > then changing the implementation may help speed up the timeline
> > before distro-level work can start, but OTOH removing all the 32-bit
> > codepaths from the implementation first makes it more likely to find
> > all relevant bits.
>
> If I understood the question correctly - the problem is with having
> glibc ABI consistent. This requires having 64 bit types for relevant
> functions. For example the __clock_settime64 accepts struct
> __timespec64 parameter which:
>
> - Is aliased to "normal" struct timespec on machines with
> __WORDSIZE==64 (x32 is a special case)
>
> - The struct __timespec64 is used on 32 bit machines
>
> As a result the glibc is ready to handle 64 bit time always (with
> clock_settime on __WORDSIZE==64 or clock_settime64 otherwise), as
> exported struct timespec fields size vary depending on the machine for
> which glibc is built.
I think we all understand the need to duplicate each interface that
passes a data type derived from time_t, and how the aliasing works,
The point above is purely for the internal implementation. The approach
that I have picked for the kernel and Rich did for musl was that
internal code never sees the old __time_t definition for any data
structure or function call, those are only used to define the wrappers
for 32-bit architectures that provide the legacy interfaces.
Arnd
[some mailing lists appear to have classified the earlier mail as spam,
it was quite long and contained a lot of links. See
https://lists.debian.org/debian-arm/2020/03/msg00032.html for the
start of the thread if you did not get that]
On Wed, Mar 11, 2020 at 3:37 PM Lukasz Majewski <lukma(a)denx.de> wrote:
> > - stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk()
> > are some of the other interfaces that take a time_t based
> > argument and need to grow a time64 version to avoid an ABI
> > mismatch.
>
> The stat() and friends will use statx internally, which supports 64 bit
> time from the outset.
> Unfortunately, it hasn't been yet converted.
>
> As statx was added in 4.1 (IIRC) - after the minimal supported Linux
> kernel version is bumped to this version (from 3.2 as now) it all will
> be fixed.
The problem I had with these was not on the kernel API side (I
still have CONFIG_COMPAT_32BIT_TIME enabled for now) but
on the application side. In particular, the 'struct stat' definition
(when __USE_XOPEN2K8 is defined) contains
struct timespec st_atim;
and similar fields that are interpreted using 64-bit time_t in the
application including the header, but with 32-bit time_t inside of
the ___fxstat64() implementation in glibc. The problem is caused
by the mismatched ABI, not by the time_t overflow, in the same
way that happens in the nptl library callers and in the other interfaces
I mentioned.
This is also what seems to cause most of the testcase failures
when the tests are built with __TIME_BITS=64.
> > - The timeval prototype appears to be broken, as it's missing
> > padding on architectures without native alignment of __time64
> > (e.g. i386) and on all big-endian architectures.
> >
>
> You mean the one "exported" to the system or one, which is internal to
> glibc (from ./include/time.h)?
I mean in the installed headers, where I get (after preprocessing)
typedef long long __time64_t;
typedef long __suseconds_t;
struct timeval
{
__time64_t tv_sec;
__suseconds_t tv_usec;
};
On i386 and m68k, this leads to a 12 byte structure when the kernel
interfaces expect a 16 byte structure. All other 32-bit architectures add
four byte padding at the end, so the size is correct, but on big-endian
systems, kernel also expects padding *before* tv_usec, in the same
way as the timespec definition does. IIRC the best way to handle this
is with a 64-bit suseconds_t, e.g. by adding a __suseconds64_t
defined the same way as __time64_t.
> > I have spent more time on this now than I had planned, and don't
> > expect to do further work on it anytime soon, but I hope my summary
> > is useful to others that are going to need this later. I can
> > obviously share my patches and build artifacts if anyone needs them.
>
> Could you upload them to any server? (kernel.org or github)?
I have uploaded the modified debian-glibc and rebootstrap
sources to https://git.linaro.org/people/arnd/ now, this should
be all that's needed to recreate the build, using these steps:
- build an x86-64 debian glibc-2.31 package (binary plus source)
based on the glibc package data
- create a pbuilder instance with that available as a source to apt
- log into the pbuilder and run the modified bootstrap.sh according
to information on the pbuilder web page.
The binary packages I created are not as useful, as they would
not work with any build of glibc, neither the version I built, nor
any fixed one. I could find a way to send that to you in private,
but it's hundreds of megabytes.
Unfortunately I lost the build logs during a crash yesterday.
> > There are two additional approaches that would likely get a Debian
> > bootstrap further, but that I have not tried as they were previously
> > dismissed:
> >
> > * Adding a time64 armhf as a separate (incompatible) target in glibc
> > that defines __TIMESIZE==64 and a 64-bit __time_t would avoid
> > most of the remaining ABI issues and put armhf-time64 in the same
> > category as riscv32 and arc, but this idea was so far rejected by
> > the glibc maintainers.
>
> As fair as I know riscv32 and arc will use generic syscall interface.
> The arm32 bit doesn't support it - so the code from those two
> aforementioned ports will not be used.
The differences between the generic syscall interface and the arm
version are much smaller than ABI differences between the ABIs
for 32-bit __time_t and the __time_t == __time64_t version.
In particular, as such a new port could mandate a minimum kernel
of v5.1, it could just use all the time64 syscalls, the split sysvipc
and statx as a baseline.
Arnd