Hi Wookey,
I've finally completed a first of draft the write-up of toolchain implications of multiarch paths that we discussed in Prague. Sorry it took a while, but it got a lot longer than I expected :-/
I'd appreciate any feedback and comments!
Multiarch paths and toolchain implications
== Overview and goals ==
Binary files in packages are usually platform-specific, that is they work only on the architecture they were built for. Therefore, the packaging system provides platform-specific versions for them. Currently, these versions will install platform-specific files to the same file system locations, which implies that only one of them can be installed into a system at the same time.
The goal of the "multiarch" effort is to lift this limitation, and allow multiple platform-specific versions of the same package to be installed into the same file system at the same time. In addition, each package should install to the same file system locations no matter on which host architecture the installation is performed (that is, no rewriting of path names during installation).
This approach could solve a number of existing situations that are not handled well by today's packaging mechanisms: - Systems able to run binaries of more than one ISA natively. - Support for multiple incompatible ABI variants on the same ISA. - Support for processor-optimized ABI-compatible library variants. - NFS file system images exported to hosts of different architectures. - Target file systems for ISA emulators etc. - Development packages for cross-compilation.
In order to support this, platform-specific versions of a multiarch package must have the property that for each file, it is either 100% identical across platforms, or else it must be installed to separate locations in the file system.
The latter is the case at least for executable files, shared libraries, static libraries and object files, and to some extent maybe header files. This means that in a multiarch world, such files must move to different locations in the file system than they are now. This causes a variety of issues to be solved; in particular, most of the existing locations are defined by the LHS and/or are assumed to have well-known values by various system tools.
In this document, I want to focus on the impact of file system hierarchy changes to two tasks in particular: - loading/running an executable - building an executable from source In the following two sections, I'll provide details on file system paths are currently handled in these two areas. In the final section, I'll discuss suggestions how to extend the current behavior to support multiarch paths.
== Loading/running an executable ==
Running a new executable starts with the execve () system call. The Linux kernel supports execution of a variety of executable types; most commonly used are - native ELF executable - ELF exectuable for a secondary native ISA (32-bit on 64-bit) - #! scripts - user-defined execution handlers (via binfmt_misc)
The binary itself is passed via full (or relative) pathname to the execve call; the kernel does not make file system hierarchy assuptions. By convention, callers of execve ususally search well-known path locations (via the PATH environment variable) when locating executables. How to adapt these conventions for multiarch is beyond the scope of this document.
With #! scripts and binfmt_misc handlers, the kernel will involve a user-space helper to start execution. The location of these handlers themselves and secondary files they in turn may require is provided by user space (e.g. in the #! line, or in the parameters installed into the binfmt_misc file system). Again, adapting these path names is beyond the scope of this document.
For native ELF executables, there are two additional classes of files involved in the initial load process: the ELF interpreter (dynamic loader), and shared libraries required by the executable.
The ELF interpreter name is provided in the PT_INTERP program header of the ELF executable to be loaded; the kernel makes no file name assumptions here. This program header is generated by the linker when performing final link of a dynamically linked executable; it uses the file name passed via the -dynamic-linker argument. (Note that while the linker will fall back to some hard-coded path if that argument is missing, on many Linux platforms this default is in fact incorrect and does not correspond to a ELF interpreter actually installed in the file system in current distributions. Passing a correct -dynamic-linker argument is therefore mandatory.)
In normal builds, the -dynamic-linker switch is passed to the linker by the GCC compiler driver. This in turn gets the proper argument to be used on the target platform from the specs file; the (correct) default value is hard-coded into the GCC platform back-end sources. On bi-arch platforms, GCC will automatically choose the correct variant depending on compile options like -m32 or -m64. Again, the logic to do so is hard-coded into the back-end. Unfortunately, various bi-arch platforms use different schemes today: amd64: /lib/ld-linux.so.2 vs. /lib64/ld-linux-x86-64.so.2 ia64: /lib/ld-linux.so.2 vs. /lib/ld-linux-ia64.so.2 mips: /lib/ld.so.1 vs. /lib64/ld.so.1 ppc: /lib/ld.so.1 vs. /lib64/ld64.so.1 s390: /lib/ld.so.1 vs. /lib/ld64.so.1 sparc: /lib/ld-linux.so.2 vs. /lib64/ld-linux.so.2
Once the dynamic interpreter is loaded, it will go on and load dynamic libraries required by the executable. For this discussion, we will consider only the case where the interpreter is ld.so as provided by glibc.
As opposed to the kernel, glibc does in fact *search* for libraries, and makes a variety of path name assumptions while doing so. It will consider paths encoded via -rpath, the LD_LIBRARY_PATH environment variable, and knows of certain hard-coded system library directories. It also provides a mechanism to automatically choose the best out of a number of libraries available on the system, depending on which capabilities the hardware / OS provides.
Specifically, glibc determines a list of search directory prefixes, and a list of capability suffixes. The directory prefixes are:
- any directory named in the (deprecated) DT_RPATH dynamic tag of the requesting object, or, recursively, any parent object (note that DT_RPATH is ignored if DT_RUNPATH is also present) - any directory listed in the LD_LIBRARY_PATH environment variable - any directory named in the DT_RUNPATH dynamic tag of the requesting object (only) - the system directories, which are on Linux hard-coded to: * /lib$(biarch_suffix) * /usr/lib$(biarch_suffix) where $(biarch_suffix) may be "64" on 64-bit bi-arch platforms.
The capability suffixes are determined from the following list of capabilities: - For each hardware capability that is present on the hardware as indicated by a bit set in the AT_HWCAP auxillary vector entry, and is considered "important" according to glibc's hard-coded list of important hwcaps (platform-dependent), a well-known string provided by glibc's platform back-end. - For each "extra" hardware capability present on the hardware as indicated by a GNU NOTE section in the vDSO provided by the kernel, a string provided in that same note. - A string identifying the platform as a whole, as provided by the kernel via the AT_PLATFORM auxillary vector entry. - For each software capability supported by glibc and the kernel, a well-known string. The only such capability supported today is "tls", indicating support for thread-local storage.
The full list of capability suffixes is created from the list of supported capabilities by forming every sub-sequence. For example, if the platform is "i686", supports the important hwcap "sse2" and TLS, the list of suffixes is: sse2/i686/tls sse2/i686 sse2 i686/tls i686 tls <empty>
The total list of directories to be searched is then formed by concatenating every directory prefix with every capability suffix. Various caching mechanisms are employed to reduce the run-time overhead of this large search space.
Note: This method of searching capability suffixes is employed only by glibc at run time; it is unknown to the toolchain at compile time. This implies that an executable will have been linked against the "base" version of a library, and the "capability-specific" version of the library is only substituted at run time. Therefore, all capability-specific versions must be ABI-compatible to the base version, in particular they must provide the same soname and symbol versions, and they must use compatible function calling conventions.
== Building an executable from source ==
For this discussion, we only consider GCC and the GNU toolchain, installed into the usual locations as system toolchain, and in the absence of any special-purpose options (-B) or environment variables (GCC_EXEC_PREFIX, COMPILER_PATH, LIBRARY_PATH ...).
However, we do consider the three typical modes of operation: - native compilation - cross-compilation using a "traditional" toolchain install - cross-compilation using a sysroot
During the build process, the toolchain performs a number of searches for files. In particular, it looks for (executable) components of the toolchain itself; for include files; for startup object files; and for static and dynamic libraries.
In doing so, the GNU toolchain considers locations derived from any of the following "roots":
- Private GCC installation directories * /usr/lib/gcc * /usr/libexec/gcc These hold both GCC components and target-specific headers and libraries provided by GCC itself.
- GNU toolchain installation directories * /usr/$(target) These directories hold files used across multiple components of the GNU toolchain, including the compiler and binutils. In addition, they may also hold target libraries and headers; in particular for libraries traditionally considered part of the toolchain, like newlib for embedded systems. In fact, in the "traditional" installation of a GNU cross-toolchain, *all* default target libraries and headers are found here. However, the toolchain directories are always consulted for native compilation as well (if present)!
- The install "prefix" This is what is specified via the --prefix configure option. The toolchain will look for headers and libraries under that root, allowing for building and installing of multiple software packages that depend on each other into a shared prefix. (This is really only relevant when the toolchain is *not* installed as system toolchain, e.g. in a setup where you provide a GNU toolchain + separately built GNU packages on a non-GNU system in some distinct non-system directory.)
- System directories * /lib$(biarch_suffix) * /usr/lib$(biarch_suffix) * /usr/local/lib$(biarch_suffix) * /usr/include * /usr/local/include These are default locations defined by the OS and hard-coded into the toolchain sources. The examples above are those used for Linux. On bi-arch platforms, $(biarch_suffix) may be the suffix "64". These directories are used only for native compilation; however in a "sysroot" cross-compiler, they are used for cross-compilation as well, prefixed by the sysroot directory.
In addition to the base directory paths refered to above, the GNU toolchain supports the so-called "multilib" mechanism. This is intended to provide support for multiple incompatible ABIs on a single platform. This is implemented by the GCC back-end having hard-coded information about which compiler option causes an incompatible ABI change, and a hard-coded "multilib" directory suffix name corresponding to that option. For example, on PowerPC the -msoft-float option is associated with the multilib suffix "nof", which means libraries using the soft-float ABI (passing floating point values in integer registers) can be provided in directories like: /usr/lib/gcc/powerpc-linux/4.4.4/nof /usr/lib/nof
The multilib suffix is appended to all directories searched for libraries by GCC and passed via -L options to the linker. The linker itself does not have any particular knowledge of multilibs, and will continue to consult its default search directories if a library is not found in the -L paths. If multiple orthogonal ABI-changing options are used in a single compilation, multiple multilib suffixes can be used in series.
As a special consideration, some compiler options may correspond to multiple incompatible ABIs that are already supported by the OS, but using directory names differently from what GCC would use internally. As the typical example, on bi-arch systems the OS will normally provide the default 64-bit libraries in /usr/lib64, while also providing 32-bit libraries in /usr/lib. For GCC on the other side, 64-bit is the default (meaning no multilib suffix), while the -m32 option is associated with the multilib suffix "32".
To solve this problem, the GCC back-end may provide a secondary OS multilib suffix which is used in place of the primary multilib suffix for all library directories derived from *system* paths as opposed to GCC paths. For example, in the typical bi-arch setup, the -m32 option is associated with the OS multilib suffix "../lib". Given the that primary system library directory is /usr/lib64 on such systems, this has the effect of causing the toolchain to search /usr/lib64/gcc/powerpc64-linux/4.4.4 /usr/lib64 for default compilations, and /usr/lib64/gcc/powerpc64-linux/4.4.4/32 /usr/lib64/../lib (i.e. /usr/lib) for -m32 compilations.
The following rules specify in detail which directories are searched at which phase of compilation. The following parameters are used:
$(target) GNU target triple (as specified at configure time)
$(version) GCC version number
$(prefix) Determined at configure time, usually /usr or /usr/local
$(libdir) Determined at configure time, usually $(prefix)/lib
$(libexecdir) Determined at configure time, usually $(prefix)/libexec
$(tooldir) GNU toolchain directory, usually $(prefix)/$(target)
$(gcc_gxx_include_dir) Location of C++ header files. Determined at configure time: * If --with-gxx-include-dir is given, the specified directory * Otherwise, if --enable-version-specific-runtime-libs is given: $(libdir)/gcc/$(target)/$(version)/include/c++ * Otherwise for all cross-compilers (including with sysroot!): $(tooldir)/include/c++/$(version) * Otherwise for native compilers: $(prefix)/include/c++/$(version)
$(multi) Multilib suffix appended for GCC include and library directories
$(multi_os) OS multilib suffix appended for system library directories
$(sysroot) Sysroot directory (empty for native compilers)
Directories searched by the compiler driver for executables (cc1, as, ...):
1. GCC directories: $(libexecdir)/gcc/$(target)/$(version) $(libexecdir)/gcc/$(target) $(libdir)/gcc/$(target)/$(version) $(libdir)/gcc/$(target)
2. Toolchain directories: $(tooldir)/bin/$(target)/$(version) $(tooldir)/bin
Directories searched by the compiler for include files:
1. G++ directories (when compiling C++ code only): $(gcc_gxx_include_dir) $(gcc_gxx_include_dir)/$(target)[/$(multi)] $(gcc_gxx_include_dir)/backward
2. Prefix directories (if distinct from system directories): [native only] $(prefix)/include
3. GCC directories: $(libdir)/gcc/$(target)/$(version)/include $(libdir)/gcc/$(target)/$(version)/include-fixed
4. Toolchain directories: [cross only] $(tooldir)/sys-include $(tooldir)/include
5. System directories: [native/sysroot] $(sysroot)/usr/local/include [native/sysroot] $(sysroot)/usr/include
Directories searched by the compiler driver for startup files (crt*.o):
1. GCC directories: $(libdir)/gcc/$(target)/$(version)[/$(multi)]
2. Toolchain directories: $(tooldir)/lib/$(target)/$(version)[/$(multi_os)] $(tooldir)/lib[/$(multi_os)]
3. Prefix directories: [native only] $(libdir)/$(target)/$(version)[/$(multi_os)] [native only] $(libdir)[/$(multi_os)]
4. System directories: [native/sysroot] $(sysroot)/lib/$(target)/$(version)[/$(multi_os)] [native/sysroot] $(sysroot)/lib[/$(multi_os)] [native/sysroot] $(sysroot)/usr/lib/$(target)/$(version)[/$(multi_os)] [native/sysroot] $(sysroot)/usr/lib[/$(multi_os)]
Directories searched by the linker for libraries:
In addition to these directories built-in to the linker, if the linker is invoked via the compiler driver, it will also search the same list of directories specified above for startup files, because those are implicitly passed in via -L options by the driver.
Also, when searching for dependencies of shared libraries, the linker will attempt to mimic the search order used by the dynamic linker, including DT_RPATH/DT_RUNPATH and LD_LIBRARY_PATH lookups.
1. Prefix directories (if distinct from system directories): [native only] $(libdir)$(biarch_suffix)
2. Toolchain directories: [native/cross] $(tooldir)/lib$(biarch_suffix)
3. System directories: [native/sysroot] $(sysroot)/usr/local/lib$(biarch_suffix) [native/sysroot] $(sysroot)/usr/lib$(biarch_suffix) [native/sysroot] $(sysroot)/lib$(biarch_suffix)
4. Non-biarch directories (if distinct from the above) [native only] $(libdir) [native/cross] $(tooldir)/lib [native/sysroot] $(sysroot)/usr/local/lib [native/sysroot] $(sysroot)/usr/lib [native/sysroot] $(sysroot)/lib
== Multiarch impact on the toolchain ==
The current multiarch proposal is to move the system library directories to a new path including the GNU target triplet, that is, instead of using /lib /usr/lib the system library directories are now called /lib/$(multiarch) /usr/lib/$(multiarch)
At this point, there is no provision for multiarch executable or header file installation.
What are the effects of this renaming on the toolchain, following the discussion above?
* ELF interpreter
The ELF interpreter would now reside in a new location, e.g. /lib/$(multiarch)/ld-linux.so.2 This allows interpreters for different architectures to be installed simultaneously, and removes the need for the various bi-arch hacks.
Change would imply modification of the GCC back-end, and possibly the binutils ld default as well (even though that's currently not normally used), to build new executables using the new ELF interpreter install path.
Caveats:
Any executable built with the new ELF interpreter will absolutely not run on a system that does not provide the multiarch install location of the interpreter. (This is probably OK.)
Executables built with the old ELF interpreter will not run on a system that *only* provides the multiarch install location. This is clearly *not* OK. To provide backwards compatibility, even a multiarch-capable system will need to install ELF interpreters at the old locations as well, possibly via symlinks. (Note that any given system can only be compatible in this way with *one* architecture, except for lucky circumstances.)
As the multiarch string $(multiarch) is now embedded into each and every executable file, it becomes invariant part of the platform ABI, and needs to be strictly standardized. GNU target triplets as used today in general seem to provide too much flexibility and underspecified components to serve in such a role, at least without some additional requirements.
* Shared library search paths
According to the primary multiarch assumption, the system library search paths are modified to include the multiarch target string: * /lib/$(multiarch) * /usr/lib/$(multiarch)
This requires modifications to glibc's ld.so loader (can possibly be provided via platform back-end changes).
Backwards compatibility most likely requires that both the new multiarch location and the old location are searched.
Open questions:
+ How are -rpath encoded paths to be handled? Option A: They are used by ld.so as-is. This implies that in a fully multiarch system, every *user* of -rpath needs to update their paths to include the multiarch target. Also, multiarch targets are once again directly embedded into exectuables. Option B: ld.so automatically appends the multiarch target string to the path as encoded in DT_RPATH/DT_RUNPATH. This may break backwards compatibility. Option C: ld.so searches both the -rpath as is, and also with multiarch target string appended.
+ How is LD_LIBRARY_PATH to be handled? The options here are basically analogous to the -rpath case.
+ What is the interaction between the multiarch string and capability suffixes (hwcaps etc.) supported by ld.so? The most straightforward option seems to be to just leave the capability mechanism unchanged, that is, ld.so would continue to append the capabitility suffixes to all directory search paths (which potentially already include a multiarch suffix). This implies that different ABI-compatible but capability-optimized versions of a library share the same multiarch prefix, but use different capability suffixes.
* GCC and toolchain directory paths
The core multiarch spec only refers to system directories. What about directories provided by GCC and the toolchain? Note that for a cross-development setup, we may have various combinations of host and target architectures. In this case $(host-multiarch) refers to the multiarch identifier for the host architecture, while $(target-multiarch) refers to the one for the target architecture. In addition $(target) refers to the GNU target triplet as currently used by GCC paths (which may or may not be equal to $(target-multiarch), depending on how the latter will end up being standardized).
+ GCC private directories
The GCC default installation already distinguishes files that are independent of the host architecture (in /usr/lib/gcc) from those that are dependent on the host architecture (in /usr/libexec/gcc). In both cases, the target architecture is already explicitly encoded in the path namess. Thus it would appear that we'd simply have to move the libexec paths to include the multiarch string in a straightforward manner in order: /usr/lib/gcc/$(target)/$(version)/... /usr/libexec/$(host-multiarch)/gcc/$(target)/$(version)/...
This assumes that two cross-compilers to the same target running on different hosts can share /usr/lib/gcc, which may not be fully possible (because two cross-compilers may build slightly different versions of target libraries due to optimization differences). In this case, the whole of /usr/lib/gcc could be moved to multiarch as well: /usr/lib/$(host-multiarch)/gcc/$(target)/$(version)/... /usr/libexec/$(host-multiarch)/gcc/$(target)/$(version)/...
The alternative would be to package them separately, with the host independent files always coming from the same package (presumbly itself produced on a native system, or else one "master" host system).
Note that if -in the first stage of multiarch- we do not support parallel installation of *binaries*, we may not need to do anything for the GCC directories.
+ Toolchain directories
The /usr/$(target) directory used by various toolchain components is somewhat of a mixture of target vs. host dependent files: /usr/$(target)/bin executable files in host architecture, targeting target architecture (e.g. cross-assembler, cross-linker binaries) /usr/$(target)/sys-include target headers (only for cross-builds) /usr/$(target)/include native toolchain header files for target (like bfd.h) /usr/$(target)/lib target libraries + native toolchain libraries for target (like libbfd.a) /usr/$(target)/lib/ldscripts linker scripts used by the cross-linker
In a full multiarch setup, the only directory that would require the multiarch suffix is probably bin: /usr/$(target)/bin/$(host-multiarch)
As discussed above, if we want to support different versions of target libraries for the same target, as compiled with differently hosted cross-compilers, we might also have to multiarch the lib directory: /usr/$(target)/lib/$(host-multiarch)
Yet another option might be to require multiarch systems to always use the sysroot cross-compile option, and not support the toolchain directory for target libraries in the first place. [Or at least, have no distribution package ever install any target library into the toolchain lib directory ...]
+ System directories
For these, the primary multiarch rules would apply. In a sysroot configuration, the sysroot prefix is applied as usual (after the multiarch paths have been determined as for a native system). Note that we need to apply the *target* multiarch string here; in case this is different from the target triplet, the toolchain would have to have explicit knowledge of those names: /lib/$(target-multiarch) /usr/lib/$(target-multiarch)
+ Prefix directories
For completeness' sake, we need to define how prefix directories are handled in a GCC build that is both multiarch enabled *and* not installed into the system prefix. The most straightforward solution would be to apply multiarch suffixes to the prefix directories as well: $(prefix)/lib/$(target-multiarch) $(prefix)/include/$(target-multiarch) [ if include is multiarch'ed ... ]
* Multiarch and cross-compilation
Using the paths as discussed in the previous section, we have some options how to perform cross-compilation on a multiarch system.
The most obvious option is to build a cross-compiler with sysroot equal to "/". This means that the compiler will use target libraries and header files as installed by unmodified distribution multiarch packages for the target architecture. This should ideally be the default cross-compilation setup on a multi-arch system.
In addition, it is still possible to build cross-compilers with a different sysroot, which may be useful if you want to install target libraries you build yourself into a non-system directory (and do not want to require root access for doing so).
Questions:
+ Should we still support "traditional" cross-compilation using the toolchain directory to hold target libraries/header
This is probably no longer really useful. On the other hand, it probably doesn't really hurt either ...
+ What about header files in a multiarch system?
The current multiarch spec does not provide for multiple locations for header files. This works only if headers are identical across all targets. This is usually true, and where it isn't, can be enforced by use of conditional compilation. In fact, the latter is how things currently work on traditional bi-arch distributions.
In the alternative, the toolchain could also provide for multiarch'ed header directories along the lines of /usr/include/local/$(target-multiarch) /usr/include/$(target-multiarch) which are included in addition to (and before) the usual directories.
It will most likely not be necessary to do this for the GCC and toolchain directory include paths, as they are already target-specific.
* Multiarch and multilib
The multilib mechanism provides a way to support multiple incompatible ABI versions on the same ISA. In a multiarch world, this is supposed to be handled by different multiarch prefixes, to enable use of the package management system to handle libraries for all those variants. How can we reconcile the two systems?
It would appear that the way forward is based on the existing "OS multilib suffix" mechanism. GCC already expects to need to handle naming conventions provided by the OS for where incompatible versions are to found.
In a multiarch system, the straightforward solution would appear to be to use the multiarch names as-is as OS multilib suffixes. In fact, this could even handle the *default* multiarch name without requiring any further changes to GCC.
For example, in a bi-arch amd64 setup, the GCC back-end might register "x86_64-linux" as default OS multilib suffix, and "i386-linux" as OS multilib suffix triggered by the (ABI changing) -m32 option. Without any further change, GCC would now search /usr/lib/x86_64-linux or /usr/lib/i386-linux as appropriate, depending on the command line options used. (This assumes that the main libdir is /usr/lib, i.e. no bi-arch configure options were used.)
Note that GCC would still use its own internal multilib suffixes for its private libraries, but that seems to be OK.
Caveat: This would imply those multilib names, and their association with compiler options, becomes hard-coded in the GCC back-end. However, this seems to have already been necessary (see above).
== Summary ==
From the preceding discussion, it would appear a multiarch setup allowing
parallel installation of run-time libraries and development packages, and thus providing support for running binaries of different native ISAs and ABI variants as well as cross-compilation, might be feasible by implementing the following set of changes. Note that parallel installation of main *executable* files by multiarch packages is not (yet) supported.
* Define well-known multiarch suffix names for each supported ISA/ABI combination. Well-known in particular means it is allowed for them to be hard-coded in toolchain source files, as well as in executable files and libraries built by such a toolchain.
* Change GCC back-ends to use the well-known multiarch suffix as OS multilib suffix, depending on target and ABI-changing options. Also include multiarch suffix in ELF interpreter name passed to ld. (Probably need to change ld default directory search paths as well.)
* Change the dynamic loader ld.so to optionally append the multiarch suffix (as a constant string pre-determined at ld.so build time) after each directory search path, before further appending capability suffixes. (See above as to open questions about -rpath and LD_LIBRARY_PATH.)
* Change package build/install rules to install libraries and ld.so into multiarch library directories (not covered in this document). Change system installation to provide for backward-compatibility fallbacks (e.g. symbolic links to the ELF interpreter).
* If capability-optimized ISA/ABI-compatible library variants are desired, they can be build just as today, only under the (same) multiarch suffix. They could be packaged either within a single pacakge, or else using multiple packages (of the same multiarch type).
* Enforce platform-independent header files across all packages. (In the alternative, provide for multiarch include paths in GCC.)
* Build cross-compiler packages with --with-sysroot=/
I'd appreciate any feedback or comments! Have I missed something?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
Awesome analysis!
On Sat, Jul 31, 2010, Ulrich Weigand wrote:
$(version) GCC version number
So I think you analyzed the upstream toolchain behavior, and I think Debian/Ubuntu toolchains cheat in some areas; for some directories which would use $(version) we use $(major).$(minor) instead, and we have a $(version) -> $(major).$(minor) symlink. This doesn't really relate to the multiarch topic, but it reminds me that we ought to fix the distro divergences so that it's easier to swap an upstream toolchain with a Debian/Ubuntu one and vice-versa.
I don't remember where this particular setup comes from; it might to avoid overly strict or painful dependencies.
Executables built with the old ELF interpreter will not run on a system that *only* provides the multiarch install location. This is clearly *not* OK. To provide backwards compatibility, even a multiarch-capable system will need to install ELF interpreters at the old locations as well, possibly via symlinks. (Note that any given system can only be compatible in this way with *one* architecture, except for lucky circumstances.)
I see two ways around this; we could patch the kernel to add a dynamic prefix before the runtime-linker path depending on the executable contents (typically depending on the arch), or more elegantly we could have a generic loader which checks the architecture of the target ELF file before calling the arch-specific loader. This loader would be linked to from all the old locations.
The reason I'm thinking of patching the kernel is because binfmt_misc is already out there and allows special behavior when encountering binary files from other architectures (or any binary pattern really).
Option C: ld.so searches both the -rpath as is, and also with multiarch target string appended.
This is a risk for cross-builds; the native version might be picked up. While this doesn't seem much of a risk for cross-compilation to an entirely different architecture (e.g. x86 to ARM), consider cross-builds from x86-64 to x86, or from EABI + hard-float to EABI + soft-float.
BTW, the CodeSourcery patchset contains a "directory poisoning" feature which seems quite useful to detect these cases early.
Thanks again for your writeup!
Loïc Minier loic.minier@linaro.org wrote:
Awesome analysis!
Thanks!
So I think you analyzed the upstream toolchain behavior
Yes, that's true.
and I think Debian/Ubuntu toolchains cheat in some areas; for some directories which would use $(version) we use $(major).$(minor) instead, and we have a $(version) -> $(major).$(minor) symlink. This doesn't really relate to the multiarch topic, but it reminds me that we ought to fix the distro divergences so that it's easier to swap an upstream toolchain with a Debian/Ubuntu one and vice-versa.
Agreed. Not sure what this particular divergence helps ...
Executables built with the old ELF interpreter will not run on a system that *only* provides the multiarch install location. This is clearly *not* OK. To provide backwards compatibility, even a multiarch-capable system will need to install ELF interpreters at the old locations as well, possibly via symlinks. (Note that any given system can only be compatible in this way with *one* architecture, except for lucky circumstances.)
I see two ways around this; we could patch the kernel to add a dynamic prefix before the runtime-linker path depending on the executable contents (typically depending on the arch),
This seems awkward. The ELF interpreter location is encoded as full path, which is not interpreted in any way by the kernel. We'd either have to encode particular filesystem layout knowledge into the kernel here, or else add a prefix at the very beginning (or end?), which doesn't correspond to the scheme suggested for multiarch.
If we go down that route, it might be easier to use tricks like bind- mounting the correct ld.so for this architecture at the default location during early startup or something ...
However, I'd have thought the whole point of the multiarch scheme was to *avoid* having to play filename remapping tricks, but instead make all filenames explicit.
or more elegantly we could have a generic loader which checks the architecture of the target ELF file before calling the arch-specific loader. This loader would be linked to from all the old locations.
Well, but then what architecture would that generic loader be in? In the end, it has to be *something* the kernel understands to load natively.
The reason I'm thinking of patching the kernel is because binfmt_misc is already out there and allows special behavior when encountering binary files from other architectures (or any binary pattern really).
But binfmt_misc only works because in the end it falls back to the built- in native ELF loader. (You can install arbitrary handlers, but the handlers themselves must in the end be something the kernel already knows how to load.)
Option C: ld.so searches both the -rpath as is, and also with multiarch target string appended.
This is a risk for cross-builds; the native version might be picked up. While this doesn't seem much of a risk for cross-compilation to an entirely different architecture (e.g. x86 to ARM), consider cross-builds from x86-64 to x86, or from EABI + hard-float to EABI + soft-float.
That's one of the fundamental design questions: do we want to make sure only multiarch libraries for the correct arch can ever be found, or do we rather want to make sure on a default install, libraries that have not yet been converted to multiarch can also be found (even taking the chance that they might turn out be of the wrong architecture / variant) ...
BTW, the CodeSourcery patchset contains a "directory poisoning" feature which seems quite useful to detect these cases early.
Yes, that's during compile time. I understand the reason for this is more to catch bad include paths manually specified in packages. Not sure if during load time the same concerns apply.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On Mon, Aug 02, 2010, Ulrich Weigand wrote:
Agreed. Not sure what this particular divergence helps ...
So Matthias mentionned that gnat and gcc are not always using the same minor version, and so it helps bootstrap them to have the common bits installable without overly strict dependencies. I wonder whether we could do that properly upstream. We should chat with Matthias on the next occasion and write down a plan.
I see two ways around this; we could patch the kernel to add a dynamic prefix before the runtime-linker path depending on the executable contents (typically depending on the arch),
This seems awkward.
Agreed
or more elegantly we could have a generic loader which checks the architecture of the target ELF file before calling the arch-specific loader. This loader would be linked to from all the old locations.
Well, but then what architecture would that generic loader be in? In the end, it has to be *something* the kernel understands to load natively.
Currently with binfmt_misc when the kernel loads a binary it will check whether it's the native architecture and if it is load the ELF dynamic linker referenced in the binary; if it matches one of the regexps from binfmt_misc, such as the binary pattern for ARM ELF binaries, it will call the binfmt interpreter instead, e.g. qemu-arm, and in this case qemu-arm will load the ELF runtime linker of the target binary to run the binary inside the CPU emulation.
So I think this should just work; the kernel will call the native ELF loader of the current arch for binaries for the current arch, and will load QEMU which will load and emulate the ELF loader for the emulated arch in the other cases.
Perhaps I should work with Steve at prototyping this to make sure this works.
The reason I'm thinking of patching the kernel is because binfmt_misc is already out there and allows special behavior when encountering binary files from other architectures (or any binary pattern really).
But binfmt_misc only works because in the end it falls back to the built- in native ELF loader. (You can install arbitrary handlers, but the handlers themselves must in the end be something the kernel already knows how to load.)
Is your point that we should disable the qemu loader for the native architecture? I certainly agree we need to!
Yes, that's during compile time. I understand the reason for this is more to catch bad include paths manually specified in packages. Not sure if during load time the same concerns apply.
Ok; I kind of agree that runtime is a different story
Loïc Minier loic.minier@linaro.org wrote on 08/02/2010 05:30:05 PM:
or more elegantly we could have a generic loader which checks the architecture of the target
ELF
file before calling the arch-specific loader. This loader would be linked to from all the old locations.
Well, but then what architecture would that generic loader be in? In
the
end, it has to be *something* the kernel understands to load natively.
Currently with binfmt_misc when the kernel loads a binary it will check whether it's the native architecture and if it is load the ELF dynamic linker referenced in the binary; if it matches one of the regexps from binfmt_misc, such as the binary pattern for ARM ELF binaries, it will call the binfmt interpreter instead, e.g. qemu-arm, and in this case qemu-arm will load the ELF runtime linker of the target binary to run the binary inside the CPU emulation.
Well, my point is that *qemu-arm* is itself an ELF binary, and the kernel must already know how to handle that. We can have user-space handlers to load secondary architectures that way -- but we cannot have a user-space handler required to load the *primary* architecture; how would that handler itself get loaded?
So I think this should just work; the kernel will call the native ELF loader of the current arch for binaries for the current arch, and will load QEMU which will load and emulate the ELF loader for the emulated arch in the other cases.
Maybe I misunderstood something else about your point then, so let's try and take a step back. Today, the location of the ELF loader is embedded into the executable itself, using a full pathname like /lib/ld.so.1. In a multiarch world, this pathname would violate packaging rules, because there are multiple different per-architecture versions of this file.
Thus I assumed the straightforward multiarch solution would be to move this file to multiarch locations like /lib/$(multiarch)/ld.so.1, which would require this new location to be embedded into all binaries.
I understood you to propose an alternative solution that would keep the old ELF interpreter name (/lib/ld.so.1) embedded in executables, and keep them working by installing some "common" loader at this location.
This caused me to wonder what that "common" loader was supposed to be, given that the kernel (for *any* architecture) would be required to be able to load that loader itself natively ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On Mon, Aug 02, 2010, Ulrich Weigand wrote:
Maybe I misunderstood something else about your point then, so let's try and take a step back. Today, the location of the ELF loader is embedded into the executable itself, using a full pathname like /lib/ld.so.1. In a multiarch world, this pathname would violate packaging rules, because there are multiple different per-architecture versions of this file.
Thus I assumed the straightforward multiarch solution would be to move this file to multiarch locations like /lib/$(multiarch)/ld.so.1, which would require this new location to be embedded into all binaries.
Yes, I agree with this plan
I understood you to propose an alternative solution that would keep the old ELF interpreter name (/lib/ld.so.1) embedded in executables, and keep them working by installing some "common" loader at this location.
Ah no, I intended us to move to /lib/$(multiarch)/ld.so.1, but for compatibility with executables from other distros and pre-multiarch world, we need to provide /lib/ld* loaders. And since the current /lib/ld* names clash across architectures, I was proposing to replace /lib/ld* with a clever wrapper that calls the proper /lib/$(multiarch)/ld.so.1 depending on the architecture of the ELF file to load.
Loïc Minier loic.minier@linaro.org wrote:
I understood you to propose an alternative solution that would keep the old ELF interpreter name (/lib/ld.so.1) embedded in executables, and keep them working by installing some "common" loader at this location.
Ah no, I intended us to move to /lib/$(multiarch)/ld.so.1, but for compatibility with executables from other distros and pre-multiarch world, we need to provide /lib/ld* loaders.
OK, I see.
And since the current /lib/ld* names clash across architectures, I was proposing to replace /lib/ld* with a clever wrapper that calls the proper /lib/$(multiarch)/ld.so.1 depending on the architecture of the ELF file to load.
So now we get back to my original question: what file type would that "clever wrapper" be? The kernel can only load an ELF interpreter that is itself an ELF file of the native architecture, so that wrapper would have to be that. However, this means that we've once again violated multiarch rules ...
If we have to install different native versions of the clever wrapper, we might just as well install the original native ELF interpreters -- that's neither better nor worse from a multiarch rules perspective.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On Mon, Aug 02, 2010, Ulrich Weigand wrote:
So now we get back to my original question: what file type would that "clever wrapper" be? The kernel can only load an ELF interpreter that is itself an ELF file of the native architecture, so that wrapper would have to be that. However, this means that we've once again violated multiarch rules ...
Oh absolutely, it would be native to the current architecture of the kernel, and would be installed in a multiarch directory too.
If we have to install different native versions of the clever wrapper, we might just as well install the original native ELF interpreters -- that's neither better nor worse from a multiarch rules perspective.
Hmm right; doesn't give us anything more
Loïc Minier loic.minier@linaro.org wrote:
If we have to install different native versions of the clever wrapper, we might just as well install the original native ELF interpreters -- that's neither better nor worse from a multiarch rules perspective.
Hmm right; doesn't give us anything more
OK, then we're all in agreement again :-)
Now this point is where the suggestion to use something like a bind mount on startup comes in. That way, there would be no violation of the multiarch rules, because /lib/ld.so.1 would not be part of any package, and in fact not even part of any file system on disk, but simply be present in the in-memory mount table.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 02/08/10 20:16, Ulrich Weigand wrote:
Now this point is where the suggestion to use something like a bind mount on startup comes in. That way, there would be no violation of the multiarch rules, because /lib/ld.so.1 would not be part of any package, and in fact not even part of any file system on disk, but simply be present in the in-memory mount table.
In many ways, this would be an elegant solution to the problem.
The problem that I foresee is that a) old programs cannot be used as "foreign" binaries on a multiarch system, and b) there's nothing to stop new programs being "accidentally" linked against the deprecated locations, and likewise not working as "foreign" binaries.
I suggest teaching the kernel to rewrite that path when it finds a non-existent interpreter. Presumably the kernel can "know" what multiarch corresponds to the traditional ABI for any given ELF flags.
Just my suggestion.
Andrew
Andrew Stubbs ams@codesourcery.com wrote:
I suggest teaching the kernel to rewrite that path when it finds a non-existent interpreter. Presumably the kernel can "know" what multiarch corresponds to the traditional ABI for any given ELF flags.
The problem with this is that is this brings namespace policy into the kernel, which kernel folks have been very opposed to in the past.
And I guess that's for good reason. You probably don't want to have to update your kernel in order to support a new ABI / different dynamic linker / or just some different filesystem layout that some userspace use case comes up with for whatever reason ...
If there's kernel support needed, the only way that would be acceptable is if all naming policy involved is actually configured from user space. That's another point that would be satisfied by a bind mount solution ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 02.08.2010 14:00, Ulrich Weigand wrote:
Loïc Minierloic.minier@linaro.org wrote:
and I think Debian/Ubuntu toolchains cheat in some areas; for some directories which would use $(version) we use $(major).$(minor) instead, and we have a $(version) -> $(major).$(minor) symlink. This doesn't really relate to the multiarch topic, but it reminds me that we ought to fix the distro divergences so that it's easier to swap an upstream toolchain with a Debian/Ubuntu one and vice-versa.
Agreed. Not sure what this particular divergence helps ...
this is no "cheating". It makes the packages robust. Remember that some frontends are built from different source packages and that a gnat-4.4 (4.4.4) still needs to be buildable with a gnat-4.4 (4.4.3) and an already updated gcc-4.4 (4.4.4). The directory cannot just be changed because the name/version is still exposed with the -V option. There was some discussion to drop this one altogether, then something like a version_alias corresponding to the target_alias could be introduced.
Of course linaro could build all frontends from one source, but then the two following issues have to be addressed:
- gcj/libjava has to be built in arm mode even if gcc defaults to thumb mode.
- build gnat from the linaro sources (this may be a problem with the bootstrap compiler, didn't investigate yet).
Matthias
Matthias Klose doko@ubuntu.com wrote on 08/02/2010 06:25:58 PM:
this is no "cheating". It makes the packages robust. Remember that some frontends are built from different source packages and that a gnat-4.4 (4.4.4) still needs to be buildable with a gnat-4.4 (4.4.3) and an already updated gcc-4.4 (4.4.4).
So the problem that is you want to support a setup where a "gcc" driver installed from a 4.4.4 build can still call and run a "gnat1" binary installed from a 4.4.3 build? That will most likely work.
But it still seems a bit fragile to me; in general, there's no guarantee that if you intermix 4.4.4 and 4.4.3 components in that way, everything actually works (that's why they use different directories in the first place).
If you want to have separate packages, a cleaner way would appear to be to make them fully self-contained, e.g. have them each provide their own driver that can be called separately.
Of course linaro could build all frontends from one source, but then the
two
following issues have to be addressed:
gcj/libjava has to be built in arm mode even if gcc defaults to thumb mode.
build gnat from the linaro sources (this may be a problem with the bootstrap compiler, didn't investigate yet).
These sound like problems that ought to be addessed in any case ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 02.08.2010 21:12, Ulrich Weigand wrote:
Matthias Klosedoko@ubuntu.com wrote on 08/02/2010 06:25:58 PM:
this is no "cheating". It makes the packages robust. Remember that some frontends are built from different source packages and that a gnat-4.4 (4.4.4) still needs to be buildable with a gnat-4.4 (4.4.3) and an already updated gcc-4.4 (4.4.4).
So the problem that is you want to support a setup where a "gcc" driver installed from a 4.4.4 build can still call and run a "gnat1" binary installed from a 4.4.3 build? That will most likely work.
No, gnat (4.4.3) has still to work, if gcc (4.4.4) is already installed.
But it still seems a bit fragile to me; in general, there's no guarantee that if you intermix 4.4.4 and 4.4.3 components in that way, everything actually works (that's why they use different directories in the first place).
Then I would need to change this internal path with every source change. I don't see this as fragile as long as it is ensured that we ship with the different frontends built from the same patchsets/sources. Note that further restrictions are made by package dependencies.
If you want to have separate packages, a cleaner way would appear to be to make them fully self-contained, e.g. have them each provide their own driver that can be called separately.
I don't understand that. I don't have a problem with the driver, but with the compiler (gnat1). Having the packages self-contained creates another problem in that you get file conflicts for files like collect2, various .o files etc.
Matthias
Matthias Klose doko@ubuntu.com wrote on 08/02/2010 09:38:49 PM:
On 02.08.2010 21:12, Ulrich Weigand wrote:
Matthias Klosedoko@ubuntu.com wrote on 08/02/2010 06:25:58 PM: So the problem that is you want to support a setup where a "gcc" driver installed from a 4.4.4 build can still call and run a "gnat1" binary installed from a 4.4.3 build? That will most likely work.
No, gnat (4.4.3) has still to work, if gcc (4.4.4) is already installed.
OK, where I said "gcc", the same applies also for "gnat", the Ada compiler driver. The reason for why a 4.4.3 gnat would fail if 4.4.4 gcc is installed is that it wouldn't find things like collect2, libgcc, crt*.o etc. Right?
But it still seems a bit fragile to me; in general, there's no
guarantee
that if you intermix 4.4.4 and 4.4.3 components in that way, everything actually works (that's why they use different directories in the first place).
Then I would need to change this internal path with every source change.
I
don't see this as fragile as long as it is ensured that we ship with the different frontends built from the same patchsets/sources. Note that
further
restrictions are made by package dependencies.
The issues I'm thinking of are things like: suppose the 4.4.4 middle-end adds code that generates calls to some new libgcc library function, which itself was added with the 4.4.4 libgcc. If you now mix-and-match components so that a compiler built from the 4.4.4 sources and using the new middle-end is used together with a libgcc built from the 4.4.3 sources, things will break.
It seems that this does not actually occur in the usage model you describe, since you apparently always update the core (libgcc etc.) *first*. I'm not sure if this is actually guaranteed by the package dependencies though. If it is, then I don't see any real problems with that approach ...
If you want to have separate packages, a cleaner way would appear to be
to
make them fully self-contained, e.g. have them each provide their own driver that can be called separately.
I don't understand that. I don't have a problem with the driver, butwith
the
compiler (gnat1). Having the packages self-contained creates another problem in that you get file conflicts for files like collect2, various .o files
etc.
What I was thinking of is along the lines of having gcc-base-4.4.3 and gcc-base-4.4.4 packages that hold the base files (crt*o, libgcc, collect2 ...), such that you can install *multiple* of the base packages at the same time. This way you could have a gcc-4.4.4 (depending on gcc-base-4.4.4) and a gnat-4.4.3 (depending on gcc-base-4.4.3) all installed at the same time.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 04.08.2010 16:55, Ulrich Weigand wrote:
Matthias Klosedoko@ubuntu.com wrote on 08/02/2010 09:38:49 PM:
On 02.08.2010 21:12, Ulrich Weigand wrote:
Matthias Klosedoko@ubuntu.com wrote on 08/02/2010 06:25:58 PM: So the problem that is you want to support a setup where a "gcc" driver installed from a 4.4.4 build can still call and run a "gnat1" binary installed from a 4.4.3 build? That will most likely work.
No, gnat (4.4.3) has still to work, if gcc (4.4.4) is already installed.
OK, where I said "gcc", the same applies also for "gnat", the Ada compiler driver. The reason for why a 4.4.3 gnat would fail if 4.4.4 gcc is installed is that it wouldn't find things like collect2, libgcc, crt*.o etc. Right?
yes
But it still seems a bit fragile to me; in general, there's no
guarantee
that if you intermix 4.4.4 and 4.4.3 components in that way, everything actually works (that's why they use different directories in the first place).
Then I would need to change this internal path with every source change.
I
don't see this as fragile as long as it is ensured that we ship with the different frontends built from the same patchsets/sources. Note that
further
restrictions are made by package dependencies.
The issues I'm thinking of are things like: suppose the 4.4.4 middle-end adds code that generates calls to some new libgcc library function, which itself was added with the 4.4.4 libgcc. If you now mix-and-match components so that a compiler built from the 4.4.4 sources and using the new middle-end is used together with a libgcc built from the 4.4.3 sources, things will break.
libgcc is always built from the sources which get uploaded first.
It seems that this does not actually occur in the usage model you describe, since you apparently always update the core (libgcc etc.) *first*. I'm not sure if this is actually guaranteed by the package dependencies though. If it is, then I don't see any real problems with that approach ...
If you want to have separate packages, a cleaner way would appear to be
to
make them fully self-contained, e.g. have them each provide their own driver that can be called separately.
I don't understand that. I don't have a problem with the driver, butwith
the
compiler (gnat1). Having the packages self-contained creates another problem in that you get file conflicts for files like collect2, various .o files
etc.
What I was thinking of is along the lines of having gcc-base-4.4.3 and gcc-base-4.4.4 packages that hold the base files (crt*o, libgcc, collect2 ...), such that you can install *multiple* of the base packages at the same time. This way you could have a gcc-4.4.4 (depending on gcc-base-4.4.4) and a gnat-4.4.3 (depending on gcc-base-4.4.3) all installed at the same time.
sure, you could have separate packages for subminor versions, and introduce a new dependency package for the minor version (gcc-4.4-defaults), but I don't see how this would help within the context of the distribution.
Matthias
Matthias Klose doko@ubuntu.com wrote:
On 04.08.2010 16:55, Ulrich Weigand wrote:
The issues I'm thinking of are things like: suppose the 4.4.4
middle-end
adds code that generates calls to some new libgcc library function,
which
itself was added with the 4.4.4 libgcc. If you now mix-and-match
components
so that a compiler built from the 4.4.4 sources and using the new
middle-end
is used together with a libgcc built from the 4.4.3 sources, things
will
break.
libgcc is always built from the sources which get uploaded first.
Ah, OK. It seems this should work fine then.
sure, you could have separate packages for subminor versions, and
introduce a
new dependency package for the minor version (gcc-4.4-defaults), but I don't see how this would help within the context of the distribution.
Going back, the question I was trying to answer is how to set up packages such that they can use the original upstream directory naming scheme, but still allow the package build sequences that you need. My suggestion was simply about a possible way to achieve that; nothing more.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 31/07/10 19:01, Ulrich Weigand wrote:
I've finally completed a first of draft the write-up of toolchain implications of multiarch paths that we discussed in Prague. Sorry it took a while, but it got a lot longer than I expected :-/
I'd appreciate any feedback and comments!
Thanks Ulrich, that's an excellent document. :)
You didn't mention anything about the HWCAP stuff, though? I think we need to capture the discussion we had about "multiarch" == "ABI", and "multiarch" != "hardware features".
Andrew
Andrew Stubbs ams@codesourcery.com wrote on 08/02/2010 01:35:01 PM:
On 31/07/10 19:01, Ulrich Weigand wrote:
I've finally completed a first of draft the write-up of toolchain implications of multiarch paths that we discussed in Prague. Sorry it
took
a while, but it got a lot longer than I expected :-/
I'd appreciate any feedback and comments!
Thanks Ulrich, that's an excellent document. :)
You didn't mention anything about the HWCAP stuff, though? I think we need to capture the discussion we had about "multiarch" == "ABI", and "multiarch" != "hardware features".
The second half of the section "Loading/running an executable" is about thw HWCAP stuff (look for "capability suffix"). In the summary I have this point:
* If capability-optimized ISA/ABI-compatible library variants are desired, they can be build just as today, only under the (same) multiarch suffix. They could be packaged either within a single pacakge, or else using multiple packages (of the same multiarch type).
If you feel this could be made clearer, I'd appreciate any suggestions :-)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 02/08/10 12:46, Ulrich Weigand wrote:
The second half of the section "Loading/running an executable" is about thw HWCAP stuff (look for "capability suffix"). In the summary I have this point:
- If capability-optimized ISA/ABI-compatible library variants are desired, they can be build just as today, only under the (same) multiarch suffix. They could be packaged either within a single pacakge, or else using multiple packages (of the same multiarch type).
If you feel this could be made clearer, I'd appreciate any suggestions :-)
OK, I'm clearly blind and incapable of performing a text search competently (I swear I did one)!
It is buried a little deep, but it is there. I guess I'd like to see a flow of how a binary loads libraries:
1. User launches binary.
2. Kernel selects a suitable execution environment (native/qemu).
3. Kernel reads .interp and loads the multiarch dynamic linker: /lib/${mulitarch}/ld.so.
4. Dynamic linker uses HWCAP to find the most appropriate libc.so.
Anyway, that's just my personal taste. The information is there, if I read it any time other than Monday morning, so I think the document is good.
We should post it on the Linaro wiki, probably.
Andrew
Andrew Stubbs ams@codesourcery.com wrote:
It is buried a little deep, but it is there. I guess I'd like to see a flow of how a binary loads libraries:
User launches binary.
Kernel selects a suitable execution environment (native/qemu).
Kernel reads .interp and loads the multiarch dynamic linker:
/lib/${mulitarch}/ld.so.
- Dynamic linker uses HWCAP to find the most appropriate libc.so.
I thought that's basically the flow of the "Loading/running an executable" sections ... I've added sub-section headers to maybe make it a bit clearer.
We should post it on the Linaro wiki, probably.
It's now on: https://wiki.linaro.org/WorkingGroups/ToolChain/MultiarchPaths
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
+++ Ulrich Weigand [2010-08-02 16:13 +0200]:
We should post it on the Linaro wiki, probably.
It's now on: https://wiki.linaro.org/WorkingGroups/ToolChain/MultiarchPaths
As this is actually a much wider issue than just Linaro, (and because it is better presented to other interested parties from the more neutral ground of Debian), I've moved it to http://wiki.debian.org/Multiarch/Spec
(and deleted the original so we don't get two diverging versions).
Hope that's not considered rude, Ulrich. (great bit of work capturing all that good stuff - thank you).
Wookey
On Mon, Aug 02, 2010, Wookey wrote:
https://wiki.linaro.org/WorkingGroups/ToolChain/MultiarchPaths
http://wiki.debian.org/Multiarch/Spec (and deleted the original so we don't get two diverging versions).
I've added a redirect so that people reading the list archives can follow the link
Wookey wookey@wookware.org wrote on 08/02/2010 10:40:23 PM:
Hope that's not considered rude, Ulrich. (great bit of work capturing all that good stuff - thank you).
No problem, that's fine with me. Thanks!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
linaro-toolchain@lists.linaro.org