[ Also posted to debian-arm; not cross-posted to avoid subscription complaints... ]
Hi folks,
We're currently carrying patches in glibc in Debian (and Ubuntu) that I wrote which are used to work out whether an ELF binary is hard-float or soft-float. We're using these to allow us to do the right thing on a multi-arch system, which is to pick a consistent set of binaries (programs and libraries) at runtime; if you try to mix binaries using different ABIs, you're prone to all kinds of weird and wonderful results but generally badness occurs.
Upstream glibc have generally not been welcoming of these patches, and I understand this; the approach taken (reading ARM-specific build attributes) is far from clean and doesn't fit well in the design of ld.so in particular. So, I've been looking into alternative methods for achieving the goal of identifying ABI. After a couple of false starts and discussion with some of the helpful toolchain and ABI folks in ARM, I think we have a solution that will work well in the long term. I just wish we'd thought about this *way* back when we first started the armhf port, as it would have been much easier to work on and standardise this back then. Modulo availability of time machines, there's not much we can do on that front... :-)
What I'm proposing is to use two new values in the OSABI field in the ELF header:
#define ELFOSABI_LINUX_ARM_AEABI_SF 65 #define ELFOSABI_LINUX_ARM_AEABI_HF 66
and use these values in the future for soft- and hard-float binaries so that can unambiguously identify them.
There's already precedent for binaries using different values in this field, with support in glibc for parsing and understanding them. Adding more possible values is quite easy, assuming that the maintainers are amenable. I'm about to post a similar message there.
I have a plan of attack for how to make a staged switch over, deliberately to minimise any potential compatibility problems. See the attached doc for that. It's deliberately not very specific in terms of timeline, as that's something I'm hoping to get feedback about. Comments very welcome; please point out if you think there are problems with this approach, or if there are any more implementations of toolchain / linker that will need to be addressed.
Cheers,
On 2 August 2012 17:43, Steve McIntyre steve.mcintyre@linaro.org wrote:
[ Also posted to debian-arm; not cross-posted to avoid subscription complaints... ]
Hi folks,
We're currently carrying patches in glibc in Debian (and Ubuntu) that I wrote which are used to work out whether an ELF binary is hard-float or soft-float. We're using these to allow us to do the right thing on a multi-arch system, which is to pick a consistent set of binaries (programs and libraries) at runtime; if you try to mix binaries using different ABIs, you're prone to all kinds of weird and wonderful results but generally badness occurs.
Upstream glibc have generally not been welcoming of these patches, and I understand this; the approach taken (reading ARM-specific build attributes) is far from clean and doesn't fit well in the design of ld.so in particular.
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
So, I've been looking into alternative methods for achieving the goal of identifying ABI. After a couple of false starts and discussion with some of the helpful toolchain and ABI folks in ARM, I think we have a solution that will work well in the long term. I just wish we'd thought about this *way* back when we first started the armhf port, as it would have been much easier to work on and standardise this back then. Modulo availability of time machines, there's not much we can do on that front... :-)
What I'm proposing is to use two new values in the OSABI field in the ELF header:
#define ELFOSABI_LINUX_ARM_AEABI_SF 65 #define ELFOSABI_LINUX_ARM_AEABI_HF 66
and use these values in the future for soft- and hard-float binaries so that can unambiguously identify them.
What happens if this value doesn't match the Tag_ABI_VFP_args value in .ARM.attributes? If the same information is present in multiple places, sooner or later someone will manage to create a file with a mismatch.
This approach is also not scalable. Suppose one day it becomes desirable to add ld.so awareness of some other feature. Then you'd need two more values to cover all the combinations. Add another feature, and you need 8 values. You get the picture.
There's already precedent for binaries using different values in this field, with support in glibc for parsing and understanding them. Adding more possible values is quite easy, assuming that the maintainers are amenable. I'm about to post a similar message there.
I really think the only sane thing to do is fix glibc so it can fetch the attributes from their standard locations.
On Thu, Aug 02, 2012 at 06:39:33PM +0100, Mans Rullgard wrote:
On 2 August 2012 17:43, Steve McIntyre steve.mcintyre@linaro.org wrote:
[ Also posted to debian-arm; not cross-posted to avoid subscription complaints... ]
Hi folks,
We're currently carrying patches in glibc in Debian (and Ubuntu) that I wrote which are used to work out whether an ELF binary is hard-float or soft-float. We're using these to allow us to do the right thing on a multi-arch system, which is to pick a consistent set of binaries (programs and libraries) at runtime; if you try to mix binaries using different ABIs, you're prone to all kinds of weird and wonderful results but generally badness occurs.
Upstream glibc have generally not been welcoming of these patches, and I understand this; the approach taken (reading ARM-specific build attributes) is far from clean and doesn't fit well in the design of ld.so in particular.
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
Not according to the glibc folks, no. They're prepared to look at the attributes at other times, but not in ld.so. And it's their code.
So, I've been looking into alternative methods for achieving the goal of identifying ABI. After a couple of false starts and discussion with some of the helpful toolchain and ABI folks in ARM, I think we have a solution that will work well in the long term. I just wish we'd thought about this *way* back when we first started the armhf port, as it would have been much easier to work on and standardise this back then. Modulo availability of time machines, there's not much we can do on that front... :-)
What I'm proposing is to use two new values in the OSABI field in the ELF header:
#define ELFOSABI_LINUX_ARM_AEABI_SF 65 #define ELFOSABI_LINUX_ARM_AEABI_HF 66
and use these values in the future for soft- and hard-float binaries so that can unambiguously identify them.
What happens if this value doesn't match the Tag_ABI_VFP_args value in .ARM.attributes? If the same information is present in multiple places, sooner or later someone will manage to create a file with a mismatch.
They'll have to work quite hard to do that, and they get to keep the pieces when they do; I'm expecting to get binutils to do the right thing in terms of setting the OSABI field when it creates ELF objects, using the same information that goes into Tag_ABI_VFP_args.
This approach is also not scalable. Suppose one day it becomes desirable to add ld.so awareness of some other feature. Then you'd need two more values to cover all the combinations. Add another feature, and you need 8 values. You get the picture.
I understand that, but I don't share the same concern. There's space for a lot of values here (65->254). In a lot of years, we've barely touched this area. Intuitively for me, the OSABI field is the *correct* place to encode information about the ABI that's in use. It's the whole point of the field...
There's already precedent for binaries using different values in this field, with support in glibc for parsing and understanding them. Adding more possible values is quite easy, assuming that the maintainers are amenable. I'm about to post a similar message there.
I really think the only sane thing to do is fix glibc so it can fetch the attributes from their standard locations.
I've already proposed (and written code for) that, and they refused to accept that method.
Cheers,
On 2 August 2012 19:00, Steve McIntyre steve.mcintyre@linaro.org wrote:
On Thu, Aug 02, 2012 at 06:39:33PM +0100, Mans Rullgard wrote:
On 2 August 2012 17:43, Steve McIntyre steve.mcintyre@linaro.org wrote:
[ Also posted to debian-arm; not cross-posted to avoid subscription complaints... ]
Hi folks,
We're currently carrying patches in glibc in Debian (and Ubuntu) that I wrote which are used to work out whether an ELF binary is hard-float or soft-float. We're using these to allow us to do the right thing on a multi-arch system, which is to pick a consistent set of binaries (programs and libraries) at runtime; if you try to mix binaries using different ABIs, you're prone to all kinds of weird and wonderful results but generally badness occurs.
Upstream glibc have generally not been welcoming of these patches, and I understand this; the approach taken (reading ARM-specific build attributes) is far from clean and doesn't fit well in the design of ld.so in particular.
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
Not according to the glibc folks, no.
But that's not for them to decide.
They're prepared to look at the attributes at other times, but not in ld.so. And it's their code.
Yes, it is their code. That makes it their duty to see to it that it conforms to published specs (or accept patches that do so).
So, I've been looking into alternative methods for achieving the goal of identifying ABI. After a couple of false starts and discussion with some of the helpful toolchain and ABI folks in ARM, I think we have a solution that will work well in the long term. I just wish we'd thought about this *way* back when we first started the armhf port, as it would have been much easier to work on and standardise this back then. Modulo availability of time machines, there's not much we can do on that front... :-)
What I'm proposing is to use two new values in the OSABI field in the ELF header:
#define ELFOSABI_LINUX_ARM_AEABI_SF 65 #define ELFOSABI_LINUX_ARM_AEABI_HF 66
and use these values in the future for soft- and hard-float binaries so that can unambiguously identify them.
What happens if this value doesn't match the Tag_ABI_VFP_args value in .ARM.attributes? If the same information is present in multiple places, sooner or later someone will manage to create a file with a mismatch.
They'll have to work quite hard to do that, and they get to keep the pieces when they do; I'm expecting to get binutils to do the right thing in terms of setting the OSABI field when it creates ELF objects, using the same information that goes into Tag_ABI_VFP_args.
I have seen stranger bugs in binutils.
This approach is also not scalable. Suppose one day it becomes desirable to add ld.so awareness of some other feature. Then you'd need two more values to cover all the combinations. Add another feature, and you need 8 values. You get the picture.
I understand that, but I don't share the same concern. There's space for a lot of values here (65->254). In a lot of years, we've barely touched this area. Intuitively for me, the OSABI field is the *correct* place to encode information about the ABI that's in use. It's the whole point of the field...
The problem is that it is *one* field. There are at least a dozen attributes that need to be encoded somewhere, and clearly 8 bits is not sufficient.
Since I was not involved in the creation of the ELF spec, I can't comment on the intent of that field. That said, the name OSABI gets me thinking of user/kernel interfaces, not so much the kind of things we're dealing with here.
There's already precedent for binaries using different values in this field, with support in glibc for parsing and understanding them. Adding more possible values is quite easy, assuming that the maintainers are amenable. I'm about to post a similar message there.
I really think the only sane thing to do is fix glibc so it can fetch the attributes from their standard locations.
I've already proposed (and written code for) that, and they refused to accept that method.
Then they are just being silly. Perhaps they need a visit from the Board of Education (aka clue-bat).
Hammering a square reality peg into a round glibc hole is not the right way to do things.
Steve McIntyre wrote:
Upstream glibc have generally not been welcoming of these patches, and I understand this; the approach taken (reading ARM-specific build attributes) is far from clean and doesn't fit well in the design of ld.so in particular.
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
Not according to the glibc folks, no. They're prepared to look at the attributes at other times, but not in ld.so. And it's their code.
[snip]
I really think the only sane thing to do is fix glibc so it can fetch the attributes from their standard locations.
I've already proposed (and written code for) that, and they refused to accept that method.
Unfortunately I haven't really followed this discussion, so I don't know the reasons why they didn't like your code.
My suspicion would be that it relates to the fundamental distiction in ELF between link-time and load-time fields: the ELF format holds both contents that are supposed to be manipulated on-file by tools like a linker, and contents that are supposed to be accessed in-memory by tools like a dynamic loader.
glibc / ld.so --quite reasonably in my opinion-- would use only those parts of ELF that are readily available at run time. This includes everything available off a program header (and thus part of the memory-mapped part of the file), but *not* things available only in a section.
Unfortunately, the .ARM.attribute section is of the second class. It is avilable to link-time tools that operate on an ELF file, but not readily available to run-time tools like ld.so that operate on the memory-mapped image. Having ld.so bypass this very fundamental separation between link-time and run-time by itself opening the ELF file and reading pieces that are not memory-mapped really seems not a good idea.
However, there are ways to solve this type of problem, which usually involve changing the linker to make a piece of information that used to be available only at link-time also available at run-time. Typically this means moving the section into the memory-mapped part of the file, and also covering it by a new program header so that it can be found by run-time tools.
This is done e.g. for the .note.ABI-tag section (which serves a similar purpose as .ARM.attributes). While this is primarily a *section* used at link time, it is also made available as part of the NOTE program header in the memory-mapped part at run-time (and this is where ld.so actually uses it).
My suggestion to address this issue would therefore be to have the linker create a new program header ARM_ATTRIBUTES to cover the .ARM.attributes section (and move it into the memory-mapped area also covered by a LOAD program header). ld.so would then be able to use its contents just as it today uses .note.ABI-tag.
This solution obviously still requires all programs to be recompiled before they present the new program header. However, it avoids the two drawbacks of your method Mans pointed out: - there is no duplication of data (there is a bit of extra meta data in the form of the new program header, but the actual data covered by it is not duplicated) - there are no future extensibility issues due to the use of a single byte
Thoughts?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 2 August 2012 20:26, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Steve McIntyre wrote:
Upstream glibc have generally not been welcoming of these patches, and I understand this; the approach taken (reading ARM-specific build attributes) is far from clean and doesn't fit well in the design of ld.so in particular.
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
Not according to the glibc folks, no. They're prepared to look at the attributes at other times, but not in ld.so. And it's their code.
[snip]
I really think the only sane thing to do is fix glibc so it can fetch the attributes from their standard locations.
I've already proposed (and written code for) that, and they refused to accept that method.
Unfortunately I haven't really followed this discussion, so I don't know the reasons why they didn't like your code.
My suspicion would be that it relates to the fundamental distiction in ELF between link-time and load-time fields: the ELF format holds both contents that are supposed to be manipulated on-file by tools like a linker, and contents that are supposed to be accessed in-memory by tools like a dynamic loader.
glibc / ld.so --quite reasonably in my opinion-- would use only those parts of ELF that are readily available at run time. This includes everything available off a program header (and thus part of the memory-mapped part of the file), but *not* things available only in a section.
Unfortunately, the .ARM.attribute section is of the second class. It is avilable to link-time tools that operate on an ELF file, but not readily available to run-time tools like ld.so that operate on the memory-mapped image. Having ld.so bypass this very fundamental separation between link-time and run-time by itself opening the ELF file and reading pieces that are not memory-mapped really seems not a good idea.
Thanks for that explanation.
However, there are ways to solve this type of problem, which usually involve changing the linker to make a piece of information that used to be available only at link-time also available at run-time. Typically this means moving the section into the memory-mapped part of the file, and also covering it by a new program header so that it can be found by run-time tools.
This is done e.g. for the .note.ABI-tag section (which serves a similar purpose as .ARM.attributes). While this is primarily a *section* used at link time, it is also made available as part of the NOTE program header in the memory-mapped part at run-time (and this is where ld.so actually uses it).
My suggestion to address this issue would therefore be to have the linker create a new program header ARM_ATTRIBUTES to cover the .ARM.attributes section (and move it into the memory-mapped area also covered by a LOAD program header). ld.so would then be able to use its contents just as it today uses .note.ABI-tag.
This solution obviously still requires all programs to be recompiled before they present the new program header.
Steve's proposal also requires recompiling everything, so there's no difference there.
However, it avoids the two drawbacks of your method Mans pointed out:
- there is no duplication of data (there is a bit of extra meta data in the form of the new program header, but the actual data covered by it is not duplicated)
- there are no future extensibility issues due to the use of a single byte
Thoughts?
Sounds like a good plan to me.
On Thu, Aug 02, 2012 at 09:20:35PM +0100, Mans Rullgard wrote:
On 2 August 2012 20:26, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
This solution obviously still requires all programs to be recompiled before they present the new program header.
Steve's proposal also requires recompiling everything, so there's no difference there.
Well, over time yes. I spelled out a plan for an extended migration to avoid a rebuild-the-world moment...
Cheers,
On Thu, Aug 02, 2012 at 09:26:56PM +0200, Ulrich Weigand wrote:
Steve McIntyre wrote:
I really think the only sane thing to do is fix glibc so it can fetch the attributes from their standard locations.
I've already proposed (and written code for) that, and they refused to accept that method.
Unfortunately I haven't really followed this discussion, so I don't know the reasons why they didn't like your code.
My suspicion would be that it relates to the fundamental distiction in ELF between link-time and load-time fields: the ELF format holds both contents that are supposed to be manipulated on-file by tools like a linker, and contents that are supposed to be accessed in-memory by tools like a dynamic loader.
Exactly, yes.
glibc / ld.so --quite reasonably in my opinion-- would use only those parts of ELF that are readily available at run time. This includes everything available off a program header (and thus part of the memory-mapped part of the file), but *not* things available only in a section.
Unfortunately, the .ARM.attribute section is of the second class. It is avilable to link-time tools that operate on an ELF file, but not readily available to run-time tools like ld.so that operate on the memory-mapped image. Having ld.so bypass this very fundamental separation between link-time and run-time by itself opening the ELF file and reading pieces that are not memory-mapped really seems not a good idea.
A number of people have said this, but I've yet to any explanation beyond "I have a bad feeling", tbh. Where I plumbed in in ld.so in glibc, nearby functions are already calling __lseek and __libc_read. Don't get me wrong, the code I've added is not especially clean! But it works, and I'm not seeing any ill-effects from this approach.
However, there are ways to solve this type of problem, which usually involve changing the linker to make a piece of information that used to be available only at link-time also available at run-time. Typically this means moving the section into the memory-mapped part of the file, and also covering it by a new program header so that it can be found by run-time tools.
This is done e.g. for the .note.ABI-tag section (which serves a similar purpose as .ARM.attributes). While this is primarily a *section* used at link time, it is also made available as part of the NOTE program header in the memory-mapped part at run-time (and this is where ld.so actually uses it).
My suggestion to address this issue would therefore be to have the linker create a new program header ARM_ATTRIBUTES to cover the .ARM.attributes section (and move it into the memory-mapped area also covered by a LOAD program header). ld.so would then be able to use its contents just as it today uses .note.ABI-tag.
This solution obviously still requires all programs to be recompiled before they present the new program header. However, it avoids the two drawbacks of your method Mans pointed out:
- there is no duplication of data (there is a bit of extra meta
data in the form of the new program header, but the actual data covered by it is not duplicated)
- there are no future extensibility issues due to the use of a
single byte
Thoughts?
Hmmm, maybe. Others have said that parsing the attributes is slow and not designed for runtime use like this... I'd been investigating defining and using the existing PT_ARM_ARCHEXT segment instead, but the ARM ABI folks seem happier to use existing ELF header fields. Hence the proposal to use OSABI. BUT: the glibc folks have already appropriated that field for their own nefarious^Wends, e.g. using OSABI_GNU to denote the use of ifunc. Now looking at using some bits in the e_flags field instead...
I don't really care all *that* much about how this is done, but some consistency across groups would be really helpful sometimes. :-/
Cheers,
On 02/08/12 18:39, Mans Rullgard wrote:
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
1) The attributes only visible in the section view (as used by linkable object files). You can't rely on that being present in an executable image.
2) The encoding has been arranged for density, not performance; it's unsuitable for low-cost look-ups when searching a chain of libraries.
Attributes were never intended for use at run time; IMO it would be a mistake to try and coerce them into such a role.
R.
On 3 August 2012 12:00, Richard Earnshaw rearnsha@arm.com wrote:
On 02/08/12 18:39, Mans Rullgard wrote:
Nevertheless, the tags in the .ARM.attributes section are the standard, published way to identify FP ABI as well as a number of other properties that might be relevant to a linker.
- The attributes only visible in the section view (as used by linkable
object files). You can't rely on that being present in an executable image.
If not present, assuming it is compatible should be good enough.
- The encoding has been arranged for density, not performance; it's
unsuitable for low-cost look-ups when searching a chain of libraries.
Attributes were never intended for use at run time; IMO it would be a mistake to try and coerce them into such a role.
Well, clearly nobody thought that information would be used _at all_ at runtime. Now people apparently want that, so something has to change.
Duplicating the attribute information in the 8-bit OSABI field is not going to work since there are more than 8 attributes.
If directly examining the attributes section is out of the question, perhaps the e_flags field would be a better option than OSABI, being clearly designated as a collection of flags and also having more bits currently unused.
Keep in mind that changing the ELF header will require an update to the ARM ELF specification, and is likely to cause compatibility problems with other toolchain vendors until they make the corresponding changes. Using existing sections does not have this problem.
On Fri, Aug 03, 2012 at 01:03:38PM +0100, Mans Rullgard wrote:
If directly examining the attributes section is out of the question, perhaps the e_flags field would be a better option than OSABI, being clearly designated as a collection of flags and also having more bits currently unused.
That's my next fallback...
Keep in mind that changing the ELF header will require an update to the ARM ELF specification, and is likely to cause compatibility problems with other toolchain vendors until they make the corresponding changes. Using existing sections does not have this problem.
We'll have to update the ARM ELF ABI docs, yes. This is expected, and I'm discussing it with the ABI folks.
Cheers,
Perhaps this was brought up before, but is there an use case for mixing ABIs in the same ELF file? For instance, one could implement a portable ARM EABI binary not calling any float function but just dlopen()ing this or that library depending on which ABI variant it detects.
(It's an unlikely scenario, but I wanted to bring it up in case the plan would prevent this and could easily be made not to.)
On Fri, Aug 03, 2012 at 01:11:23PM +0200, Loïc Minier wrote:
Perhaps this was brought up before, but is there an use case for mixing ABIs in the same ELF file? For instance, one could implement a portable ARM EABI binary not calling any float function but just dlopen()ing this or that library depending on which ABI variant it detects.
(It's an unlikely scenario, but I wanted to bring it up in case the plan would prevent this and could easily be made not to.)
We've discussed it in the past, and while you possibly *could* do it, it's such an edge case that nobody cared. Likely to be fragile, too.
Cheers,
On 3 August 2012 12:48, Steve McIntyre steve.mcintyre@linaro.org wrote:
On Fri, Aug 03, 2012 at 01:11:23PM +0200, Loïc Minier wrote:
Perhaps this was brought up before, but is there an use case for mixing ABIs in the same ELF file? For instance, one could implement a portable ARM EABI binary not calling any float function but just dlopen()ing this or that library depending on which ABI variant it detects.
(It's an unlikely scenario, but I wanted to bring it up in case the plan would prevent this and could easily be made not to.)
We've discussed it in the past, and while you possibly *could* do it, it's such an edge case that nobody cared. Likely to be fragile, too.
If it really becomes necessary, an override flag to dlopen() should be simple enough to implement.
linaro-toolchain@lists.linaro.org