Hello,
I've been following up on the discussion we had on Monday regarding stack alignment, and noticed that I had mis-remembered the current state of affairs. Ramana asked me on Tuesday to provide a write-up of the actual status, so here we go ...
To summarize the background of the problem: on ARM, the incoming stack pointer is only guaranteed to be aligned to an 8 byte boundary. This means that objects on the stack (local variables, spill slots, temporaries etc.) cannot easily be aligned to more than 8 bytes. This can potentially cause problems in two situations:
1) The object's default alignment (according to its type) is larger than 8 bytes 2) The object has a forced non-default alignment that is larger than 8 bytes
The first situation should in theory never appear, since according to the ARM ABI all types have a default alignment of at most 8 bytes. However, due to the current mix-up in GCC, vector types actually are considered to have a 16-byte alignment requirement in GCC.
The second situation can only appear with local variables that are declared using attribute ((aligned)).
We had discussed on Monday that we need to fix the second situation, since this can always occur and is supported on other platforms. By doing so, we would then automatically fix the first situation as well.
However, this reasoning turns out to be incorrect. There are currently in GCC *two* completely separate mechanisms that can be used to align objects on the stack to larger than the ABI guaranteed stack pointer alignment:
A) Re-alignment of the full stack frame. This is what is used by the Intel back-end (and only the Intel back-end). At function entry, generated code will align the stack pointer itself to whatever is necessary to fulfil alignment requirements of all objects on the stack. This may necessitate follow-on changes: the frame pointer, if there is one, will likewise need to be aligned at runtime. Also, since incoming stack arguments are now no longer at a fixed offset relative to the stack pointer *or* frame pointer in some cases, we might need an extra register as argument pointer. This method allows extra alignment for *any* object on the stack, but needs significant back-end support in order to be enabled on any non-Intel architecture.
B) Dynamic allocation of selected stack variables. This is implemented by common code with no involvement of the back-end. In effect, the code in cfgexpand.c:expand_stack_vars that decides on how to allocate local variables on the stack will remove all variables that require extra alignment and place them into an extra structure. Generated prologue code will then in effect dynamically allocate and align that structure on the stack, and just store a pointer to it as "variable" into the normal stack frame. All other areas of the frame are unaffected. Since this method just simulates code the programmer could have written themselves using alloca, it does not require *any* back-end support and is enabled by default everywhere. However, it only works for regular local variables, and not for any other objects on the stack.
Objects on the stack *except* local variables always use default alignment. Since on most platforms, except Intel and *currently* ARM, the ABI stack pointer alignment is sufficient to implement default alignments, method B) as above is able to fulfil all stack alignments. Intel uses method A), so they're also OK. In effect, it's only ARM due to the vector type alignment problem that runs into the situation that neither method works.
Under those circumstances, given that: - we want to fix vector type alignment in order to become ABI compliant - once we've fixed this, we're in the same situation as other platforms and method B) already fixes stack alignment problems - implementing method A) is therefore both quite involved *and* actually superfluous
I'd now rather recommend that we *don't* try to implement method A) (full stack-frame re-alignment) on ARM.
Comments?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 18/04/12 18:36, Ulrich Weigand wrote:
Hello,
I've been following up on the discussion we had on Monday regarding stack alignment, and noticed that I had mis-remembered the current state of affairs. Ramana asked me on Tuesday to provide a write-up of the actual status, so here we go ...
To summarize the background of the problem: on ARM, the incoming stack pointer is only guaranteed to be aligned to an 8 byte boundary. This means that objects on the stack (local variables, spill slots, temporaries etc.) cannot easily be aligned to more than 8 bytes. This can potentially cause problems in two situations:
- The object's default alignment (according to its type) is larger than 8
bytes 2) The object has a forced non-default alignment that is larger than 8 bytes
The first situation should in theory never appear, since according to the ARM ABI all types have a default alignment of at most 8 bytes. However, due to the current mix-up in GCC, vector types actually are considered to have a 16-byte alignment requirement in GCC.
The second situation can only appear with local variables that are declared using attribute ((aligned)).
We had discussed on Monday that we need to fix the second situation, since this can always occur and is supported on other platforms. By doing so, we would then automatically fix the first situation as well.
However, this reasoning turns out to be incorrect. There are currently in GCC *two* completely separate mechanisms that can be used to align objects on the stack to larger than the ABI guaranteed stack pointer alignment:
A) Re-alignment of the full stack frame. This is what is used by the Intel back-end (and only the Intel back-end). At function entry, generated code will align the stack pointer itself to whatever is necessary to fulfil alignment requirements of all objects on the stack. This may necessitate follow-on changes: the frame pointer, if there is one, will likewise need to be aligned at runtime. Also, since incoming stack arguments are now no longer at a fixed offset relative to the stack pointer *or* frame pointer in some cases, we might need an extra register as argument pointer. This method allows extra alignment for *any* object on the stack, but needs significant back-end support in order to be enabled on any non-Intel architecture.
B) Dynamic allocation of selected stack variables. This is implemented by common code with no involvement of the back-end. In effect, the code in cfgexpand.c:expand_stack_vars that decides on how to allocate local variables on the stack will remove all variables that require extra alignment and place them into an extra structure. Generated prologue code will then in effect dynamically allocate and align that structure on the stack, and just store a pointer to it as "variable" into the normal stack frame. All other areas of the frame are unaffected. Since this method just simulates code the programmer could have written themselves using alloca, it does not require *any* back-end support and is enabled by default everywhere. However, it only works for regular local variables, and not for any other objects on the stack.
I read the C11 standard briefly a few months back, and I believe that B) is all that is needed there. The standard excludes over-aligning function arguments.
Objects on the stack *except* local variables always use default alignment. Since on most platforms, except Intel and *currently* ARM, the ABI stack pointer alignment is sufficient to implement default alignments, method B) as above is able to fulfil all stack alignments. Intel uses method A), so they're also OK. In effect, it's only ARM due to the vector type alignment problem that runs into the situation that neither method works.
Under those circumstances, given that:
- we want to fix vector type alignment in order to become ABI compliant
- once we've fixed this, we're in the same situation as other platforms and
method B) already fixes stack alignment problems
- implementing method A) is therefore both quite involved *and* actually
superfluous
I'd now rather recommend that we *don't* try to implement method A) (full stack-frame re-alignment) on ARM.
Comments?
Yes, sounds like the right solution to me.
Technically, GCC's vector mechanism allows the creation of any size of vector, which will be aligned to the size of the vector. We only run into problems when that size exceeds the maximum alignment. Such values passed by value to functions should also be over-aligned. I think if we were to continue supporting such non-standard types we would have to change the rules to pass them by reference and have caller copying. We'd still need to deal with the 16-byte vectors somehow though.
So overall, I think the only practical solution is to limit vectors to 8-byte alignment.
R.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
linaro-toolchain@lists.linaro.org