Update on stack (re-)alignment issues - linaro-toolchain

18 Apr 2012


      Hello,
I've been following up on the discussion we had on Monday regarding stack
alignment, and noticed that I had mis-remembered the current state of
affairs.  Ramana asked me on Tuesday to provide a write-up of the actual
status, so here we go ...
To summarize the background of the problem:  on ARM, the incoming stack
pointer is only guaranteed to be aligned to an 8 byte boundary.  This means
that objects on the stack (local variables, spill slots, temporaries etc.)
cannot easily be aligned to more than 8 bytes.  This can potentially cause
problems in two situations:
1) The object's default alignment (according to its type) is larger than 8
bytes
2) The object has a forced non-default alignment that is larger than 8
bytes
The first situation should in theory never appear, since according to the
ARM ABI all types have a default alignment of at most 8 bytes.   However,
due to the current mix-up in GCC, vector types actually are considered to
have a 16-byte alignment requirement in GCC.
The second situation can only appear with local variables that are declared
using attribute ((aligned)).
We had discussed on Monday that we need to fix the second situation, since
this can always occur and is supported on other platforms.   By doing so,
we would then automatically fix the first situation as well.
However, this reasoning turns out to be incorrect.  There are currently in
GCC *two* completely separate mechanisms that can be used to align objects
on the stack to larger than the ABI guaranteed stack pointer alignment:
A) Re-alignment of the full stack frame.  This is what is used by the Intel
back-end (and only the Intel back-end).  At function entry, generated code
will align the stack pointer itself to whatever is necessary to fulfil
alignment requirements of all objects on the stack.  This may necessitate
follow-on changes: the frame pointer, if there is one, will likewise need
to be aligned at runtime.  Also, since incoming stack arguments are now no
longer at a fixed offset relative to the stack pointer *or* frame pointer
in some cases, we might need an extra register as argument pointer.  This
method allows extra alignment for *any* object on the stack, but needs
significant back-end support in order to be enabled on any non-Intel
architecture.
B) Dynamic allocation of selected stack variables.  This is implemented by
common code with no involvement of the back-end.  In effect, the code in
cfgexpand.c:expand_stack_vars that decides on how to allocate local
variables on the stack will remove all variables that require extra
alignment and place them into an extra structure.  Generated prologue code
will then in effect dynamically allocate and align that structure on the
stack, and just store a pointer to it as "variable" into the normal stack
frame.  All other areas of the frame are unaffected.  Since this method
just simulates code the programmer could have written themselves using
alloca, it does not require *any* back-end support and is enabled by
default everywhere.  However, it only works for regular local variables,
and not for any other objects on the stack.
Objects on the stack *except* local variables always use default alignment.
Since on most platforms, except Intel and *currently* ARM, the ABI stack
pointer alignment is sufficient to implement default alignments, method B)
as above is able to fulfil all stack alignments.   Intel uses method A), so
they're also OK.   In effect, it's only ARM due to the vector type
alignment problem that runs into the situation that neither method works.
Under those circumstances, given that:
- we want to fix vector type alignment in order to become ABI compliant
- once we've fixed this, we're in the same situation as other platforms and
method B) already fixes stack alignment problems
- implementing method A) is therefore both quite involved *and* actually
superfluous
I'd now rather recommend that we *don't* try to implement method A)  (full
stack-frame re-alignment) on ARM.
Comments?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294