Re: Update on stack (re-)alignment issues

19 Apr 2012

      On 18/04/12 18:36, Ulrich Weigand wrote:
...
Hello,
I've been following up on the discussion we had on Monday regarding stack
alignment, and noticed that I had mis-remembered the current state of
affairs.  Ramana asked me on Tuesday to provide a write-up of the actual
status, so here we go ...
To summarize the background of the problem:  on ARM, the incoming stack
pointer is only guaranteed to be aligned to an 8 byte boundary.  This means
that objects on the stack (local variables, spill slots, temporaries etc.)
cannot easily be aligned to more than 8 bytes.  This can potentially cause
problems in two situations:

The object's default alignment (according to its type) is larger than 8

bytes
2) The object has a forced non-default alignment that is larger than 8
bytes
The first situation should in theory never appear, since according to the
ARM ABI all types have a default alignment of at most 8 bytes.   However,
due to the current mix-up in GCC, vector types actually are considered to
have a 16-byte alignment requirement in GCC.
...
The second situation can only appear with local variables that are declared
using attribute ((aligned)).
We had discussed on Monday that we need to fix the second situation, since
this can always occur and is supported on other platforms.   By doing so,
we would then automatically fix the first situation as well.
However, this reasoning turns out to be incorrect.  There are currently in
GCC *two* completely separate mechanisms that can be used to align objects
on the stack to larger than the ABI guaranteed stack pointer alignment:
A) Re-alignment of the full stack frame.  This is what is used by the Intel
back-end (and only the Intel back-end).  At function entry, generated code
will align the stack pointer itself to whatever is necessary to fulfil
alignment requirements of all objects on the stack.  This may necessitate
follow-on changes: the frame pointer, if there is one, will likewise need
to be aligned at runtime.  Also, since incoming stack arguments are now no
longer at a fixed offset relative to the stack pointer *or* frame pointer
in some cases, we might need an extra register as argument pointer.  This
method allows extra alignment for *any* object on the stack, but needs
significant back-end support in order to be enabled on any non-Intel
architecture.
B) Dynamic allocation of selected stack variables.  This is implemented by
common code with no involvement of the back-end.  In effect, the code in
cfgexpand.c:expand_stack_vars that decides on how to allocate local
variables on the stack will remove all variables that require extra
alignment and place them into an extra structure.  Generated prologue code
will then in effect dynamically allocate and align that structure on the
stack, and just store a pointer to it as "variable" into the normal stack
frame.  All other areas of the frame are unaffected.  Since this method
just simulates code the programmer could have written themselves using
alloca, it does not require *any* back-end support and is enabled by
default everywhere.  However, it only works for regular local variables,
and not for any other objects on the stack.
I read the C11 standard briefly a few months back, and I believe that B)
is all that is needed there.  The standard excludes over-aligning
function arguments.
...
Objects on the stack *except* local variables always use default alignment.
Since on most platforms, except Intel and *currently* ARM, the ABI stack
pointer alignment is sufficient to implement default alignments, method B)
as above is able to fulfil all stack alignments.   Intel uses method A), so
they're also OK.   In effect, it's only ARM due to the vector type
alignment problem that runs into the situation that neither method works.
Under those circumstances, given that:

we want to fix vector type alignment in order to become ABI compliant
once we've fixed this, we're in the same situation as other platforms and

method B) already fixes stack alignment problems

implementing method A) is therefore both quite involved *and* actually

superfluous
I'd now rather recommend that we *don't* try to implement method A)  (full
stack-frame re-alignment) on ARM.
Comments?
Yes, sounds like the right solution to me.
Technically, GCC's vector mechanism allows the creation of any size of
vector, which will be aligned to the size of the vector.  We only run
into problems when that size exceeds the maximum alignment.  Such values
passed by value to functions should also be over-aligned.  I think if we
were to continue supporting such non-standard types we would have to
change the rules to pass them by reference and have caller copying.
We'd still need to deal with the 16-byte vectors somehow though.
So overall, I think the only practical solution is to limit vectors to
8-byte alignment.
R.
...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Update on stack (re-)alignment issues