[TCWG] Split-stack for aarch64

13 Jun 2016


      Hi all,
First sorry for the long message, but I am kinda stuck on an issue 
with my split-stack work for aarch64, so any new eyes to check if 
I am doing things properly would be really helpful.
I pushed my current work on a local gcc git [1] and glibc [2] branches.
The current code show not issue with the placed C tests, but there
is one elusive GO tests that fails for some reason.
The glibc patch is pretty simple: it adds a tcbhead_t field
(__private_ss) which will be used to hold the split stack thread
value.  Different from stack protector, split-stack requires
a pointer per thread and it is used frequently on *every* function
prologue.  So faster access is through TCB direct access (one
instruction less than TLS initial-exec). I plan to digress a little
more about why I decided to use TCB access, but in a short the advantages
are:
1. It is faster than TLS initial-exec
 2. Does not require any static or dynamic relocation
The rest of patch is just to add a versioned symbol so either static
or dynamic linking fails for an older glibc (to prevent split-stack
binaries to run on non-supported glibcs).
The GCC patch is more complex, but it follows the already implemented
split-stack support on other architectures (x86, powerpc64, s390).
Basically you add hooks to generate the required prologue and other
bits (C varargs requires some work) and add some runtime support on
libgcc (morestack.S).
Split-stack idea is basically as this: let say you have a function
that requires a very large stack allocation that might fail at
runtime (due ulimit -s limit).  Split-stack add some instrumentation
that check if the stack allocation will fail based on initial
value and allocates stack segments as required.
So basically a function would be instrumented as:
function foo
  ss := TCB::__private_ss
  if SP + stack_allocation_size > ss
    call __morestack
  // function code
What __morestack basically does is create a new stack segment with
some slack (using a platform neutral code), change the stack pointer
and continue run the function.  So a stack frame for a function
that called __morestack is as:
foo
 _ __morestack
          _ // function code
And when the function code finished its execution (including all
possible function calls), it returns to __morestack so it restore
the old stack pointer and arguments.
Now, this is the most straightforward usage of __morestack. However
GO language allows a construct [3] that allows a function to register
a callback that is called at end of its scope that allows to 'recover'
from some runtime execution failure.
And this is the remaining GO tests that fails [4].  What it basically 
does is run a set of tests that allocate some different structure and 
try to access it in different invalid way to check if accessing a
know null pointer is caught by the runtime (GO adds null pointer checks 
for  some constructs).
9 func main() {
 10         ok := true
 11         for _, tt := range tests {
 12                 func() {
 13                         defer func() {
 14                                 if err := recover(); err == nil {
 15                                         println(tt.name, "did not panic")
 16                                         ok = false
 17                                 }
 18                         }()
 19                         tt.fn()
 20                 }()
 21         }
 22         if !ok {
 23                 println("BUG")
 24         }
 25 }
41 var tests = []struct{
 42         name string
 43         fn func()
 44 }{
76         {"*bigstructp", func() { use(*bigstructp) }},
108 type BigStruct struct {
109         i int
110         j float64
111         k string
112         x [128<<20]byte
113         l []byte
114 }
So basically here it tries to allocate a very big structure (BigStruct with 
about 128 MBs) on stack and since it does not have stack allocation it will 
need to call __morestack.
Now, if have patient to read until now, the way GCCGO does that is by 
throwing an exception to unwind the stack and to add some CFI directives in
both generated code and morestack to correct handling the unwinding.
So if GCC generates the unwind information for the objects and if __morestack
have the correct unwind information it should, so I presume my patch is
failing in either define the correct exception handler directives in
morestack.S or I am failing in generate the correct __morestack call.
The __morestack call is done at 'aarch64_expand_split_stack_prologue' in
my patch as:
--
+  /* Call __morestack with a non-standard call procedure: x10 will hold
+     the requested stack pointer and x11 the required stack size to be
+     copied.  */
+  args_size = crtl->args.size >= 0 ? crtl->args.size : 0;
+  reg11 = gen_rtx_REG (DImode, R11_REGNUM);
+  emit_move_insn (reg11, GEN_INT (args_size));
+  use_reg (&call_fusage, reg11);
+
+  /* Set up a minimum frame pointer to call __morestack.  The SP is not
+     save on x29 prior so in __morestack x29 points to the called SP.  */
+  aarch64_pushwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, 16);
+
+  insn = emit_call_insn (gen_call (gen_rtx_MEM (DImode, morestack_ref),
+                                  const0_rtx, const0_rtx));
+  add_function_usage_to (insn, call_fusage);
+
+  reg29 = gen_rtx_REG (Pmode, R29_REGNUM);
+  cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg29, cfi_ops);
+  reg30 = gen_rtx_REG (Pmode, R30_REGNUM);
+  cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg30, cfi_ops);
+  insn = emit_insn (aarch64_gen_loadwb_pair (DImode, stack_pointer_rtx,
+                                            reg29, reg30, 16));
+
+  /* Reset the CFA to be SP + FRAME_SIZE.  */
+  new_cfa = stack_pointer_rtx;
+  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);
+  REG_NOTES (insn) = cfi_ops;
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  emit_use (gen_rtx_REG (DImode, LR_REGNUM));
+
+  emit_insn (gen_split_stack_return ());
--
I do not add any stack frame allocation for the call, so it might a source
of issues.
Another issue might in morestack.S unwinding directives that is not following
the ABI correctly. I am revising it using GCC generated exceptions examples.
[1] https://git.linaro.org/toolchain/gcc.git/shortlog/refs/heads/linaro-local/az...
[2] https://git.linaro.org/toolchain/glibc.git/shortlog/refs/heads/azanella/spli...
[3] https://blog.golang.org/defer-panic-and-recover
[4] gcc/testsuite/go.test/test/nilptr2.go

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[TCWG] Split-stack for aarch64