Hi all,
First sorry for the long message, but I am kinda stuck on an issue with my split-stack work for aarch64, so any new eyes to check if I am doing things properly would be really helpful.
I pushed my current work on a local gcc git [1] and glibc [2] branches. The current code show not issue with the placed C tests, but there is one elusive GO tests that fails for some reason.
The glibc patch is pretty simple: it adds a tcbhead_t field (__private_ss) which will be used to hold the split stack thread value. Different from stack protector, split-stack requires a pointer per thread and it is used frequently on *every* function prologue. So faster access is through TCB direct access (one instruction less than TLS initial-exec). I plan to digress a little more about why I decided to use TCB access, but in a short the advantages are:
1. It is faster than TLS initial-exec 2. Does not require any static or dynamic relocation
The rest of patch is just to add a versioned symbol so either static or dynamic linking fails for an older glibc (to prevent split-stack binaries to run on non-supported glibcs).
The GCC patch is more complex, but it follows the already implemented split-stack support on other architectures (x86, powerpc64, s390). Basically you add hooks to generate the required prologue and other bits (C varargs requires some work) and add some runtime support on libgcc (morestack.S).
Split-stack idea is basically as this: let say you have a function that requires a very large stack allocation that might fail at runtime (due ulimit -s limit). Split-stack add some instrumentation that check if the stack allocation will fail based on initial value and allocates stack segments as required.
So basically a function would be instrumented as:
function foo ss := TCB::__private_ss if SP + stack_allocation_size > ss call __morestack // function code
What __morestack basically does is create a new stack segment with some slack (using a platform neutral code), change the stack pointer and continue run the function. So a stack frame for a function that called __morestack is as:
foo _ __morestack _ // function code
And when the function code finished its execution (including all possible function calls), it returns to __morestack so it restore the old stack pointer and arguments.
Now, this is the most straightforward usage of __morestack. However GO language allows a construct [3] that allows a function to register a callback that is called at end of its scope that allows to 'recover' from some runtime execution failure.
And this is the remaining GO tests that fails [4]. What it basically does is run a set of tests that allocate some different structure and try to access it in different invalid way to check if accessing a know null pointer is caught by the runtime (GO adds null pointer checks for some constructs).
9 func main() { 10 ok := true 11 for _, tt := range tests { 12 func() { 13 defer func() { 14 if err := recover(); err == nil { 15 println(tt.name, "did not panic") 16 ok = false 17 } 18 }() 19 tt.fn() 20 }() 21 } 22 if !ok { 23 println("BUG") 24 } 25 }
41 var tests = []struct{ 42 name string 43 fn func() 44 }{
76 {"*bigstructp", func() { use(*bigstructp) }},
108 type BigStruct struct { 109 i int 110 j float64 111 k string 112 x [128<<20]byte 113 l []byte 114 }
So basically here it tries to allocate a very big structure (BigStruct with about 128 MBs) on stack and since it does not have stack allocation it will need to call __morestack.
Now, if have patient to read until now, the way GCCGO does that is by throwing an exception to unwind the stack and to add some CFI directives in both generated code and morestack to correct handling the unwinding.
So if GCC generates the unwind information for the objects and if __morestack have the correct unwind information it should, so I presume my patch is failing in either define the correct exception handler directives in morestack.S or I am failing in generate the correct __morestack call.
The __morestack call is done at 'aarch64_expand_split_stack_prologue' in my patch as:
-- + /* Call __morestack with a non-standard call procedure: x10 will hold + the requested stack pointer and x11 the required stack size to be + copied. */ + args_size = crtl->args.size >= 0 ? crtl->args.size : 0; + reg11 = gen_rtx_REG (DImode, R11_REGNUM); + emit_move_insn (reg11, GEN_INT (args_size)); + use_reg (&call_fusage, reg11); + + /* Set up a minimum frame pointer to call __morestack. The SP is not + save on x29 prior so in __morestack x29 points to the called SP. */ + aarch64_pushwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, 16); + + insn = emit_call_insn (gen_call (gen_rtx_MEM (DImode, morestack_ref), + const0_rtx, const0_rtx)); + add_function_usage_to (insn, call_fusage); + + reg29 = gen_rtx_REG (Pmode, R29_REGNUM); + cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg29, cfi_ops); + reg30 = gen_rtx_REG (Pmode, R30_REGNUM); + cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg30, cfi_ops); + insn = emit_insn (aarch64_gen_loadwb_pair (DImode, stack_pointer_rtx, + reg29, reg30, 16)); + + /* Reset the CFA to be SP + FRAME_SIZE. */ + new_cfa = stack_pointer_rtx; + cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops); + REG_NOTES (insn) = cfi_ops; + RTX_FRAME_RELATED_P (insn) = 1; + + emit_use (gen_rtx_REG (DImode, LR_REGNUM)); + + emit_insn (gen_split_stack_return ()); --
I do not add any stack frame allocation for the call, so it might a source of issues.
Another issue might in morestack.S unwinding directives that is not following the ABI correctly. I am revising it using GCC generated exceptions examples.
[1] https://git.linaro.org/toolchain/gcc.git/shortlog/refs/heads/linaro-local/az... [2] https://git.linaro.org/toolchain/glibc.git/shortlog/refs/heads/azanella/spli... [3] https://blog.golang.org/defer-panic-and-recover [4] gcc/testsuite/go.test/test/nilptr2.go