Hi,
I've stumbled across an assembler error message that I don't understand.
bl1/aarch64/bl1_exceptions.S: Assembler messages:
bl1/aarch64/bl1_exceptions.S:53: Error: non-constant expression in
".if" statement
It occurs when building ARM Trusted Firmware with aarch64-linux-gnu-gcc
that ships with Ubuntu 16.04. It does _not_ occur with an older Linaro
toolchain. More details in [1].
The .align directives in the vector_base and vector_entry macros _do_
make a difference. Are they the cause of the problem? Would you
recommend writing the code differently? Or is it a compiler bug?
[1] https://github.com/ARM-software/tf-issues/issues/401
Thanks,
--
Jerome
Hi all,
First sorry for the long message, but I am kinda stuck on an issue
with my split-stack work for aarch64, so any new eyes to check if
I am doing things properly would be really helpful.
I pushed my current work on a local gcc git [1] and glibc [2] branches.
The current code show not issue with the placed C tests, but there
is one elusive GO tests that fails for some reason.
The glibc patch is pretty simple: it adds a tcbhead_t field
(__private_ss) which will be used to hold the split stack thread
value. Different from stack protector, split-stack requires
a pointer per thread and it is used frequently on *every* function
prologue. So faster access is through TCB direct access (one
instruction less than TLS initial-exec). I plan to digress a little
more about why I decided to use TCB access, but in a short the advantages
are:
1. It is faster than TLS initial-exec
2. Does not require any static or dynamic relocation
The rest of patch is just to add a versioned symbol so either static
or dynamic linking fails for an older glibc (to prevent split-stack
binaries to run on non-supported glibcs).
The GCC patch is more complex, but it follows the already implemented
split-stack support on other architectures (x86, powerpc64, s390).
Basically you add hooks to generate the required prologue and other
bits (C varargs requires some work) and add some runtime support on
libgcc (morestack.S).
Split-stack idea is basically as this: let say you have a function
that requires a very large stack allocation that might fail at
runtime (due ulimit -s limit). Split-stack add some instrumentation
that check if the stack allocation will fail based on initial
value and allocates stack segments as required.
So basically a function would be instrumented as:
function foo
ss := TCB::__private_ss
if SP + stack_allocation_size > ss
call __morestack
// function code
What __morestack basically does is create a new stack segment with
some slack (using a platform neutral code), change the stack pointer
and continue run the function. So a stack frame for a function
that called __morestack is as:
foo
\_ __morestack
\_ // function code
And when the function code finished its execution (including all
possible function calls), it returns to __morestack so it restore
the old stack pointer and arguments.
Now, this is the most straightforward usage of __morestack. However
GO language allows a construct [3] that allows a function to register
a callback that is called at end of its scope that allows to 'recover'
from some runtime execution failure.
And this is the remaining GO tests that fails [4]. What it basically
does is run a set of tests that allocate some different structure and
try to access it in different invalid way to check if accessing a
know null pointer is caught by the runtime (GO adds null pointer checks
for some constructs).
9 func main() {
10 ok := true
11 for _, tt := range tests {
12 func() {
13 defer func() {
14 if err := recover(); err == nil {
15 println(tt.name, "did not panic")
16 ok = false
17 }
18 }()
19 tt.fn()
20 }()
21 }
22 if !ok {
23 println("BUG")
24 }
25 }
41 var tests = []struct{
42 name string
43 fn func()
44 }{
76 {"*bigstructp", func() { use(*bigstructp) }},
108 type BigStruct struct {
109 i int
110 j float64
111 k string
112 x [128<<20]byte
113 l []byte
114 }
So basically here it tries to allocate a very big structure (BigStruct with
about 128 MBs) on stack and since it does not have stack allocation it will
need to call __morestack.
Now, if have patient to read until now, the way GCCGO does that is by
throwing an exception to unwind the stack and to add some CFI directives in
both generated code and morestack to correct handling the unwinding.
So if GCC generates the unwind information for the objects and if __morestack
have the correct unwind information it should, so I presume my patch is
failing in either define the correct exception handler directives in
morestack.S or I am failing in generate the correct __morestack call.
The __morestack call is done at 'aarch64_expand_split_stack_prologue' in
my patch as:
--
+ /* Call __morestack with a non-standard call procedure: x10 will hold
+ the requested stack pointer and x11 the required stack size to be
+ copied. */
+ args_size = crtl->args.size >= 0 ? crtl->args.size : 0;
+ reg11 = gen_rtx_REG (DImode, R11_REGNUM);
+ emit_move_insn (reg11, GEN_INT (args_size));
+ use_reg (&call_fusage, reg11);
+
+ /* Set up a minimum frame pointer to call __morestack. The SP is not
+ save on x29 prior so in __morestack x29 points to the called SP. */
+ aarch64_pushwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, 16);
+
+ insn = emit_call_insn (gen_call (gen_rtx_MEM (DImode, morestack_ref),
+ const0_rtx, const0_rtx));
+ add_function_usage_to (insn, call_fusage);
+
+ reg29 = gen_rtx_REG (Pmode, R29_REGNUM);
+ cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg29, cfi_ops);
+ reg30 = gen_rtx_REG (Pmode, R30_REGNUM);
+ cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg30, cfi_ops);
+ insn = emit_insn (aarch64_gen_loadwb_pair (DImode, stack_pointer_rtx,
+ reg29, reg30, 16));
+
+ /* Reset the CFA to be SP + FRAME_SIZE. */
+ new_cfa = stack_pointer_rtx;
+ cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);
+ REG_NOTES (insn) = cfi_ops;
+ RTX_FRAME_RELATED_P (insn) = 1;
+
+ emit_use (gen_rtx_REG (DImode, LR_REGNUM));
+
+ emit_insn (gen_split_stack_return ());
--
I do not add any stack frame allocation for the call, so it might a source
of issues.
Another issue might in morestack.S unwinding directives that is not following
the ABI correctly. I am revising it using GCC generated exceptions examples.
[1] https://git.linaro.org/toolchain/gcc.git/shortlog/refs/heads/linaro-local/a…
[2] https://git.linaro.org/toolchain/glibc.git/shortlog/refs/heads/azanella/spl…
[3] https://blog.golang.org/defer-panic-and-recover
[4] gcc/testsuite/go.test/test/nilptr2.go
=== This Week ===
Support for watchpoint un-alligned watchpoint addresses on LLDB
AArch64 [TCWG-367] [3/10]
-- Implementation successful without the need for instruction emulation.
-- Watchpoint hit detection fixed with ptrace reporting hit address correctly.
-- Used up watchpoint HitAddress function to return back the real
address to host.
Investigation of Nexus devices kernel issues.[TCWG-622] [2/10]
-- Figured out steps to build custom kernel for Neuxs7.
-- Kernel fails to boot though.
LLDB hardware watchpoint capability test and skip watchpoint testing
[TCWG-622] [2/10]
-- Submitted a patch to report back correct reason for watchpoint
creation failure.
Miscellaneous [3/10]
-- Meetings, emails, discussions etc.
-- A look into LLDB xfail decorators to fail on the basis of arm ABI.
-- Out of office on Thursday 9th June for visa interview.
=== Next Week ===
Support for watchpoint un-alligned watchpoint addresses on LLDB
AArch64 [TCWG-367]
-- Finish up work, test and submit for review upstream.
LLDB hardware watchpoint capability test and skip watchpoint testing [TCWG-622]
-- Patch review and further progress.
Miscellaneous
-- Testsuite Makefile.rulez update still pending, hopefully will be
done this week.
-- Xfail decorator updates to xfail based on ARM ABI.
=== This Week ===
Investigation of Nexus devices kernel issues.[TCWG-622] [6/10]
-- Figured out source trees for various android devices and checked
out kernel code.
-- Comparison between watchpoint implementation of different kerenel
sources supported by Nexus devices.
-- Figured a way to burn a custom rom with updated kernel on Nexus 7.
-- Figured Kconfig issue that was causing watchpoints not to be
detected on Nexus 7.
-- Watchpoint installation still doesnt work on Nexus 7.
LLDB buildbot updates and maintenance [TCWG-241] [2/10]
-- Upgraded cmake version after migration to new version.
-- Migrating local buildbot setup (builders and testers) to ubuntu xenial 16.04
-- Buildbot restart after power outage.
Miscellaneous [2/10]
-- Meetings, emails, discussions etc.
-- Out of office on Tuesday 31st 2016 for visa documents preparation.
=== Next Week ===
Support for watchpoint un-alligned watchpoint addresses on LLDB
AArch64 [TCWG-367]
-- Continue work on hardware watchpoint support.
Investigation of Nexus devices kernel issues.[TCWG-622]
-- Figured out steps to build custom kernel for Neuxs7.
LLDB hardware watchpoint capability test and skip watchpoint testing [TCWG-622]
-- Investigate possible ways to solve this issue.
Miscellaneous
-- Out of office on Thursday 9th June for visa interview.
* One day off on Thu [2/10]
# Progress #
* TCWG-639, non-8-byte-aligned watchpoint is missed on aarch64. [4/10].
It is caused by limited kernel BAS support. RedHat reports this
problem and posts a patch. Good to see they start to contribute to
aarch64 GDB, but I don't like the patch. Spend much time writing a
prototype to prove their patch is not perfect. We agree to fix this
problem by extend RSP fortunately.
* TCWG-333, [3/10] I follow the way how mips does, and fix some bugs.
* TCWG-547, TCWG-518, blocked in upstream review.
* Misc, meetings. [1/10]
# Plan #
* TCWG-333, TCWG-556.
* Ping TCWG-547, TCWG-518.
--
Yao
== Progress ==
o Extended Validation (4/10)
- Investigate and reported Glibc/libsanitizer issue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71445
- Looked at benchmark job issues
- Discuss benchmark/lava notification mechanism
o Linaro GCC (1/10)
- Merged FSF GCC 5.4 branch
o Upstream GCC (3/10)
- ARMv8.1 libatomic investigation:
seems to work fine when enabling multilib on AArch64
need to investigate completeness of the support.
o Misc (2/10)
* Various meetings and discussions.
* Backlfip patch reviews
== Plan ==
o Backport reviews and June snapshots
o Continue libatomic
o Benchmarking job fixes.
== This Week ==
* LTO (5/10)
a) committed r237207 to fix unresolved test-cases added in r236503 (1/10)
b) move increase_alignment pass to regular pass (3/10)
- Patch iterations based on suggestions from Honza and Richard
- Understanding lto streaming and decl merging code
c) TCWG-548 (1/10)
- WIP patch
* TCWG-319 (1/10)
- updated patch submitted upstream:
- changing vect_hw_misalign causes regression for vect-align-1.c
* Benchmarking (1/10)
- Benchmark successfully ran for linaro-gcc-5 submitted by Yvan
- Benchmark for fsf-gcc-6 failed both with prebuilt route and letting
job build the benchmark
* Misc (3/10)
- Meetings
- Travel to Mumbai for Schengen Visa appointment
== Next Week ==
- Try to get patch committed for increase_alignment
- TCWG-319: Look at vect-align-1.c regression
- TCWG-548: Complete implementation and validate patch.
- ipa-vrp: Experiment with Kugan's patch.
== Progress ==
* Validation
- disk full on one builder caused problem during the validation of
the long series of backports. We'll have to refine the cleanup policy.
- switch to docker on-hold until the builders are upgraded
* ABE
- looked a bit at the gdb/gdbserver issue
* Backports
- used a script to process the spreadsheet and submitted all the
backports that backflip was able to complete in batch mode (a lot!)
- fsf-5-branch merge review
* GCC
- regression reports on trunk, chasing problems that occur in corner
case configurations
- One of the Neon intrinsics tests I committed recently wrongly
executed v8 Neon code on v7 HW: I didn't catch this earlier because of
a bug in qemu, promptly fixed by Peter Maydell
- neon-testgen.ml removal delayed because other cleanup is desirable
before (fixing neon* effective-target tests on-going)
- started looking at PR 67591 (ARM v8 Thumb IT blocks deprecated)
* Support
- started looking at bug 2315
* Misc (conf-calls, meetings, emails, ...)
== Next ==
* Validation:
- patch reviews
- look at disk management/cleanup
* Backports
- check validation results
* GCC
- monitor trunk regressions
- fix "check_vect" guard in gcc.dg/vect tests
- fix neon* effective-target
- hopefully test-neongen.ml removal
- pr 67591
- advsimd tests
== Progress ==
TCWG-607 Initial ARM port for LLD committed upstream. Hello World on
an ARM with an ARM only gcc libc is possible, but not much else.
TCWG-611 Initial Thumb support sent for upstream review. Interworking
is possible at the BLX level but full interworking support
(veneers/thunks) isn't there yet. With this patch it will be possible
to do Hello World with a recent Linaro gcc release.
TCWG-634 llvm-mc putting out R_ARM_THM_PC24 for B<cond>.W instead of
R_ARM_THM_PC19. Simple fix now committed upstream.
== Plans ==
Interworking thunks for lld. The existing thunk design is very simple
and only supports one type of thunk per target. For interworking we
need at least two (ARM to Thumb) and (Thumb to ARM).
I think it might be possible to fit a basic implementation just good
enough for ARMv7a to be correct into the existing mechanism, however
just one more thunk type will need a more sophisticated design.
Current thought is to try and implement the basic design to learn a
bit more about the mechanism as it will hopefully not take too long.