On 12 December 2013 21:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
On 12 December 2013 21:02, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi,
Thanks for the respsonse.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 08:00, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280
This value looks surprisingly large if it is an offset from TP (x1).
Yeah, it does a bit doesn't it.
(gdb) p/x $x0 - $x1 $9 = 0x648680
(not really a suspicious number)
I guess I don't understand the adrp code. My understanding is that:
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
would result in 0x4b4000 + 0x64c000 in x0 and then
The disassembler may have done this for you, would 0x64c000 make more sense?
Yes, indeed.
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
reads from 0x4b4000 + 0x64c000 + 776 but
(gdb) x 0x4b4000 + 0x64c000 + 776 0xb00308: Cannot access memory at address 0xb00308
(I'm not sure if the disassembly for adrp has the immediate shifted or not, but anyway:
(gdb) x 0x4b4000 + (0x64c000<<12) + 776 0x4c4b4308: Cannot access memory at address 0x4c4b4308)
So I'm clearly missing something here...
x1 0x7fb7ff76f0 548547819248
Have you tried printing the memory at this address? It looks like it is probably ok...
Yeah, it's fine.
I guess that means that the thread pointer is probably correct.
It's plausible, at least :)
(gdb) x/20g $x1 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000
I guess I see three things that could be wrong:
- The operand to "adrp x0, 0x64c000"[1]
- The operand to "ldr x0, [x0,#776]"
Is there a dynamic reloc for this GOT slot?
How would I tell? :)
Generally the TLS code will load the TP then load an offset from the GOT that the dynamic linker has fixed up based on a dynamic relocation which should reference the correct symbol etc.
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
So I get this:
$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4
And objdump -C -R gives this: http://paste.ubuntu.com/6563640/
This would seem to be the relevant entry:
000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache
But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is
000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char>
This looks wrong.
which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense:
(gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000
The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-)
Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw.
I would be inclined to agree. Is there a simple way to reproduce the build?
(although I don't think I will have time to look at it until the new year)