Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280 x1 0x7fb7ff76f0 548547819248 x2 0x0 0 x3 0x7fb7fc11b8 548547596728 x4 0x1 1 x5 0x0 0 x6 0x50 80 x7 0x0 0 x8 0x0 0 x9 0x6165727473676f4c 7018141438804717388 x10 0x0 0 x11 0x0 0 x12 0x2 2 x13 0x10 16 x14 0x0 0 x15 0x7fb7e5e590 548546143632 x16 0x64b3d8 6599640 x17 0x7fb7f667d0 548547225552 x18 0x7fffffdab0 549755804336 x19 0x7fffffed50 549755809104 x20 0xb 11 x21 0xb 11 x22 0x6500b0 6619312 x23 0x650070 6619248 x24 0x7fffffff 2147483647 x25 0x64db40 6609728 x26 0x7fffffeda0 549755809184 x27 0x653d00 6634752 x28 0x7fffffe750 549755807568 x29 0x7fffffe4d0 549755806928 x30 0x4b4ed4 4935380 sp 0x7fffffe4d0 0x7fffffe4d0 pc 0x4b4b90 0x4b4b90 <mongo::TSP<mongo::OwnedPointerVector<std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > >::get() const+36> cpsr 0x20000000 536870912 fpsr 0x0 0 fpcr 0x0 0
If I recompile this object file without -fPIC, it works.
I guess I see three things that could be wrong:
1) The operand to "adrp x0, 0x64c000"[1] 2) The operand to "ldr x0, [x0,#776]" 3) The value of tpidr_el0
Oh, and I guess:
4) The setup of tls has gone wrong and the address in x0 _ought_ to be accessible but isn't for some reason.
Any hints on which of these seems mostly likely to be the culprit?
Chers, mwh
[1] FWIW, objdump reports 0x64c000 as "_GLOBAL_OFFSET_TABLE_+0x2d0", not sure why that doesn't show up in gdb's disassembly).
On 12 December 2013 08:00, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280
This value looks surprisingly large if it is an offset from TP (x1).
x1 0x7fb7ff76f0 548547819248
Have you tried printing the memory at this address? It looks like it is probably ok...
x2 0x0 0 x3 0x7fb7fc11b8 548547596728 x4 0x1 1 x5 0x0 0 x6 0x50 80 x7 0x0 0 x8 0x0 0 x9 0x6165727473676f4c 7018141438804717388 x10 0x0 0 x11 0x0 0 x12 0x2 2 x13 0x10 16 x14 0x0 0 x15 0x7fb7e5e590 548546143632 x16 0x64b3d8 6599640 x17 0x7fb7f667d0 548547225552 x18 0x7fffffdab0 549755804336 x19 0x7fffffed50 549755809104 x20 0xb 11 x21 0xb 11 x22 0x6500b0 6619312 x23 0x650070 6619248 x24 0x7fffffff 2147483647 x25 0x64db40 6609728 x26 0x7fffffeda0 549755809184 x27 0x653d00 6634752 x28 0x7fffffe750 549755807568 x29 0x7fffffe4d0 549755806928 x30 0x4b4ed4 4935380 sp 0x7fffffe4d0 0x7fffffe4d0 pc 0x4b4b90 0x4b4b90 <mongo::TSP<mongo::OwnedPointerVector<std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > >::get() const+36> cpsr 0x20000000 536870912 fpsr 0x0 0 fpcr 0x0 0
If I recompile this object file without -fPIC, it works.
I guess I see three things that could be wrong:
- The operand to "adrp x0, 0x64c000"[1]
- The operand to "ldr x0, [x0,#776]"
Is there a dynamic reloc for this GOT slot?
Hi,
Thanks for the respsonse.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 08:00, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280
This value looks surprisingly large if it is an offset from TP (x1).
Yeah, it does a bit doesn't it.
(gdb) p/x $x0 - $x1 $9 = 0x648680
(not really a suspicious number)
I guess I don't understand the adrp code. My understanding is that:
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
would result in 0x4b4000 + 0x64c000 in x0 and then
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
reads from 0x4b4000 + 0x64c000 + 776 but
(gdb) x 0x4b4000 + 0x64c000 + 776 0xb00308: Cannot access memory at address 0xb00308
(I'm not sure if the disassembly for adrp has the immediate shifted or not, but anyway:
(gdb) x 0x4b4000 + (0x64c000<<12) + 776 0x4c4b4308: Cannot access memory at address 0x4c4b4308)
So I'm clearly missing something here...
x1 0x7fb7ff76f0 548547819248
Have you tried printing the memory at this address? It looks like it is probably ok...
Yeah, it's fine.
(gdb) x/20g $x1 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000
The end of /proc/$pid/maps looks like this:
7fb7fd3000-7fb7fee000 r-xp 00000000 08:01 4330216 /lib/aarch64-linux-gnu/ld-2.17.so 7fb7ff3000-7fb7ffc000 rwxp 00000000 00:00 0 7fb7ffc000-7fb7ffe000 r-xp 00000000 00:00 0 [vdso] 7fb7ffe000-7fb7fff000 r-xp 0001b000 08:01 4330216 /lib/aarch64-linux-gnu/ld-2.17.so 7fb7fff000-7fb8001000 rwxp 0001c000 08:01 4330216 /lib/aarch64-linux-gnu/ld-2.17.so 7ffffdf000-8000000000 rwxp 00000000 00:00 0 [stack]
So $x1 is within a random 36k map and $x0 is off in la la land between a bit of ld-2.17.so and the stack.
x2 0x0 0 x3 0x7fb7fc11b8 548547596728 x4 0x1 1 x5 0x0 0 x6 0x50 80 x7 0x0 0 x8 0x0 0 x9 0x6165727473676f4c 7018141438804717388 x10 0x0 0 x11 0x0 0 x12 0x2 2 x13 0x10 16 x14 0x0 0 x15 0x7fb7e5e590 548546143632 x16 0x64b3d8 6599640 x17 0x7fb7f667d0 548547225552 x18 0x7fffffdab0 549755804336 x19 0x7fffffed50 549755809104 x20 0xb 11 x21 0xb 11 x22 0x6500b0 6619312 x23 0x650070 6619248 x24 0x7fffffff 2147483647 x25 0x64db40 6609728 x26 0x7fffffeda0 549755809184 x27 0x653d00 6634752 x28 0x7fffffe750 549755807568 x29 0x7fffffe4d0 549755806928 x30 0x4b4ed4 4935380 sp 0x7fffffe4d0 0x7fffffe4d0 pc 0x4b4b90 0x4b4b90 <mongo::TSP<mongo::OwnedPointerVector<std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > >::get() const+36> cpsr 0x20000000 536870912 fpsr 0x0 0 fpcr 0x0 0
If I recompile this object file without -fPIC, it works.
I guess I see three things that could be wrong:
- The operand to "adrp x0, 0x64c000"[1]
- The operand to "ldr x0, [x0,#776]"
Is there a dynamic reloc for this GOT slot?
How would I tell? :)
Cheers, mwh
Michael Hudson-Doyle michael.hudson@linaro.org writes:
I guess I don't understand the adrp code. My understanding is that:
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
would result in 0x4b4000 + 0x64c000 in x0 and then
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
reads from 0x4b4000 + 0x64c000 + 776 but
(gdb) x 0x4b4000 + 0x64c000 + 776 0xb00308: Cannot access memory at address 0xb00308
(I'm not sure if the disassembly for adrp has the immediate shifted or not, but anyway:
Oh, I see the disassembly calculates the address...
(gdb) x/g 0x64c000 + 776 0x64c308: 0x0000000000648680 (gdb) p *((long long*)(0x64c000 + 776)) == ($x0 - $x1) $3 = true
So that bit makes sense now.
Cheers, mwh
On 12 December 2013 21:02, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi,
Thanks for the respsonse.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 08:00, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280
This value looks surprisingly large if it is an offset from TP (x1).
Yeah, it does a bit doesn't it.
(gdb) p/x $x0 - $x1 $9 = 0x648680
(not really a suspicious number)
I guess I don't understand the adrp code. My understanding is that:
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
would result in 0x4b4000 + 0x64c000 in x0 and then
The disassembler may have done this for you, would 0x64c000 make more sense?
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
reads from 0x4b4000 + 0x64c000 + 776 but
(gdb) x 0x4b4000 + 0x64c000 + 776 0xb00308: Cannot access memory at address 0xb00308
(I'm not sure if the disassembly for adrp has the immediate shifted or not, but anyway:
(gdb) x 0x4b4000 + (0x64c000<<12) + 776 0x4c4b4308: Cannot access memory at address 0x4c4b4308)
So I'm clearly missing something here...
x1 0x7fb7ff76f0 548547819248
Have you tried printing the memory at this address? It looks like it is probably ok...
Yeah, it's fine.
I guess that means that the thread pointer is probably correct.
(gdb) x/20g $x1 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000
The end of /proc/$pid/maps looks like this:
7fb7fd3000-7fb7fee000 r-xp 00000000 08:01 4330216 /lib/aarch64-linux-gnu/ld-2.17.so 7fb7ff3000-7fb7ffc000 rwxp 00000000 00:00 0 7fb7ffc000-7fb7ffe000 r-xp 00000000 00:00 0 [vdso] 7fb7ffe000-7fb7fff000 r-xp 0001b000 08:01 4330216 /lib/aarch64-linux-gnu/ld-2.17.so 7fb7fff000-7fb8001000 rwxp 0001c000 08:01 4330216 /lib/aarch64-linux-gnu/ld-2.17.so 7ffffdf000-8000000000 rwxp 00000000 00:00 0 [stack]
So $x1 is within a random 36k map and $x0 is off in la la land between a bit of ld-2.17.so and the stack.
x2 0x0 0 x3 0x7fb7fc11b8 548547596728 x4 0x1 1 x5 0x0 0 x6 0x50 80 x7 0x0 0 x8 0x0 0 x9 0x6165727473676f4c 7018141438804717388 x10 0x0 0 x11 0x0 0 x12 0x2 2 x13 0x10 16 x14 0x0 0 x15 0x7fb7e5e590 548546143632 x16 0x64b3d8 6599640 x17 0x7fb7f667d0 548547225552 x18 0x7fffffdab0 549755804336 x19 0x7fffffed50 549755809104 x20 0xb 11 x21 0xb 11 x22 0x6500b0 6619312 x23 0x650070 6619248 x24 0x7fffffff 2147483647 x25 0x64db40 6609728 x26 0x7fffffeda0 549755809184 x27 0x653d00 6634752 x28 0x7fffffe750 549755807568 x29 0x7fffffe4d0 549755806928 x30 0x4b4ed4 4935380 sp 0x7fffffe4d0 0x7fffffe4d0 pc 0x4b4b90 0x4b4b90 <mongo::TSP<mongo::OwnedPointerVector<std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > >::get() const+36> cpsr 0x20000000 536870912 fpsr 0x0 0 fpcr 0x0 0
If I recompile this object file without -fPIC, it works.
I guess I see three things that could be wrong:
- The operand to "adrp x0, 0x64c000"[1]
- The operand to "ldr x0, [x0,#776]"
Is there a dynamic reloc for this GOT slot?
How would I tell? :)
Generally the TLS code will load the TP then load an offset from the GOT that the dynamic linker has fixed up based on a dynamic relocation which should reference the correct symbol etc.
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 21:02, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi,
Thanks for the respsonse.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 08:00, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280
This value looks surprisingly large if it is an offset from TP (x1).
Yeah, it does a bit doesn't it.
(gdb) p/x $x0 - $x1 $9 = 0x648680
(not really a suspicious number)
I guess I don't understand the adrp code. My understanding is that:
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
would result in 0x4b4000 + 0x64c000 in x0 and then
The disassembler may have done this for you, would 0x64c000 make more sense?
Yes, indeed.
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
reads from 0x4b4000 + 0x64c000 + 776 but
(gdb) x 0x4b4000 + 0x64c000 + 776 0xb00308: Cannot access memory at address 0xb00308
(I'm not sure if the disassembly for adrp has the immediate shifted or not, but anyway:
(gdb) x 0x4b4000 + (0x64c000<<12) + 776 0x4c4b4308: Cannot access memory at address 0x4c4b4308)
So I'm clearly missing something here...
x1 0x7fb7ff76f0 548547819248
Have you tried printing the memory at this address? It looks like it is probably ok...
Yeah, it's fine.
I guess that means that the thread pointer is probably correct.
It's plausible, at least :)
(gdb) x/20g $x1 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000
I guess I see three things that could be wrong:
- The operand to "adrp x0, 0x64c000"[1]
- The operand to "ldr x0, [x0,#776]"
Is there a dynamic reloc for this GOT slot?
How would I tell? :)
Generally the TLS code will load the TP then load an offset from the GOT that the dynamic linker has fixed up based on a dynamic relocation which should reference the correct symbol etc.
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
So I get this:
$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4
And objdump -C -R gives this: http://paste.ubuntu.com/6563640/
This would seem to be the relevant entry:
000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache
But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is
000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char>
which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense:
(gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000
The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-)
Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw.
Cheers, mwh
On 12 December 2013 21:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
On 12 December 2013 21:02, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi,
Thanks for the respsonse.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 08:00, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
I have a bit of a strange one. I'm not after a full solution, just any hints that quickly come to mind :)
After a few simple patches I have a build of mongodb for aarch64 (built with gcc-4.8). However, all of the test binaries that the build spits out immediately segfault. gdb-ing shows that they segfault inside this macro:
TSP_DECLARE(OwnedOstreamVector, threadOstreamCache);
This expands to:
# define TSP_DECLARE(T,p) \ extern __thread T* _ ## p; \ template<> inline T* TSP<T>::get() const { return _ ## p; } \ extern TSP<T> p;
And indeed, it's mongo::TSP<mongo::OwnedPointerVector<...> >::get() const that we're segfaulting in. This is the disassembly of this function (at -O0) with the faulting instruction marked:
0x00000000004b4b6c <+0>: stp x29, x30, [sp,#-32]! 0x00000000004b4b70 <+4>: mov x29, sp 0x00000000004b4b74 <+8>: str x0, [x29,#16] 0x00000000004b4b78 <+12>: adrp x0, 0x64c000 0x00000000004b4b7c <+16>: ldr x0, [x0,#776] 0x00000000004b4b80 <+20>: nop 0x00000000004b4b84 <+24>: nop 0x00000000004b4b88 <+28>: mrs x1, tpidr_el0 0x00000000004b4b8c <+32>: add x0, x1, x0 => 0x00000000004b4b90 <+36>: ldr x0, [x0] 0x00000000004b4b94 <+40>: ldp x29, x30, [sp],#32 0x00000000004b4b98 <+44>: ret
And the registers:
(gdb) info registers x0 0x7fb863fd70 548554407280
This value looks surprisingly large if it is an offset from TP (x1).
Yeah, it does a bit doesn't it.
(gdb) p/x $x0 - $x1 $9 = 0x648680
(not really a suspicious number)
I guess I don't understand the adrp code. My understanding is that:
0x00000000004b4b78 <+12>: adrp x0, 0x64c000
would result in 0x4b4000 + 0x64c000 in x0 and then
The disassembler may have done this for you, would 0x64c000 make more sense?
Yes, indeed.
0x00000000004b4b7c <+16>: ldr x0, [x0,#776]
reads from 0x4b4000 + 0x64c000 + 776 but
(gdb) x 0x4b4000 + 0x64c000 + 776 0xb00308: Cannot access memory at address 0xb00308
(I'm not sure if the disassembly for adrp has the immediate shifted or not, but anyway:
(gdb) x 0x4b4000 + (0x64c000<<12) + 776 0x4c4b4308: Cannot access memory at address 0x4c4b4308)
So I'm clearly missing something here...
x1 0x7fb7ff76f0 548547819248
Have you tried printing the memory at this address? It looks like it is probably ok...
Yeah, it's fine.
I guess that means that the thread pointer is probably correct.
It's plausible, at least :)
(gdb) x/20g $x1 0x7fb7ff76f0: 0x0000007fb7ff7e28 0x0000000000000000 0x7fb7ff7700: 0x0000000000000000 0x0000000000000000 0x7fb7ff7710: 0x0000000000000000 0x0000000000000000 0x7fb7ff7720: 0x0000000000000000 0x0000007fb7e5ce50 0x7fb7ff7730: 0x0000007fb7e5fff8 0x0000000000000000 0x7fb7ff7740: 0x0000007fb7e1bab8 0x0000007fb7e1b4b8 0x7fb7ff7750: 0x0000007fb7e1c3b8 0x0000007fb7e5c550 0x7fb7ff7760: 0x0000000000000000 0x0000000000000000 0x7fb7ff7770: 0x0000000000000000 0x0000000000000000 0x7fb7ff7780: 0x0000000000000000 0x0000000000000000
I guess I see three things that could be wrong:
- The operand to "adrp x0, 0x64c000"[1]
- The operand to "ldr x0, [x0,#776]"
Is there a dynamic reloc for this GOT slot?
How would I tell? :)
Generally the TLS code will load the TP then load an offset from the GOT that the dynamic linker has fixed up based on a dynamic relocation which should reference the correct symbol etc.
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
So I get this:
$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4
And objdump -C -R gives this: http://paste.ubuntu.com/6563640/
This would seem to be the relevant entry:
000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache
But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is
000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char>
This looks wrong.
which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense:
(gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000
The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-)
Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw.
I would be inclined to agree. Is there a simple way to reproduce the build?
(although I don't think I will have time to look at it until the new year)
Will Newton will.newton@linaro.org writes:
On 12 December 2013 21:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
[sniiiip]
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
So I get this:
$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4
And objdump -C -R gives this: http://paste.ubuntu.com/6563640/
This would seem to be the relevant entry:
000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache
But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is
000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char>
This looks wrong.
Yeah, it does. Also poking around at 0x64c308 shows something that looks very much like a vtable for a class called typed_value...
which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense:
(gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000
The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-)
Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw.
I would be inclined to agree. Is there a simple way to reproduce the build?
Ha. No, I've only seen this when compiling all of mongodb, which takes a pretty long time on hw. I'll certainly let you know if I can come up with something smaller. I'll also try with 4.9.
(although I don't think I will have time to look at it until the new year)
No worries, a bug on launchpad.net/linaro-gcc is the right way to track this properly?
Cheers, mwh
On 12 December 2013 23:14, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
On 12 December 2013 21:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
[sniiiip]
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
So I get this:
$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4
And objdump -C -R gives this: http://paste.ubuntu.com/6563640/
This would seem to be the relevant entry:
000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache
But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is
000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char>
This looks wrong.
Yeah, it does. Also poking around at 0x64c308 shows something that looks very much like a vtable for a class called typed_value...
which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense:
(gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000
The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-)
Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw.
I would be inclined to agree. Is there a simple way to reproduce the build?
Ha. No, I've only seen this when compiling all of mongodb, which takes a pretty long time on hw. I'll certainly let you know if I can come up with something smaller. I'll also try with 4.9.
Small is good, but we do have access to hardware so at least it won't be days waiting for the model. ;-)
(although I don't think I will have time to look at it until the new year)
No worries, a bug on launchpad.net/linaro-gcc is the right way to track this properly?
Yeah I think so, although I think it will actually be a binutils bug.
Will Newton will.newton@linaro.org writes:
On 12 December 2013 23:14, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
On 12 December 2013 21:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
[sniiiip]
I would guess that 0x64c000 is the base of the GOT and 776 is the offset into it (but I could be wrong). objdump -h will give you the layout of the sections, objdump -R will dump the relocations.
So I get this:
$ objdump -h build/linux2/normal/mongo/base/counter_test | grep got --context=2 23 .dynamic 00000220 000000000064b160 000000000064b160 0023b160 2**3 CONTENTS, ALLOC, LOAD, DATA 24 .got 00001c78 000000000064b380 000000000064b380 0023b380 2**3 CONTENTS, ALLOC, LOAD, DATA 25 .data 00000130 000000000064d000 000000000064d000 0023d000 2**4
And objdump -C -R gives this: http://paste.ubuntu.com/6563640/
This would seem to be the relevant entry:
000000000064ccb8 R_AARCH64_TLS_TPREL64 mongo::_threadOstreamCache
But I don't know what the offset means here and how it relates to the 776 in "ldr x0, [x0,#776]". 0x64c000 + 776 is 0x64c308 which is
000000000064c308 R_AARCH64_GLOB_DAT vtable for boost::program_options::typed_value<unsigned int, char>
This looks wrong.
Yeah, it does. Also poking around at 0x64c308 shows something that looks very much like a vtable for a class called typed_value...
which is just random, but I don't know if that's a valid thing to be looking at :-) That said, if we examine the memory at 0x64ccb8 and interpret it as an offset against tpidr_el0 things *seem* to make sense:
(gdb) x 0x64ccb8 0x64ccb8: 0x00000010 (gdb) x/g $x1 + 0x10 0x7fb7ff7700: 0x0000000000000000
The correct value for this tls pointer at this point in time _is_ in fact NULL, but obviously this could happen just by chance :-)
Still, looks a bit like a toolchain bug to me. This is with g++ 4.8 from trusty fwiw.
I would be inclined to agree. Is there a simple way to reproduce the build?
Ha. No, I've only seen this when compiling all of mongodb, which takes a pretty long time on hw. I'll certainly let you know if I can come up with something smaller. I'll also try with 4.9.
Small is good, but we do have access to hardware so at least it won't be days waiting for the model. ;-)
(although I don't think I will have time to look at it until the new year)
No worries, a bug on launchpad.net/linaro-gcc is the right way to track this properly?
Yeah I think so, although I think it will actually be a binutils bug.
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
Cheers, mwh
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell...
Cheers, mwh
On 16 Dec 2013 16:37, "Michael Hudson-Doyle" michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort |
cut -d' ' -f 2 | uniq -c
4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R
build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c
4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out?
I guess I really mean "between patching the code and writing the relocs out" here.
I don't really see how but maybe this rings a bell...
Cheers, mwh
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
On 16 December 2013 03:36, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell...
Indeed it does. ;-)
A similar issue was caused by commit 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to the aarch64 ld backend) but was intended to be fixed by the rework of the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I was never actually able to reproduce the failure case (I saw binaries that were broken so I know it could happen) so the fix was somewhat speculative. Hence I am very interested in finding a reproducible case where this GOT entry misordering happens!
Will Newton will.newton@linaro.org writes:
On 16 December 2013 03:36, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell...
Indeed it does. ;-)
A similar issue was caused by commit 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to the aarch64 ld backend) but was intended to be fixed by the rework of the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I was never actually able to reproduce the failure case (I saw binaries that were broken so I know it could happen) so the fix was somewhat speculative. Hence I am very interested in finding a reproducible case where this GOT entry misordering happens!
I'm possibly doing something wrong, but I've tried to try compiling the suspect binary with both binutils git tip and the commit before 692e2b8bc but both had the problem. So I guess it's something else, or I wasn't testing what I thought I was testing.
Cheers, mwh
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Will Newton will.newton@linaro.org writes:
On 16 December 2013 03:36, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell...
Indeed it does. ;-)
A similar issue was caused by commit 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to the aarch64 ld backend) but was intended to be fixed by the rework of the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I was never actually able to reproduce the failure case (I saw binaries that were broken so I know it could happen) so the fix was somewhat speculative. Hence I am very interested in finding a reproducible case where this GOT entry misordering happens!
I'm possibly doing something wrong, but I've tried to try compiling the suspect binary with both binutils git tip and the commit before 692e2b8bc but both had the problem. So I guess it's something else, or I wasn't testing what I thought I was testing.
Argh, I wasn't testing what I thought I was testing... trying again.
Cheers, mwh
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Will Newton will.newton@linaro.org writes:
On 16 December 2013 03:36, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell...
Indeed it does. ;-)
A similar issue was caused by commit 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to the aarch64 ld backend) but was intended to be fixed by the rework of the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I was never actually able to reproduce the failure case (I saw binaries that were broken so I know it could happen) so the fix was somewhat speculative. Hence I am very interested in finding a reproducible case where this GOT entry misordering happens!
I'm possibly doing something wrong, but I've tried to try compiling the suspect binary with both binutils git tip and the commit before 692e2b8bc but both had the problem. So I guess it's something else, or I wasn't testing what I thought I was testing.
Argh, I wasn't testing what I thought I was testing... trying again.
Ah... found it! This is the code that determines the offset to patch into the code (elfnn-aarch64.c line 3845):
value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma + globals->root.sgot->output_section->output_offset);
and this is the code that determines the offset as written into the relocation (elfnn-aarch64.c line 4248):
off = symbol_got_offset (input_bfd, h, r_symndx); ... rela.r_offset = globals->root.sgot->output_section->vma + globals->root.sgot->output_offset + off;
Can you see the difference? The former is "root.sgot->output_section->output_offset", the latter is "root.sgot->output_offset".
This suggests the rather obvious attached patch. I haven't tested this exact patch, but its an obvious translation from a patch to 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also haven't tested the second hunk at all, but it seems plausible...
Cheers, mwh
On 17 December 2013 07:53, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Will Newton will.newton@linaro.org writes:
On 16 December 2013 03:36, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Aaah, you might be onto something there. I built myself a cross gcc-4.8 today and it appeared to compile things correctly (I didn't actually get to run it, but the objdump poking looked right) and I got a bit worried that this was all down to some cosmic ray / corruption when I first compiled it. But, the scripts I cargo culted just use compile binutils from git tip, so if the bug is in binutils...
So I still don't know what's going on, exactly, but I have a debug build of binutils now and some clues. It still only happens on real hardware, not cross compiling on my laptop, but I think I have an idea as to why. This might be complete crack, but anyway.
I think it's to do with the order of things within the GOT.
When I cross compile, sort the relocations by address, then count up the number of relocations of each type, it looks like this:
$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 103 R_AARCH64_GLOB_DAT 305 R_AARCH64_JUMP_SLOT 12 R_AARCH64_COPY 1 RELOCATION 2
In this case, the code and the relocation agree on where the thread local variable is.
When I compile natively, it looks like this:
(t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | uniq -c 4 295 R_AARCH64_JUMP_SLOT 496 R_AARCH64_GLOB_DAT 1 R_AARCH64_TLS_TPREL64 104 R_AARCH64_GLOB_DAT 12 R_AARCH64_COPY 1 RELOCATION 2
And the code and the relocation disagree on where the thread local variable is -- by 298 * sizeof(void*). Which is almost (but I admit, not exactly) the number of JUMP_SLOTs that are, in this case, before the TLS variable in the GOT. When I compiled in a different way, there were only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation disagreed by 163 slots.
So is it possible somehow that the GOT has these JUMP_SLOTs inserted into it after the relocation for the TLS has been written out? I don't really see how but maybe this rings a bell...
Indeed it does. ;-)
A similar issue was caused by commit 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to the aarch64 ld backend) but was intended to be fixed by the rework of the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I was never actually able to reproduce the failure case (I saw binaries that were broken so I know it could happen) so the fix was somewhat speculative. Hence I am very interested in finding a reproducible case where this GOT entry misordering happens!
I'm possibly doing something wrong, but I've tried to try compiling the suspect binary with both binutils git tip and the commit before 692e2b8bc but both had the problem. So I guess it's something else, or I wasn't testing what I thought I was testing.
Argh, I wasn't testing what I thought I was testing... trying again.
Ah... found it! This is the code that determines the offset to patch into the code (elfnn-aarch64.c line 3845):
value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma + globals->root.sgot->output_section->output_offset);
and this is the code that determines the offset as written into the relocation (elfnn-aarch64.c line 4248):
off = symbol_got_offset (input_bfd, h, r_symndx); ... rela.r_offset = globals->root.sgot->output_section->vma + globals->root.sgot->output_offset + off;
Can you see the difference? The former is "root.sgot->output_section->output_offset", the latter is "root.sgot->output_offset".
Yes, that does look a bit odd.
This suggests the rather obvious attached patch. I haven't tested this exact patch, but its an obvious translation from a patch to 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also haven't tested the second hunk at all, but it seems plausible...
Thanks for you analysis, the fix does look plausible indeed. ;-)
Have you verified it fixes the problem you were seeing?
I'm about to disappear to sunnier climes for three weeks but I'll definitely look at it when I get back. I've added Marcus to CC in case he isn't reading this list.
Will Newton will.newton@linaro.org writes:
On 17 December 2013 07:53, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Ah... found it! This is the code that determines the offset to patch into the code (elfnn-aarch64.c line 3845):
value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma + globals->root.sgot->output_section->output_offset);
and this is the code that determines the offset as written into the relocation (elfnn-aarch64.c line 4248):
off = symbol_got_offset (input_bfd, h, r_symndx); ... rela.r_offset = globals->root.sgot->output_section->vma + globals->root.sgot->output_offset + off;
Can you see the difference? The former is "root.sgot->output_section->output_offset", the latter is "root.sgot->output_offset".
Yes, that does look a bit odd.
Yes. And one is the difference between the reloc and the code value and the other is zero...
This suggests the rather obvious attached patch. I haven't tested this exact patch, but its an obvious translation from a patch to 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also haven't tested the second hunk at all, but it seems plausible...
Thanks for you analysis, the fix does look plausible indeed. ;-)
Have you verified it fixes the problem you were seeing?
To be super correct, I have not verified that the patch I sent you, when applied to binutils tip, fixes the problem. But a patch that's basically the same when applied to a slightly random commit from June results in working binaries (and the unpatched version does not).
I'm about to disappear to sunnier climes
One advantage of the southern hemisphere: my climes are already sunny...
for three weeks but I'll definitely look at it when I get back. I've added Marcus to CC in case he isn't reading this list.
Cool. Would it be useful to report the bug in https://sourceware.org/bugzilla/ as well?
Cheers, mwh
+Ryan, +Kugan,
On 17 December 2013 08:45, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
On 17 December 2013 07:53, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Ah... found it! This is the code that determines the offset to patch into the code (elfnn-aarch64.c line 3845):
value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma + globals->root.sgot->output_section->output_offset);
and this is the code that determines the offset as written into the relocation (elfnn-aarch64.c line 4248):
off = symbol_got_offset (input_bfd, h, r_symndx); ... rela.r_offset = globals->root.sgot->output_section->vma + globals->root.sgot->output_offset + off;
Can you see the difference? The former is "root.sgot->output_section->output_offset", the latter is "root.sgot->output_offset".
Yes, that does look a bit odd.
Yes. And one is the difference between the reloc and the code value and the other is zero...
This suggests the rather obvious attached patch. I haven't tested this exact patch, but its an obvious translation from a patch to 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also haven't tested the second hunk at all, but it seems plausible...
Thanks for you analysis, the fix does look plausible indeed. ;-)
Have you verified it fixes the problem you were seeing?
To be super correct, I have not verified that the patch I sent you, when applied to binutils tip, fixes the problem. But a patch that's basically the same when applied to a slightly random commit from June results in working binaries (and the unpatched version does not).
I'm about to disappear to sunnier climes
One advantage of the southern hemisphere: my climes are already sunny...
for three weeks but I'll definitely look at it when I get back. I've added Marcus to CC in case he isn't reading this list.
Cool. Would it be useful to report the bug in https://sourceware.org/bugzilla/ as well?
Yes please.
Ryan or Kugan can you look at fixing this please?
Thanks,
Matt
On 17/12/13 20:38, Matthew Gretton-Dann wrote:
+Ryan, +Kugan,
On 17 December 2013 08:45, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Will Newton will.newton@linaro.org writes:
On 17 December 2013 07:53, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Ah... found it! This is the code that determines the offset to patch into the code (elfnn-aarch64.c line 3845):
value = (symbol_got_offset (input_bfd, h, r_symndx) + globals->root.sgot->output_section->vma + globals->root.sgot->output_section->output_offset);
and this is the code that determines the offset as written into the relocation (elfnn-aarch64.c line 4248):
off = symbol_got_offset (input_bfd, h, r_symndx); ... rela.r_offset = globals->root.sgot->output_section->vma + globals->root.sgot->output_offset + off;
Can you see the difference? The former is "root.sgot->output_section->output_offset", the latter is "root.sgot->output_offset".
Yes, that does look a bit odd.
Yes. And one is the difference between the reloc and the code value and the other is zero...
This suggests the rather obvious attached patch. I haven't tested this exact patch, but its an obvious translation from a patch to 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also haven't tested the second hunk at all, but it seems plausible...
Thanks for you analysis, the fix does look plausible indeed. ;-)
Have you verified it fixes the problem you were seeing?
To be super correct, I have not verified that the patch I sent you, when applied to binutils tip, fixes the problem. But a patch that's basically the same when applied to a slightly random commit from June results in working binaries (and the unpatched version does not).
I'm about to disappear to sunnier climes
One advantage of the southern hemisphere: my climes are already sunny...
for three weeks but I'll definitely look at it when I get back. I've added Marcus to CC in case he isn't reading this list.
Cool. Would it be useful to report the bug in https://sourceware.org/bugzilla/ as well?
Yes please.
Ryan or Kugan can you look at fixing this please?
OK, I will look at it.
Thanks, Kugan
Matthew Gretton-Dann matthew.gretton-dann@linaro.org writes:
On 17 December 2013 08:45, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Cool. Would it be useful to report the bug in https://sourceware.org/bugzilla/ as well?
Yes please.
https://sourceware.org/bugzilla/show_bug.cgi?id=16340
Cheers, mwh
linaro-toolchain@lists.linaro.org