Hi Ramana, Ulrich. Could I have some help with an unexpected testsuite failure while backporting Carrot's adddi patch? testsuite/gcc.misc-tests/gcov-7.c builds and runs but aborts during leave() due to unexpected results.
The merge request is here: https://code.launchpad.net/~michaelh1/gcc-linaro/core-adddi/+merge/113111
The testsuite diff is here: http://ex.seabright.co.nz/build/gcc-linaro-4.7+bzr115001~michaelh1~core-addd...
The build tree is at: cbuild@tcpanda02.v:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build
The failing and working versions are on tcpanda02 as ~/gcov-7.exe and ~/gcov-7-ok.
Here's the details: * The test is fine when built from the command line * The test is fine on the hard float Precise build * The failing binary works fine when run on Precise * The disassembled body (not libraries) is identical modulo changes in addresses * The fault goes away with a static linking via adding "--tool_opts '-static'" * The fault persists with binutils 2.22 * The fault persists with the eglibc 2.15 loader
I assume the testsuite picks up a different libgcc and libgcov somehow which gives a different executable. It's strange that the static linked version is fine, and that the failing binary works fine on a different host.
Could you have a poke in the build tree?
-- Michael
Michael Hope michael.hope@linaro.org wrote:
Here's the details:
- The test is fine when built from the command line
This is weird in particular. It probably means that you built it in a way where it picks up system libgcc and/or libgcov instead of the versions just built in the compiler tree ...
Did you set GCC_EXEC_PREFIX ?
- The test is fine on the hard float Precise build
- The failing binary works fine when run on Precise
- The disassembled body (not libraries) is identical modulo changes
in addresses
- The fault goes away with a static linking via adding
"--tool_opts'-static'"
- The fault persists with binutils 2.22
- The fault persists with the eglibc 2.15 loader
I assume the testsuite picks up a different libgcc and libgcov somehow which gives a different executable. It's strange that the static linked version is fine, and that the failing binary works fine on a different host.
Could you have a poke in the build tree?
Unfortunately ex.seabright.co.nz doesn't appear to be accessible at the moment.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294 ´
On 19 July 2012 04:31, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Michael Hope michael.hope@linaro.org wrote:
Here's the details:
- The test is fine when built from the command line
This is weird in particular. It probably means that you built it in a way where it picks up system libgcc and/or libgcov instead of the versions just built in the compiler tree ...
Did you set GCC_EXEC_PREFIX ?
I used the spawn line from gcc.log which includes a -B.
I've looked further and I'm flummoxed. I ran the testsuite with a --tool_opts='-save-temps -v'. gprof-7.o is identical with the command line version, as you'd expect. I took the collect2 line from the verbose log and ran that manually[1] and still got a test that passes. I ran strace to track what static libraries are being used and everything is under the build tree, except /usr/lib/crt{1,n,i}.o and libc_nonshared.a. I especially checked libgcc.a and libgcov.a and both are being pulled from build/gcc.
-- Michael
[1] the epic COMPILER_PATH=/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc/:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/libexec/gcc/arm-linux-gnueabi/4.7.1/:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/libexec/gcc/ \ LIBRARY_PATH=/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc/:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/lib/gcc/arm-linux-gnueabi/4.7.1/:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/lib/gcc/:/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/lib/gcc/arm-linux-gnueabi/4.7.1/../../../:/lib/:/usr/lib/ \ COLLECT_GCC_OPTIONS="'-B' '/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc/' '-save-temps' '-v' '-fprofile-arcs' '-ftest-coverage' '-o' './gcov-7.exe' '-march=armv7-a' '-mtune=cortex-a9' '-mfloat-abi=softfp' '-mfpu=neon' '-mthumb' '-mtls-dialect=gnu'" \ /scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc/collect2 --eh-frame-hdr -dynamic-linker /lib/ld-linux.so.3 -X -m armelf_linux_eabi -nostdlib -o ./gcov-7.exe /usr/lib/crt1.o /usr/lib/crti.o /scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc/crtbegin.o -L/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc -L/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/lib/gcc/arm-linux-gnueabi/4.7.1 -L/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/lib/gcc -L/scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/install/lib/gcc/arm-linux-gnueabi/4.7.1/../../.. -fix-cortex-a8 gcov-7.o -lgcov -lgcc -L/usr/lib/arm-linux-gnueabi --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /scratch/cbuild/slave/slaves/tcpanda02/gcc-linaro-4.7+bzr115001~michaelh1~core-adddi/gcc/default/build/gcc/crtend.o /usr/lib/crtn.o
Michael Hope michael.hope@linaro.org wrote on 19.07.2012 00:27:31:
On 19 July 2012 04:31, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Michael Hope michael.hope@linaro.org wrote:
Here's the details:
- The test is fine when built from the command line
This is weird in particular. It probably means that you built it in a way where it picks up system libgcc and/or libgcov instead of the versions just built in the compiler tree ...
Did you set GCC_EXEC_PREFIX ?
I used the spawn line from gcc.log which includes a -B.
I've looked further and I'm flummoxed. I ran the testsuite with a --tool_opts='-save-temps -v'. gprof-7.o is identical with the command line version, as you'd expect. I took the collect2 line from the verbose log and ran that manually[1] and still got a test that passes. I ran strace to track what static libraries are being used and everything is under the build tree, except /usr/lib/crt{1,n,i}.o and libc_nonshared.a. I especially checked libgcc.a and libgcov.a and both are being pulled from build/gcc.
Not sure what the difference is, specifically. But as mentioned in our call today, during testsuite runs GCC_EXEC_PREFIX *is* set, and that may modify the behaviour in some ways ...
If you look into the gcc build directory under testsuite/gcc/site.exp, you'll find that it sets TEST_GCC_EXEC_PREFIX. When running GCC under the test suite, it will use this setting as the value of GCC_EXEC_PREFIX.
When running the compile on the command line, I'd recommend to manually set GCC_EXEC_PREFIX to the same value to avoid differences ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
Michael Hope michael.hope@linaro.org wrote:
Hi Ramana, Ulrich. Could I have some help with an unexpected testsuite failure while backporting Carrot's adddi patch? testsuite/gcc.misc-tests/gcov-7.c builds and runs but aborts during leave() due to unexpected results.
The failing and working versions are on tcpanda02 as ~/gcov-7.exe and ~/gcov-7-ok.
OK, so I understand why ~/gcov-7.exe aborts. What actually happens is: - The program runs through its test sucessfully, until the final exit. - Within exit, gcov_exit processing is done. - Within the gcov_exit subroutine __gcov_rewrite, a *tail* call to fseek is done (which is translated into a branch). - This branch is supposed to go to a PLT stub, but instead goes haywire. - Execution gets rerouted to somewhere in the main code. - This is turn leads it to *again* check its proper exit conditions. - This test now fails, leading to the abort.
In the good case ~/gcov-7-ok, the fseek PLT is called correctly. This (good case) looks like:
00008f70 <__gcov_rewrite>: 8f70: b538 push {r3, r4, r5, lr} 8f72: 4b0a ldr r3, [pc, #40] ; (8f9c <__gcov_rewrite+0x2c>) 8f74: 447b add r3, pc 8f76: 699a ldr r2, [r3, #24] 8f78: 2a00 cmp r2, #0 8f7a: dd0c ble.n 8f96 <__gcov_rewrite+0x26> 8f7c: 2400 movs r4, #0 8f7e: f04f 35ff mov.w r5, #4294967295 8f82: 4621 mov r1, r4 8f84: 6818 ldr r0, [r3, #0] 8f86: 4622 mov r2, r4 8f88: 605c str r4, [r3, #4] 8f8a: 609c str r4, [r3, #8] 8f8c: 619d str r5, [r3, #24] 8f8e: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr} 8f92: f7ff bc1d b.w 87d0 <_init+0xc4> 8f96: f7ff ebd6 blx 8744 <_init+0x38>
00008718 <.plt>: 8718: e52de004 push {lr} ; (str lr, [sp, #-4]!) 871c: e59fe004 ldr lr, [pc, #4] ; 8728 <_init+0x1c> 8720: e08fe00e add lr, pc, lr 8724: e5bef008 ldr pc, [lr, #8]! 8728: 000099d8 ldrdeq r9, [r0], -r8 [snip] 87d0: 4778 bx pc 87d2: 46c0 nop ; (mov r8, r8) 87d4: e28fc600 add ip, pc, #0 87d8: e28cca09 add ip, ip, #36864 ; 0x9000 87dc: e5bcf964 ldr pc, [ip, #2404]! ; 0x964
Note how at 0x8f92 (Thumb mode!) we have a branch to 0x87d0, which is a Thumb stub that switches to ARM mode and calls into the main ARM PLT stub at 0x87d4.
In the bad case we have instead:
00008fdc <__gcov_rewrite>: 8fdc: b538 push {r3, r4, r5, lr} 8fde: 4b0a ldr r3, [pc, #40] ; (9008 <__gcov_rewrite+0x2c>) 8fe0: 447b add r3, pc 8fe2: 699a ldr r2, [r3, #24] 8fe4: 2a00 cmp r2, #0 8fe6: dd0c ble.n 9002 <__gcov_rewrite+0x26> 8fe8: 2400 movs r4, #0 8fea: f04f 35ff mov.w r5, #4294967295 8fee: 4621 mov r1, r4 8ff0: 6818 ldr r0, [r3, #0] 8ff2: 4622 mov r2, r4 8ff4: 605c str r4, [r3, #4] 8ff6: 609c str r4, [r3, #8] 8ff8: 619d str r5, [r3, #24] 8ffa: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr} 8ffe: f000 bec7 b.w 9d90 <__fstat+0x10> 9002: f7ff ebd6 blx 87b0 <_init+0x38>
9d90: f7fe bd56 b.w 8840 <_init+0xc8>
00008784 <.plt>: 8784: e52de004 push {lr} ; (str lr, [sp, #-4]!) 8788: e59fe004 ldr lr, [pc, #4] ; 8794 <_init+0x1c> 878c: e08fe00e add lr, pc, lr 8790: e5bef008 ldr pc, [lr, #8]! [snip] 883c: 4778 bx pc 883e: 46c0 nop ; (mov r8, r8) 8840: e28fc600 add ip, pc, #0 8844: e28cca09 add ip, ip, #36864 ; 0x9000 8848: e5bcf94c ldr pc, [ip, #2380]! ; 0x94c
Here, the branch at 0x8ffe first goes to an extra stub at 0x9d90, which in turn branches directly to 0x8840 -- but this is the ARM version of the PLT stub, which now gets executed in Thumb mode!
The reason for this extra stub turns out to be a Cortex-A8 chip erratum: note how the b.w is a Thumb-2 4-byte instruction where its first two bytes are in a different page than its last two?
Such instructions are apparently not executed correctly if the branch goes to a target in the first of those pages. The linker therefore has code to check for this case, and insert an extra branch stub to compensate for this problem. When adding this extra stub, in the special case that the target was a PLT stub, the code forgot to check whether we need to also compensate for Thumb mode.
This bug actually was fixed by Richard Sandiford here: http://cygwin.com/ml/binutils/2011-04/msg00177.html
The fix got included into mainline here:
2011-05-06 Richard Sandiford richard.sandiford@linaro.org
* elf32-arm.c (cortex_a8_erratum_scan): If the stub is a Thumb branch to a PLT entry, redirect it to the PLT's Thumb entry point.
binutils 2.21 still has the bug, but binutils 2.22 is fixed.
Therefore I don't understand this:
- The fault persists with binutils 2.22
Did you perform the final link step creating the gcov-7.exe with the linker from binutils 2.22 ?
Do you have a version of that executable built this way that still shows the problem? I'd like to have a look at that ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 19 July 2012 06:29, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Michael Hope michael.hope@linaro.org wrote:
Hi Ramana, Ulrich. Could I have some help with an unexpected testsuite failure while backporting Carrot's adddi patch? testsuite/gcc.misc-tests/gcov-7.c builds and runs but aborts during leave() due to unexpected results.
The failing and working versions are on tcpanda02 as ~/gcov-7.exe and ~/gcov-7-ok.
OK, so I understand why ~/gcov-7.exe aborts. What actually happens is:
- The program runs through its test sucessfully, until the final exit.
- Within exit, gcov_exit processing is done.
- Within the gcov_exit subroutine __gcov_rewrite, a *tail* call to fseek is done (which is translated into a branch).
- This branch is supposed to go to a PLT stub, but instead goes haywire.
- Execution gets rerouted to somewhere in the main code.
- This is turn leads it to *again* check its proper exit conditions.
- This test now fails, leading to the abort.
In the good case ~/gcov-7-ok, the fseek PLT is called correctly. This (good case) looks like:
00008f70 <__gcov_rewrite>: 8f70: b538 push {r3, r4, r5, lr} 8f72: 4b0a ldr r3, [pc, #40] ; (8f9c <__gcov_rewrite+0x2c>) 8f74: 447b add r3, pc 8f76: 699a ldr r2, [r3, #24] 8f78: 2a00 cmp r2, #0 8f7a: dd0c ble.n 8f96 <__gcov_rewrite+0x26> 8f7c: 2400 movs r4, #0 8f7e: f04f 35ff mov.w r5, #4294967295 8f82: 4621 mov r1, r4 8f84: 6818 ldr r0, [r3, #0] 8f86: 4622 mov r2, r4 8f88: 605c str r4, [r3, #4] 8f8a: 609c str r4, [r3, #8] 8f8c: 619d str r5, [r3, #24] 8f8e: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr} 8f92: f7ff bc1d b.w 87d0 <_init+0xc4> 8f96: f7ff ebd6 blx 8744 <_init+0x38>
00008718 <.plt>: 8718: e52de004 push {lr} ; (str lr, [sp, #-4]!) 871c: e59fe004 ldr lr, [pc, #4] ; 8728 <_init+0x1c> 8720: e08fe00e add lr, pc, lr 8724: e5bef008 ldr pc, [lr, #8]! 8728: 000099d8 ldrdeq r9, [r0], -r8 [snip] 87d0: 4778 bx pc 87d2: 46c0 nop ; (mov r8, r8) 87d4: e28fc600 add ip, pc, #0 87d8: e28cca09 add ip, ip, #36864 ; 0x9000 87dc: e5bcf964 ldr pc, [ip, #2404]! ; 0x964
Note how at 0x8f92 (Thumb mode!) we have a branch to 0x87d0, which is a Thumb stub that switches to ARM mode and calls into the main ARM PLT stub at 0x87d4.
In the bad case we have instead:
00008fdc <__gcov_rewrite>: 8fdc: b538 push {r3, r4, r5, lr} 8fde: 4b0a ldr r3, [pc, #40] ; (9008 <__gcov_rewrite+0x2c>) 8fe0: 447b add r3, pc 8fe2: 699a ldr r2, [r3, #24] 8fe4: 2a00 cmp r2, #0 8fe6: dd0c ble.n 9002 <__gcov_rewrite+0x26> 8fe8: 2400 movs r4, #0 8fea: f04f 35ff mov.w r5, #4294967295 8fee: 4621 mov r1, r4 8ff0: 6818 ldr r0, [r3, #0] 8ff2: 4622 mov r2, r4 8ff4: 605c str r4, [r3, #4] 8ff6: 609c str r4, [r3, #8] 8ff8: 619d str r5, [r3, #24] 8ffa: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr} 8ffe: f000 bec7 b.w 9d90 <__fstat+0x10> 9002: f7ff ebd6 blx 87b0 <_init+0x38>
9d90: f7fe bd56 b.w 8840 <_init+0xc8>
00008784 <.plt>: 8784: e52de004 push {lr} ; (str lr, [sp, #-4]!) 8788: e59fe004 ldr lr, [pc, #4] ; 8794 <_init+0x1c> 878c: e08fe00e add lr, pc, lr 8790: e5bef008 ldr pc, [lr, #8]! [snip] 883c: 4778 bx pc 883e: 46c0 nop ; (mov r8, r8) 8840: e28fc600 add ip, pc, #0 8844: e28cca09 add ip, ip, #36864 ; 0x9000 8848: e5bcf94c ldr pc, [ip, #2380]! ; 0x94c
Here, the branch at 0x8ffe first goes to an extra stub at 0x9d90, which in turn branches directly to 0x8840 -- but this is the ARM version of the PLT stub, which now gets executed in Thumb mode!
The reason for this extra stub turns out to be a Cortex-A8 chip erratum: note how the b.w is a Thumb-2 4-byte instruction where its first two bytes are in a different page than its last two?
Such instructions are apparently not executed correctly if the branch goes to a target in the first of those pages. The linker therefore has code to check for this case, and insert an extra branch stub to compensate for this problem. When adding this extra stub, in the special case that the target was a PLT stub, the code forgot to check whether we need to also compensate for Thumb mode.
This bug actually was fixed by Richard Sandiford here: http://cygwin.com/ml/binutils/2011-04/msg00177.html
The fix got included into mainline here:
2011-05-06 Richard Sandiford richard.sandiford@linaro.org
* elf32-arm.c (cortex_a8_erratum_scan): If the stub is a Thumb branch to a PLT entry, redirect it to the PLT's Thumb entry point.
binutils 2.21 still has the bug, but binutils 2.22 is fixed.
Good, that's what I expected. Adding a --tool_opts '-Wl,-no-fix-cortex-a8' makes the test pass.
Therefore I don't understand this:
- The fault persists with binutils 2.22
Did you perform the final link step creating the gcov-7.exe with the linker from binutils 2.22 ?
I must have missed the collect-ld wrapper. Changing the absolute path to the new temporary binutils lets the test pass.
We have an explanation for the fault. I don't know why I can't reproduce it from the command line but will have to live with that.
-- Michael
linaro-toolchain@lists.linaro.org