Hello,
We have been using Linaro GCC 5.x[1] and valgrind.
When the optimizer is turned on valgrind complains about writes beyond the current stack pointer. With the optimizer off, the problem report goes away.
I have my own conclusion about what is going on but I won't bias you with it. Here are the facts:
All files and logs attached as 10K tar.gz if it survives this maillist.
test.c: #include <stdio.h>
int main(int argc,char** argv) { int i;
for (i = 1; i < argc; i++) { printf("argument is %s\n", argv[i]); }
return 0; }
$ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \ -mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \ -o test-fail test.c
$ valgrind --leak-resolution=high --track-origins=yes \ --trace-children=yes --leak-check=full --error-limit=no \ ./test-fail arg1 arg2 arg3
==20011== Memcheck, a memory error detector ==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==20011== Command: ./test-fail arg1 arg2 arg3 ==20011== ==20011== Invalid write of size 4 ==20011== at 0x10300: main (test.c:4) ==20011== Address 0xbdbfcb58 is on thread 1's stack ==20011== 24 bytes below stack pointer ==20011==
000102f8 <main>: 102f8: e3500001 cmp r0, #1 102fc: da000014 ble 10354 <main+0x5c> 10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8 ^^^^^^^^ Complaint is here
10304: e1a05001 mov r5, r1 10308: e3a04001 mov r4, #1 1030c: e1cd60f8 strd r6, [sp, #8] 10310: e300748c movw r7, #1164 ; 0x48c 10314: e1a06000 mov r6, r0 10318: e3407001 movt r7, #1 1031c: e58d8010 str r8, [sp, #16] 10320: e58de014 str lr, [sp, #20] 10324: e2844001 add r4, r4, #1 10328: e5b51004 ldr r1, [r5, #4]! 1032c: e1a00007 mov r0, r7 10330: ebffffe4 bl 102c8 printf@plt 10334: e1560004 cmp r6, r4 10338: 1afffff9 bne 10324 <main+0x2c> 1033c: e1cd40d0 ldrd r4, [sp] 10340: e3a00000 mov r0, #0 10344: e1cd60d8 ldrd r6, [sp, #8] 10348: e59d8010 ldr r8, [sp, #16] 1034c: e28dd014 add sp, sp, #20 10350: e49df004 pop {pc} ; (ldr pc, [sp], #4) 10354: e3a00000 mov r0, #0 10358: e12fff1e bx lr
Without the optimizer, the code looks different and valgrind does not issue any errors.
000103d8 <main>: 103d8: e52db008 str fp, [sp, #-8]! ^^^^^^^ Valgrind does not complain about this
103dc: e58de004 str lr, [sp, #4] 103e0: e28db004 add fp, sp, #4 103e4: e24dd010 sub sp, sp, #16 103e8: e50b0010 str r0, [fp, #-16] 103ec: e50b1014 str r1, [fp, #-20] ; 0xffffffec 103f0: e3a03001 mov r3, #1 103f4: e50b3008 str r3, [fp, #-8] 103f8: ea00000b b 1042c <main+0x54> 103fc: e51b3008 ldr r3, [fp, #-8] 10400: e1a03103 lsl r3, r3, #2 10404: e51b2014 ldr r2, [fp, #-20] ; 0xffffffec 10408: e0823003 add r3, r2, r3 1040c: e5933000 ldr r3, [r3] 10410: e1a01003 mov r1, r3 10414: e30004a4 movw r0, #1188 ; 0x4a4 10418: e3400001 movt r0, #1 1041c: ebffffa9 bl 102c8 printf@plt 10420: e51b3008 ldr r3, [fp, #-8] 10424: e2833001 add r3, r3, #1 10428: e50b3008 str r3, [fp, #-8] 1042c: e51b2008 ldr r2, [fp, #-8] 10430: e51b3010 ldr r3, [fp, #-16] 10434: e1520003 cmp r2, r3 10438: baffffef blt 103fc <main+0x24> 1043c: e3a03000 mov r3, #0 10440: e1a00003 mov r0, r3 10444: e24bd004 sub sp, fp, #4 10448: e59db000 ldr fp, [sp] 1044c: e28dd004 add sp, sp, #4 10450: e49df004 pop {pc} ; (ldr pc, [sp], #4)
[1] 5.3-2016.02 for Yocto-project and cross-compile 5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from recipes yet." Both versions give the same results for this test program.
---------------- William A. Mills Chief Technologist, Open Solutions, SDO Texas Instruments, Inc. 20450 Century Blvd Germantown MD 20878 240-643-0836
On Thu, Jun 9, 2016 at 2:22 PM, William Mills wmills@ti.com wrote:
When the optimizer is turned on valgrind complains about writes beyond the current stack pointer. With the optimizer off, the problem report goes away.
000102f8 <main>: 102f8: e3500001 cmp r0, #1 102fc: da000014 ble 10354 <main+0x5c> 10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8 ^^^^^^^^ Complaint is here
This optimization is called shrink-wrapping. It involves moving the function prologue/epilogue inside an outer-most if statement, so that we we can avoid allocating a stack frame when we don't need it. It can be disabled with -fno-shrink-wrap. Perhaps valgrind has special support to detect stack writes inside a prologue, and this support is failing when a function is shrink wrapped because it can't identify where the prologue is.
Jim
This looks like a valgrind bug to me.
I can reproduce the problem with this simple program, which shows the issue at any optimisation level.
int main () { asm volatile ("" : : : "r4", "r5"); return 0; }
[on my raspberry pi, with the system gcc] $ gcc test.c -mtune=cortex-a15 -marm $ valgrind ./a.out ==15850== Memcheck, a memory error detector ==15850== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==15850== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==15850== Command: ./a.out ==15850== ==15850== Invalid write of size 4 ==15850== at 0x103E8: main (in /home/cgb23/a.out) ==15850== Address 0xbdcf34a4 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes ...
000103e8 <main>: 103e8: e16d40fc strd r4, [sp, #-12]! 103ec: e58db008 str fp, [sp, #8] 103f0: e28db008 add fp, sp, #8 103f4: e3a03000 mov r3, #0 103f8: e1a00003 mov r0, r3 103fc: e24bd008 sub sp, fp, #8 10400: e1cd40d0 ldrd r4, [sp] 10404: e59db008 ldr fp, [sp, #8] 10408: e28dd00c add sp, sp, #12 1040c: e12fff1e bx lr
Without looking at the valgrind sources, I'd guess that valgrind isn't handling the strd instruction correctly. "size 4" obviously isn't correct for the strd, and it also may not be accounting for the writeback of the stack pointer correctly. Looking at google, I found this bug report to the valgrind mailing list: https://sourceforge.net/p/valgrind/mailman/message/34632852/. It seems to relate to the same issue, but did not attract any attention. A brief look at the attached patch suggests that the problem is related to the way valgrind handles writes to the stack with negative offsets and writeback.
The suggested --workaround-gcc296-bugs=yes option does seem to suppress the error. Alternatively, since the compiler will only use STRD/LDRD in the prologue and epilogue when compiling for cores with an out-of-order microarchitecture, you can workaround the problem by compiling with -mcpu=cortex-a7, in which case it will use PUSH and POP instead
On 9 June 2016 at 22:22, William Mills wmills@ti.com wrote:
Hello,
We have been using Linaro GCC 5.x[1] and valgrind.
When the optimizer is turned on valgrind complains about writes beyond the current stack pointer. With the optimizer off, the problem report goes away.
I have my own conclusion about what is going on but I won't bias you with it. Here are the facts:
All files and logs attached as 10K tar.gz if it survives this maillist.
test.c: #include <stdio.h>
int main(int argc,char** argv) { int i;
for (i = 1; i < argc; i++) { printf("argument is %s\n", argv[i]); } return 0;
}
$ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \ -mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \ -o test-fail test.c
$ valgrind --leak-resolution=high --track-origins=yes \ --trace-children=yes --leak-check=full --error-limit=no \ ./test-fail arg1 arg2 arg3
==20011== Memcheck, a memory error detector ==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==20011== Command: ./test-fail arg1 arg2 arg3 ==20011== ==20011== Invalid write of size 4 ==20011== at 0x10300: main (test.c:4) ==20011== Address 0xbdbfcb58 is on thread 1's stack ==20011== 24 bytes below stack pointer ==20011==
000102f8 <main>: 102f8: e3500001 cmp r0, #1 102fc: da000014 ble 10354 <main+0x5c> 10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8 ^^^^^^^^ Complaint is here
10304: e1a05001 mov r5, r1 10308: e3a04001 mov r4, #1 1030c: e1cd60f8 strd r6, [sp, #8] 10310: e300748c movw r7, #1164 ; 0x48c 10314: e1a06000 mov r6, r0 10318: e3407001 movt r7, #1 1031c: e58d8010 str r8, [sp, #16] 10320: e58de014 str lr, [sp, #20] 10324: e2844001 add r4, r4, #1 10328: e5b51004 ldr r1, [r5, #4]! 1032c: e1a00007 mov r0, r7 10330: ebffffe4 bl 102c8 printf@plt 10334: e1560004 cmp r6, r4 10338: 1afffff9 bne 10324 <main+0x2c> 1033c: e1cd40d0 ldrd r4, [sp] 10340: e3a00000 mov r0, #0 10344: e1cd60d8 ldrd r6, [sp, #8] 10348: e59d8010 ldr r8, [sp, #16] 1034c: e28dd014 add sp, sp, #20 10350: e49df004 pop {pc} ; (ldr pc, [sp], #4) 10354: e3a00000 mov r0, #0 10358: e12fff1e bx lr
Without the optimizer, the code looks different and valgrind does not issue any errors.
000103d8 <main>: 103d8: e52db008 str fp, [sp, #-8]! ^^^^^^^ Valgrind does not complain about this
103dc: e58de004 str lr, [sp, #4] 103e0: e28db004 add fp, sp, #4 103e4: e24dd010 sub sp, sp, #16 103e8: e50b0010 str r0, [fp, #-16] 103ec: e50b1014 str r1, [fp, #-20] ; 0xffffffec 103f0: e3a03001 mov r3, #1 103f4: e50b3008 str r3, [fp, #-8] 103f8: ea00000b b 1042c <main+0x54> 103fc: e51b3008 ldr r3, [fp, #-8] 10400: e1a03103 lsl r3, r3, #2 10404: e51b2014 ldr r2, [fp, #-20] ; 0xffffffec 10408: e0823003 add r3, r2, r3 1040c: e5933000 ldr r3, [r3] 10410: e1a01003 mov r1, r3 10414: e30004a4 movw r0, #1188 ; 0x4a4 10418: e3400001 movt r0, #1 1041c: ebffffa9 bl 102c8 printf@plt 10420: e51b3008 ldr r3, [fp, #-8] 10424: e2833001 add r3, r3, #1 10428: e50b3008 str r3, [fp, #-8] 1042c: e51b2008 ldr r2, [fp, #-8] 10430: e51b3010 ldr r3, [fp, #-16] 10434: e1520003 cmp r2, r3 10438: baffffef blt 103fc <main+0x24> 1043c: e3a03000 mov r3, #0 10440: e1a00003 mov r0, r3 10444: e24bd004 sub sp, fp, #4 10448: e59db000 ldr fp, [sp] 1044c: e28dd004 add sp, sp, #4 10450: e49df004 pop {pc} ; (ldr pc, [sp], #4)
[1] 5.3-2016.02 for Yocto-project and cross-compile 5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from recipes yet." Both versions give the same results for this test program.
William A. Mills Chief Technologist, Open Solutions, SDO Texas Instruments, Inc. 20450 Century Blvd Germantown MD 20878 240-643-0836
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
On 06/09/2016 09:07 PM, Charles Baylis wrote:
This looks like a valgrind bug to me.
I can reproduce the problem with this simple program, which shows the issue at any optimisation level.
int main () { asm volatile ("" : : : "r4", "r5"); return 0; }
[on my raspberry pi, with the system gcc] $ gcc test.c -mtune=cortex-a15 -marm $ valgrind ./a.out ==15850== Memcheck, a memory error detector ==15850== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==15850== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==15850== Command: ./a.out ==15850== ==15850== Invalid write of size 4 ==15850== at 0x103E8: main (in /home/cgb23/a.out) ==15850== Address 0xbdcf34a4 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes ...
000103e8 <main>: 103e8: e16d40fc strd r4, [sp, #-12]! 103ec: e58db008 str fp, [sp, #8] 103f0: e28db008 add fp, sp, #8 103f4: e3a03000 mov r3, #0 103f8: e1a00003 mov r0, r3 103fc: e24bd008 sub sp, fp, #8 10400: e1cd40d0 ldrd r4, [sp] 10404: e59db008 ldr fp, [sp, #8] 10408: e28dd00c add sp, sp, #12 1040c: e12fff1e bx lr
Without looking at the valgrind sources, I'd guess that valgrind isn't handling the strd instruction correctly.
Yes, this was my conclusion as well.
"size 4" obviously isn't correct for the strd, and it also may not be accounting for the writeback of the stack pointer correctly. Looking at google, I found this bug report to the valgrind mailing list: https://sourceforge.net/p/valgrind/mailman/message/34632852/. It seems to relate to the same issue, but did not attract any attention. A brief look at the attached patch suggests that the problem is related to the way valgrind handles writes to the stack with negative offsets and writeback.
Thanks for the patch pointer. I looked at the patch. The special casing of -8 in the original code looks like a hack to me. The patch looks right to me. It just removes the special casing of -8 and does the same for all negative values. The comment is wrong. The logic is handling the [SP, #-k]! form (Note the -> ! <-). Negative values w/o the SP update would still generate an error.
Will the compiler ever generate: strd Rd, [SP, Rm]! or strd Rd, [SP, Rm, LSL #k]!
where Rm is negative (or at all?)
Valgrind would currently not handle these cases at all.
The suggested --workaround-gcc296-bugs=yes option does seem to suppress the error. Alternatively, since the compiler will only use STRD/LDRD in the prologue and epilogue when compiling for cores with an out-of-order microarchitecture, you can workaround the problem by compiling with -mcpu=cortex-a7, in which case it will use PUSH and POP instead
On 9 June 2016 at 22:22, William Mills wmills@ti.com wrote:
Hello,
We have been using Linaro GCC 5.x[1] and valgrind.
When the optimizer is turned on valgrind complains about writes beyond the current stack pointer. With the optimizer off, the problem report goes away.
I have my own conclusion about what is going on but I won't bias you with it. Here are the facts:
All files and logs attached as 10K tar.gz if it survives this maillist.
test.c: #include <stdio.h>
int main(int argc,char** argv) { int i;
for (i = 1; i < argc; i++) { printf("argument is %s\n", argv[i]); } return 0;
}
$ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \ -mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \ -o test-fail test.c
$ valgrind --leak-resolution=high --track-origins=yes \ --trace-children=yes --leak-check=full --error-limit=no \ ./test-fail arg1 arg2 arg3
==20011== Memcheck, a memory error detector ==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==20011== Command: ./test-fail arg1 arg2 arg3 ==20011== ==20011== Invalid write of size 4 ==20011== at 0x10300: main (test.c:4) ==20011== Address 0xbdbfcb58 is on thread 1's stack ==20011== 24 bytes below stack pointer ==20011==
000102f8 <main>: 102f8: e3500001 cmp r0, #1 102fc: da000014 ble 10354 <main+0x5c> 10300: e16d41f8 strd r4, [sp, #-24]! ; 0xffffffe8 ^^^^^^^^ Complaint is here
10304: e1a05001 mov r5, r1 10308: e3a04001 mov r4, #1 1030c: e1cd60f8 strd r6, [sp, #8] 10310: e300748c movw r7, #1164 ; 0x48c 10314: e1a06000 mov r6, r0 10318: e3407001 movt r7, #1 1031c: e58d8010 str r8, [sp, #16] 10320: e58de014 str lr, [sp, #20] 10324: e2844001 add r4, r4, #1 10328: e5b51004 ldr r1, [r5, #4]! 1032c: e1a00007 mov r0, r7 10330: ebffffe4 bl 102c8 printf@plt 10334: e1560004 cmp r6, r4 10338: 1afffff9 bne 10324 <main+0x2c> 1033c: e1cd40d0 ldrd r4, [sp] 10340: e3a00000 mov r0, #0 10344: e1cd60d8 ldrd r6, [sp, #8] 10348: e59d8010 ldr r8, [sp, #16] 1034c: e28dd014 add sp, sp, #20 10350: e49df004 pop {pc} ; (ldr pc, [sp], #4) 10354: e3a00000 mov r0, #0 10358: e12fff1e bx lr
Without the optimizer, the code looks different and valgrind does not issue any errors.
000103d8 <main>: 103d8: e52db008 str fp, [sp, #-8]! ^^^^^^^ Valgrind does not complain about this
103dc: e58de004 str lr, [sp, #4] 103e0: e28db004 add fp, sp, #4 103e4: e24dd010 sub sp, sp, #16 103e8: e50b0010 str r0, [fp, #-16] 103ec: e50b1014 str r1, [fp, #-20] ; 0xffffffec 103f0: e3a03001 mov r3, #1 103f4: e50b3008 str r3, [fp, #-8] 103f8: ea00000b b 1042c <main+0x54> 103fc: e51b3008 ldr r3, [fp, #-8] 10400: e1a03103 lsl r3, r3, #2 10404: e51b2014 ldr r2, [fp, #-20] ; 0xffffffec 10408: e0823003 add r3, r2, r3 1040c: e5933000 ldr r3, [r3] 10410: e1a01003 mov r1, r3 10414: e30004a4 movw r0, #1188 ; 0x4a4 10418: e3400001 movt r0, #1 1041c: ebffffa9 bl 102c8 printf@plt 10420: e51b3008 ldr r3, [fp, #-8] 10424: e2833001 add r3, r3, #1 10428: e50b3008 str r3, [fp, #-8] 1042c: e51b2008 ldr r2, [fp, #-8] 10430: e51b3010 ldr r3, [fp, #-16] 10434: e1520003 cmp r2, r3 10438: baffffef blt 103fc <main+0x24> 1043c: e3a03000 mov r3, #0 10440: e1a00003 mov r0, r3 10444: e24bd004 sub sp, fp, #4 10448: e59db000 ldr fp, [sp] 1044c: e28dd004 add sp, sp, #4 10450: e49df004 pop {pc} ; (ldr pc, [sp], #4)
[1] 5.3-2016.02 for Yocto-project and cross-compile 5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from recipes yet." Both versions give the same results for this test program.
William A. Mills Chief Technologist, Open Solutions, SDO Texas Instruments, Inc. 20450 Century Blvd Germantown MD 20878 240-643-0836
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
linaro-toolchain@lists.linaro.org