Yao Qi wrote:
Hi, We are looking for some possible improvements and optimizations on thumb2 code size. Currently, I am running some benchmarks with compilation flag "-Os -march=armv7-a -mthumb", and hope to find some thing interesting that we can improve. Beside that, do you have some ideas on this topic? or do you have some observations on thumb2 code that we may probably improve the size?
Any thoughts on this are appreciated.
I found some new possible improvements. Your comments on them are welcome. See more details in https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SizeOptimize
10. Replace multiple vldr by vldm Observed in bezier01float/bez.o, 8: f100 0438 add.w r4, r0, #56 ; 0x38 c: b085 sub sp, #20 e: 2600 movs r6, #0 10: e03d b.n 8e <interpolatePoints+0x8e> 12: e954 2302 ldrd r2, r3, [r4, #-8] 16: 2500 movs r5, #0 18: ed14 ab0e vldr d10, [r4, #-56] ; 0xffffffc8 // <-- 1c: ed14 bb0c vldr d11, [r4, #-48] ; 0xffffffd0 // <-- 20: ed14 cb0a vldr d12, [r4, #-40] ; 0xffffffd8 // <-- 24: ed14 db08 vldr d13, [r4, #-32] ; 0xffffffe0 // <-- 28: e9cd 2300 strd r2, r3, [sp] 2c: ed14 eb06 vldr d14, [r4, #-24] ; 0xffffffe8 // <--
These vldr instructions can be replaced by one vldm.
11. Replace str/ldr by memcpy Observed in bezier01fixed/pointio.o:outputPoints() 00000000 <outputPoints>: 0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 4: 4604 mov r4, r0 6: b089 sub sp, #36 ; 0x24 8: 2600 movs r6, #0 a: 460f mov r7, r1 c: e025 b.n 5a <outputPoints+0x5a> e: 68e3 ldr r3, [r4, #12] 10: 2500 movs r5, #0 12: e894 0e00 ldmia.w r4, {r9, sl, fp} 16: 9303 str r3, [sp, #12] 18: 6923 ldr r3, [r4, #16] 1a: 9304 str r3, [sp, #16] 1c: 6963 ldr r3, [r4, #20] 1e: 9305 str r3, [sp, #20] 20: 69a3 ldr r3, [r4, #24] 22: 9306 str r3, [sp, #24] 24: 69e3 ldr r3, [r4, #28] 26: 9307 str r3, [sp, #28] code size will be smaller if we replace ldr/str by memcpy().
12. uxth/sxth Observed in automotive/idctrn01/bmark.c short unPack( unsigned char c ) { /* Only want lower four bit nibble */ c = c & (unsigned char)0x0F ;
if( c > 7 ) { /* Negative nibble */ return( ( short )( c - 16 ) ) ; } else { /* positive nibble */ return( ( short )c ) ; } }
GCC produces code like this, 00000024 <unPack>: 24: f000 000f and.w r0, r0, #15 28: 2807 cmp r0, #7 2a: d901 bls.n 30 <unPack+0xc> 2c: 3810 subs r0, #16 2e: b280 uxth r0, r0 <--[1] 30: b200 sxth r0, r0 <--[2] 32: 4770 bx lr
Are instruction [1] and [2] redundant? Can we remove these two instructions? If they are redundant, we can remove them safely.