* Goal Goal of this work is to look for thumb2 code size improvements on FSF GCC trunk.
* Methodology ** Build FSF GCC trunk w/ and wo/ hardfp, run benchmarks including eembc, spec2000, and dhrystone, and check asm code to see if there is any possible improvements on size. ** Get input and suggestion from ARM experts. ** Search open PRs in GCC bugzilla.
* Results Each item has been tracked on launchpad, and is listed with some elements, ** Cause: cause of this problem is known or unknown ** Difficulty: estimation of implementation difficulty ** Recommendation: Yao's recommendation on that bug for next step
1. LP:633233 Push/pop low register rather than high register when keeping stack alignment As Richard E. pointed out, it was implemented in gcc-4.5 on 2009, but Yao still can see the usage of r8 on FSF GCC trunk. Cause: Might be a regression if problem disappears on gcc-4.5. Difficulty: Easy. might not hard to fix a regression. Recommendations: Fix this regression if it is.
2. LP:633243 Improve regrename to make use of low registers. Get input from Bernd S. and Julian B. Initial implementation has been suggested by Bernd S. Cause: current regrename in gcc treats high and low registers equally. Difficulty: Medium. Recommendation: Implement it as Bernd suggested, and do benchmarking to see how much size is improved.
3. LP:634682 Redundant uxth/sxth insn are generated Cause: Unknown Difficulty: Unknown Recommendation: No recommendation so far.
4. LP:634696 Function is not inlined properly with -Os In consumer/cjpeg/jmemmgr.c, GCC inlined out_of_memory() with -Os, so increase code size. Cause: Unknown. Difficulty: Unknown Recommendation: Educate GCC to inline carefully when -Os is turned on.
5. GCC PR40730 LP:634731 Redundant memory load
6. LP:634738 inefficient code to extract least bits from an integer value GCC PR40697 is for thumb-1. The same problem is in thumb-2. Cause: Unknown. Difficulty: Medium. Recommendation: Fix it the similar way as fixing GCC PR40697.
7. LP:634891 Replace load/store by memcpy more aggressively Difficulty: Should be easy. Recommendation: Fix to this problem might be "reduce threshold value once -Os is turned on".
8. LP:637220 allocate local variables with fewer instructions GCC PR40657 is about this kind of problem, and was fixed. The similar prolbme exits on gcc with hardfp. Cause: Unknown. Difficulty: Unknown. Recommendation: No recommendation so far.
9. GCC PR 43721 Failure to optimize (a/b) and (a%b) into single __aeabi_idivmod call Difficulty: Medium or easy. Recommendation: No.
10. LP:637814 Combine add/move to add LP:637882 Combine ldr/mov to ldr Possible improvements have been found. No idea how to fix it yet. Cause: Unknown. Difficulty: Unknown. Recommendation: No.
11. LP:638014 Replace memset by memclr when 2nd parameter is zero Difficulty: Easy. Recommendation: No recommendation so far.
12. LP:625233 Merge constant pools for small functions Cause: Unknown. Difficulty: Medium. Recommendation: No.
13. LP:638935 Replace multiple vldr by vldm Some vldr insns accessing consecutive address can be replaced by single vldm. It is not about thumb2, but related to code size optimization. Cause: Unknown. Difficulty: Medium. Recommendation: No.