Re: Help on merging two patches

5 Nov 2010


      On Wed, 2010-11-03 at 17:39 +0800, Yao Qi wrote:
...
Hi,
I am backporint some patches from FSF mainline, which may improve Linaro 
4.5 gcc on thumb2 speed.
The first one is done by Richard E. "Improve optimization to transform 
TST into LSLS"
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html
After it applied to Linaro 4.5 tree, EEMBC speed number downgrades, 
while code size is reduced to some extent.  The code difference is like 
this,
6801      	ldr	r1, [r0, #0]
  f831 3013 	ldrh.w	r3, [r1, r3, lsl #1]
-f413 6f00 	tst.w	r3, #2048	; 0x800
-f43f af41 	beq.w	cc <t_run_test+0xcc>
+0518      	lsls	r0, r3, #20
+f57f af44 	bpl.w	cc <t_run_test+0xcc>
  4610      	mov	r0, r2
After reading cortex-a8 TRM, I can't find exact timing cycles of lsls. 
Under Chung-Lin's help, we feel that lsls should be slower than tst, but 
don't have any evidence to prove.  If any people is familiar with arm 
microarch, help is welcome.  If our assumption is correct, we may can 
change this patch to an optimization specific to size only.
The second patch is Bernd's "Fix an if statement in arm_rtx_costs_1"
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html
After this patch applied, EEMBC benchmark number is not changed.  Shall 
we merge this patch to linaro 4.5 tree?  I am inclined to merge it, but 
if you have concerns on this patch, let us discuss here.
So I have no reason to expect lsls to ever take longer to execute than
tst.  I suspect what you are seeing here is some unfortunate side effect
that can't be explained from the small code snippet.  An example would
include BTAC aliasing, but there could be other reasons for this
happening.
So overall, I'd expect the change to be a Good Thing (tm), but there's
always the chance that individual blocks of code may run more slowly.
R.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Help on merging two patches