Re: AND vs UXTB

3 Aug 2012


      On Fri, Aug 3, 2012 at 3:53 PM, Richard Earnshaw rearnsha@arm.com wrote:
...
On 03/08/12 13:49, Mans Rullgard wrote:
...
I have noticed gcc has a preference for generating UXTB instructions
when an AND with #255 would do the same thing.  This is bad, because
on A9 UXTB has two cycles latency compared to one cycle for AND.  On
A8 both instructions have one cycle latency.
UXTB on the other hand is a 16-bit instruction, whereas AND is a 32-bit one.
Of the cores I'm aware of, only A9 has this performance anomaly.
While you are at it, please also consider blacklisting UXTAB
instruction variants when tuning for Cortex-A9 unless optimizing for
size.
I was fairly confident that I had a feature request in gcc bugzilla
about this, but apparently this is not the case. My bad.
-- 
Best regards,
Siarhei Siamashka

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: AND vs UXTB