On 25/03/11 21:48, Diane Holt wrote:
> I hope you don't mind me sending you mail, but I'm a bit stuck...I've
> been told I need the Linaro 4.5.2 toolchain because it has some "neon
> optimizations" that the CS 4.5.1 doesn't have.
In general, you'd be better addressing these questions on the Linaro
Toolchain mailing list: linaro-toolchain(a)lists.linaro.org (I've copied
it in).
Not least because I'm on vacation for the next week. :)
> Unfortunately, the Linaro
> 4.5.2 that's available for download (already built) won't work in my
> Scratchbox environment, since it was compiled against a glibc that's too
> new. The CS 4.5.1 works fine -- but I'm not allowed to use it, because
> of the neon stuff.
The CS and Linaro compilers are really very similar, but CodeSourcery
has not made a release since the autumn, so Linaro will have some extra
features.
> Do you know whether CS actually does have (or will have) the same neon
> optimizations Linaro has?
It depends which optimizations you are referring to? The existing CS
release had the latest improvements at the time it was released, and I
believe that the upcoming release will probably be very similar to
Linaro (at least, with respect to ARMv7 - there'll be many differences
for other architecture variants), but I'm not promising that.
Sorry if that's a bit vague, but I the contents of the next CS release
is still not finalised.
> If it doesn't (and won't), then I'm going to have to build the Linaro
> one from source. Unfortunately, I've not been able to find any detailed
> information on how to go about doing that. Do you know if that's
> documented anywhere?
Are you talking about building native compiler, or a cross-compiler? The
former is very simple (provided you have all the dependencies), while
the latter is more involved.
Here's the recipe to build a native compiler:
tar xf gcc-linaro.....tar.bz2
mkdir objdir
cd objdir
../gcc-linaro....../configure --prefix=<your-install-path> <opts>
make bootstrap
make install
You can copy the configure <opts> from another compiler using 'gcc -v'
and './configure --help' in the source tree should tell you what they mean.
If you want to build a cross compiler, I suggest you look at crosstool
or crosstool-ng, or OpenEmbedded. Building cross-toolchains is non-trivial.
Hope that helps.
Andrew
Hi All,
After downloading linaro toolchain by apt-get in ubuntu, I compiled
the uboot for ARM1136 SoC with -march=armv5 option. And it can compile
successfully. Then I let the uboot run on target boards and system
failed due to "undefined instructions". Checked linaro toolchain
options, it is:
#arm-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.5.2/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.5.2-5ubuntu2~ppa1'
--with-bugurl=file:///usr/share/doc/gcc-4.5/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.5 --enable-shared --enable-multiarch
--enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/arm-linux-gnueabi/include/c++/4.5.2
--libdir=/usr/lib --enable-nls --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin
--enable-gold --enable-ld=default --with-plugin-ld=ld.gold
--enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a
--with-float=softfp --with-fpu=vfpv3-d16 --with-mode=thumb
--disable-werror --enable-checking=release
--program-prefix=arm-linux-gnueabi-
--includedir=/usr/arm-linux-gnueabi/include --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=arm-linux-gnueabi
--with-headers=/usr/arm-linux-gnueabi/include
--with-libs=/usr/arm-linux-gnueabi/lib
Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-5ubuntu2~ppa1)
The imporant options are "--with-arch=armv7-a --with-float=softfp
--with-fpu=vfpv3-d16". I just want to ask whether these options stop
arm-linux-gnueabi-gcc to support old arch? If so, according to gcc
documents at http://gcc.gnu.org/install/configure.html,
"
--with-cpu=cpu
--with-cpu-32=cpu
--with-cpu-64=cpu
Specify which cpu variant the compiler should generate code for by
default. cpu will be used as the default value of the -mcpu= switch.
This option is only supported on some targets, including ARM, i386,
M68k, PowerPC, and SPARC. The --with-cpu-32 and --with-cpu-64 options
specify separate default CPUs for 32-bit and 64-bit modes; these
options are only supported for i386, x86-64 and PowerPC.
--with-schedule=cpu
--with-arch=cpu
--with-arch-32=cpu
--with-arch-64=cpu
--with-tune=cpu
--with-tune-32=cpu
--with-tune-64=cpu
--with-abi=abi
--with-fpu=type
--with-float=type
These configure options provide default values for the
-mschedule=, -march=, -mtune=, -mabi=, and -mfpu= options and for
-mhard-float or -msoft-float. As with --with-cpu, which switches will
be accepted and acceptable values of the arguments depend on the
target.
"
There are only default values for later compiling. Users should be
able to swith to other values by setting other options. But why did
arm-linux-gnueabi-gcc still build "undefined instructions" to arm1136
with "arch=armv5"? In fact arm1136 is armv6.
Then i compiled a toolchain for linaro gcc-linaro-4.4-2011.02-0 codes
by myself, the options are simple:
#arm-none-linux-gnueabi-gcc -v
Using built-in specs.
Target: arm-none-linux-gnueabi
Configured with: ../gcc-linaro-4.4-2011.02-0/configure
--target=arm-none-linux-gnueabi
--prefix=/home/vmuser/development/toolchain/build-toolchain/tools
--enable-languages=c,c++ --disable-libgomp
Thread model: posix
gcc version 4.4.5 (Linaro GCC 4.4-2011.02-0)
Then I compiled uboot by this toolchain again, the uboot can work.
Then why can the toolchain compiled by myself support more arch? And
what performance is lost in my compiling?
Thanks
Barry
Hello,
* Submitted merge requests for SMS patch to gcc-linaro and gcc-linaro/4.6.
* Testing SMS patch which extends the current implementation to
consider loops that contain
instructions with REG_INC_NOTE.
* Filed PRs 48336 48380 for recent fails of trunk on ARM.
* Had a chat with Ramana about the DENbench benchmarks, directions and findings.
* Filed PR 745743 in linaro gcc-bugzilla
Thanks,
Revital
Hi,
* continued bringing patches upstream
- auto-detection of vector size - committed
- changing default vector size to 128 - submitted and testing the
final version
- if-conversion improvement - submitted and now testing the final version
* gcc-linaro-4.6
- submitted a merge request for store sink patch (this patch is
already upstream)
Ira
For reference. We know that the NEON intrinsics in GCC have issues.
I came across this page:
http://hilbert-space.de/?p=22
which has a colour to greyscale conversion done using intrinsics.
gcc-linaro-4.5-2011.03-0 does poorly through saving intermediate
values on the stack. The core of the loop is:
.L3:
mov ip, r4
vld3.8 {d16-d18}, [r6]
vstmia r4, {d16-d18}
ldmia ip!, {r0, r1, r2, r3}
mov sl, r9
adds r7, r7, #1
adds r6, r6, #24
stmia sl!, {r0, r1, r2, r3}
fldd d16, [sp, #24]
fldd d18, [sp, #32]
ldmia ip, {r0, r1}
vmull.u8 q8, d16, d19
stmia sl, {r0, r1}
vmlal.u8 q8, d18, d20
fldd d18, [sp, #40]
vmlal.u8 q8, d18, d21
vshrn.i16 d16, q8, #8
vst1.8 {d16}, [r5]
adds r5, r5, #8
cmp r8, r7
bgt .L3
llvm-2.9~svn128540 does much better:
vld3.8 {d20, d21, d22}, [r1]!
add r3, r3, #1
cmp r3, r2
vmull.u8 q12, d21, d16
vmlal.u8 q12, d20, d17
vmlal.u8 q12, d22, d18
vshrn.i16 d19, q12, #8
vst1.8 {d19}, [r0]!
blt .LBB0_1
and may actually be better than the had-written assembler on Nils's
page due to scheduling the loop comparison earlier.
Richard S, were you looking into this?
-- Michael
Hi there. A reminder that today's call has shifted due to the
European daylight savings change. It's now at 0800 UTC which is 9 am
in the UK, 10 am in central Europe, and 10 am in Israel.
-- Michael
== Last week ==
* PR46934: Thumb-1 ICE, small fix in the "casesi" jump-table expand
code. Quickly approved and committed upstream.
* Enhance XOR patch for gcc/simplify-rtx.c. Updated comments and
committed upstream.
* PR48250 / CS Issue #9845 / Launchpad #723185. Unaligned DImode reload
under NEON. Submitted patch upstream, but still need to do some more
verification that older pre-ARMv5TE cases are safe. Should complete this
week.
* Working on a type of ICE seen currently on upstream trunk, a few
testcases failing under '-O3 -g'. It seems VTA related, but also might
have something to do with register elimination not fully done for
(var_location (entry_value ...)) expressions, leaving [afp+#num] memory
addresses existing in debug insns after reload. Still investigating.
* Launchpad #689887, ICE in get_arm_condition_code(). Pushed a merge
request to Linaro 4.5 for this patch. Also another LP#742961 appeared as
another case of this ICE...
* Still working on (what I think should be) the last of the CoreMark
ARMv6 regressions. The problem is to combine uxtb+cmp into ands #255.
This could be done by adding (set (cc) (compare (zero_extend...)))
patterns, implemented by ands assembly, but still looking if this can be
done (probably more elegantly) by something like CANONICALIZE_COMPARISON
(replacing compare operands) in the ARM backend.
* Launchpad #736007, ICE immed_double_const under -mfpu=neon -g. Some
discussion on gcc-patches about this, still unclear on what should be
done...
== This week ==
* Push forward on above issues.
Committed Dan's RVCT interoperation patch, both upstream and to Linaro
GCC 4.6.
Adjusted Benrd's "Discourage NEON on Cortex-A8" patch following Richard
Earnshaw's comments, and reposted upstream. The new version was
approved, and committed. I've also submitted a merge proposal to Linaro
GCC 4.6.
Dropped Tom's patch for marking smalls strings read-only. This
optimization seems to have no visible effect for ARM in GCC 4.6. I'll
leave it it to Tom to forward-port, if it's still meaningful for MIPS.
Julian has committed the patch for lp:675347, so I've submitted merge
requests to both Linaro GCC 4.5 and 4.6.
Bernd has posted the shrink wrapping patches upstream. I've posted this
info in all the relevant Linaro tracking tickets.
Talked Revital Eres through the Bazaar/Launchpad merge request system.
Tried to understand why GCC 4.6 does not use multiply-and-accumulate
efficiently, when used with 64-bit values. It seems that the compiler
sometimes uses (subreg:SI (reg:DI ...)) and sometimes just uses a plain
(reg:SI ..) and those don't combine to give useful patterns, but I
haven't got to the bottom of it yet.
Tested an FSF GCC 4.6 snapshot from the 23rd. All well, so I've merged
it to the Linaro GCC 4.6 branch.
* Future Absence
Away Monday 28th to Friday 1st April.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html