Hi,
I'm trying to use pre-built version of linaro toolchain for cross-compiler in Ubuntu 11.04 on our 64bit server.
I got it from http://people.linaro.org/~michaelh/incoming/binaries/.
When I run arm-linux-gnueabi-gcc to compile a c source, it says "No such file or directory".
The steps are as below:
1. Unpack gcc-linaro-arm-linux-gnueabi-2011.12-20111219+bzr2309~linux.tar.bz2.
2. Rename gcc-linaro-arm-linux-gnueabi-2011.12-20111219+bzr2309~linux to arm-fsl-linux-gnueabi.
3. Copy to ~/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/
2. Use arm-linux-gnueabi-gcc to compile my code.
The output is:
r65388@shlinux3:~/MEMCPYBM$ ~/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/arm-fsl-linux-gnueabi/bin/arm-linux-gnueabi-gcc
-bash: /home/r65388/toolchain_ltib/gcc-linaro-4.6.3-glibc-2.13-singlelib-2011.12/arm-fsl-linux-gnueabi/bin/arm-linux-gnueabi-gcc: No such file or directory
I have installed lsb 4.0 and libc6 2.13.
r65388@shlinux3:~/MEMCPYBM$ dpkg -s lsb
Package: lsb
Status: install ok installed
Priority: extra
Section: misc
Installed-Size: 48
Maintainer: Ubuntu Developers <ubuntu-devel-discuss(a)lists.ubuntu.com>
Architecture: all
Version: 4.0-0ubuntu16
...
r65388@shlinux3:~/MEMCPYBM$ dpkg -s libc6-dev
Package: libc6-dev
Status: install ok installed
Multi-Arch: same
Priority: optional
Section: libdevel
Installed-Size: 11888
Maintainer: Ubuntu Developers <ubuntu-devel-discuss(a)lists.ubuntu.com>
Architecture: amd64
Source: eglibc
Version: 2.13-20ubuntu5
Provides: libc-dev
Depends: libc6 (= 2.13-20ubuntu5), libc-dev-bin (= 2.13-20ubuntu5), linux-libc-dev
Recommends: gcc | c-compiler
Suggests: glibc-doc, manpages-dev
Breaks: binutils (<< 2.20.1-1), binutils-gold (<< 2.20.1-11), cmake (<< 2.8.4+dfsg.1-5), gcc-4.4 (<< 4.4.6-3ubuntu1), gcc-4.4-base (<< 4.4.6-3ubuntu1), gcc-4.5 (<< 4.5.3-1ubuntu2), gcc-4.5-base (<< 4.5.3-1ubuntu2), gcc-4.6 (<< 4.6.0-12), gcj-4.4-base (<< 4.4.6-2ubuntu2), gcj-4.5-base (<< 4.5.3-1ubuntu2), gnat-4.4-base (<< 4.4.6-1ubuntu3), libhwloc-dev (<< 1.2-3), libjna-java (<< 3.2.7-4), liblouis-dev (<< 2.3.0-2), liblouisxml-dev (<< 2.4.0-2), make (<< 3.81-8.1), pkg-config (<< 0.26-1)
...
Do you know why?
How can I fix this issue?
Thanks~~
Yours
Terry
Hi Ramana,
as you pointed out, in the gcc.dg/vect/vect-double-reduc-6.c test case,
using compiler options as described in PR 51819, we see the following
inefficient code generation:
vmov.32 r2, d28[0] @ 57 vec_extractv4si [length = 4]
vmov.32 r1, d22[0] @ 84 vec_extractv4si [length = 4]
str r2, [r0, #4] @ 58 *thumb2_movsi_vfp/7 [length =
4]
vmov.32 r3, d0[0] @ 111 vec_extractv4si [length = 4]
str r1, [r0, #8] @ 85 *thumb2_movsi_vfp/7 [length =
4]
vst1.32 {d2[0]}, [r0:64] @ 31 neon_vst1_lanev4si
[length = 4]
str r3, [r0, #12] @ 112 *thumb2_movsi_vfp/7 [length =
4]
bx lr @ 120 *thumb2_return [length = 12]
(The :64 alignment in vst1.32 is incorrect; that is that actual problem in
PR 51819, which is now fixed.)
The reason for this particular code sequence turns out to be as follows:
The middle end tries to store the LSB vector lane to memory, and uses the
vec_extract named pattern to do so. This pattern currently only supports
an "s_register_operand" destination, and is implemented via vmov to a core
register. The contents of that register are then stored to memory. Now
why does any vst1 instruction show up? This is because combine is able to
merge the vec_extract back into the store pattern and ends up with a
pattern that matches neon_vst1_lanev4si. Note that the latter pattern is
actually intended to implement NEON built-ins (vst1_lane_... etc).
Now there seem to be two problems with this scenario:
First of all, the neon_vst1_lane<mode> patterns seem to be actually
incorrect on big-endian systems due to lane-numbering problems. As I
understand it, all NEON intrinsics are supposed to take lane numbers
according to the NEON ordering scheme, while the vec_select RTX pattern is
defined to take lane numbers according to the in-memory order. Those
disagree in the big-endian case. All other patterns implementing NEON
intrinsics therefore avoid using vec_select, and instead resort to using
special UNSPEC codes -- the sole exception to this happens to be
neon_vst1_lane<mode>. It would appear that this is actually incorrect, and
the pattern ought to use a UNSPEC_VST1_LANE unspec instead (that UNSPEC
code is already defined, but nowhere used).
Now if we make that change, then the above code sequence will contain no
vst1 any more. But in any case, expanding first into a vec_extract
followed by a store pattern, only to rely on combine to merge them back
together, is a suboptimal approach. One obvious drawback is that the
auto-inc-dec pass runs before reload, and therefore only sees plain stores
-- with no reason whatsoever to attempt to introduce post-inc operations.
Also, just in general it normally works out best to allow the final choice
between register and memory operands to be make in reload ...
Therefore, I think the vec_extract patterns ought to support *both*
register and memory destination operands, and implement those via vmov or
vst1 in final code generation, as appropriate. This means that we can make
the choice in reload, guided as usual by alternative ordering and/or
penalties -- for example, we can choose to reload the address and still use
vst1 over reloading the contents to a core register and then using an
offsetted store.
Finally, this sequence will also allow the auto-inc-dec pass to do a better
job. The current in-tree pass doesn't manage unfortunately, but with
Richard's proposed replacement, assisted by a tweak to the cost function to
make sure the (future) address reload is "priced in" correctly, I'm now
seeing what appears to be the optimal code sequence:
vst1.32 {d6[0]}, [r0:64]! @ 30 vec_extractv4si/1
[length = 4]
vst1.32 {d22[0]}, [r0]! @ 56 vec_extractv4si/1 [length =
4]
vst1.32 {d2[0]}, [r0:64]! @ 82 vec_extractv4si/1
[length = 4]
vst1.32 {d4[0]}, [r0] @ 108 vec_extractv4si/1 [length =
4]
bx lr @ 116 *thumb2_return [length = 12]
(Again the :64 is wrong; it's already fixed on mainline but I haven't
pulled that change in yet.)
The attached patch implements the above-mentioned changes. Any comments?
I'll try to get some performance numbers as well before moving forward with
the patch ...
(As an aside, it might likewise be helpful to update the vec_set patterns
to allow for memory operands, implemented via vld1.)
(See attached file: diff-gcc-arm-vecextractmem)
B.t.w. I'm wondering how I can properly test:
- that the NEON intrinsics still work
- that everything works on big-endian
Any suggestions would be appreciated!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
Hi Andrew. gcc-4.7~svn183693 just finished running through the auto
builders and builds and tests in the ARM, Cortex-A9, i686, and x86_64
configurations.
You could base the Linaro 4.7 branch off that. It's r114897 in bzr
and, suitably, Sandra is the author.
-- Michael
For your records:
Loïc wrote a script that pushes the revisions from a GCC branch into
the current development focus. This makes branching on a shared
repository much faster. The script is here:
http://bazaar.launchpad.net/~linaro-toolchain-dev/cbuild/tools/view/head:/e…
and it runs daily on a cronjob as the cbuild user on the 'apus' EC2
instance (phew).
We'll need to update this when 4.7 comes along.
-- Michael
I've gone through the documentation on the wiki about how to get
going with KVM on Fast Models, to clean it up and reorganise it
and add some of the missing bits (notably how to set up a KVM
guest kernel and filesystem). It's now at:
https://wiki.linaro.org/PeterMaydell/KVM/HowTo
The only minor point I'd still like to address is that at the
moment we document the old-style "build your kernel and arguments
into an .axf file" boot-wrapper, because my changes to support
specifying them at model runtime haven't yet gone into the
boot-wrapper git repo. When they do land I'll update the wiki.
(Yes, technically TCWG2011-A15-KVM says "one page summary" but
I thought splitting it into four pages was much clearer :-))
-- PMM
Ken, Åsa: could you add a -O0 and -O1 build to the size and benchmark
results? I'm looking at the writeup and it would be interesting to
contrast the speed/size of -O0 with -O2.
Ta,
-- Michael
Continued work on 64-bit shifts in core registers. This has now been
posted to gcc-patches, and is awaiting review.
64-bit shifts in NEON are also working correctly, but the register
allocator chooses not to use them most of the time. I've begun trying to
work out why, but it's quite involved in ira-costs.c and will take some
unpicking, I think.
Attempted to create a Linaro GCC 4.7 branch, but my test build failed,
so that'll have to wait until it's stabilized a little.
Hi Marcin, Ricardo. How is the work on pre-built sysroots coming
along? I'd like to use/reference them in the next binary toolchain
release.
I'm looking for:
* Scripts that produce the sysroots
* A README that covers what they contain and how to reproduce them
* Test plan
* An official tagged branch holding the above
* A tarball release of the above
* A tarball release of the different sysroots
* Done in a way so they easily integrate with the binary builds[1]
and are useful to others as a sysroot
* Relocatable
* Usable on win32 and Linux
all hosted somewhere. Zhenqiang and I can test and give you feedback.
For reference, here's the README for the binary builds:
http://launchpadlibrarian.net/90998258/README.txt
Here's the simple script I used to make a libc only sysroot:
http://bazaar.launchpad.net/~linaro-toolchain-dev/crosstool-ng/linaro/view/…
I used chdist as I don't know multistrap. I guessed and added
build-dep support to download the build dependencies. It also fixes
up the absolute symlinks to relative.
-- Michael
[1] https://launchpad.net/linaro-toolchain-binaries
==Progress===
* Fixed PR48308 on FSF trunk. Needs backporting to FSF GCC 4.6 branch
* Fixed a number of failing testcases on trunk.
* Read up on Partial-partial PRE . Slow progress but getting a handle
on the theory now. A couple of approaches being benchmarked . Still
slow progress.
* Debugged Andrew's issues with 64 bit shifts. Nice that skype screen
sharing works well on Ubuntu.
* Started notes for Connect 2012.q1.
* Looked into the strd / strexd failure on the testcase in trunk.
Looked at a small patch to implement sync_lock_releasedi for ARM but
needs some more time and effort. Filed issue
https://bugs.launchpad.net/gcc-linaro/+bug/922474
=== Plans ===
* Finish 1x AFDS
* Continue with partial-partial PRE .
* Finish backport of fix for PR50313 to appropriate branches
* Start preparing for Connect 2012.q1
* Do something about the PGO and ABI patches next week.
Absences.
* Feb 6-10 : Linaro Connect Q1.12.
* Feb 13- 18 : Holiday.