On 6 November 2012 02:48, Rob Herring <robherring2(a)gmail.com> wrote:
>
> On 11/05/2012 05:13 AM, Russell King - ARM Linux wrote:
> > On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
> >> On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
> >>> On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> >>>> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> >>>>> On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> >>>>>> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> >>>>>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >>>>>>>
> >>>>>>>> While v6 can support unaligned accesses, it is optional and current
> >>>>>>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> >>>>>>>> v6.
> >>>>>>>
> >>>>>>> not true according to the gcc changes page
> >>>>>>
> >>>>>> What are you going to believe: documentation or what the compiler
> >>>>>> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> >>>>>> support backported and 4.7.2, unaligned accesses are emitted for v7
> >>>>>> only. I guess default here means it is the default unless you change the
> >>>>>> default in your build of gcc.
> >>>>>
> >>>>> Since ARMv6 can handle unaligned access in the same way as ARMv7
> >>>>> it seems a clear bug in gcc which might hopefully get fixed.
> >>>>> Thus in this case I think it is reasonable to follow the
> >>>>> gcc documentation, otherwise the code would break for ARMv6
> >>>>> when gcc gets fixed.
> >>>>
> >>>> But the compiler can't assume the state of the U bit. I think it is
> >>>> still legal on v6 to not support unaligned accesses, but on v7 it is
> >>>> required. All the standard v6 ARM cores support it, but I'm not sure
> >>>> about custom cores or if there are SOCs with buses that don't support
> >>>> unaligned accesses properly.
> >>>
> >>> Well, I read the "...since Linux version 2.6.28" comment
> >>> in the gcc changes page in the way that they assume the
> >>> U-bit is set. (Although I'm not sure it really is???)
> >>
> >> Actually, the kernel checks the arch version and the U bit on boot,
> >> and chooses the appropriate setting for the A bit depending on the
> >> result. (See arch/arm/mm/alignment.c:alignment_init().)
> >
> > That is in the kernel itself, _after_ the decompressor has run. It is
> > not relevant to any discussion about the decompressor.
> >
> >> Currently, we depend on the CPU reset behaviour or firmware/
> >> bootloader to set the U bit for v6, but the behaviour should be
> >> correct either way, though unaligned accesses will obviously
> >> perform (much) better with U=1.
> >
> > Will someone _PLEASE_ address my initial comments against this patch
> > in light of the fact that it's now been proven _NOT_ to be just a V7
> > issue, rather than everyone seemingly buring their heads in the sand
> > over this.
>
> I tried adding -munaligned-accesses on a v6 build and still get byte
> accesses rather than unaligned word accesses. So this does seem to be a
> v7 only issue based on what gcc will currently produce. Copying Michael
> Hope who can hopefully provide some insight on why v6 unaligned accesses
> are not enabled.
This looks like a bug. Unaligned access is enabled for armv6 but
seems to only take effect for cores with Thumb-2. Here's a test case
both with unaligned field access and unaligned block copy:
struct foo
{
char a;
int b;
struct
{
int x[3];
} c;
} __attribute__((packed));
int get_field(struct foo *p)
{
return p->b;
}
int copy_block(struct foo *p, struct foo *q)
{
p->c = q->c;
}
With -march=armv7-a you get the correct:
bar:
ldr r0, [r0, #1] @ unaligned @ 11 unaligned_loadsi/2 [length = 4]
bx lr @ 21 *arm_return [length = 12]
baz:
str r4, [sp, #-4]! @ 25 *push_multi [length = 4]
mov r2, r0 @ 2 *arm_movsi_vfp/1 [length = 4]
ldr r4, [r1, #5]! @ unaligned @ 9 unaligned_loadsi/2 [length = 4]
ldr ip, [r1, #4] @ unaligned @ 10 unaligned_loadsi/2 [length = 4]
ldr r1, [r1, #8] @ unaligned @ 11 unaligned_loadsi/2 [length = 4]
str r4, [r2, #5] @ unaligned @ 12 unaligned_storesi/2 [length = 4]
str ip, [r2, #9] @ unaligned @ 13 unaligned_storesi/2 [length = 4]
str r1, [r2, #13] @ unaligned @ 14 unaligned_storesi/2 [length = 4]
ldmfd sp!, {r4}
bx lr
With -march=armv6 you get a byte-by-byte field access and a correct
unaligned block copy:
bar:
ldrb r1, [r0, #2] @ zero_extendqisi2
ldrb r3, [r0, #1] @ zero_extendqisi2
ldrb r2, [r0, #3] @ zero_extendqisi2
ldrb r0, [r0, #4] @ zero_extendqisi2
orr r3, r3, r1, asl #8
orr r3, r3, r2, asl #16
orr r0, r3, r0, asl #24
bx lr
baz:
str r4, [sp, #-4]!
mov r2, r0
ldr r4, [r1, #5]! @ unaligned
ldr ip, [r1, #4] @ unaligned
ldr r1, [r1, #8] @ unaligned
str r4, [r2, #5] @ unaligned
str ip, [r2, #9] @ unaligned
str r1, [r2, #13] @ unaligned
ldmfd sp!, {r4}
bx lr
readelf -A shows that the compiler planned to use unaligned access in
both. My suspicion is that GCC is using the extv pattern to extract
the field from memory, and that pattern is only enabled for Thumb-2
capable cores.
I've logged PR55218. We'll discuss it at our next meeting.
-- Michael
== Progress ==
* Maintenance
- Fixing ARM buildbots, poking people to fix bugs, keeping them green
- http://llvm.org/viewvc/llvm-project?view=rev&revision=173510
* Cost Model
- Fixing some bugs on the generic code
- http://llvm.org/viewvc/llvm-project?view=rev&revision=173691
- Adding some simple free cast (plus some infrastructure)
- http://llvm.org/viewvc/llvm-project?view=rev&revision=173849
* LLVM
- Investigating APFloat issue on Chromebook (bad libraries?)
- Clang miscompiles and show same synthoms, will play with options next
week
- AArch64 back-end in, to be built by default
* LAVA
- Got three last errors due to include path ('bits/predefs.h' file not
found)
- libc6-dev + libstdc++-dev have no effect, problem doesn't show on
buildbots
- Testing heating problem with multiple images (only 12.02 is good)
- Testing other boards, other images (with Dave)
* Friday Holiday
== Plan ==
* Try a bit more on the APFloat issue in Chromebook, but I think that's
just bad distro (ChrUbuntu), since no one else has this problem. Has anyone
put any Linaro image on a Chromebook?
* Continue working on getting faster builds on LAVA (quad-core origen,
Arndale, etc) with Dave Pigot.
* Continue micro-benchmarking the vectorization and updating the
cost-model. Start discussing the side-effects that are not modelled at all.
The Linaro Toolchain Working Group and Platform Team are pleased to
announce the 2013.01
release of the Linaro Toolchain Binaries, a pre-built version of
Linaro GCC and Linaro GDB that runs on generic Linux or Windows and
targets the glibc Linaro Evaluation Build.
Uses include:
* Cross compiling ARM applications from your laptop
* Remote debugging
* Build the Linux kernel for your board
What's included:
* Linaro GCC 4.7 2013.01
* Linaro GDB 7.5 2012.12
* A statically linked gdbserver
* A system root
* Manuals under share/doc/
The system root contains the basic header files and libraries to link
your programs against.
The Linux version is supported on Ubuntu 10.04.3 and 12.04, Debian
6.0.2, Fedora 16, openSUSE 12.1, Red Hat Enterprise Linux Workstation
5.7 and later, and should run on any Linux Standard Base 3.0
compatible distribution. Please see the README about running on
x86_64 hosts.
The Windows version is supported on Windows XP Pro SP3, Windows Vista
Business SP2, and Windows 7 Pro SP1.
The binaries and build scripts are available from:
https://launchpad.net/linaro-toolchain-binaries/trunk/2013.01
Need help? Ask a question on https://ask.linaro.org/
Already on Launchpad? Submit a bug at
https://bugs.launchpad.net/linaro-toolchain-binaries
On IRC? See us on #linaro on Freenode.
Other ways that you can contact us or get involved are listed at
https://wiki.linaro.org/GettingInvolved.
Hi,
I have a few armv7 assembly tests. I'm trying to compile these using the linaro aarch64 toolchain and I'm getting errors.
Is there any specific flag that I have to pass to enable backward compatibility to allow v7 assembly to be compiled for a v8 model?
reset.s: Assembler messages:
reset.s:32: Error: operand 1 should be an integer register -- `mov r0,#0'
reset.s:33: Error: unknown mnemonic `mcr' -- `mcr p15,0,R0,C13,c0,1'
reset.s:36: Error: unknown mnemonic `mrc' -- `mrc p15,0,r0,c1,c0,0'
reset.s:40: Error: operand 1 should be a SIMD vector register -- `orr r0,r0,#0x00001000'
....
Relevant assembly code:
....
_reset:
// init Context ID Register
MOV r0, #0
MCR p15, 0, R0, C13, c0, 1
// Enable Instruction cache
mrc p15, 0, r0, c1, c0, 0
/* set bits:
12 = I i-cache
*/
orr r0, r0, #0x00001000
mcr p15, 0, r0, c1, c0, 0
.....
This is my assembler command: aarch64-linux-gnu-as -march=armv8-a+fp --keep-locals -o "reset.o" "reset.s"
Thanks,
Kalai
== Progress ==
* 64-bits ops in Neon: waiting for upstream.
* vectorizer cost model: initial activation with unaligned load/store
cost equal to aligned ones; benchmarking shows no significant
difference.
* smin-umin: a few benchmarks show a few unexpected regressions (10-15%).
* setting up spec2k on local board
* tcpanda heat problems: GCC built OK. Don't know how hot it became.
== Next ==
* handle 64-bits bitops in Neon feedback from upstream if any.
* analyze regressions in smin-umin
* check if more tuning of the vectorizer cost model is desirable.
* finish local board setup
* tcpanda: run gcc testsuite to check heat
== Progress ==
* Boehm GC AArch64 support:
- Tested on Foundation model
- Patches sent to mailing list
- Boehm GC has been accepted and merged into mainline
- Libatomic_ops under review, some improvements are needed.
== Next ==
* Boehm GC AArch64 support:
- Fix libatomic_ops for mainline merge
* Start gc sections support for AArch64 binutils
* Review roster
Summary:
* Investigate Automotive benchmark performance on different branch cost.
Details:
1. Automotive benchmark performance analysis for different branch cost
on Pardaboard ES.
* Design small test cases to simulate bitmnp01 to compare the
performance between ITTT and conditional branch. Test results show
- If branch prediction does not work (put the codes in a
function), ITTT is always better than conditional branch.
- If branch prediction works (inline the codes t in the loop
body), for most cases, conditional branch is better than ITTT.
* Code alignment has big impact for tblook01. By default IT block
has better performance. When adding __attribute__((aligned (16))) for
function t_run_test, performance of conditional branch is better than
IT block.
2. Prepare Linaro toolchain binary release.
* Update Linaro crosstool-ng local patches due to the fix of
lp:1067766 in source package.
* Spawn all builds and smoke tests.
Plan:
* Investigate SPEC2k performance for different branch costs.
* Work with Bero for 2013.01 toolchain binary release .
Planed leaves:
* Feb. 9 - 15: Chinese Spring Festival.
Best Regards!
-Zhenqiang
== Progress ==
* Buildbot
- Taking buildbot to Linaro
- Had wireless/GPU overheating, disabled kernel modules
- Running smooth again (most of the time)
- Debugging errors that only appear on ARM.
* Building and Testing LLVM
- Compiling on Intel with only the ARM backend helps a lot
- Sent a call for Action to people clean up cross-compilation failures
* LAVA
- Progress on LAVA LLVM job
- Got it checking out, configuring and building
- Got PASS/FAIL/SKIP patterns working
- https://validation.linaro.org/lava-server/scheduler/job/46027
- Need to get a patch from a specific place to apply
* Cost Model
- Re-wrote table lookup patch a few times, finally in for good
- http://llvm.org/viewvc/llvm-project?rev=173382&view=rev
- Studying costs of instructions, all seem good enough
- Better approach now is to change the target description (less code, more
gain)
* EuroLLVM
- 136 people so far
== Plan ==
* Test distcc (or similar) on Pandas
* Get a buildbot running with cross-compilation
* Internal git repository for LAVA LLVM job
* Confirm Linaro's sponsorship for EuroLLVM
* Continue cost model changes in between
== Background ==
* Monitor list for ARM changes
* Monitor buildbot for failures
Activity:
* calls and meetings (about 20% of my working week this week ;-))
* finished rebasing and testing the KVM QEMU patches (thanks
to Pawel for getting me an updated RTSM device tree), sent
out updated version to go with -v17 kernel
* minor qemu maintenance patches (including a minor cfi01
flash model bugfix)
* trying to track down issues running a 3.8-rc4 vexpress
kernel on QEMU. Among other things:
* looks like we need to emulate some more of the oscillator
and voltage config registers now (if only to make the
kernel a bit quieter)
* the kernel doesn't like the way qemu's boot loader puts
the DTB blob after the initrd but beginning in the same
page as the initrd ends [free_initrd_mem will trash memory
outside the initrd proper but inside that last page]
* a15 reports the wrong board model number
-- PMM
Dear All,
Is it possible to compile ARCH "AArch64 " for 32 mode, like if I have
x86 64 bit machine and I install 32 bit OS on it, and machine is
compatible with 32 bit binary.
So is it possible to use AARCH64 (Cortex-V8) with installation of
kernel 32 bit and use 32 bit tool chain.
If answer is yes, can I build tool chain or is there option available
in linaro cross-compile available from
https://launchpad.net/linaro-toolchain-binaries/+milestone/2012.12
Thanks