== Progress ==
* February 4.7 release:
- Released a respin of 4.7 2013.02, which fixes an issue with
multiarch on x32 and kfreebsd builds.
* Boehm GC AArch64 support:
- Fixed 128-bit atomic load/store and 'compare and swap' functions
- Testsuite is now OK
* AArch64 porting meeting:
- No new requirements
* Infrastructure:
- Wifi now usable on my laptop
== Next ==
* vacation
* Connect
== Progress ==
* smin-umin: waiting for benchmark results with 'coalesce-vars' patch
reverted on trunk.
* libsanitizer: its backtrace printing facility relies on unwinding
info not present by default in binaries. Adding -funwind-tables
improves the results in GCC testsuite.
There is still an interaction between runtest/qemu and isatty() which
confuses dejagnu. Forcing libsanitizer's internal_isatty to return 0
fixes it. TBC.
* vectorizer cost model: backport in 4.7 required to remove a part,
for lack of new vectorizer infrastructure (arm_add_stmt_cost).
* 'turnoff 64bits ops in Neon': waiting for benchmark results after
backporting on 4.7.
* internal tasks
== Next ==
* holidays next week
* Connect week after
Progress:
* updated various Virtualization category cards in cards.linaro.org
with my comments and clarifications
* rebased qemu-linaro on upstream 1.4.0
* upstream code review: pl330 and others
* prompted by LP:1129571 into another look at the linux-user threading
issues. I dusted off an ancient patch I'd written to address one
part of these, rebased it and rewrote it to work properly.
* KVM/ARM kernel patches are now upstream, so we can submit the QEMU
patches; started on a final rebase, polish and test
Absences:
* NB: I now work a 4 day week, excluding Wednesdays
* 4-8 March: Linaro Connect Asia (Hong Kong)
-- PMM
Hi folks,
Attached is the Linpack benchmark, which I ran GCC and Clang with and
without vectorization (though most of the loops are not vectorized).
Reading the output of LLVM loop vectorizer, it also doesn't do much, but
the net gain is due to the basic-block vectorizer. Does GCC has a similar
concept?
The results are also attached.
cheers,
--renato
The Linaro Toolchain Working Group announce the 2013.02-01 Linaro GCC 4.7
release. This is a respin of the 2013.02 release because of an issue with
multiarch for x32 and kfreebsd builds in the previous one.
Please find the original 2013.02 announcement below.
The Linaro Toolchain Working Group is pleased to announce the 2013.02
release of both Linaro GCC 4.7 and Linaro GCC 4.6.
Linaro GCC 4.7 2013.02 is the eleventh release in the 4.7 series. Based
off the latest GCC 4.7.2+svn195745 release, it includes ARM-focused
performance improvements and bug fixes.
Interesting changes include:
* Updates to GCC 4.7.2+svn195745
* Includes arm/aarch64-4.7-branch up to svn revision 195716
* Support for Cortex-A7 backported from trunk
Linaro GCC 4.6 2013.02 is the 24th release in the 4.6 series. Based
off the latest GCC 4.6.3+svn195744 release, this is the eleventh release
after entering maintenance.
Interesting changes include:
* Updates to 4.6.3+svn195744
The source tarballs are available from:
https://launchpad.net/gcc-linaro/+milestone/4.7-2013.02https://launchpad.net/gcc-linaro/+milestone/4.6-2013.02
Downloads are available from the Linaro GCC page on Launchpad:
https://launchpad.net/gcc-linaro
More information on the features and issues are available from the
release pages:
https://launchpad.net/gcc-linaro/4.7/4.7-2013.02https://launchpad.net/gcc-linaro/4.6/4.6-2013.02
Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Bugs: https://bugs.launchpad.net/gcc-linaro/
Questions? https://ask.linaro.org/
On 6 November 2012 02:48, Rob Herring <robherring2(a)gmail.com> wrote:
>
> On 11/05/2012 05:13 AM, Russell King - ARM Linux wrote:
> > On Mon, Nov 05, 2012 at 10:48:50AM +0000, Dave Martin wrote:
> >> On Thu, Oct 25, 2012 at 05:08:16PM +0200, Johannes Stezenbach wrote:
> >>> On Thu, Oct 25, 2012 at 09:25:06AM -0500, Rob Herring wrote:
> >>>> On 10/25/2012 09:16 AM, Johannes Stezenbach wrote:
> >>>>> On Thu, Oct 25, 2012 at 07:41:45AM -0500, Rob Herring wrote:
> >>>>>> On 10/25/2012 04:34 AM, Johannes Stezenbach wrote:
> >>>>>>> On Thu, Oct 11, 2012 at 07:43:22AM -0500, Rob Herring wrote:
> >>>>>>>
> >>>>>>>> While v6 can support unaligned accesses, it is optional and current
> >>>>>>>> compilers won't emit unaligned accesses. So we don't clear the A bit for
> >>>>>>>> v6.
> >>>>>>>
> >>>>>>> not true according to the gcc changes page
> >>>>>>
> >>>>>> What are you going to believe: documentation or what the compiler
> >>>>>> emitted? At least for ubuntu/linaro 4.6.3 which has the unaligned access
> >>>>>> support backported and 4.7.2, unaligned accesses are emitted for v7
> >>>>>> only. I guess default here means it is the default unless you change the
> >>>>>> default in your build of gcc.
> >>>>>
> >>>>> Since ARMv6 can handle unaligned access in the same way as ARMv7
> >>>>> it seems a clear bug in gcc which might hopefully get fixed.
> >>>>> Thus in this case I think it is reasonable to follow the
> >>>>> gcc documentation, otherwise the code would break for ARMv6
> >>>>> when gcc gets fixed.
> >>>>
> >>>> But the compiler can't assume the state of the U bit. I think it is
> >>>> still legal on v6 to not support unaligned accesses, but on v7 it is
> >>>> required. All the standard v6 ARM cores support it, but I'm not sure
> >>>> about custom cores or if there are SOCs with buses that don't support
> >>>> unaligned accesses properly.
> >>>
> >>> Well, I read the "...since Linux version 2.6.28" comment
> >>> in the gcc changes page in the way that they assume the
> >>> U-bit is set. (Although I'm not sure it really is???)
> >>
> >> Actually, the kernel checks the arch version and the U bit on boot,
> >> and chooses the appropriate setting for the A bit depending on the
> >> result. (See arch/arm/mm/alignment.c:alignment_init().)
> >
> > That is in the kernel itself, _after_ the decompressor has run. It is
> > not relevant to any discussion about the decompressor.
> >
> >> Currently, we depend on the CPU reset behaviour or firmware/
> >> bootloader to set the U bit for v6, but the behaviour should be
> >> correct either way, though unaligned accesses will obviously
> >> perform (much) better with U=1.
> >
> > Will someone _PLEASE_ address my initial comments against this patch
> > in light of the fact that it's now been proven _NOT_ to be just a V7
> > issue, rather than everyone seemingly buring their heads in the sand
> > over this.
>
> I tried adding -munaligned-accesses on a v6 build and still get byte
> accesses rather than unaligned word accesses. So this does seem to be a
> v7 only issue based on what gcc will currently produce. Copying Michael
> Hope who can hopefully provide some insight on why v6 unaligned accesses
> are not enabled.
This looks like a bug. Unaligned access is enabled for armv6 but
seems to only take effect for cores with Thumb-2. Here's a test case
both with unaligned field access and unaligned block copy:
struct foo
{
char a;
int b;
struct
{
int x[3];
} c;
} __attribute__((packed));
int get_field(struct foo *p)
{
return p->b;
}
int copy_block(struct foo *p, struct foo *q)
{
p->c = q->c;
}
With -march=armv7-a you get the correct:
bar:
ldr r0, [r0, #1] @ unaligned @ 11 unaligned_loadsi/2 [length = 4]
bx lr @ 21 *arm_return [length = 12]
baz:
str r4, [sp, #-4]! @ 25 *push_multi [length = 4]
mov r2, r0 @ 2 *arm_movsi_vfp/1 [length = 4]
ldr r4, [r1, #5]! @ unaligned @ 9 unaligned_loadsi/2 [length = 4]
ldr ip, [r1, #4] @ unaligned @ 10 unaligned_loadsi/2 [length = 4]
ldr r1, [r1, #8] @ unaligned @ 11 unaligned_loadsi/2 [length = 4]
str r4, [r2, #5] @ unaligned @ 12 unaligned_storesi/2 [length = 4]
str ip, [r2, #9] @ unaligned @ 13 unaligned_storesi/2 [length = 4]
str r1, [r2, #13] @ unaligned @ 14 unaligned_storesi/2 [length = 4]
ldmfd sp!, {r4}
bx lr
With -march=armv6 you get a byte-by-byte field access and a correct
unaligned block copy:
bar:
ldrb r1, [r0, #2] @ zero_extendqisi2
ldrb r3, [r0, #1] @ zero_extendqisi2
ldrb r2, [r0, #3] @ zero_extendqisi2
ldrb r0, [r0, #4] @ zero_extendqisi2
orr r3, r3, r1, asl #8
orr r3, r3, r2, asl #16
orr r0, r3, r0, asl #24
bx lr
baz:
str r4, [sp, #-4]!
mov r2, r0
ldr r4, [r1, #5]! @ unaligned
ldr ip, [r1, #4] @ unaligned
ldr r1, [r1, #8] @ unaligned
str r4, [r2, #5] @ unaligned
str ip, [r2, #9] @ unaligned
str r1, [r2, #13] @ unaligned
ldmfd sp!, {r4}
bx lr
readelf -A shows that the compiler planned to use unaligned access in
both. My suspicion is that GCC is using the extv pattern to extract
the field from memory, and that pattern is only enabled for Thumb-2
capable cores.
I've logged PR55218. We'll discuss it at our next meeting.
-- Michael
== Progress ==
* smin-umin: spawned build jobs for gcc-trunk with 'coalesce-vars'
patch reverted (from A.Oliva), so that I can then run benchmarks to
compare its effect with the one observed on gcc-4.7.
* libasan: thanks to Peter, I am able to run sample programs under
qemu. Ran GCC testsuite, observed some failures, to be investigated.
* vectorizer cost model: committed upstream in 4.8.
* Released gcc-linaro-4.6-2013.02 and gcc-linaro-4.7-2013.02.
* backported 'turnoff 64 bits ops in Neon' from upstream to
gcc-linaro-4.7. Waiting for builds to complete to launch benchmark
jobs.
* internal tasks
== Next ==
* smin-umin: look at benchmarks results if available
* libasan: analyse testsuite results
* 64bits ops in Neon; look at benchmarks results if available
* get more codec samples
Christophe.
== Progress ==
* February merge 4.6 and 4.7
- Another issue raised by Matthias, fix merged.
- The merge request exposed a new failure on i686 (ld seg. fault)
which looks like the ones we had on i686 before the release,
re-spawned the job to see if we can reproduce.
* Boehm GC AArch64 support:
- Fixed some defect, identified some cases not implementable by
atomic builtins, worked on inline asm versions.
- re-installed working environment to be able to test the gcc
integration (thanks to Matt)
* Aarch64 porting meeting:
- No new requirements.
== Next ==
* Review roster.
* Boehm GC AArch64 support:
- fix libatomic_ops
- validate GCC's Boehm gc integration
* libunwind aarch64 support:
- continue.
== This Week ===
* Got the remote Pandaboard RootFS updates, Thanks to Dave again.
* Got the SSH tunnels going for remote GDB test suite executions, thougt a
single remote run takes eternity to complete.
* Analysis of GDB test suite log files separated the failures and other
test cases which were not passed as for the analysis.
* Preliminary analysis done on GDB logs for remote configuration on ARM in
comparison with native configuration.
== Next week ==
* Try to complete log file analysis and post results.
* Try to run GDB test suite using QEMU.
* Follow up on hong kong visa application
--
*** Sorry about spamming I got the subject wrong in the first email.