linaro-toolchain

linaro-toolchain@lists.linaro.org

7 participants
5663 discussions

by Andrew Stubbs

Hi Michael, I have finally managed to complete the release process. It wasn't quite as smooth as I would have liked, but we seem the have got there! Notes: - Ramana's VCVT patch caused an Android problem. This was reverted right before the release. - The initial release spin and test went without a hitch. - There was an additional test failure in the GCC testsuite, but this turns out to be because the snapshot date "20121201" happens to contain the string "120". Interestingly, this will also be true for most of 2012. - The ubutest runs seem to have a some problems: all of the glibc and python builds have failed with a message about libgcc. Since this has hit both 4.5 and 4.6 simultaneously I'm assuming it's environmental and not caused by a new toolchain bug. The rest of the compilation appears fine. - The benchmarking seems fine on A9, but I couldn't find results for the others, although the scheduler lists the jobs. - The upload to Launchpad was somewhat problematic. Uploading 4.5 took two attempts. Uploading 4.6 failed about 6 times (at 20 minutes or so each) before I tried from another machine with a faster uplink - that went first time. Andrew

14 years

Linaro GCC 4.6 & 4.5 2011.12 released

by Andrew Stubbs

The Linaro Toolchain Working Group is pleased to announce the 2011.12 release of both Linaro GCC 4.6 and Linaro GCC 4.5. Linaro GCC 4.6 2011.12 is the tenth release in the 4.6 series. Based off the latest GCC 4.6.2+svn181866, it contains a range of vectoriser performance improvements and general bug fixes. Interesting changes include: * Updates to 4.6.2+svn181866 * Generic tuing support for Big-endian platforms. * SLP support for operations with arbirary numbers of operands. * SLP support for conditions. * Pattern recognition support in basic-block SLP. * Enhancements to mixed-size condition pattern recognition. * Support for 64bit __sync* primitives on ARM. * Unaligned block-move support for ARMv7. * Added Cortex-A15 integer pipeline tuning. Linaro GCC 4.5 2011.12 is the sixteenth release in the 4.5 series. Based off the latest GCC 4.5.3+svn181877, this is a maintenance focused release. Interesting changes in 4.5 include: * Updates to 4.5.3+svn181877 The source tarballs are available from: https://launchpad.net/gcc-linaro/+milestone/4.6-2011.12 https://launchpad.net/gcc-linaro/+milestone/4.5-2011.12 Downloads are available from the Linaro GCC page on Launchpad: https://launchpad.net/gcc-linaro More information on the features and issues are available from the release page: https://launchpad.net/gcc-linaro/4.6/4.6-2011.12 https://launchpad.net/gcc-linaro/4.5/4.5-2011.12 Mailing list: http://lists.linaro.org/mailman/listinfo/linaro-toolchain Bugs: https://bugs.launchpad.net/gcc-linaro/ Questions? https://ask.linaro.org/ Interested in commercial support? inquire at support(a)linaro.org

14 years

[ACTIVITY] December 4-8

by Ira Rosen

Hi, - fixed PR 51285 - continued looking at the alignment issue, ran Michael's script with different options, tested Ramana's preliminary patch for vld1/vst1, and my "don't peel for low loop bounds" patch Ira

14 years

Linaro QEMU 2011.12 released

by Peter Maydell

The Linaro Toolchain Working Group is pleased to announce the release of Linaro QEMU 2011.12. Linaro QEMU 2011.12 is the latest monthly release of qemu-linaro. Based off upstream (trunk) QEMU, it includes a number of ARM-focused bug fixes and enhancements. New in this month's release: - There are no Linaro-specific changes of note in this release - This release is based on the upstream QEMU 1.0 release. (Note that future qemu-linaro releases will continue to track upstream trunk; the release dates for upstream and our release just happened to be conveniently aligned in this case.) Known issues: - Graphics do not work for OMAP3 based models (beagle, overo) with 11.10 Linaro images. - This release of qemu-linaro is known not to work on ARM hosts. (See bugs #883133, #883136) The source tarball is available at: https://launchpad.net/qemu-linaro/+milestone/2011.12 More information on Linaro QEMU is available at: https://launchpad.net/qemu-linaro

14 years

Linaro GDB 7.3 2011.12 released

by Ulrich Weigand

The Linaro Toolchain Working Group is pleased to announce the release of Linaro GDB 7.3. Linaro GDB 7.3 2011.12 is the fourth release in the 7.3 series. Based off the latest GDB 7.3.1, it includes a number of ARM-focused bug fixes and enhancements. This release contains: * Update to GDB 7.3.1 code base * Support single-stepping atomic operations (LDREX/STREX sequences) The source tarball is available at: https://launchpad.net/gdb-linaro/+milestone/7.3-2011.12 More information on Linaro GDB is available at: https://launchpad.net/gdb-linaro

14 years

Release notes for GCC 4.6

by Andrew Stubbs

Hi all, I've copied all those who made commits to GCC 4.6 this month. Could you please give me a sentence or two for the release notes? Thanks Andrew

14 years

Effect of alignment and peeling on vectorised loops

by Michael Hope

I had a play with the vecotiser to see how peeling, unrolling, and alignment affected the performance of simple memory bound loops. The short story is: * For fixed length loops, don't peel * Performance is the same for 8 byte aligned arrays and up * Performance is very similar for unaliged arrays * vld1 is as fast as vldmia * vld1 with specified alignment is much faster than vld1 The loop is the rather ugly and artifical:: void op(struct ains * __restrict out, const struct aints * __restrict in) { for (int i = 0; i < COUNT; i++) { out->v[i] = (in->v[i] * 173) | in->v[i]; } } where `struct aints` is a aligned structure. I couldn't figure out how to use an aligned typedef of ints without still introducing a runtime check. I assume I was running into some type of runtime alias checking. This compiled into:: vmov.i32 q10, #173 add r3, r0, #5 0: vldmia r1!, {d16-d17} vmul.i32 q9, q8, q10 vorr q8, q9, q8 vstmia r0!, {d16-d17} cmp r0, r3 bne 0b I then lied to the compiler by changing the actual alignment at runtime. See: http://people.linaro.org/~michaelh/incoming/runtime-offset.png The performance didn't change for actual alignments of 8, 16, or 32 bytes. I then converted the loop into one using vld1 and fed it smaller alignments. See: http://people.linaro.org/~michaelh/incoming/small-offsets.png The throughput falls into two camps: one of alignments 1, 2, or 4 and one of 8, 16, 32. The throughput is very similar for both camps but has some stange dropoffs at 24 words, around 48 words, and around 96 words. The terminal throughput at 300 words and above is within 0.5 % I then converted the vld1 and vst1 to specifiy an alignment of 64 bits. See: http://people.linaro.org/~michaelh/incoming/set-alignment.png This improved the throughput in all cases and in cases for more than 50 words by 14 %. This graph also shows the overhead of the runtime peeling check. The blue line is the vectoriser version which is slower to pick up due the greater per call overhead. I then went back to the vectoriser and changed the alignment of the struct to cause peeling to turn on and off. See: http://people.linaro.org/~michaelh/incoming/unroll.png At 200 words, the version without peeling is 2.9 % faster. This is partly due to a fixed count loop turning into a runtime count due to unknown alignment. This run also showed the affect of loop unrolling. The loop seems to be unrolled for loops of <= 64 words and drops off in performance past around 8 words. When the unrolling finally drops out, performance increases by 101 %. Raw results and the test cases are available in lp:~linaro-toolchain-dev/linaro-toolchain-benchmarks/private-runs A graph of all results is at: http://people.linaro.org/~michaelh/incoming/everything.png The usual caveats apply: this test was all in L1, only on the A9, and very artificial. -- Michael

14 years

Re: Static Library startup

by Dave Martin

> On Mon, Dec 5, 2011 at 1:40 AM, Tom Gall <tom.gall(a)linaro.org> wrote: > > I probably know the answer to this already but ... > > > > For shared libs one can define and use something like: > > > > void __attribute__ ((constructor)) my_init(void); > > void __attribute__ ((destructor)) my_fini(void); > > > > Which of course allows your lib to run code just after the library is > > loaded and just before the library is going to be unloaded. This helps > > keep out cruft such as the following out of your design: > > > > PleaseCallThisLibraryFunctionFirstOrThereWillBeAnErrorWhichYouWillHitCausingYouToPostToTheMailingListAskingTheSameQuestionThatHasBeenAsked1000sOfTimes(); > > > > Yeah .. you know the function. I don't like it either. > > > > Unfortunately this doesn't work when people link in the .a from your > > lib. Libs like libjpeg-turbo in theory should never ever need to be > > linked in that fashion but consider the browsers who link to the > > universe instead of using system shared libs. On Mon, Dec 05, 2011 at 04:19:11PM +0800, Kito Cheng wrote: > Here is some triky way for this problem, you can put the constructor > and destructor to the source file which contain necessary function > call in your libraries to enforce the linker to archive your > constructor and destructor. > > However if this solution is not work for your situation, you can apply > the patch in attach for build script to enable the > LOCAL_WHOLE_STATIC_LIBRARIES for executable, > > After patch you can just add a line in your Android.mk : > > LOCAL_WHOLE_STATIC_LIBRARIES += libfoo > > The most disadvantage of this way is you should always link libfoo by > LOCAL_WHOLE_STATIC_LIBRARIES...and this patch don't send to linaro and > aosp yet. [...] Part of the problem here is that .a libraries lack the dependency and linkage metadata that shared libraries have. -2) Put up with the need to call an explicit initialisation function for the library. A lot of commonly-used libraries require an initialisation call, and I'm not sure it causes that much of a problem in practice... -1) Put a C++ wrapper around just enough of your library such that your constructor/destructor code is recognised as a needed static constructor/descructor by the toolchain. I can't think of a very nice way of doing this, so I won't elaborate on it... It's also not really a solution, since you still need to pull in a dummy static object from somewhere in order to cause the construcor and descructor to get called. 0) libtool or similar may help solve this problem, but I don't know much about this -- also, for solving the problem, that approach only works if uses of your library link via libtool. 1) One hacky approach is to rename your library to libmylib-real.a, and then make replace libmylib.a with a linker script which pulls in the needed constructor as well as the real library: libmylib.a: EXTERN(__mylib_constructor) INPUT(/path/to/libmylib-real.a) This works, providing that __mylib_constructor is external (normally, you would be able have the constructor function static, but it needs to be externally visible in order to be pulled in in this way. 2) Another way of doing a similar thing is to mark __mylib_constructor as undefined in all the objects that make up the library. Unfortunately, there seems to be no obvious way of doing that: the assembler generates undefined symbol references automatically for unresolved references at assembly time. There's no way for force the existence of an undefined symbol without an actual reference to it. objcopy/elfedit don't seem to support adding such a symbol either. It would be simple to write a tool to add the undefined symbol reference (such tools may exist already), but binutils doesn't seem to provide this for you. The plausible-looking -u option to gcc doesn't do anything unless doing a link. One other way of doing it without a special tool is to insert a bogus relocation into the text section of each object with an assembler .reloc directive specifying relocation type R_<arch>_NONE. There isn't really a portable way to do that, though. The name of the relocation changes per-arch, and some arches have other quirks (on ARM for example, .reloc cannot refer to the current location, but seems instead to need to refer to a defined symbol which is non-zero distance away from the location counter). One advantage to this approach is that your .a file looks just like any other .a file. Also, you can include that dependency in only those objects which really require the library to be initialised (normally, this is not a huge benefit though, since probably most of your objects _do_ require the library to be initialised). A disadvantage (other than portability problems) is that, like (1), the constructor symbol must be external (not static)... so it pollutes the symbol table and isn't protected against people calling it directly. You can create a dummy symbol instead of referring to the constructor symbol directly though -- this solves the second problem. 3) Finally, you can split your contructor/destructor code out into a separate .o file (say mylib-ctors.o), and use the linker script trick for (1) to forcibly include this object when linking: libmylib.a: INPUT(/path/to/mylib-ctors.o /path/to/mylib-real.a) This avoids some of the disadvantages of the other approaches, but you still end up with a strange-looking library which is really a linker script. This is closer to how the C library traditionally solves the problem (i.e., the crt*.o stuff). libc.so also tends to be a linker script, which deals with the fact that some parts of libc must be statically linked from a separate library when linking to -lc. Obviously, approaches (1)..(3) all suffer from arch or toolchain portability problems (or both). (The GNU/GCC __constructor__ thing is obviously a portability problem in itself, it you're minded to care about it.) Cheers ---Dave

14 years

[ACTIVITY] 28th November - 2nd December

by Andrew Stubbs

* Linaro GCC Continued work on 64-bit shift / extend / etc. in NEON. I have posted an RFC to the gcc-patches list in the hope of getting some feedback on how best to fix this. No response yet. Hopefully some of the Linaro guys are at least looking at it ... Merged FSF GCC 4.5 and 4.6 into the Linaro GCC release branches prior to the release next week. Set more benchmarking work running in my ongoing investigation into generic tuning. Did a dry run of the extra release testing Michael normally does. It failed. Michael says he's fixed it now, but I know how to do my bit, so fingers crossed. * Other Experienced some IT/connectivity outages within Mentor. Resolved now.

14 years

Activity - Week ending 2nd Dec 2011.

by Ramana Radhakrishnan

==Progress=== * Off sick on Monday * Systematic testing duty - few Aarch64 issues. * Linaro patch review duty. * Tested my vcvt fixed point patch and close to committing. * Worked on sometime on movw / movt for symbol references rather than constant pools . While this gives nice benefits it's a code size hog and needs further investigation. * PGO patch being tested finally and should go back up for review. === Plans === * Release week next week. * Start looking at partial_partial PRE. * Finish committing by backlog of patches. Absences. * Dec 19 - 31st Dec - Tentatively booked * Feb 6-10 : Linaro Connect Q1.12/

14 years

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain