linaro-toolchain November 2010

linaro-toolchain@lists.linaro.org

27 participants
56 discussions

by Michael Hope

Hi there. Could everyone in the toolchain working group start sending their activity reports to this list please? Put [ACTIVITY] at the start of the subject line so that they can be filtered. Ta, -- Michael

15 years, 6 months

Status reports

by Michael Hope

Hi there. Attached are the status reports from the Toolchain WG members for last week. -- Michael Ken Werner -- Hi Michael, * got access to the internal wiki/calendar/email :) * continued to setup the borrowed vexpress board * upgraded to the Linaro 10.11 release * encountered various issues until I found that the /etc/hosts is empty (#674090) * learned that the SD card issue is a known problem (#632798) * the network interface sometimes dies if stressed (Matt was able to reproduce this) * the disabled CONFIG_SWAP is being tracked as #672656 * sometimes the entire system hangs (when under heavy load?) * David noticed that /proc/cpuinfo lacks neon support (but his string benchmark/testcase ran fine) * wondering why the kernel reports only about 800 BogoMIPS while it's around 2k on the panda board * started to work on the atomic memory operations item * identified the relevant GCC patches * still looking for a good way to verify the GCC support * posted a patch on the glibc-ports ml with regard to #643171 David A Gilbert -- I managed to get to try Ken Werner's Versatile Express board with an A9MP tile; the shape of the graphs matches that from the Panda, but the raw performance is down by a factor of about 3 - I'm guessing it's clocked lower for some reason. It confirms however that the Neon behaviour I was seeing with memset is not Panda/OMAP4 specific; no one has replied to my post to linaro-toolchain. It's a difficult situation in that my fastest memset on Beagle is with Neon, and my fastest on v9 is without Neon - what would you select on? I've just finished writing memchr tests and my first crack at a faster version; I realised I could use the same trick that I had used for strlen and it works nicely - it seems to be about 50% faster than the libc version; I've not tested against any other versions yet. Paul Mckenney hasn't replied yet about the OSSC stuff, but apparently he's out travelling and back next week; so I'll catch him then. I tried preloading my faster memset into ghostscript, but found it was blatantly ignoring it - I think the memset is being called from somewhere inside libc; I managed to get xdeb to cross build me a libc but haven't yet got my changes into it. My order for a USB hard drive for my beagle seems to have been delayed by the supplier; I'm pushing this but it's starting to be a bit of a pain. Richard Sandiford -- == Last Week == * Pinged my GAS fix for Thumb PLT branches to locally-defined symbols. Committed it to binutils trunk and 2.21 branch after approval. This fixes the libgcc.so build failure that I was seeing with GOLD. * Worked on a patch to fix GOLD's handling of non-function references to weak undefined symbols. This ended up touching every backend (i386, x86_64, ARM, Power and SPARC) and was quite invasive, so it took a while in the end. Committed to binutils trunk after approval. * Ran more tests, both with -marm and -mthumb. I'm getting identical GCC test results (including gfortran and objc) for GOLD and BFD ld, so I think we're at the stage where GOLD is a viable replacement for the BFD linker. == Next Week == * I'll start looking at the IFUNC support. * I'll take another look at launchpad bug 665598. Peter Maydell -- Progress: * qemu: more cleanup of signal handler VFP patchset; I think I just need to add iwmmx support and it's good * qemu: VCVT: found yet another bug, did final patchset cleanup: submitted to upstream list [8 patch series] * qemu: submitted a trivial patch to fix a problem where __get_user/_put_user macros had an unnecessary local var which could clash with a var being used by the macro user * set up a tree on git.linaro.org which we can use for a branch to make pull requests for ARM qemu fixes * did a rough estimate of time to do an Eagle qemu model (6 months + testing/bug fixing time) Issues: * lost some time to a problem where Linux VMs stopped being able to talk to the LDAP server; however I have a workaround and IT are investigating Meetings: * toolchain, toolchain standup, pdsw-tools, PD doughnuts Plans - attend Meego conference in Dublin (Nov 15-18 inc travel) http://conference2010.meego.com/ - start on qemu consolidation by upstreaming various ARMv7 correctness fixes Ira Rosen -- Here is this week report: 1. BeagleBoard installed, now "playing" with it 2. Continued to work on auto-detection of vector size 3. Looked into mixed vector sizes 4. Learning about vld and vst instructions It looks like I won't be able to participate in Wed calls, since I am alone with the kids on Wednesday evenings.

15 years, 6 months

Assembler bug blocking Thumb-2 kernel builds

by Dave Martin

Hi all, I've hit a probable assembler bug trying to build a Thumb-2 kernel: Trying to assemble the attached file, I get: arch/arm/kernel/relocate_kernel.S: Assembler messages: arch/arm/kernel/relocate_kernel.S:10: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) arch/arm/kernel/relocate_kernel.S:11: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) arch/arm/kernel/relocate_kernel.S:58: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) arch/arm/kernel/relocate_kernel.S:59: Error: invalid offset, value too big (0xFFFFFFFFFFFFFFFC) The code appears correct and resonable, except that there should be a .align directive before the data words at the end of the file (but adding this doesn't fix the error) Assembling in ARM (i.e., without -mthumb), or deleting the .globl lines associated with the affected target symbols, the problem goes away. I believe this may be already by tracked by CodeSourcery as is issue #8775 (?) Has anyone hit this issue before? Is it fixed upstream? Any help much appreciated. Cheers ---Dave

15 years, 6 months

A9 Neon confusion

by David Gilbert

Hi, I've been looking at some basic libc routine optimisation and have a curious problem with memset and wondered if anyone can offer some insights. Some graphs and links to code are on https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemset I've written a simple memset in both a with and without Neon variety and tested them on a Beagle(C4) and a Panda board and I'm finding that the Neon version is faster than the non-neon version (a bit) on the Beagle but a LOT slower on the Panda - and I'd like to understand why it's slower than the non-neon version - I'm guessing it's some form of cache interaction. The graphs on that page are all generated by timing a loop that repeatedly memsets the same area of memory; the X axis is the size of the memset. Prior to the test loop the area is read into cache (I came to the conclusion the A8 didn't write allocate?). There are two variants of the graphs - absolute in MB/s on Y, and a relative set (below the absolute) that are relative to the performance of the libc routines. (The ones below those pairs are just older versions). if you look at the top left graph on that page you can see that on the Beagle (left) my Neon routine beats my Thumb routine a bit (both beating libc). If you look on the top right you see the Panda performance with my Thumb code being the fastest and generally following libc, but the Neon code (red line) topping out at about 2.5GB/s which is substantially below the peak of the libc and ARM code. The core loop of the Neon code (see the bzr link for the full thing) is: 4: subs r4,r4,#32 vst2.8 {d0,d1,d2,d3}, [ r3:256 ]! bne 4b while the core of the non-Neon version is: 4: subs r4,r4,#16 stmia r3!,{r1,r5,r6,r7} bne 4b I've also tried vst1 and vstm in the neon loop and it still won't match the non-Neon version. All suggestions welcome, plus I'd appreciate if anyone can suggest which particular limit it's hitting - does anyone have figures for the theoretical bus and L1 and L2 write bandwidths for a Panda (and Beagle) ? Thanks in advance, Dave

15 years, 6 months

Draft of next weeks public review

by Michael Hope

Hi there. I've uploaded a draft of the slides and notes for next weeks public review at: http://bazaar.launchpad.net/~linaro-toolchain-wg/+junk/publicreview1105/fil… 'Toolchain Public Review 11.05.odp' is a set of slides I'll talk to. The first 15-20 minutes will go through these to describe our focus and goals and how they tie together the blueprints and priorities. The rest of the session will go through the current blueprints and priorities. See: Toolchain Blueprints (short).pdf for the summary version and: Toolchain Blueprints (long).pdf for the long version. The long version is interesting if you can't find a particular tool or technology. It may be small enough to be called out as a single work item. These are only a draft, but I realised I haven't shared the plans with the rest of the group very well and Monday's meeting won't be the best. I'm on holiday tomorrow but feel free to send me any comments, -- Michael

15 years, 6 months

Reviewing blueprints for the TSC

by Michael Hope

Hi there. I've been going through the blueprints in preparation for next weeks TSC review. The top level topics are good, and I'd like to have the rest of the engineering blueprints checked over and updated to match what we talked about at the summit. Ira, could you please create blueprints for the areas you plan to look into? Anything that will take longer than a month should have a blueprint. It's worth having a catch-all blueprint for anything left over. Please add these as a dependency to: https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-neon-performance Zach, could you check: https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/ltrace-support https://blueprints.launchpad.net/linaro/+spec/tr-toolchain-openocd Peter, there's a whole range of QEMU ones that could do with a pass over. For more about the review, see: https://wiki.linaro.org/Releases/1105/PublicPlanReview For a list of all of the toolchain blueprints, see: http://ex.seabright.co.nz/helpers/blueprints#toolchain -- Michael

15 years, 6 months

Mixed vector sizes

by Ira Rosen

Hi, I started to look into mixed vector sizes (in the same loop). My main reason for this was to allow widening and narrowing instructions, that have different vector sizes for src and dest, to work properly. My example was widen_mult (int = short * short), I thought its implementation was not optimal. But now that I have a working GCC mainline for ARM, I see that it works just fine. short ub[], uc[]; int c[]; for (i = 0; i < n; i++) c[i] = ub[i] * ua[i]; is compiled as: .L11: add r1, r1, #1 vldmia r4!, {d18-d19} cmp r5, r1 vldmia ip!, {d16-d17} vmull.s16 q10, d18, d16 vstr d20, [r3, #-32] vstr d21, [r3, #-24] vmull.s16 q8, d19, d17 vstr d16, [r3, #-16] vstr d17, [r3, #-8] add r3, r3, #32 bhi .L11 which looks good to me at least from the vmull point of view. Does anyone have an example when mixed vector size instructions are not used properly? Another reason for mixed sizes could be cases where only part of the loop can be vectorized with the wider vectors. I don't know how common this is. Are there any other reasons to implement mixed vector sizes? I understand that this can be a useful feature, I am just not sure it's the most important one. Thanks, Ira

15 years, 6 months

Backport criteria

by Michael Hope

I've been going through the ChangeLog for the release and am having trouble justifying some of the changes brought in. In particular: * -fstrict-volatile-bitfields, which is more appropriate for bare metal/kernel code * Cortex-M4 support * C locale support in libstdc++-v3 The march/mcpu clean up is OK but marginal. Our focus is time based performance on the Cortex-A series with an implied applications over kernel/bare metal. This is a very narrow view, but every non-performance line of code we bring in can also bring in a bug. Any thoughts? For those who are looking at using our toolchain, is earlier access to other toolchain improvements interesting? -- Michael

15 years, 6 months

Upstream GCC feature freeze

by Andrew Stubbs

Hi all, As you may or may not know, upstream GCC has now entered 'stage 3' of it's development cycle. This will last until spring. This means that they are only accepting bug fixes and documentation improvements. New features and any performance improvements must wait until GCC 4.6 branches, prior to release, and GCC 4.7 development opens. During this process, our usual preferred work flow (upstream first) will not work, so we'll have to do something else. Here's my proposal: * Create a new Launchpad branch for GCC 4.6. * Synchronize this branch with upstream regularly * once per week, perhaps. * Try to get upstream approval for all new patches in the usual way * on the understanding that they won't be applied until stage 1 * bug fixes are unaffected and may commit as usual. * Commit all pending patches to our own 4.6 branch * and backport them to our 4.5, branch, of course. * Usual "no test regressions" policy applies to our own patches * but beware regressions from merges from upstream. * we may want to track the clean 4.6 test results for comparison This is little different to what we do with the 4.5 release branch now. Thoughts? Andrew

15 years, 6 months

Linaro GCC 4.5 2010-11 released

by Michael Hope

The Linaro Toolchain Working Group is pleased to announce the latest release of Linaro GCC 4.5. Linaro GCC 4.5 is the fourth release in the 4.5 series. Based off the latest GCC 4.5.1+svn164911, it includes many ARM-focused performance improvements and bug fixes. Interesting changes include: * Various NEON related fixes * Performance improvements * A clean up of some of the testsuite test cases * An updated version of the __sync multicore primitives * Improvements in data packing when optimising for size * C locale support in libstdc++-v3 This release adds the new option -fstrict-volatile-bitfields and enables it by default on ARM. See doc/invoke.texi for more information. The source tarball is available from: https://launchpad.net/gcc-linaro/+milestone/4.5-2010.11-0 Downloads are available from the Linaro GCC page on Launchpad: https://launchpad.net/gcc-linaro Note that there were no changes to the 4.4 series. -- Michael

15 years, 6 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain November 2010