linaro-toolchain October 2011

linaro-toolchain@lists.linaro.org

19 participants
55 discussions

by Ira Rosen

Hi, * Finished a presentation for NEON forum. Revital and Richard kindly agreed to take a look and gave me some valuable comments. Thanks! * widen-shifts: - While preparing the presentation I found some room for improvement in the pattern detection, so I implemented it. It gave additional 13% to rgb24tobgr16. - Ramana suggested a solution on how to check the constant operand of vshll. Testing these two things on ARM. * SLP improvements: - Implemented a patch that swaps operands if necessary to make the operations isomorphic, and supports loads with different offsets. Testing it now. - The three relevant libav loops now get vectorized giving 42%-57% speedup. Next week holidays: half days Sunday-Wednesday and Thursday. Ira

13 years, 9 months

Binary toolchain questions summary

by Michael Hope

Here's my summary from Monday's meeting on the harder parts of binary toolchains. Using a 4.6 compiler against a 4.5 based sysroot such as Natty: * libgcc and libstdc++ are part of the compiler * The compiler expects features that are in the corresponding runtime * You can't reliably run or validate against an earlier runtime The solution is to upgrade the runtime on the sysroot to 4.6. 4.6 is backwards compatible. Ubuntu did this with Maverick and it caused no problems, although problems such as Debian #622783 have been seen. Multiarch: * The Ubuntu multiarch patch should work with a sysroot * Multiarch and multilib should work together Multilib: * The current ARM multilib rules are old and not very relevant * Multilib means you need multiple sysroots as well * Skip multilib for the first release Other: * Anything we support in cross we should support native first * Check that we don't have to directly supply the source that goes with the binary sysroot -- Michael

13 years, 9 months

How to install an ARM cross compiler

by Andrew Stubbs

Saw this, thought it might be interesting if we want to point people at it something in future .... http://playterm.org/r/install-an-arm-cross-compiler-1316950150 Maybe we could record some more detailed stuff ourselves? Andrew

13 years, 9 months

[Activity] Week ending 7th Oct. 2011

by Ramana Radhakrishnan

==Progress=== * Out of office for a day. * Wrote a quick patch to do vcvt.f32.s32 with fractional bits where we can. Tested no regressions, need to commit this after review upstream. * Desk move and packing for that. * Looked at a bug report LP 836588 appears to go away with fno-gcse. Needs more digging. === Plans === * Look at LP 836588. * Finish auto-inc-dec pipeline scheduling work. * Clear out some of the old patches (POST_MODIFY_DISP for vfp, BRANCH_COST ) * Settle into new desk. * Again a short week as I'm away for 2 days for an internal training event. Meetings: * 1-1s * TCWG calls Absences. * 5th October - out of office. * 13th - 14th October - Out of Office - Internal training event. * 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked - * 08 Nov - 11 Nov - Tentatively booked * Dec 19 - 31st Dec - Tentatively booked

13 years, 9 months

[ACTIVITY] Weekly status

by Richard Sandiford

== Last week == * Patch review. * Backported second attempt to fix get_arm_condition_code ICE. * Worked on -fsched-pressure. Experimented with various combinations of ideas. This is giving some good results (e.g. a 2x improvement in libav's put_h264_qpel8_hv_lowpass_8) but needs a bit more work to fix some outliers. I'll be away from 18th Oct to 14th Nov. Richard

13 years, 9 months

Native build performance

by Michael Hope

Random data for the day: Dave Pigott has installed some new PandaBoard build machines in the validation lab. They're identical to mine except that root is on USB Flash instead of NFS, and they have a much faster flash drive for the build area. The time taken to bootstrap and test gcc-linaro-2011.09 with C, C++, Fortran, and LTO is: * ursa3, ursa4 (Toshiba USB stick): 301 minutes build / 369 test * ursa2 (no-name USB stick): 324 minutes build / 422 test * tcpanda (fast USB stick): 274 minutes build / 265 test So the new combo gives a 1.38 x faster build. I'm surprised as I though the build was CPU bound. I'd hate to see what building on an SD card is like. Note that /tmp is in RAM, /scratch is ext4, the new boards use noatime, and the kernel doesn't have the new USB performance fix. -- Michael

13 years, 9 months

[ACTIVITY] weekly status

by Revital Eres

Continue working on estimating register pressure with SMS: - Discussed current approach with Richard which gave useful leads. - Started to implement this approach. - Doing experiments on libav microbench.

13 years, 9 months

Re: Use of memcpy() in libpng

by jbowler＠acm.org

The unaligned accesses in libpng are, for the large copies, a bug. Our attempt to align the row buffer to a 16 byte boundary was off-by-one so we end up always mis-aligning it. I've posted a patch on the png-mng-implement list: http://sourceforge.net/mailarchive/message.php?msg_id=28194444 The time spent in memcpy() is probably an illusion. The data out of zlib gets copied to one row buffer where it is unfiltered (if necessary) then a copy is made in a separate buffer that is only used for the filter handling. If you test using images with large rows (I don't know what pngbench does) the copy buffer may well get flushed out of the second level cache between each row, then the memcpy will stall bringing it back in. If you have machine level profiling you may see this as a massive time spike on some probably unrelated instruction which just happens to be in the PC when the stall stops everything. Anyway, I have several ideas of how to avoid the copy when it isn't required. John Bowler <jbowler(a)acm.org> -----Original Message----- From: Glenn Randers-Pehrson [mailto:glennrp@gmail.com] Sent: Monday, October 03, 2011 1:15 PM To: PNG/MNG implementation discussion list Subject: [png-mng-implement] Use of memcpy() in libpng [Fwd from linaro-toolchain list] Re: Use of memcpy() in libpng David Gilbert Tue, 27 Sep 2011 06:20:14 -0700 On 27 September 2011 14:16, Christian Robottom Reis <k...(a)linaro.org> wrote: > On Tue, Sep 27, 2011 at 09:47:33AM +0100, Ramana Radhakrishnan wrote: >> On 26 September 2011 21:51, Michael Hope <michael.h...(a)linaro.org> wrote: >> > Saw this on the linaro-multimedia list: >> > >> > http://lists.linaro.org/pipermail/linaro-multimedia/2011-September/ >> > 000074.html >> > >> > libpng spends a significant amount of time in memcpy(). This might >> > tie in with Ramana's investigation or the unaligned access work by >> > allowing more memcpy()s to be inlined. >> >> It's the unaligned access and the change / improvements to the memcpy >> that *might* help in this case. But that ofcourse depends on the >> compiler knowing when it can do such a thing. Ofcourse what might be >> more interesting is the kind of workload analysis that Dave's done in >> the past with memcpy to know what the alignment and size of the >> buffer being copied is. > > If you guys could take a look at this there is a potential requirement > for the MMWG around libpng optimization; we could fit this in along > with other work (possible vectorizing, etc) on that component. It wouldn't take long to analyse the memcpy calls - life would be easier if we had the test program and some details on things like what size of images were used in these benchmarks. Dave

13 years, 9 months

[ACTIVITY] 3rd - 7th October

by Andrew Stubbs

Continued work on my constant reuse optimizations. Not too much this week though. I've now fixed some issues with the ARM size-costs code that was causing it to wildly over-estimate the cost of a MOVT instruction. I'll have to post this upstream sometime soon. Took another look at the shift-amount bug. Discussed the issue with Paul Brook. I've now fixed the original bug, and fixed the new bug introduced by Paul's original fix, and committed that upstream. I still need to backport it to Linaro GCC though, and the latent bug that Richard S spotted is still being analysed. Did a merge from FSF 4.5 & 4.6 to Linaro, and pushed them the Launchpad branches for testing. Begun work benchmarking different setups for the generic tuning patches. I had a lot of trouble trying to set up SPEC2000 though. Hopefully these issues are now resolved, with some help from Michael, and I have established some baseline figures on both A8 and A9 to work from. No progress on native tuning. I'm still waiting for upstream review. In other news: Mentor's contract with Linaro has now been extended for another 6 months. :)

13 years, 9 months

[ACTIVITY] 2011-10-07

by David Gilbert

== String Routines == * Built and tested a newlib with my memchr in - ready to go with a bit of tidy up. * Followed up on my eglibc patch submission by a comment suggesting the use of --with-cpu pointing back at the previous discussion. == 64 Bit atomics == * Updated gcc patch based on Ramana's comments, retested and posted new version - Lost half a day to a failing SD card in our panda. == QEMU == * Posted a patch that made one variable thread local using __thread that fixes multi threaded user mode ARM programs (e.g. firefox); this seems to have mutated on the list into a patch for more general thread local support. Dave

13 years, 9 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain October 2011