linaro-toolchain November 2010

linaro-toolchain@lists.linaro.org

27 participants
56 discussions

by Yao Qi

== Linaro GCC == * LP:634738: Firstly, fix this in combiner. Try the other approach (without changes to arm.md) suggested by Andrew S, to fix arm_gen_constant in some cases to generate lsl/lsr ranther loading constant. Some piece of code in arm.c was written in 1998, hard to understand with few comments. During this, find some lsl/lsr can be replaced by ubfx. Use gen_extzv_t2 when arm_arch_thumb2 is true to transform lsl + lsr to ubfx. Two patches are ready. * LP:633243: Got build failures on FSF trunk for arm-none-linux-gnueabi. Test patch on FSF trunk 2010-10-21. No regression. * LP:638935: predicate "vfp_register_operand" should return true for VFP_D0_D7_REGS registers. Fixed. predicates {store,load}_multiple_operation assumes mode is SImode, and size of data is 4. Fix them to accept multiple VFP operations. Write three new test cases for stm/fldm/fstm pattern. Test patch on FSF trunk 2010-10-21. No regression. * SMS on thumb2. Discussed with Revital Eres back and forth on doloop pattern for thumb2. doloop pattern is not recognized so far on thumb2. Revital has a fix to thumb2_cbz pattern. After this fix, doloop pattern should be recognized. * Ping ARM fix PR45701 in gcc-patches for the fifth time. Still no reply. == This Week == * Look at regressions of ldm/stm backport on Linaro GCC 4.5. * Internal review of patch to LP:638935 * Try SMS on thumb2 for EEMBC, if Revital's thumb2_cbz pattern fix is accepted by upstreams. -- Yao (齐尧)

15 years, 5 months

New data centre host

by Michael Hope

Hi there. Our Versatile Express has been installed in the data centre and is available for use. See: https://wiki.linaro.org/WorkingGroups/ToolChain/Hardware for the details. If you're a member of the Toolchain WG then you should already have an account. Dave is currently using this machine for benchmarking. Until we get more hardware, please use IRC or email to manage access. -- Michael

15 years, 5 months

[ACTIVITY] 22nd - 28th November

by Andrew Stubbs

Reviewed Yao's patch for AND optimization. Some back on forth on the best way to tackle this problem. LP:663939 - thumb2 constant loading - backported my patches to GCC 4.5 - awaiting review LP:595479 - .eh_frame broken. - Discovered this problem had been fixed (with Thomas' patch) since August, and has also been fixed upstream, albeit with an alternative patch. Nothing to do here. LP:641379 - bitfields poorly optimized. - analysed the problem. The code in cse.c that is supposed to fix this does not recognise the case. - created a patch and tested it for both GCC 4.6 and 4.5. - awaiting review LP:674146 - dpkg segfault. - started looking at this, but Chung-Lin took it first. While trying to reproduce lp:674146, I discovered that my IGEPv2 had a corrupted rootfs, again. I only fixed it last week, so I looked into it more deeply. It seems the SD card has developed at least one bad block. Reformatted, scanned and reinstalled the files from backup. I think the problem was caused by the daily apt package download (it was always those files that were corrupt), so I've disabled that. I've also disabled access-time-stamps. If it happens again I will have to consider using a different underlying filesystem format. LP:643479 / CS Issue:8610 - Multiply and accumulate optimization - created patches for both issues. - both were machine description subtleties. - backported the patches to 4.5 - the patches apply and work fine, but ... - found an extra problem with redundant moves - awaiting review GCC 4.6 - Created a new Launchpad series and branch to track GCC 4.6 development. - Set up the CS internal build config. - Tried to build the latest checkout and failed - glibc problem still unfixed - Jie has reported it now. - libquadmath build fails Merged FSF GCC trunk (pre-4.5.2) into Linaro GCC 4.5 tree. Merged the outstanding Launchpad merge requests into GCC 4.5. The testing showed regressions, so I backed out most of the merges and did them in smaller batches. Chung-Lin and Richard's patches passed the testing, so that leaves Yao's as the problem patch. I didn't get time to test this assertion this week. ---------------------------------- Next week: Vacation.

15 years, 5 months

Re: GCC Optimization Brain Storming Session

by Steven Bosscher

Andrew Stubbs wrote: > * Instruction set coverage. > - Are there any ARM/Thumb2 instructions that we are not taking > advantage of? [2] > - Do we test that we use the instructions we do have? [3] There is no general frame work to test instruction set coverage. The only way to find out, really, is to create some test cases where you expect the compiler to produce a certain insn. Is there a list of all ARM/Thumb2 instructions and the ones implemented in the GCC ARM machine descriptions? > * Constant pools > - it might be a very handy space optimization to have small > functions share one constant pool, but the way the passes work one > function at a time makes this hard. (LP:625233) There are also passes working on the entire program, or a partition. Isn't it more a question of how to group and process functions that are candidates for sharing a constant pool with a neighbor? Are there algorithms for this kind of pool sharing in the academic or ARM-specific literature? Other suggestions for the discussion: * Better use of conditional execution. - No idea how much this really helps for ARM, but there are bug reports about missed opportunities from time to time, so... - How to model conditional execution before register allocation? - How exploit opportunities better in GCC (ifcvt is inadequate and too late in the pipeline). - Also look at LLVM here, it appears to have a better cost model for if-conversion than GCC (taking into account a target-dependent branch misprediction penalty, for example). * Basic block re-ordering for speed/size. - The existing basic block reordering pass in GCC implements only a reordering strategy for speed. - The pass does not run at all for functions optimized for size. * Comparing ARM cost models and param settings to x86_64 - Compare, for some set of functions/benchmarks, the results of estimate_num_insns, estimate_operator_cost, and estimate_move_cost, between ARM and x86_64. Rationalize or fix any significant differences. See whether heuristics based on these functions require tuning for ARM. - Go through params.def and see if there are further ARM tuning opportunities. There are more than 100 DEFPARAMs and many of them guide heuristics but have only been tuned on x86_64. (There is set_default_param_value, but most backends do not change the defaults.) Hoping this is helpful, Ciao! Steven

15 years, 5 months

Silverbell dchroots

by Christian Robottom Reis

David G. requested a few packages installed on silverbell today (the quad-A9 VE porter machine we host in the datacenter). We got dchroots instead: ----- Forwarded message from LaMont Jones via RT <rt(a)admin.canonical.com> ----- Date: Fri, 26 Nov 2010 21:02:22 +0000 Subject: [rt.admin.canonical.com #42662] Simple package installs on silverbell On Fri Nov 26 17:08:28 2010, kiko(a)canonical.com wrote: > Hi there, > > Could we get installed on silverbell: > > build-essential > debhelper > fakeroot > > And could we get deb-src's added to sources.list and do an apt-get > update to allow us to apt-get source certain packages? This is for > simple compilation benchmarks. Thanks! Dchroot environments have been created on silverbell for both maverick and natty. Within the chroot, you can sudo apt-get install to install packages. apt-get update/dist-upgrade and installs that cause package removal will require a GSA to do them. To build for maverick: dchroot -c maverick (and then build however you want...) lamont ----- End forwarded message ----- -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

15 years, 5 months

[ACTIVITY] weekly status

by Ken Werner

Hi, * the ARM __sync_* glibc-ports patch was accepted upstream * posted proposal for consolidating sync primitives but stdatomic seems to be the future * used my small gcc testsuite patch to verify __sync_* support of the gcc- linaro * created: https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations * looked into GOMP support on ARM: - #pragma omp atomic results in proper asm code (dmb, ldrex, strex, dmb) - #pragma omp flush results in a DMB instruction - #pragma omp barrier results to a call to GOMP_barrier (I'm not sure if this is the desired behavior) * started to look into #681138 Regards Ken

15 years, 5 months

[ACTIVITY] 2010-11-26

by David Gilbert

Hand crafted a simple strchr and comparing it with Libc: https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr It's interesting it's significantly faster than libc's on A9's, but on A8's it's slower for large sizes. I've not really looked why yet; my implementation is just the absolute simplest thumb-2 version. Did some ltrace profiling to see what typical strchr and strlen sizes were, and got a bit surprised at some of the typical behaviours (Lots of cases where strchr is being used in loops to see if another string contains anyone of a set of characters, a few cases of strchr being called with Null strings, and the corner case in the spec that allows you to call strchr with \0 as the character to search for). Trying some other benchmarks (pybench spends very little time in libc,package builds of simple packages seem to have a more interesting mix of libc use). Sorting out some of the red tape for contributing. Dave

15 years, 5 months

Notes on mixing D16/D32 code

by Michael Hope

It's a bit of a newbie question, but I've been wondering if you can intermix hard float VFPv3-D16 code with VFPv3-D32 code. You can as: According to the ABI: * d0-d15 are used for floating point parameters, no matter if you are D16 or D32 * d0-d15 are not preserved across function calls * d16-d31 must be preserved across function calls The scenarios are: A D32 function calls a D16 function: * The first 16 (!) parameters are passed in D0-D15 * Any remaining are passed on the stack * The D16 function doesn't know about D16-D31, doesn't use them, and hence preserves them A D16 function calls a D32 function: * The first 16 parameters are passed in D0-D15 * Any remaining are passed on the stack * The D32 function preserves any of the D16-D31 registers that it uses. Redundant, but fine. A D32 function (A) calls a D16 function (B) which calls a D32 function (C): * Parameters are OK, as above * B doesn't use D16-D31 and hence preserves them * C preserves any of the D16-D31 that it uses, which preserves them from A's point of view -- Michael

15 years, 5 months

[ACTIVITY] report week 47

by Peter Maydell

(short week: only three days) RAG: Red: Amber: Green: qemu: initial pull req sent; vfp-in-sighandlers patchset sent Milestones: | Planned | Estimate | Actual | finish virtio-system | 2010-08-27 | postponed | | get valgrind into linaro PPA | 2010-09-15 | 2010-09-28 | 2010-09-28 | complete a qemu-maemo update | 2010-09-24 | 2010-09-22 | 2010-09-22 | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | Progress: * qemu: final polish on a patchset for saving/restoring VFP and iWMMXT registers across linux-user mode signal handlers; patch series sent to mailing list * qemu: sent a pull request for a small set of ARM fixes (make SMC undef; fix PXHxx; fix saturating add/sub; fix VCVT) * reviewed arm semihosting SYS_GET_CMDLINE patch v2 * I now have enough qemu patches in flight that I'm tracking them at https://wiki.linaro.org/PeterMaydell/QemuPatchStatus (simple manual list for now, hopefully will be sufficient) Meetings: toolchain, pdsw-tools Plans - qemu consolidation Absences: (complete to end of 2010) Thu/Fri 25-26 Nov; Fri 17 Dec - Tue 4 Jan inclusive. (Dallas Linaro sprint 9-15 Jan.)

15 years, 5 months

__sync barriers

by Richard Sandiford

For the record, the thing I half-remembered on the call was: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html and: http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html The problem is that all __sync operations besides __sync_lock_test_and_set and __sync_lock_release are defined to be full barriers. Using something like __sync_val_compare_and_swap for __arch_compare_and_exchange_val_*_acq and __arch_compare_and_exchange_val_*_rel may on some architectures be too heavyweight, since those macros only need acquire/after and release/before barriers. See in particular: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00928.html from the first thread, where the feeling was that the future wasn't these __sync builtins, but the new C and C++ atomic memory support. Probably already known, sorry. I just wasn't sure that trying to convert everyone (not just ARM) to __sync_* was necessarily going to go down well. Richard

15 years, 5 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain November 2010