linaro-toolchain May 2011

linaro-toolchain@lists.linaro.org

26 participants
54 discussions

by Chung-Lin Tang

== Last week == * At Linaro@UDS; I am still typing this in Budapest. Sparingly did some work between sessions. * PR42017, ARM LR register not being used. Discussed the patch with Richard Sandiford at LDS. Re-tested a bit and about to resend a revised patch according to his suggestion. * LP:748138, redirect_jump() ICE. Committed patch to CS stable and trunk. Submitted merge request to Linaro 4.5 branch. * LP:689887. Got some suggestions from Revital on how to debug the bootstrap failure caused by my patch, will look into applying it. == This week == * Taking Monday off, I'll be flying back to Taiwan on Tuesday. * Continue with issues after getting home.

14 years, 1 month

[ACTIVITY] 9th - 13th May

by Andrew Stubbs

Spent the whole week attending Linaro@UDS. Any other activity this week is squeezed into the space between (interesting) sessions. Finished making the suggested changes to my Thumb2 constants patch, and posted it back upstream. This is pre-approved, but can't be committed until after the addw/subw patch. http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg05195.html Merged all my outstanding approved merge requests to the release branches in time for next week's release. ---- Upstream patched requiring review: * NEON scheduling patch http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html * ARM Thumb2 addw/subw support. http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg03783.html

14 years, 1 month

[ACTIVITY] report week 19

by Peter Maydell

RAG: Red: Amber: Green: 1105 work item status 99% complete with 2 weeks to go Current Milestones: | Planned | Estimate | Actual | qemu-linaro 2011-05 | 2011-05-19 | 2011-05-19 | n/a | close out 1105 blueprints | 2011-05-28 | 2011-05-28 | | complete 1111 planning | 2011-05-28 | 2011-05-28 | | Historical Milestones: finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off | first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 | qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 | qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | 2011-04-21 | == merge-correctness-fixes == * some of my pending patches have been applied; a number of others are still under discussion or need further work/testing == other == * We won't be making a qemu-linaro 2011-05 release, since there are no changes since the 2011-04 release (due to a combination of the Easter holiday and UDS week). * Attended UDS * almost all 1105 work items either complete or confirmed postponed to next cycle * Good progress on fleshing out blueprints for next cycle: https://wiki.linaro.org/PeterMaydell/Qemu1111 Current qemu patch status is tracked here: https://wiki.linaro.org/PeterMaydell/QemuPatchStatus Absences: (maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver [LinuxCon proper follows on 17-19th]

14 years, 1 month

Idea for auto-increment performance improvement

by Richard Sandiford

Last week, Ramana pointed me at an upstream bug report about the inefficient code that GCC generates for vzip, vuzp and vtrn: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48941 It was filed not longer after the Neon seminar at the summit; I'm not sure whether that was a coincidence or not. I attached a patch to the bug last week and will test it this week. However, a cut-down version shows up another problem that isn't related specifically to intrinsics. Given: #include <arm_neon.h> void foo (float32x4x2_t *__restrict dst, float32x4_t *__restrict src, int n) { while (n--) { dst[0] = vzipq_f32 (src[0], src[1]); dst[1] = vzipq_f32 (src[2], src[3]); dst += 2; src += 4; } } GCC produces: cmp r2, #0 bxeq lr .L3: vldmia r1, {d16-d17} vldr d18, [r1, #16] vldr d19, [r1, #24] vldr d20, [r1, #32] vldr d21, [r1, #40] vldr d22, [r1, #48] vldr d23, [r1, #56] add r3, r0, #32 vzip.32 q8, q9 vzip.32 q10, q11 subs r2, r2, #1 vstmia r0, {d16-d19} add r1, r1, #64 vstmia r3, {d20-d23} add r0, r0, #64 bne .L3 bx lr We're missing many auto-increment opportunities here. I think this is due to the limitations of GCC's auto-inc-dec pass rather than to a problem in the ARM port itself. I think there are two main areas for improvement: - The pass only tries to use auto-incs in cases where there is a separate addition and memory access. It doesn't try to handle cases where there are two consecutive memory accesses of the form *base and *(base + size), even if the address costs make it clear that post-increments would be a win. - The pass uses a backward scan rather than a forward scan, which makes it harder to spot chains of more than two accesses. FWIW, I've got fairly specific ideas about how to do this. Unfortunately, the pass is in need of some TLC before it's easy to make changes. So in terms of work items, how about: 1. Clean up the auto-inc pass so that it's easier to modify 2. Investigate improvements to the pass 3. Submit the changes upstream 4. Backport the changes to the Linaro branches I wrote some patches for (1) last week. I'd estimate it's about 2 weeks' work for (1) and (2). (3) and (4) would hopefully be background tasks. The aim would be for something like: .L3: vldmia r1!, {d16-d17} vldmia r1!, {d18-d19} vldmia r1!, {d20-d21} vldmia r1!, {d22-d23} vzip.32 q8, q9 vzip.32 q10, q11 subs r2, r2, #1 vstmia r0!, {d16-d19} vstmia r0!, {d20-d23} bne .L3 bx lr This should help with auto-vectorised code, as well as normal core code. (Combining the vldmias and vstmias is a different topic. The fact that this particular example could be implemented using one load and one store is to some extent coincidental.) Richard

14 years, 1 month

[ACTIVITY] 2011-05-13

by David Gilbert

== String routines == * Gave up on perf on silverbell and redid it on ursa2; now have a full set of perf figures and have updated the workload report to show the spec binaries that use significant time in libc and the routines they spend it in; a handful of tests spend very significant amounts of time in libm. * Have ltrace results from about 75% of spec - some of the others are fighting a bit * Optimised the non-neon memcpy; it's now quite respectable except in one or two cases (2 byte misaligned, and for some odd reason source offset by 8 bytes, destination by 12 is way down on any other combination) (Current result graphs here https://wiki.linaro.org/Internal/People/DaveGilbert?action=AttachFile&do=ge… ) Dave

14 years, 1 month

[ACTIVITY] May 8-12

by Ira Rosen

Hi, * continued looking into ffmpeg/libavcodec: - dcadsp.c - the inner loop contains reverse accesses which are not supported on Neon. I think we can handle them using vrev and vswp. - a lot of loops have unknown memory stride. I am exploring a possibility of a combination of scalar loads and vmov into a vector register, but it is probably too expensive. * looking into telecom/conven Ira

14 years, 1 month

[ACTIVITY] May.02 -- May.08

by Chung-Lin Tang

== Last week == * Launchpad #748138: "ICE in redirect_jump, at jump.c:1443". Related to shrink-wrap, discussed a bit with Bernd off-list. Sent fix today (Mon.) to gnu-internal; will need to merge to Linaro. * CoreMark combine canonicalize compares patch set: bootstrapped and tested with clean results on powerpc, added comments and updated upstream submission. Machine independent parts okayed by Jeff Law, now committed upstream. ARM parts still pending review. * Compiled back-list of upstream patches, and sent to patches(a)linaro.org * Traveled to Budapest, Hungary for Linaro Developer Summit on Saturday. == This week == * Linaro Developer Summit at Budapest all week.

14 years, 1 month

[ACTIVITY] May 2 - May 6

by Ulrich Weigand

== GDB == * Committed support for NEON registers in core dumps (bug #615972) to Linaro GDB (not yet in mainline). * Investigated root cause of bug #615996 (gdb.cp/templates.exp) and started exploring ways to fix it. == GCC == * Committed fix for bug #759409 (Profiled bootstrap fails in GCC 4.5) to FSF GCC 4.5 branch and Linaro GCC 4.5. Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

14 years, 1 month

[ACTIVITY] 3th - 6th May

by Andrew Stubbs

Worked on the ARM 16 -> 64-bit multiply-and-accumulate problem. Bernd kindly provided a prototype patch to help. I've tried to understand what needs to be done, but I didn't have enough time to get to the bottom of it. So far, I think I know why the existing code doesn't work, and I think I have a way forward. It does appear that the real problem ought to be solved in the tree optimizers, though. Committed the FSF GCC 4.5.3 merge to the Linaro 4.5 branch. Testing did not show any trouble. Matthias requested an additional 4.5 merge to pick up a new bug fix, so I've done the merge, and submitted the merge request for testing. Committed Maxim's compound conditionals optimization patch - a merge from Linaro GCC 4.5. There was some confusion caused by the lp:gcc-linaro/4.6 branch history accidentally getting re-written. After some discussion on #bzr I managed to figure out what happened, posted a warning to linaro-toolchain mailing list, and changed the branch configuration to prevent it happening again. Committed Mark Shinwell's BRANCH_COST patch to Linaro GCC 4.6 - another merge from GCC 4.5. Merged from FSF GCC 4.6 to Linaro 4.6 and submitted the patch for testing. Richard Earnshaw approved my recent Thumb2 constants patch, but only if I modify it slightly. I've begun work on the changes, but I still need to test them. I won't be able to commit them until the ADDW/SUBW patch has been approved. Ramana has reviewed my EABI half-precision function names patch, and discovered that the return types are wrong. I have no idea how this happened - the changes are deliberate so they must have been based on something, but I no longer have the same documents I had when I did the work, and it clearly doesn't match my current ones. In any case, the changes make no practical difference as function return values are always as wide a register anyway. * Other Public holiday on Monday. * Next week I will be attending UDS in Budapest from 8th - 14th May. I shall continue to read my email, but will not be attending any calls. ---- Upstream patched requiring review: * NEON scheduling patch http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html * ARM Thumb2 addw/subw support. http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg03783.html

14 years, 1 month

[ACTIVITY] 2011-05-06

by David Gilbert

== Bug fighting == * Tracked bug 774175 (apt segfault on armel on oneiric) down to the cortex-a8 branch erratum bug that we found as part of the bug jam a few weeks ago (affecting the more obscure vtk package) - Richard's existing binutils fix should fix this. == String routines == * Struggled to get 'perf' to get sane results from profiling spec; some of the samples are obviously being associated with the wrong process somewhere along the process (e.g. it's showing significant samples in the sh process but in a library that's used by the actual benchmark. * latrace on spec still running on ursa2 * Wrote a non-neon memcpy; as expected it's aligned performance is very similar to libc/kernel - it's a bit faster in some places but slower in some odd places (e.g. n*32+1 bytes is a lot slower for some reason). It's also really bad on mis-aligned cases, I tried to take advantage of the v7's ability to do misaligned loads - but they really are quite slow. Dave

14 years, 1 month

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain May 2011