linaro-toolchain February 2011

linaro-toolchain@lists.linaro.org

27 participants
66 discussions

by Revital1 Eres

Hello, Implemented a patch for SMS to support targets that their doloop part is not decoupled from the rest of the loop's instructions (which is the current assumption of SMS). ARM is an example of such target, where the loop's instructions might use CC reg which is used in the doloop part. Now testing the patch on ARM and other targets that have do-loop. Thanks, Revital

14 years, 6 months

[ACTIVITY] February 20-24

by Ira Rosen

Hi, * vectorizer cost model - implemented builtin_vectorization_cost for NEON - added register spilling considerations to the cost model - started testing/tuning on EEMBC Telecom and DenBench (for now I have only two examples for spilling: fdct_int32 mp4encode that shouldn't get vectorized and viterbi that should) * measured vectorization impact on Telecom autcor - it's about 5x (initially I got run time segfault, but the bug is already fixed on GCC trunk, I'll have to check gcc-linaro-4.5 as well) * NEON-vs.non-NEON degradation - started to look at aes. There are 6 loops that get vectorized with 4.6 (due to this patch http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01927.html that allows cond_expr in number of loop iterations expressions) and vzip/vuzp patch, but not with gcc-linaro-4.5. But it doesn't explain the degradation of course. - I don't understand mp4decodepsnr improvement, since I don't see any loops or basic blocks vectorized. Ira

14 years, 6 months

Weekly Activity report

by Mounir Bsaibes

Please find the compilation of the weekly activity reports at the following link, if I have missed anyone let me know. https://wiki.linaro.org/WorkingGroups/ToolChain/ActivityReports/2011-02-18 In your report please try to stick to the following format, to help make the report consistant: == Topic == * item 1 * items 2 See https://wiki.linaro.org/Process/Reporting Thanks & Regards, Mounir

14 years, 6 months

Improving the code generated for vld and vst intrinsics

by Richard Sandiford

One of the vectorisation discussions from last year was about the poor code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces the result of the loads onto the stack, then loads the individual pieces from there. It does the same thing in reverse for stores. I think there are two major problems here: 1. The result of the vld*() is a record type such as: typedef struct int16x4x3_t { int16x4_t val[3]; } int16x4x3_t; Ideally, we'd like one of these structures to be stored in a pseudo register. However, the ARM port currently limits in-register record types to 64 bits, so something this big is always given BLKmode and stored on the stack. A simple "fix" for this is to increase MAX_FIXED_MODE_SIZE. That would do the right thing for the structures in arm_neon.h, but wouldn't be safe in general. 2. The vld*() returns values as a single integer (such as EI mode), while uses of the value will typically be in a vector mode such as V4SI. CANNOT_CHANGE_MODE_CLASS doesn't allow direct "mode-punning" between the two in VFP_REGS, so this again forces the punning to be done on the stack. The code in question is: /* FPA registers can't do subreg as all values are reformatted to internal precision. VFP registers may only be accessed in the mode they were set. */ #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \ || reg_classes_intersect_p (VFP_REGS, (CLASS)) \ However, the VFP restriction appears to be specific to VFPv1 -- thanks to Peter for the archaeology -- and isn't a problem for v6+. In that case, removing this restriction is an important optimisation. I tried the patch below on the following simple testcase: #include "arm_neon.h" void foo (uint16_t *a) { uint16x4x3_t x, y; x = vld3_u16 (a); y = vld3_u16 (a + 12); x.val[0] = vadd_u16 (x.val[0], y.val[0]); x.val[1] = vadd_u16 (x.val[1], y.val[1]); x.val[2] = vadd_u16 (x.val[2], y.val[2]); vst3_u16 (a, x); } (not necessarily sensible!). Before the patch, -O2 produced: sub sp, sp, #48 add r3, r0, #24 vld3.16 {d16-d18}, [r3] vld3.16 {d20-d22}, [r0] add r3, sp, #24 vstmia sp, {d20-d22} vstmia r3, {d16-d18} fldd d19, [sp, #8] fldd d16, [sp, #0] fldd d17, [sp, #24] fldd d20, [sp, #32] vadd.i16 d18, d16, d17 vadd.i16 d17, d19, d20 fldd d19, [sp, #16] fldd d20, [sp, #40] vadd.i16 d16, d19, d20 fstd d18, [sp, #0] fstd d17, [sp, #8] fstd d16, [sp, #16] vldmia sp, {d16-d18} vst3.16 {d16-d18}, [r0] add sp, sp, #48 bx lr After the patch we get: vld3.16 {d24-d26}, [r0] add r3, r0, #24 vld3.16 {d20-d22}, [r3] vmov q8, q12 @ ti vadd.i16 d17, d17, d21 vadd.i16 d16, d24, d20 vadd.i16 d18, d26, d22 vst3.16 {d16-d18}, [r0] bx lr The VMOV is a bit disappointing, and needs further investigation. The first hunk fixes (2), and I think is correct. The second hunk hacks (1), and isn't suitable in itself. I'll next try to make arm_neon.h use built-in record types that are explicitly EImode, which should remove the need to change MAX_FIXED_MODE_SIZE. Richard Index: gcc/gcc/config/arm/arm.h =================================================================== --- gcc.orig/gcc/config/arm/arm.h +++ gcc/gcc/config/arm/arm.h @@ -1171,10 +1171,12 @@ enum reg_class /* FPA registers can't do subreg as all values are reformatted to internal precision. VFP registers may only be accessed in the mode they were set. */ -#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ - (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ - ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \ - || reg_classes_intersect_p (VFP_REGS, (CLASS)) \ 2+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ + (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ + ? (reg_classes_intersect_p (FPA_REGS, (CLASS)) \ + || (TARGET_VFP \ + && reg_classes_intersect_p (VFP_REGS, (CLASS)) \ + && arm_fpu_desc->rev == 1)) \ : 0) /* The class value for index registers, and the one for base regs. */ @@ -2458,4 +2460,6 @@ enum arm_builtins instruction. */ #define MAX_LDM_STM_OPS 4 +#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (XImode) + #endif /* ! GCC_ARM_H */

14 years, 6 months

Unavailable for a bit

by Michael Hope

Hi there. We've had an earthquake. Family and friends are fine but i'll be unavailable for a few days. Services on ex.seabright.co.nz are down. I'll cancel Wednesdays standup call. See you soon, -- Michael

14 years, 6 months

[ACTIVITY] Feb 14 - Feb 17

by Ulrich Weigand

== GDB == * Working with Will Deacon, identified root cause of GDB problems running on Versatile Express in SMP mode, and verified that Errata workaround fixes the problem * Finished testing GDB HW watchpoints patch on vexpress, submitted complete patch set for mainline inclusion * Reviewed Yao's mainline patch to enable displaced stepping in Thumb mode Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294

14 years, 6 months

[ACTIVITY] Feb 14 -- Feb 20

by Chung-Lin Tang

== Last week == * PR46178, PR46002: both upstream issues related to the priority coloring mode of IRA. Both patches submitted, the first already approved and committed. Vladimir M. did mention that the priority algorithm would be removed once his newer "cover class-less" patches goes in during stage1. Anyways, I got more familiar with IRA during the process, and the patches will still be applicable to 4.5/4.6. * PR43872: incorrectly aligned VLAs under ARM. This turned out to be a one-liner fix. Submitted upstream awaiting approval. * Discussed on email/IRC with Revital Eres on SMS and ARM doloop pattern issues. * Launchpad #721021: Linaro GCC ICE under -mtune=xscale. Investigated a bit; did not see ICE immediately, but GCC went into infinite loop (Khem Raj, the reporter, says it runs for a while then ICEs). * Coremark ARMv5TE vs ARMv7-A performance regression: reproduced consistently using our own Tegra boards. Investigated and seem to have found something, will post more detailed findings later. == This week == * Coremark investigation. * More GCC issues.

14 years, 6 months

Continuous build results

by Michael Hope

Hi there. I've created a new mailing list for the automated build results of Linaro GCC, Linaro GDB, and (once-weekly) FSF 4.5 and 4.6. You can subscribe by going to: https://launchpad.net/~linaro-toolchain-builds I've subscribed everyone in the toolchain working group. If you'd like to get these emails, go to: https://launchpad.net/people/+me/+editemails and set 'Linaro Toolchain Builds' to 'Preferred address'. Any build failures will also go to linaro-toolchain(a)l.linaro.org. The archive is here: https://lists.launchpad.net/linaro-toolchain-builds/ You can also track the crude state of the build hosts here: http://ex.seabright.co.nz/helpers/scheduler and see things status scroll by on #linaro-cbuild on Freenode. This was kicked off due to the problems with the 2011.01 release. The follow-up is tracked here: https://blueprints.launchpad.net/gcc-linaro/+spec/incident-followup-1 -- Michael

14 years, 6 months

[ACTIVITY] 14th - 19th February

by Andrew Stubbs

== GCC == Posted 2 of our 4.5 patches upstream. My latest 4.6 build and test completed, so I've pushed an update to the bzr branch. The branch is now up to mainline state as of the 12th. Merged 3 4.5 patches into Linaro GCC 4.6. Upstream review isn't happening, so I've decided to commit them anyway. The last upload (FSF mainline as of 12th Feb) will therefore become the baseline I'm going to use for Linaro GCC 4.6. Begun benchmarking the questionable patches before forward porting them, using EEMBC. Michael Hope has given me access to one of his A9 Panda boards in New Zealand. This ought to have been straight-forward, but of course it wasn't. It took me a while to convince myself I was getting meaningful results and testing the right thing. Also the A9 seemed to be able to complete the configured iterations in 'zero' time, which fooled me for a while. I think I now have a set up that works. It seems to run very slowly sometimes though - something to do with SSH? ---- Upstream patched requiring review: * Thumb2 constants: http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html * Kazu's VFP testcases: http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html * Jie's thumb2 testcase fix: http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00670.html * ARM EABI half-precision functions http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html * ARM Thumb2 Spill Likely tweak http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html

14 years, 6 months

[ACTIVITY] report week 07

by Peter Maydell

RAG: Red: Amber: Green: DATE/QEMU conference place confirmed, travel booked Current Milestones: | Planned | Estimate | Actual | qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | | Historical Milestones: finish virtio-system | 2010-08-27 | postponed | | finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 | successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 | finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off | first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 | * maintain-beagle-models: + implemented missing epoll syscalls for qemu usermode, submitted upstream https://bugs.launchpad.net/qemu-linaro/+bug/644961 + tracked down the problem causing serial console to break: the new Linux driver uses some extra features of the UART which we weren't modelling https://bugs.launchpad.net/qemu-linaro/+bug/714600 * merge-correctness-fixes: + reworked VZIP/VUZP patch as per review comments, resubmitted + reviewed CL's latest shift patches, added fixes of my own for large shift counts and overlapping src/dest regs, submitted a 10 patch rolled up series + reviewed a patch for adding cp15 VA-PA translation ops + reviewed various versions of vrecpe/vsqrte patches from CL * versatile-express model: B Labs kindly made available their Versatile Express board model: https://github.com/bbalban/qemu/commits/universal-branch and I've spent a few days getting it to boot a Linaro kernel, fixing a few bugs and cleaning up the patchset in preparation for upstreaming it. This included discovering a bug in qemu's SD card model which was causing Linux not to be able to detect cards on PL181, and resulting in spurious qemu warnings on omap3: https://bugs.launchpad.net/qemu-linaro/+bug/714606 * other: + ARM architecture Q&A for modelling engineers + booked travel/hotel for QEMU conference * meetings: toolchain, PDSW-tools, PD comms, Linaro-in-ARM network infrastructure, pdsw-doughnuts and 1st birthday celebration, Current qemu patch status is tracked here: https://wiki.linaro.org/PeterMaydell/QemuPatchStatus Absences: 17/18 March: QEMU Users Forum, Grenoble Holiday: 22 Apr - 2 May 9-13 May: UDS, Budapest (maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver

14 years, 6 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain February 2011