linaro-toolchain December 2010

linaro-toolchain@lists.linaro.org

37 participants
63 discussions

by Ira Rosen

- Continued looking into NEON special loads and stores. - Benchmarks: concentrated on EEMBC Telecom: - autcor gets vectorized - viterbi, besides strided data accesses, needs to sink conditional stores to allow if-conversion and make the main loop vectorizable. Since the potential here is 4x, I think it's worthwhile to work on this. - conven, fbital also have control-flow issue, but much more complicated than viterbi - fft has a problem with loop count, I would like to investigate this a bit more - diffmeasure doesn't seem to have vectorization potential - Fixed GCC PR 46663 on trunk, testing the fix for 4.3, 4.4, 4.5.

14 years, 7 months

[PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

by Julian Brown

Hi, Here's a work-in-progress patch which fixes many execution failures seen in big-endian mode when -mvectorize-with-neon-quad is in effect (which is soon to be the default, if we stick to the current plan). But, it's pretty hairy, and I'm not at all convinced it's not working "for the wrong reason" in a couple of places. I'm mainly posting to gauge opinions on what we should do in big-endian mode. This patch works with the assumption that quad-word vectors in big-endian mode are in "vldm" order (i.e. with constituent double-words in little-endian order: see previous discussions). But, that's pretty confusing, leads to less than optimal code, and is bound to cause more problems in the future. So I'm not sure how much effort to expend on making it work right, given that we might be throwing that vector ordering away in the future (at least in some cases: see below). The "problem" patterns are as follows. * Full-vector shifts: these don't work with big-endian vldm-order quad vectors. For now, I've disabled them, although they could potentially be implemented using vtbl (at some cost). * Widening moves (unpacks) & widening multiplies: when widening from D-reg to Q-reg size, we must swap double-words in the result (I've done this with vext). This seems to work fine, but what "hi" and "lo" refer to is rather muddled (in my head!). Also they should be expanders instead of emitting multiple assembler insns. * Narrowing moves: implemented by "open-coded" permute & vmovn (for 2x D-reg -> D-reg), or 2x vmovn and vrev64.32 for Q-regs (as suggested by Paul). These seem to work fine. * Reduction operations: when reducing Q-reg values, GCC currently tries to extract the result from the "wrong half" of the reduced vector. The fix in the attached patch is rather dubious, but seems to work (I'd like to understand why better). We can sort those bits out, but the question is, do we want to go that route? Vectors are used in three quite distinct ways by GCC: 1. By the vectorizer. 2. By the NEON intrinsics. 3. By the "generic vector" support. For the first of these, I think we can get away with changing the vectorizer to use explicit "array" loads and stores (i.e. vldN/vstN), so that vector registers will hold elements in memory order -- so, all the contortions in the attached patch will be unnecessary. ABI issues are irrelevant, since vectors are "invisible" at the source code layer generally, including at ABI boundaries. For the second, intrinsics, we should do exactly what the user requests: so, vectors are essentially treated as opaque objects. This isn't a problem as such, but might mean that instruction patterns written using "canonical" RTL for the vectorizer can't be shared with intrinsics when the order of elements matters. (I'm not sure how many patterns this would refer to at present; possibly none.) The third case would continue to use "vldm" ordering, so if users inadvertantly write code such as: res = vaddq_u32 (*foo, bar); instead of writing an explicit vld* intrinsic (for the load of *foo), the result might be different from what they expect. It'd be nice to diagnose such code as erroneous, but that's another issue. The important observation is that vectors from case 1 and from cases 2/3 never interact: it's quite safe for them to use different element orderings, without extensive changes to GCC infrastructure (i.e., multiple internal representations). I don't think I quite realised this previously. So, anyway, back to the patch in question. The choices are, I think: 1. Apply as-is (after I've ironed out the wrinkles), and then remove the "ugly" bits at a later point when vectorizer "array load/store" support is implemented. 2. Apply a version which simply disables all the troublesome patterns until the same support appears. Apologies if I'm retreading old ground ;-). (The CANNOT_CHANGE_MODE_CLASS fragment is necessary to generate good code for the quad-word vec_pack_trunc_<mode> pattern. It would eventually be applied as a separate patch.) Thoughts? Julian ChangeLog gcc/ * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Allow changing mode of vector registers. * config/arm/neon.md (vec_shr_<mode>, vec_shl_<mode>): Disable in big-endian mode. (reduc_splus_<mode>, reduc_smin_<mode>, reduc_smax_<mode>) (reduc_umin_<mode>, reduc_umax_<mode>) (neon_vec_unpack<US>_lo_<mode>, neon_vec_unpack<US>_hi_<mode>) (neon_vec_<US>mult_lo_<mode>, neon_vec_<US>mult_hi_<mode>) (vec_pack_trunc_<mode>, neon_vec_pack_trunc_<mode>): Handle big-endian mode for quad-word vectors.

14 years, 7 months

Building a cross compiler

by Michael Hope

Hi there. I've had a few questions recently about how to build a cross compiler, so I took a stab at writing the steps down in a Makefile. See: https://code.launchpad.net/~michaelh1/+junk/cross-build Hopefully it's easy to follow. It uses a binary sysroot and gives you vanilla binutils 2.20 and Linaro GCC 2010.11 in a good enough way that you can cross-compile for Maverick. The script is minimal and trades readability for flexibility. Note that Marcin's cross compiler packages or the Embedian toolchains are a better way to go, but if you want to see the steps involved check out the script. Marcin or Matthias, would you mind reviewing it? -- Michael

14 years, 7 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain December 2010