I've been spending this week playing around with various representations
of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the
best representation would be to use built-in functions.
One concern in the original discussion was that the optimisers might
move the original MEM_REFs away from the call. I don't think that's
a problem though. For loads, we can simply treat the whole of the
accessed memory as an array, and pass the array by value. If we do that,
then the call would just look like:
__builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
(where, despite the C notation, the MEM_REF accesses the whole of elem[N]).
It is of course possible in principle for the tree optimisers to replace
this MEM_REF with another, equivalent, one, but that's OK semantically.
It isn't possible for the optimisers to replace it with something like
an SSA name, because arrays can't be stored in gimple registers.
__builtin_load_lanes would then be used like this:
combined_vectors = __builtin_load_lanes (...);
vector1 = ...extract first vector from combined_vectors...
vector2 = ...extract second vector from combined_vectors...
....
So combined_vectors only exists for load and extract operations.
The question then is: what type should it have? (At this point I'm
just talking about types, not modes.) The main possibilities seemed to be:
1. an integer type
Pros
* Gimple registers can store integers.
Cons
* As Julian points out, GCC doesn't really support integer types
that are wider than 2 HOST_WIDE_INTs. It would be good to
remove that restriction, but it might be a lot of work, and it
isn't something we'd want to take on as part of this project.
* We're not really using the type as an integer.
* The combination of the integer type and the __builtin_load_lanes
array argument wouldn't be enough to determine the correct
load operation. __builtin_load_lanes would need something
like a vector count (N => vldN) argument as well.
2. a combined vector type
Pros
* Gimple registers can store vectors.
Cons
* For vld3, this would mean creating vector types with non-power-
of-two vectors. GCC doesn't support those yet, and you get
ICEs as soon as you try to use them. (Remember that this is
all about types, not modes.)
It _might_ be interesting to implement this support, but as
above, it would be a lot of work. It also raises some semantic
questions, such as: what is the alignment of the new vectors?
Which leads to...
* The alignment of the type would be strange. E.g. suppose
we're loading N*2 uint32_ts into N vectors of 2 elements each.
The types and alignments would be:
N=2 uint32x4_t, alignment 16
N=3 uint32x6_t, alignment 8 (if we follow the convention for modes)
N=4 uint32x8_t, alignment 32
We don't need alignments greater than 8 in our intended use;
16 and 32 are overkill.
* We're not really using the type as a single vector,
but as a collection of vectors.
* The combination of the vector type and the __builtin_load_lanes
array argument wouldn't be enough to determine the correct
load operation. __builtin_load_lanes would need something
like a vector count (N => vldN) argument as well.
3. an array of vectors type
Pros
* No support for new GCC features (large integers or non-power-of-two
vectors) is needed.
* The alignment of the type would be taken from the alignment of the
individual vectors, which is correct.
* It accurately reflects how the loaded value is going to be used.
* The type uniquely identifies the correct load operation,
without need for additional arguments. (This is minor.)
Cons
* Gimple registers can't store array values.
So I think the only disadvantage of using an array of vectors is that the
result can never be a gimple register. But that isn't much of a disadvantage
really; the things we care about are the individual vectors, which can
of course be treated as gimple registers. I think our tracking of memory
values is good enough for combined_vectors to be treated as such
(even though, with the back-end changes we talked about earlier,
they will actually be stored in RTL registers).
So how about the following functions? (Forgive the pascally syntax.)
__builtin_load_lanes (REF : array N*M of X)
returns array N of vector M of X
maps to vldN
in practice, the result would be used in assignments of the form:
vectorX = ARRAY_REF <result, X>
__builtin_store_lanes (VECTORS : array N of vector M of X)
returns array N*M of X
maps to vstN
in practice, the argument would be populated by assignments of the form:
vectorX = ARRAY_REF <result, X>
__builtin_load_lane (REF : array N of X,
VECTORS : array N of vector M of X,
LANE : integer)
returns array N of vector M of X
maps to vldN_lane
__builtin_store_lane (VECTORS : array N of vector M of X,
LANE : integer)
returns array N of X
maps to vstN_lane
Note that each operation can be expanded independently. The expansion
doesn't rely on preceding or following statements.
I've hacked up the prototype below as a proof of concept. It includes
changes to the C parser to allow these functions to be created in the
original source code. This is throw-away code though; it would never
be submitted.
I've also included a simple test case and the output I get from it.
The output looks pretty good; there's not even the stray VMOV that
I saw with the intrinsics earlier in the week.
(Note that if you'd like to try this yourself, you'll need the patch
I posted on Monday as well.)
What do you think? Obviously this discussion needs to move to gcc@ at
some point, but I wanted to make sure this was vaguely sane first.
Richard
Temporarily took over Tech Lead of the Toolchain Working Group while
Michael Hope recovers from the Christchurch earthquake. (He's fine, but
unable to work.) This didn't actually require any action, in the end.
Michael returned to work towards the end of the week.
Forward ported, benchmarked, and posted one of Mark Shinwell's NEON
patches upstream.
Further benchmarking was not possible as the Panda board I was using is
located in Christchurch, NZ.
Merged and tested the FSF GCC 4.5 branch into Linaro GCC. There were a
couple of test regressions in the fortran testsuite, so I've filed bug
lp:723086. The other test results were either the same or better.
Benchmarked the ARM A8 function/jump alignment patch to see what effect
it has in GCC 4.6. Found no measurable improvement in EEMBC. I suggest
dropping this patch.
Brought the patch tracker up-to-date, and entered tracking tickets for
all outstanding patches.
Merged FSF trunk to Linaro GCC 4.6.
Committed Jie's Thumb2 testcase fix to FSF GCC trunk. Thanks to Ramana
for using his new found authority to approve it.
Investigated the suitability of several of the patches for
forward-porting. Corresponded with Benrd and Julian.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* Kazu's VFP testcases:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00128.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
== Last week ==
* Launchpad #721021 GCC ICE on ARM/XScale: identified as case of
upstream PR45177; backported and pushed to Linaro.
* Launchpad #709453/CS Issue #7122: Neon vmov 0.0 issues; some progress
on my current WIP patch, but tests showed another 3 regressions, still
on-going.
* Launchpad #711819/GCC PR47719: ICE in push_minipool_fix. Ramana
reminded that my patch, which added some pool range attributes, were
actually removed earlier by Bernd in the fix for PR43137. Discussed and
mostly concluded that we should add them back for now. Will re-submit
patch with testcase to gcc-patches this week.
* Coremark ARMv7-A regressions: still work in progress.
== This week ==
* TW Public Holiday Feb.28 (Mon).
* Ping some of my upstream patch submissions.
* Get incompleted issues done.
* Coremark regression investigation.
Hello Linaro toolchain guys,
I have a few questions regarding GCC fully supporting the ARM Cortex M4,
I'm especially thinking of the additional DSP instructions and if these are supported and how optimal the code being produced is?
Thanks for your support,
Best Regards
Christian (ST-Ericsson)
Hi,
== Investigate developer tools ==
* Finished latrace investigation.
== PandaBoard ==
* The defective PandaBoard that was sent back in December is now repaired and
on my desk again. It doesn't show the behaviour of #708883 and works
flawlessly so far. :)
== libunwind ==
* Did some debugging of the test-async-sig testcase to get started with
libunwind. It will dead-lock if you add "--enable-debug" since libunwind does
printfs in this case which are not signal safe.
* Sorted out which of Zachs patches are upstream and which are not.
* Started to learn about the different unwind methods that libunwind provides
on ARM.
Regards
Ken
== ffi ==
* Sent variadic patch for libffi to libffi-discuss
* Worked through some suggestions from Chung-Lin, need to do some rework
== string routines ==
* memchr & strchr patch sent for inclusion in ubuntu packages
* tried sqlite's benchmarks - they don't spend too much time in the
C library; although
a few % in memcpy, and ~1% in memset (also seem to have found an
sqlite test case failure on
ARM and filed as bug 725052)
== porting jam ==
* There wasn't much traffic on #linaro during this related to the jam
* I closed bug 635850 (fastdep FTBFS) which was already fixed with
an explicit fix for ARM in the changelog
and bug 492336 (eglibc's tst-eintr1 failing) which seems to work now
but it's not clear when it was fixed.
* Looking at eglibc's test log there seem to be a bunch of others
that are failing and may well be worth investigating.
* bug 372121 (qemu/xargs stack/number of arguments limit) seems to
work ok, however the reporter did say it was quite a fragile test;
that needs more investigation to see
whether the original reason has actually been fixed.
== misc ==
* swapping notes with Peter on the PBX SD card investigation
Dave
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | |
Historical Milestones:
finish virtio-system | 2010-08-27 | postponed | |
finish testing PCI patches | 2010-10-01 | 2010-10-22 | 2010-10-18 |
successful ARM qemu pull req | 2010-12-16 | 2010-12-16 | 2010-12-16 |
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
== maintain-beagle-models ==
* rebased qemu-linaro on upstream
* checked omap_uart model for any issues with enabling the extended
(non-16550A) features which the new Linux drivers need. Sent meego
merge request for patchset which turns on the features, and does
a little cleanup. Now in meego, qemu-linaro.
== merge-correctness-fixes ==
* reviewed versions 5 and 6 of Christophe's vrecpe/vsqrte patchset;
v6 was good and has now been committed
* sent a version of "dummy cp14 debug registers" patch upstream;
however I've realised it triggers a false positive in the
temp-leak debugging code in target-arm/translate.c
* wrote/sent a patch which moves this temp-leak debugging code
into TCG proper (which I think makes it much simpler and cleaner
and avoids the false positives mentioned above)
* some work on the cp15 performance counter registers. I now
have some code which I think is a fully architecturally valid
implementation of an "implements no events" core, except that
we don't implement the cycle count register.
* started testing/review of Adam's VA-to-PA translation regs patch.
In the course of this discovered that qemu unconditionally
implements an ARM940 cp15 WFI register which clashes with these;
submitted patch to add correct not-for-v6/v7 feature gating.
* sent out patch fixing usermode seeks by 32 bit guest on 64 bit
host (based on a diagnosis and suggested fix by Eoghan Sherry)
* sent patch fixing compile error in vnc code
== vexpress model ==
* sent a patchset for fixing the MMC card detect wiring on
PBX upstream; this is needed for vexpress too
* finished vexpress cleanup and cross-checking against the docs; I
now have a patchset I'm happy to upstream and will post next week
== other ==
* took part in pgp keysigning event with emdebian folks
* meetings: toolchain, PDSW-tools
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
17/18 March: QEMU Users Forum, Grenoble
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
== GDB ==
* Worked with Will Deacon and the Linaro kernel team to
make sure HW watchpoint and Versatile Express errata
fixes are included in the upcoming Linaro kernel release.
* Committed GDB HW watchpoint patches to mainline, and
backport to Linaro GDB. This completes work on the
HW watchpoint blueprint.
* Worked on fixing the GDB part of #620611 (Unable to
backtrace out of vector page 0xffff0000). Posted
(two versions of) mainline patch for discussion.
* Worked on kernel patch for #615974 (Interrupted system
call handling).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* Looked at the poor code generated for Neon load/store intrinsics.
Looked into the history behind the treatment of VFP registers by
CANNOT_CHANGE_MODE_CLASS. Peter confirmed that the restrictions
apply only to VFPv1. Wrote a patch to improve the code, which
partly overlapped with Julian's.
* Looked at how the operations should be represented at the tree level.
Experimented with various combinations of tree codes and types
to see which felt right. Wrote this up in the message I sent today.
== Next week ==
* More vectorisation.
* Submit some queued patches.
* Maybe some bug fixing. (I see there's a reload bug just waiting
to be claimed by a lucky developer.)
On holiday the following week.
Richard
Services at ex.seabright.co.nz are back up.
On Tue, Feb 22, 2011 at 10:06 PM, Michael Hope <michael.hope(a)linaro.org> wrote:
> Hi there. We've had an earthquake. Family and friends are fine but i'll be
> unavailable for a few days. Services on ex.seabright.co.nz are down. I'll
> cancel Wednesdays standup call.
>
> See you soon,
>
> -- Michael