Hi,
- vectorization of widening multiplication of unsigned types and
constants - committed to mainline
- fix vectorizer testsuite failures on ARM - submitted
- testing a patch to fix a bug in the vectorizer revealed by the
widen-mult patch
- testing a patch to fix bad peeling heuristic that causes
gcc.dg/vect/vect-72.c to fail on ARM
Ira
I've done a quick write-up on the (almost) continious builds done in
the toolchain group:
https://wiki.linaro.org/WorkingGroups/Builds
It's high level and includes things like what branches we watch, how
often they get built, where the results go, and how things like
testsuite results are shared. This is in follow up to the email on
testsuite diffs yesterday.
-- Michael
Public Holiday on Monday.
Learned that Linaro are reducing their funding to just one CodeSourcery
engineer, myself. Spoke to Chung-Lin to break the news and reassign him
to other work. Chung-Lin will now be working on MIPS16 Eglibc porting.
Pinged my ADDW/SUBW patch, again. Ramana finally reviewed it, so I've
addresses his concerns and reposted. The corrected patch was approved,
so I've set it to test before committing.
Continued work on widening multiplies tree optimizations in GCC. Bernd
made it sound quite easy, but changing the type of some operations means
quite a lot of tweaking and reworking in the rest of the compiler expand
routines. In particular, the widening stuff needs to be broken out of
expand_binop, and recast.
Merged, tested and committed the latest patches from FSF 4.5.
Merged, tested and committed the latest patches from FSF 4.6.
Richard Earnshaw approved my widening multiply RTL patch, so I've set
that to test in the Linaro test system.
Richard also approved my SMLALTB/SMLALTT patch. Set that testing also.
Responded to a question on ask.linaro.org.
----
Upstream patched requiring review:
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
Fixed an SMS patch following comments received in the gcc@ ml.
While testing the fix I discovered another issue-- latest mainline
ICEs with SMS flags while building libgcc on ARM configured with
--with-arch=armv7-a.
This new failure does not seem to be related to the above fix and I'm
now investigating it.
Looked at code generated for spec2006's libquantum, hmmer and
cactusADM_base benchmarks.
== String routines ==
* Wrote a hybrid ARM/Neon memcpy - it uses Neon for non-aligned cases or
large (>=128k) cases
* polished up and sent out write up of workload analysis of denbench and spec
* Ran denbench with all memcpy and memset varients, graphed up results
- SPEC 2k6 is now cooking with the memcpy set - it'll take all weekend.
== 64 bit atomics ==
* Started looking through the Gcc code at the existing non-64bit atomic code;
I need to understand how registers work in DI mode and what's going to be
needed in terms of temporaries.
Dave
== Progress ==
* Finished breaking down the Thumb2 performance blueprint
* Some patch review and bugzilla maintenance.
* Canonicalized vorn and vbic. Bootstrap failure reported . Fixed upstream
* Rewrote parts of the DImode expanders and combined them to two
patterns with alternatives that get enabled based on the architecture
variant. While looking at the bug with adr's possibly going out of
range, it looks like there is a bug in const_ok_for_op with respect to
how it attempts to generate code for a DImode move of 0xffffffff which
can be implemented as a simple mvn but gets split into 3 instructions
More explanations in the patch when it comes out.
* Thumb2 performance meeting this week.
* Talked to RichardS about A8 and Neon / auto-increment issues he was
seeing with scheduler descriptions and looked again at the A8 TRMs and
the examples.
* Looked at lrint and lrintf which are C99 functions for rounding and
created a prototype lrint and lrintf patch for GCC that now appears to
generate the vcvtr instructions.
== Plans ==
* Spend some time on the VFP moves and look at ivopts for a bit.
* Finish testing and submit upstream my other patch with DImode moves
and cases where we are splitting more than necessary.
* Start looking at some of the T2 performance work items.
* Patch review. Finish TLS patch review .
* Try to get vcvtr working and tested with eglibc.
* Look at RichardS's comments and testcase for the A8.
Meetings:
* 1-1s
* Linaro calls.
[Short week: bank holiday]
RAG:
Red:
Amber:
Green:
Current Milestones:
|| || Planned || Estimate || Actual ||
||qemu-linaro-2011-06 || 2011-06-16 || 2011-06-16 || ||
Historical Milestones:
||finish qemu-cont-integn || 2011-01-25 || 2011-01-25 || handed off ||
||first qemu-linaro release || 2011-02-08 || 2011-02-08 || 2011-02-08 ||
||qemu-linaro 2011-03 || 2011-03-08 || 2011-03-08 || 2011-03-08 ||
||qemu-linaro 2011-04 || 2011-04-21 || 2011-04-21 || 2011-04-21 ||
||qemu-linaro 2011-05 || 2011-05-19 || 2011-05-19 || n/a ||
||close out 1105 blueprints || 2011-05-28 || 2011-05-28 || 2011-05-19 ||
||complete 1111 planning || 2011-05-28 || 2011-05-28 || 2011-05-27 ||
== upstream-omap3-patches ==
* started on disentangling the patchstack: submitted patches upstream
for a few standalone fixes. First few steps in a big job...
== omap3-usb-model ==
* added QEMU's USB OHCI model to the omap3/beagle; the kernel detects
the USB controller and hub but not any attached devices; more
debugging required
== other ==
* discussions about Android emulator
* office move
* QEMU 0.15 is not too far in the future: need to make sure all the
ARM stuff we want is in it
Meetings: standup, GSoC student
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
1-5 August: Linaro sprint 1111
(maybe) 15-16 August: QEMU/KVM strand at LinuxCon NA, Vancouver
[LinuxCon proper follows on 17-19th]
== This week ==
* Spent about half of the week on auto increment/decrement. There are two
execution failures left.
* Looked at assembly comparisons between the old pass and various forms
of the new pass. The results look reasonable.
* Ran DENbench and my libav microbenchmarks to measure the difference
in performance. Saw that some tests were repeatably worse.
* Looked into those tests and realised that they were being hit by the
lack of an address writeback model in the scheduler (a known limitation).
Dependent stores were being scheduled in a block at the end of the loop
because we said that the dependencies had 0 latency.
* Spent most of the rest of the week on fixing that limitation. One of the
difficulties is that define_bypass currently requires a complete list
of instruction reservations. This is difficult for things like writeback
because the result could in principle be used by many different instructions.
Decided to generalise define_bypass so that it can handle filename-style
globs.
* Wrote a patch to model writeback in NEON.
* Wrote a patch to model writeback in core instructions. However,
while doing this, I noticed that the behaviour I'm seeing on our
Cortex-A8 doesn't match what I'd expected from GCC's A8 scheduler
description (or the documentation). Talked with Ramana about it.
Distilled a benchmark.
* These scheduler changes didn't improve the DENbench and libav
scores much by themselves, but the combination of the scheduler
and auto inc/dec changes did produce noticeable improvements
in some libav benchmarks and rather smaller improvements in
some DENbench ones.
== Next week ==
* Finish scheduler work, in light of observed behaviour.
* More testing prior to submission.
I'm away the week of 13th June.
Richard