* Still having trouble with using multistrap/pdebuild-cross for
cross-compiling Firefox - it looks like only x86 packages get downloaded,
not armel. I have asked Wookey for advice, and he will try to reproduce the
build.
* Falling back to native compiling until the cross-compiling set up has been
sorted out. I will now take a look at how to pass different compiler options
to the Mozilla build system and how to build different parts of the program.
Best Regards
Åsa
(short week: 4 days)
RAG:
Red:
Amber:
Green: blog started :-) http://translatedcode.wordpress.com/
Current Milestones:
|| || Planned || Estimate || Actual ||
||a15-usermode-support || 2011-11-10 || 2011-11-10 || ||
||upstream-omap3-cleanup || 2011-11-10 || 2011-11-10 || ||
Historical Milestones:
||qemu-linaro-2011-07 || 2011-07-21 || 2011-07-21 || 2011-07-21 ||
||qemu-linaro 2011-08 || 2011-08-18 || 2011-08-18 || 2011-08-18 ||
||qemu-linaro 2011-09 || 2011-09-15 || 2011-09-15 || 2011-09-15 ||
||add-omap3-networking || 2011-10-13 || 2011-10-13 || 2011-10-13 ||
||a15-systemmode-planning || 2011-10-13 || 2011-10-13 || 2011-09-22 ||
== other ==
* upstream patch review, putting together pull requests
* more time spent on qemu on ARM host apparent memory corruption
bug (no luck yet :-(); found a Valgrind bug in the process,
though (KDE:284472). This ate up way too much of this week.
* A15 KVM planning work
* meetings etc
* moved over to patches.linaro for QEMU patch tracking
-- PMM
Hi,
* widening shifts - finally committed upstream
* SLP loads with different offsets and operand swaps - committed upstream
* SLP with multiple types - merged to gcc-linaro-4.6
* vectorizer stuff: patch review, test fixes, discussions, bug fix
* Ramana and I discussed what can be done with VEC_PERM_EXPR for NEON,
and created https://blueprints.launchpad.net/gcc-linaro/+spec/support-vec-perm
for this issue.
Ira
Following on from last night's performance call, I had a look at how
64 bit integer operations are mapped to NEON instructions. The
summary is:
* add - fine
* subtract - fine
* bitwise and - fine
* bitwise or - fine
* bitwise xor - fine
* multiply - can't as the instruction tops out at 32 bits. Might be
able to compose using VMLAL
* div, mod - no instruction
* negate - instruction tops out at 32 bits, but could be turned into
vmov #0, vsub
* left shift constant - missing
* right shift constant - missing
* right arithmetic shift constant - missing
* left shift register - missing
* right shift register - tricky, as you do this as a left shift -register
* not - no instruction, but could be done through a vceq, #0?
* bitwise not - missing
I also noticed that the replicated constants aren't being used. A
pre-increment is load constant pool; vadd but could be done as a vmov,
#-1; vsub. The same with pre-decrement - it could be done as a vmov,
#-1; vadd.
This seems worth blueprinting.
-- Michael
limits-fndefn.c takes an impressively long time to run. On an idle
machine, -O3 -g -c takes 17:31 and -O2 -g -c takes The test already
has a dg-timeout-factor of 4 giving a total timeout of 20 minutes.
Removing the -g brings this down to 30 s. Keeping the -g and adding
-fno-var-tracking brings this down to 45 s.
We could bump the multiplier up to 8 but it's getting a bit
ridiculous. Any thoughts?
-- Michael
== Last week and today ==
* Backported fix for returning std::pair<bool, bool>. Unfortunately
this showed up a regression on 4.5. I couldn't reproduce it cross,
and the testcase itself looks innocuous, so I'm wondering whether
the patch might trigger a miscompilation of cc1plus.
* Committed SMS register-scheduling patches upstream and backported
to Linaro 4.6.
* Most of the week spent on -fsched-pressure. Still trying a few
variations in order to get the right balance. (My local haifa-sched.c
now has about 20 new toggles.) Still feel like I'm making progress,
rather than hitting the point of diminishing returns.
Hope Connect goes well. See everyone in a few weeks' time.
Richard
Completed the 4.5 and 4.6 FSF to Linaro merges.
Spun the Linaro GCC release tarballs, uploaded them to the test farm,
and set off the test builds.
Continued looking at the constant reuse optimization. This time I've
build GCC itself with the new pass to see how many optimization
opportunities there are. This shook out a lot more small bugs, which was
useful.
Backported my negative-shifts patch to Linaro 4.6, pushed it to
Launchpad for testing, and then committed it to 4.6 once in was approved.
Experimented with running SPEC2K on A8 and A9 boards in order to
establish a baseline for the generic tuning tweaks. A short test doesn't
give much clue as to what can be achieved, and a long test takes way too
long. The problem is also complicated by the benchmarks where the A8
tuning works better on A9 than A9 tuning does. :S
Received a bug report (GCC bugzilla 50717) for my widening multiplies
patches. Analysed the problem, developed a patch, and posted it to
gcc-patches.
<Short week with 2 days gone on an internal training course>
==Progress===
* Some patch review.
* Spent time looking at LP 836588.
* Tried some different approaches for the vcvt.f64.s32 case and it
looks like the simple solution is the best one unfortunately :(
* 2 days off at internal training course.
=== Plans ===
* continue looking into LP 836588
* Patch review week/
* Work on getting vcvt.f* case done and finish some of the backlog.
Absences.
* 31st Oct - 4th Nov - Linaro Summit Orlando - Travel booked -
* 08 Nov - 11 Nov - Tentatively booked
* Dec 19 - 31st Dec - Tentatively booked
I've just tried rerunning some benchmarks on my panda, which I
reinstalled recently and am getting
some odd behaviour:
The kernel is 3.0.0-1404-linaro-lt-omap
For example:
simple_strlen: ,102400, loops of ,62, bytes=6.054688 MB, transferred
in ,20324707.000000 ns, giving, 297.897898 MB/s
simple_strlen: ,102400, loops of ,32, bytes=3.125000 MB, transferred
in ,7904053.000000 ns, giving, 395.366782 MB/s
simple_strlen: ,102400, loops of ,16, bytes=1.562500 MB, transferred
in ,7354736.000000 ns, giving, 212.448142 MB/s
simple_strlen: ,102400, loops of ,8, bytes=0.781250 MB, transferred in
,91553.000000 ns, giving, 8533.308575 MB/s
simple_strlen: ,102400, loops of ,4, bytes=0.390625 MB, transferred in
,1495361.000000 ns, giving, 261.224547 MB/s
simple_strlen: ,102400, loops of ,2, bytes=0.195312 MB, transferred in
,1983643.000000 ns, giving, 98.461518 MB/s
Note the 8 byte one apparently 40 times faster, and for true oddness:
smarter_strlen_ldrd: ,102400, loops of ,62, bytes=6.054688 MB,
transferred in ,3936768.000000 ns, giving, 1537.984331 MB/s
smarter_strlen_ldrd: ,102400, loops of ,32, bytes=3.125000 MB,
transferred in ,0.000000 ns, giving, inf MB/s
smarter_strlen_ldrd: ,102400, loops of ,16, bytes=1.562500 MB,
transferred in ,4180909.000000 ns, giving, 373.722557 MB/s
Now, while I like infinite transfer rates, I suspect they're wrong.
Anyone else seeing this?
Dave