Hi,
After learning how to control MEM_ALIGN and, therefore, alignment
hints from the vectorizer, I was able to generate 64-bit hints (with
the help of Ramana's patches). I saw a 16% improvement on a benchmark
with stack variables, for which we now force alignment to 64 bits and
create alignment hints, instead of using peeling.
Ira
Hi,
* Finished an across-compilers report for benchmarks over the latest in FSF
and Linaro series. Will start storing results in the
linaro-toolchain-benchmarks bzr repository.
* Looking closer at eembc results, especially regressions between gcc-4.4
and gcc-4.6. Did runs with gcc-linaro-4.4 with -fno-unroll-loop. Will
continue analyze and try to present the result in a good way.
* Reviewed Michael's geomean implementation.
* I will be on Christmas holiday w52 and w01, will be back 9/1.
/Regards
Åsa
Hi there. Could the toolchain team please have a look through the
current GCC blueprints and update them? You can see a list and states
at:
http://apus.seabright.co.nz/helpers/backlog
and for gcc-linaro only at:
http://apus.seabright.co.nz/helpers/backlog/project/gcc-linaro
Please check for any that:
* are on your short-term todo list but aren't against your name
* have been started but are stuck in the backlog or todo
* are finished but not marked as such
* are blocked
* are duplicates or too undefined
* or are obsolete
I'm especially interested in:
* "slp-supported-ops"
* "sms-register-scheduling"
* "better-block-operations"
* "libraries-for-backlog"
* "backport-conditional-execution"
* "improve-peeling"
* "64-bit-sync-pimitives"
* "neon-strided-load-extract"
If you've finished a significant amount of work on one blueprint then
let me know. We can split that work out and push the rest back into
the backlog.
Also, let me know if you're blocked on final benchmarking. We can now
easily benchmark a merge request and see the difference.
-- Michael
Continued work on 64-bit neon operations. The negdi2 seems to be more
difficult than previously thought - vneg won't do it, and there's no way
to encode either "0-reg" or "not(reg)+1", so I'm shelving that idea for
the moment, and moving on to one_compldi2_neon, which ought to be
straight forward.
Did the entire Linaro GCC release process, in the absence of Michael
Hope, from source to announcement. The process didn't go as smoothly as
I'd have liked, but I got through it, mostly. Hopefully Michael won't be
travelling next time ...
Tried to figure out how to do 64-bit shifts using a QImode shift amount.
This promised to eliminate the unnecessary zero-extends, but it doesn't
work because neither iwmmxt or neon registers are permitted to hold
QImode values (presumably changing this would have consequences
elsewhere?). Annoyingly, it's also not possible to put SImode values in
(most) neon registers, so I'm not sure quite how to optimize the values.
More investigation required.
Hi!
This week was spent doing internal ST-E work, but related to the Linaro
tcwg so I will give a short summary anyway.
I have taken the Linaro toolchain (prebuilt by the Android working group)
and used it in our internal Android build.
There were several build errors, as expected when going from gcc-4.4.3,
which is the default compiler in Android (Gingerbread) to gcc-4.6.2. Many
errors were solved with patches from the Linaro Android distribution.
Did some benchmarking related to web browsing:
ARMBBench (load and rendering of web pages) - gave me 4-6% improvement with
the Linaro toolchain.
Sunspider and BroserMark (JavaScript) - gave me ~6% overall regression with
the Lianaro toolchain. However, when zooming in to individual test cases -
SunSpider consist of ~25 tests in 9 categories - the results are really
scattered. A few tests are mainly contributing to the regression. I try to
narrow things down to understand which code parts in v8 (the JavaScript
engine) that causes the slowdown.
Best regards
Åsa
Continue working on the patch to estimate register pressure on SMS:
Addressing the comments received from Richard and Ayal.
Testing the patch on libav micro benchmarks.
Summary
* "make check-gcc" for linux gcc, cygwin gcc and native windows gcc.
Details:
1. "make check-gcc" on linux.
* One more failed case (gcc.dg/visibility-d) for the toolchain
generated from crosstool-ng based on embedded toolchain code base. But
logs show the .s files are the same.
2. "make check-gcc" on windows.
* Dir format issue:
Native windows programs require the disk symbol format as c:, d:,
etc. But in cygwin, it is changed to /cygdrive/c, /cygdrive/d. Need
wrapper to convert it.
* qemu output in cygwin (Qemu-0.15.1-windows-Medium.zip from
http://lassauge.free.fr/qemu/)
qemu can not output the result like "*** EXIT 0" on screen. Need
wrapper to handle it.
* "make check-gcc" for cygwin toolchain (build from scratch in cygwin).
You can run make check like it on linux.
* "make check-gcc" for pre-installed binary toolchain (installed as
native windows programs)
a. configure gcc from the source package. (Only need the config*,
Makefile to make sure "make check" work)
b. reset the TEST_GCC_EXEC_PREFIX (site.exp) to the correct dir
(INSTALL DIR) with the right format.
c. wrap gcc/xgcc to use the pre-installed gcc and change the dir format.
d. handle /usr/share/dejagnu/testglue.c (cp it to current test dir
or convert it to windows path)
Plan:
* Handle g++ test on windows.
* Work out a formal document or wiki page on how to "make check-gcc" on windows.
* Test and analyze the failed cases.
Best regards!
-Zhenqiang
PS:
1) qemu-system-arm.exe sample
#!/bin/sh
dir=`dirname $0`
run ()
{
# Change /cygdrive/e to e:
para=`echo $* | sed -e 's/\/cygdrive\/e/e\:/'`
# arm.exe is the real qemu-system-arm.exe
# output to stdout.txt or stderror.txt.
$dir/arm.exe $para | tee
# output to screen
cat $dir/stdout.txt
}
run $*
2) xgcc.exe sample
#!/bin/sh
run ()
{
# Change /cygdrive/e to e:
para=`echo $* | sed -e 's/\/cygdrive\/e/e\:/'`
# Use a local copy of testglue.c rather than /usr/share/dejagnu/testglue.c
para=`echo $para | sed -e 's/\/usr\/share\/dejagnu\/testglue.c/testglue.c/g'`
# run the test with preinstalled binary toolchain
#TBD: handle g++
arm-none-eabi-gcc.exe $para
}
run $*
3) TEST_GCC_EXEC_PREFIX in site.exp sample
# Toolchain is installed at e:/Dec/RC3.
TEST_GCC_EXEC_PREFIX "e:/Dec/RC3/lib/gcc/"
Hi,
* I've been debugging various errors and warnings that I encountered
with the binary CSL 2011.03 toolchain
* Fleshed out my recipe for the external toolchain; now get a working
core-image-minimal that boots fine within qemu
* Debugged why cmake based recipes (like libproxy) are having trouble
when compiling with an external toolchain
* Currently the libc is provided by the sysroot of the external
toolchain. This might not be ideal and as time permits I'd like to find
a way to get eglibc build instead.
Regards
Ken
Task Planned Estimated Actual
Historical
~~~~~~~
Connect 2011.q4
preparation 28/10/2011 28/10/2011
28/10/2011
Linaro Tasks
~~~~~~~~~~~~
Fully Investigate the O3
performance
regressions 31/01/2012
Neon backend experiments 09/12/2011 14/12/2011
with alignment hints
and addressing mode work.
Investigate partial-partial
PRE and regression with
bitmnp01 18/12/2011
Writeup on the optimizations 31/12/2011
enabled with PGO
RAG :
RED : None
AMBER:
==Progress===
* The Android guys found a bug with the vcvt.f64.s32 instruction
coming out after my patch and I found a few assembler issues as well
during this process which are now fixed upstream.
* Backported the A15 patches into Linaro 4.6
* Assisted as needed with the release which really wasn't too much
work for me other than the revert .
* Backported one part of the partial-partial PRE patch . Still looking into it.
* Did some analysis of the failure with di-layout.c test failure and
RichardS has now fixed it in the middle-end.
* Wrote a patch to replace all vector mode aligned vldm / vstm with
equivalent vld1.64 and vst1.64 to allow more alignment hints to come
out of the compiler. Still not fully happy with it but it's looking
much better than the original hack.
=== Plans ===
* Continue looking at partial-partial PRE and try and understand it further.
* Flush out these neon patches that I'm accruing with the addressing
modes and see where we get to with alignment hints and vld1.64's .
* Look at movw's / movt's vs constant pools.
* Submit my PGO patch .
Absences.
* Dec 19 - 31st Dec - Tentatively booked
* Feb 6-10 : Linaro Connect Q1.12.
* Feb 11- 15 : Holiday.