== This week ==
* TCWG-619
- v8 LTO build with different options for x86 and aarch64.
- Reported upstream v8 LTO build failure on ARM.
- Tried to build chromium with FSF gcc, linaro binary release and
linaro-4.9-branch
* PR49551
- Not able to reproduce ICE with latest trunk (r221871).
* Misc
- College assignments submission and term end.
== Next Week ==
* TCWG-619
- Build chromium with linaro-4.9-branch and trunk.
- Prepare stats for LTO build with different options for v8 on x86 and aarch64
- Try building chromium with LTO with FSF trunk for arm
* TCWG-639:
- Add enhancement to header file flattening script.
== Progress ==
Friday holiday
* Automation Framework (CARD-1378 2/10)
- Power cut in the office
- Fixing gateway, rebooting machines
- Mob management
* LLVM ARM Maintenance (CARD-1833 2/10)
- ARMTargetParser review
* Background (4/10)
- Code review, meetings, discussions, etc.
- All LLVM buildbots broken (one still)
- Trying to merge Android round/exception
- https://android-review.googlesource.com/#/c/125910/1
- Not that easy, will need bigger changes and tests to go in
== Plan ==
* Long holidays
* EuroLLVM
* Back on the 15th
Hi,
I did some tests on the following function
--- CUT HERE ---
int fibo(int n)
{
if (n < 2) return 1;
return (fibo(n-2) + fibo(n-1));
}
--- CUT HERE ---
and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.
Looking at the disassembly I see it is using FP registers to hold
integer values. The following is a small extract.
.L3:
fmov w0, s8
sub w25, w25, #1
cmn w25, #1
add w0, w0, w27
fmov s8, w0
bne .L19
add w0, w0, 1
b .L2
Recompiling with -mgeneral-regs-only generates a huge improvement.
The following are the times I get on various partner HW. I have
normalised the -O2 times to 1 second so that I do not disclose actual
partner performance data:
Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec
Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec
Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec
Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec
So, in general, -O3 does actually do better than -O2, but in all cases
performance is better if I stop it using FP registers for int values.
I have put a tarball of the test program along with 3 binaries and 3
disassemblies here:-
http://people.linaro.org/~edward.nevill/fibo.tar
All the best,
Ed.
Hi,
I'm seeing the following build error trying to build from the current master
branch (1ac806b) of http://git.linaro.org/toolchain/binutils-gdb.
make[3]: *** No rule to make target `-L../zlib', needed by `run'. Stop.
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory `gdb/sim/arm'
The following commit predating the zlib changes appears to build without error.
b19a8f8545100a08ee2a64c05631aff6f651faa1
Thanks,
Chris
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
catomics - TCWG-436 [5/10]
* Got pointed at a suitable set of benchmarks, results still underwhelming
* However, patches were using relaxed atomics rather than no atomics at all
* Fiddled abe into building sysroots for me (I get libstdc++ that way)
Misc - [5/10]
* Tidied up some 'perf shotgun' scripting from the juno cache
investigation, so I've got the tools for next time
* Started sorting out my backups - but didn't finish before build-01's
death destroyed a bunch of work
* Raised priority of sorting out my backups, now just a matter of
waiting on some large rysncs
* Pieced my world back together on dev-01
=Plan=
Holiday Wednesday, public holidays next Friday and Monday
See how catomics do when we're conditionally-not-atomic-at-all
Investigate a bit to see if I can see if there's a reason we were
using relaxed atomics
Resurrect Jira benchmarking on dev-01
* Will include some porting, scripts don't work out of box on dev-01
One day off on Friday. [2/10]
# Progress #
* aarch64 gdb, the number of FAIL is reduced to 26 on aarch64-linux!
there are still about 10+ FAILs can be fixed. [4/10]
** TCWG-726, fails in gdb.base/break-interp.exp. Fixed.
Remove prelink package from juno board as aarch64 isn't supported.
** TCWG-681, fails in savedregs.exp. Patch is committed.
** PR 18139. Patches are committed.
* arm gdb, 938 fails for -mfloat-abi=soft and 1014 fails for
-mfloat-abi=hard. Analysing fails. [2/10]
* GDB kernel-awareness meeting with ST. [1/10]
Understand the definition of "kernel-awareness", and will
discuss about the design upstreams later.
* TCWG-716, investigate LLDB perf testing. [1/10] Done.
** LLDB already had something about performance testing, in
lldb/test/benchmarks/ and lldb/tools/lldb-perf.
** TestCompileRunToBreakpointTurnaround.py compares the speed of LLDB
and GDB, but in an incorrect way.
# Plan #
* Take more care on arm gdb test fails.
* Fix the rest of aarch64 gdb fails.
--
Yao
== Issue ==
* none
== Progress ==
* Infrastructure and Validation (1/10)
* GCC Upstream (6/10)
- PR63587 and PR64871 committed in FSF 4.9 branch.
- PR64208 patch review is OK, but needs to be validate on an iWMMXT platform
(pinged some Marvell people).
- Submitted a fix for arm_subsi3_insn (alternatives issue). This is
a stage1 patch.
- Identified another insn which has alternatives issues in Thumb2.
* Release and Backports (1/10)
- Backflip maintenance
- 12 Backports for 2015.04 (CARD TCWG-699)
* Misc (2/10)
- Various meetings
- ST internal year review
== Plan ==
- Continue upstream work.
* ASAN/TSAN run on 42 bit VA Aarch64 (TCWG-634) (6/10)
Sent a patch that enables ASAN tests with 64 bit allocator on
amd-01 (AMD Seattle). All ASAN test passes in LLVM.
But on juno platform 39 bit VA does not have enough memory to map
hence we need to stay on 32 bit allocator.
Discussed with ASAN community and it is been decided to use 32 bit
allocator as default. They are not ok with having a mechanism to
detect VA and swutch allocators based on that.
Started looking at failures on amd-01 (AMD steatle) with 32 bit alloctor.
None of the ASAN tests ran when I switched to 32 bit allocator on amd-01.
Reason there is a spin mutex lock which is waiting for the memory
allocation to complete, but assertion failure makes it to wait
infinitely.
After fixing map range the assertion failure is gone but I keep
getting some failures with 32 bit allocator "on".
Bug869: Continued to look at ABS_EXPR cases (2/10).
* Emails, meetings. (2/10)
* Linaro 1-1 with christophe, Ryan, status meet
* AMD meetings/event, 1-1 with AMD manager, status meeting.
* GCC mailing list.
== Plan ==
*Continue to fix TSAN/ASAN 32bit allocator failures on amd-01 .
* Bug869
== Progress ==
* Type promotion pass (zero/sign extension elimination) - TCWG-547 (2/10)
- Ran more benchmarks and gathered more data (will post the results)
- Need to run perf to analyse regressions
* Bug 1373 (1/10)
- Set-up back-porting infrastructure
- Ran into some issues
* TCWG-486 (6/10)
- Discussed with Jim and identified the issues and possible fixes
- Getting closer to an acceptable fix
- Need to run benchmarking
* Misc (1/10)
- gcc-patchs and gcc-bugs list
== Plan ==
* TCWG-620 and TCWG-547