Hi there. A reminder that today's call has shifted due to the
European daylight savings change. It's now at 0800 UTC which is 9 am
in the UK, 10 am in central Europe, and 10 am in Israel.
-- Michael
== Last week ==
* PR46934: Thumb-1 ICE, small fix in the "casesi" jump-table expand
code. Quickly approved and committed upstream.
* Enhance XOR patch for gcc/simplify-rtx.c. Updated comments and
committed upstream.
* PR48250 / CS Issue #9845 / Launchpad #723185. Unaligned DImode reload
under NEON. Submitted patch upstream, but still need to do some more
verification that older pre-ARMv5TE cases are safe. Should complete this
week.
* Working on a type of ICE seen currently on upstream trunk, a few
testcases failing under '-O3 -g'. It seems VTA related, but also might
have something to do with register elimination not fully done for
(var_location (entry_value ...)) expressions, leaving [afp+#num] memory
addresses existing in debug insns after reload. Still investigating.
* Launchpad #689887, ICE in get_arm_condition_code(). Pushed a merge
request to Linaro 4.5 for this patch. Also another LP#742961 appeared as
another case of this ICE...
* Still working on (what I think should be) the last of the CoreMark
ARMv6 regressions. The problem is to combine uxtb+cmp into ands #255.
This could be done by adding (set (cc) (compare (zero_extend...)))
patterns, implemented by ands assembly, but still looking if this can be
done (probably more elegantly) by something like CANONICALIZE_COMPARISON
(replacing compare operands) in the ARM backend.
* Launchpad #736007, ICE immed_double_const under -mfpu=neon -g. Some
discussion on gcc-patches about this, still unclear on what should be
done...
== This week ==
* Push forward on above issues.
Committed Dan's RVCT interoperation patch, both upstream and to Linaro
GCC 4.6.
Adjusted Benrd's "Discourage NEON on Cortex-A8" patch following Richard
Earnshaw's comments, and reposted upstream. The new version was
approved, and committed. I've also submitted a merge proposal to Linaro
GCC 4.6.
Dropped Tom's patch for marking smalls strings read-only. This
optimization seems to have no visible effect for ARM in GCC 4.6. I'll
leave it it to Tom to forward-port, if it's still meaningful for MIPS.
Julian has committed the patch for lp:675347, so I've submitted merge
requests to both Linaro GCC 4.5 and 4.6.
Bernd has posted the shrink wrapping patches upstream. I've posted this
info in all the relevant Linaro tracking tickets.
Talked Revital Eres through the Bazaar/Launchpad merge request system.
Tried to understand why GCC 4.6 does not use multiply-and-accumulate
efficiently, when used with 64-bit values. It seems that the compiler
sometimes uses (subreg:SI (reg:DI ...)) and sometimes just uses a plain
(reg:SI ..) and those don't combine to give useful patterns, but I
haven't got to the bottom of it yet.
Tested an FSF GCC 4.6 snapshot from the 23rd. All well, so I've merged
it to the Linaro GCC 4.6 branch.
* Future Absence
Away Monday 28th to Friday 1st April.
----
Upstream patched requiring review:
* Thumb2 constants:
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00652.html
* ARM EABI half-precision functions
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00874.html
* ARM Thumb2 Spill Likely tweak
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00880.html
* NEON scheduling patch
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
Hi,
== libunwind ==
* modified the extbtl-parser to operate on the DWARF model directly
* this adds support for unwinding call stacks with mixed (DWARF and extbl)
frames on ARM
* did a few other fixes and cleanups
* posted the patches on the libunwind ml
* set up a tree on git.linaro.org
* attended a class on friday
Regards
Ken
== GDB ==
* Completed glibc patch to add ARM unwind tables to system call stubs
(bug #684218), patch committed upstream and backported to Ubuntu glibc.
* Posted kernel patch to fixes GDB inferior calls while stopped in a
restartable system call (bug #615974); waiting for review.
* Ongoing work to fix single-stepping over signal handlers (bug #615978).
* Implemented patch to fix single-stepping across bad ARM/Thumb boundary
(bug #667309); posted to mailing list for comments.
* Contributed two fixes for valgrind on ARM (to enable running GDB under
valgrind); both now accepted mainline.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
== This week ==
* Moved the discussion about the RTL and gimple representation of
strided loads/stores to the gcc@ list. Got some good feedback:
http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html
* Started a subdiscussion about the handling of modes:
http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html
This is a tricky one. I'll add more fuel to the fire next week.
* Committed two GCC patches to clean up the expand interface.
Dealt with the fallout (some expected, but unfortunately some not).
* Submitted two of the patches to improve code generation for
strided load/store intrinsics:
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01631.htmlhttp://gcc.gnu.org/ml/gcc-patches/2011-03/msg01634.html
* Spent a lot of the week reworking the way the load/store intrinsics
are handled, to fix both correctness and performance bugs. The new
rtl patterns should have the right form for the vectoriser.
Made what feels like good progress, but it's not complete yet.
* Sent separate R_ARM_IRELATIVE patch to glibc, after feedback from
glibc-ports.
* Booked flight and hotel for Budapest summit.
* Pinged unreviewed patches.
== Next week ==
* More intrinsics improvements. I think these are necessary to get good
code out of the vectoriser too.
Richard
== String routines ==
* Wrote a thumb optimised strchr
- As expected it's got nice performance for longer runs but at
sizes <16 bytes it's slower, and a lot of the strchr
calls are very short, so it's probably not of benefit in most cases
( https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr?ac…
)
* Wrote a neon-memcpy
- As previously found with memset, it performs well on A8 but
poorly on A9 - it does however do the case where
the source/destination isn't aligned quite well even on A9 ; the vld1
unaligned case works with relatively little penalty.
(it performs comparably to the Bionic implementation - mine is a
bit faster on shorter calls, Bionic is better
on longer uses - I think that's because they've got some careful use
of preloads where I have so far got none).
I'm on holiday up to and including 5th April.
Dave
== GCC ==
Progress:
* Investigated excessive VFP moves . Investigating ways forward.
* Went through some of the test results with 4.6 RC2 upstream - looking
through test results etc.
* Setup SPEC2k6 cross on my Linaro machine.
* Waiting for my new Panda board sometime next week.
* Some small bug fixes upstream. Need to rework a couple of
documentation patches after review.
Plans:
* Continue looking at excessive VFP moves.
* Continue to look at some patches upstream.
* Finish working through Thumb2 speed tickets.
* Set up new Panda board.
* Start looking at DENBench results and identify
potential speed up areas.
Meetings:
* 1-1s
* Linaro toolchain meeting
Absences:
* March 30th (maybe): WC Cricket Semi-final. (Ind v Pak)
* April 15 – 26 -> Booked Holiday.
* May 9-14 - LDS Budapest
RAG:
Red:
Amber:
Green:
Current Milestones:
| Planned | Estimate | Actual |
qemu-linaro 2011-04 | 2011-04-21 | 2011-04-21 | |
Historical Milestones:
finish qemu-cont-integration | 2011-01-25 | 2011-01-25 | handed off |
first qemu-linaro release | 2011-02-08 | 2011-02-08 | 2011-02-08 |
qemu-linaro 2011-03 | 2011-03-08 | 2011-03-08 | 2011-03-08 |
== maintain-beagle-models ==
* benchmarking/testing of the TCG locking fix: oddly benchmarks
seem to come out with less slowdown (1% or less) than a system
mode bootup/shutdown. (I used scimark and dhrystone. scimark
is the same speed, which is to be expected because we spend all
our time doing floating point emulation. I was expecting a bigger
perf hit on dhrystone, though.)
* submitted patches to make qemu fail cleanly if you ask for more RAM
than a board supports
== merge-correctness-fixes ==
* tested the Neon element load/store instructions; wrote patches to
fix UNDEF handling (which are blocked waiting for the patch pipeline
to be drained) and confirmed there aren't any other bugs.
There is a meego patch to use helper functions for multi-element
load/store which is apparently to avoid overflowing a TCG buffer:
need to test and upstream this.
* investigated android qemu tree for any missing correctness fixes:
looking through the changelog I think we have fixes upstream for
everything that was fixed in the android tree.
== other ==
* patch: fix versatilepb/realview handling of multiple nic options
* patch: better diagnosis of more nics requested than board supports
[this is needed to get the vexpress patch committed]
* reviewed a patch to add ARMv4/v4T support to qemu
(mostly consists of making sure we UNDEF in the right places)
* meetings: toolchain, standup, pdsw-tools, 1-2-1
Current qemu patch status is tracked here:
https://wiki.linaro.org/PeterMaydell/QemuPatchStatus
Absences:
Holiday: 22 Apr - 2 May
9-13 May: UDS, Budapest
(maybe) ~17-19 August: QEMU/KVM strand at LinuxCon NA, Vancouver
Hello,
Implemented a patch to apply SMS in the presence of instructions with
REG_INC_NOTE. (this occurs in telecom/autocor thus SMS needs to be run
with -fno-auto-inc-dec
flag to be applied)
Sent a merge request to gcc-linaro for the SMS patches.
Thanks to Andrew Stubbs for his help.
https://code.launchpad.net/~eres/gcc-linaro/SMS_doloop_for_ARM
I intend to send a request to gcc-linaro.4.6 as well.
Thanks,
Revital