# Progress #
* TCWG-545, Handle "branch-to-self" instruction in single stepping.
[5/10] Patches are posted upstream for review.
* TCWG-532, one patch is committed and one patch is posted for review.
[2/10]
* Tweak ARM process record. [2/10]
Two patches are pushed in. Many test fails are fixed.
* FSF patches review. [1/10].
# Plan #
* Linaro Connect.
--
Yao
Hi,
I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to have improved significantly. For example, it now seems much better at using ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
However, I still have a few whinges:-)
See attached copy.c / copy.s (This is a performance critical function from OpenJDK)
pd_disjoint_words:
cmp x2, 8 <<< (1)
sub sp, sp, #64 <<< (2)
bhi .L2
cmp w2, 8 <<< (1)
bls .L15
.L2:
add sp, sp, 64 <<< (2)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
(2) Nowhere in the function does it store anything on the stack, so why
drop and restore the stack every time. Also, minor quibble in the
disass, why does sub use #64 whereas add uses just '64' (appreciate this
is probably binutils, not gcc).
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldrb w2, [x3,w2,uxtw] <<< (3)
adr x3, .Lrtx4
add x2, x3, w2, sxtb #2
br x2
(3) Why use a byte table, this is not some sort of embedded system. Use
a word table and this becomes.
.L15:
adrp x3, .L4
add x3, x3, :lo12:.L4
ldr x2, [x3, x2, lsl #3]
br x2
An aligned word load takes exactly the same time as a byte load and we
save the faffing about calculating the address.
.L10:
ldp x6, x7, [x0]
ldp x4, x5, [x0, 16]
ldp x2, x3, [x0, 32] <<< (4)
stp x2, x3, [x1, 32] <<< (4)
stp x6, x7, [x1]
stp x4, x5, [x1, 16]
(4) Seems to be something wrong with the load scheduler here? Why not
move the stp x2, x3 to the end. It does this repeatedly.
Unfortunately as this function is performance critical it means I will
probably end up doing it in inline assembler which is time consuming,
error prone and non portable.
* Whinge mode off
Ed
== Progress ==
o GCC dev. (7/10)
* Remote validation sanitizing:
- Implemented and tested a pure dejagnu fix (the actual
implementation works fine for GCC but might be an issue in a different
context, a cleaner fix almost done)
- Found a latent issue in GCC profiling test harness
* ARM and AArch64 backends LRA cleanup:
- Looked at the remaining artifacts, will prepare a patch for GCC 7
o Misc (3/10)
* Various meetings
* internal discussions
== Plan ==
o Finalize and submit dejagnu fix
Port to microinstance - TCWG-432 [7/10]
* Merged last few months of development back to benchmarking branch
* Restored support for multiple targets per builder
* Updated builder landed, altered jobs to work with it
** Removed assumption that host filesystem is non-persistent
** Stacked up test runs for the weekend
Transfer secret management to LAVA [1/10]
* LAVA jobs now use a within-LAVA key to access sources
Misc [2/10]
* Unsuccessful fiddling with heat-monitoring tools on Juno
* Usual background of mail and meetings
=Plan=
* Fallout from weekend test runs
** Some failure is going on, need to investigate
* Update docs and Jenkins configs w.r.t. last week's activity
* Further investigation on a couple of LAVA issues that are causing me pain
** Un-deserializable bundles
** Inaccessible image reports
* Continue assessing target stability/looking at inconsistent results
== This week ==
* Bugzilla 69663 - [ARM] Implement overflow arithmetic standard names (6/10)
- Tested and posted SImode and DImode patch upstream
- Feedback recommended supporting thumb2 in addition to arm architectures
- Patch to support thumb2 fails on all thumb architectures;
investigating failures
* Bugzilla 70008 - [ARM] Reverse subtract with carry can be generated in
thumb2 mode (2/10)
- Created new bug, developed and successfully tested patch
- Fix posted upstream
* Bugzilla 70014 - [ARM] Predicate does not match constraint
(*subsi3_carryin_const) (1/10)
- Created new bug and patch
* Misc (1/10)
== Next week ==
* Bugzilla 69663 - Cleanup by merging patterns using mode iterators,
submit upstream
* Bugzilla 70008 - Respond to upstream comments as appropriate
* Bugzilla 70014 - Post patch and respond to upstream comments
* Travel to Linaro Connect beginning March 3rd
== This Week ==
* LTO (6/10)
- TCWG 528:
a) reduced test-case for the case when decl node gets visited multiple times
b) updated patch not to walk artificial record decls (typeinfo
objects) as per Richard's suggestion.
submitted upstream, waiting for review.
- benchmarking: Aarch64 SPEC2006-int benchmarks complete
- looked at pr57703
- Slides
* setting up perf on chromebook (2/10)
- perf doc
- got perf running on chromebook by manually building it and set of
(clumsy) workarounds.
- perf annotate shows no output and perf stat shows "not supported" for almost
all entires except "page faults"
- will give a try to dual boot chrubuntu on chromebook
* half-day sick leave (1/10)
- doctor's appointment for eye inflammation
* Misc (1/10)
- Meetings
== Next Week ==
- LTO
- tcwg-310
- look at jenkins tutorial in collaborate wiki
== Progress ==
* Support (4/10)
- Updating patch D17141 for Darwin, resubmitting, discussions.
- Understanding PR21778, may need changes to SLP
- Benchmarking some scheduler choices for A17
* Release (1/10)
- 3.8.0 RC3 validation
* Background (5/10)
- Code review, meetings, discussions, general support, etc.
- Sifting through CVs, interviews, etc.
# Progress #
* Support range stepping on arm-linux. TCWG-518. [5/10]
Post patch series about "the thread is stepping over breakpoint but
it spawns child thread". The fix is OK but the test case changes are
being reviewed.
The more I test my range stepping patches, the more existing bugs I
find. Looking at the bug "software single step the instruction
branch to self."
* AArch64 linux syscall for record/replay. TCWG-532. [1/10]
Patch is out for review.
* Fix some ARM reverse debugging bugs. TCWG-183. [1/10]
Patch is pushed in. The original implementation wasn't carefully
reviewed, so I am sure there are bugs somewhere else.
* Patch review on arm tracepoint support. [1/10]
One patch is approved but I insist that another patch should be done
in generic part instead of ARM specific part, but the author wants do
it in ARM specific part because he things it is simpler.
* Misc [2/10]
** Go through the Linux kernel awareness GDB patches quickly, the first
reaction is "split your patch, please".
** Go to London to collect my passport.
# Plan #
* Support range stepping on arm-linux. TCWG-518.
* TCWG-167, TCWG-532.
* Prepare for the Linaro Connect travel.
--
Yao