== This Week ==
* TCWG-856 (2/10)
- submitted patch to flatten cfgloop.h:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00277.html
* TCWG-777 (4/10)
- Modified pass to not generate redundant stores
- Investigating ICE caused by the pass during gcc build
- Discussions for possible approaches with Christophe and Kugan
- Reading thru documentation on optabs and ccmp patches
* Misc (4/10)
- Patch sent upstream which fixes segfault in gcc for -dx option.
- Filed upstream binutils bug for "branch range out of error"
- Conference calls
- Travel to Mumbai for US Visa OFC appointment
== Next Week ==
- Word towards committing cfgloop.h flattening patch
- Continue working on TCWG-830, TCWG-777, TCWG-847
== Progress ==
* Maintenance (CARD-1833 4/10)
- ADD/SUB with negative immediates solved by a year old
patch from ARM, sigh. On to the next bug... :(
- Working on https://llvm.org/PR20700
* Buildbots (CARD-1823 2/10)
- Moving benchmark bot to CMake, fixing deepcopy bug in
environment that broke new builds
- Restarting a few bots that crashed
* Background (4/10)
- Code review, meetings, discussions, etc.
- A lot of code review this week...
- Blocking disrespectful web spiders in llvm.org
- Emacs now almost works as I expect
== Plan ==
* Continue PR20700
* Have a look at Polybench
* Look for some more bugs to fix
Benchmarking presentation [7/10]
* More reading
* Ran through a couple more drafts
Misc [3/10]
* Featuring a bug in my backup scripts that took ~1/10 to fix
=Plan=
Back to benchmark automation as main activity
Presentation in the background
== Progress ==
LLDB development
-- Support for running lldb on arm hard float abi targets [TCWG-855] [7/10]
-- Built lldb-server for armhf trusty chromebook
-- Figured out problem with lldb-server showing up i386-linux-gnu as
target triple.
-- Verfied load of arm-elf executable and breakpoint setting.
-- LLDB GDBserver dies while trying to run the target.
Miscellaneous [3/10]
-- Playing with highkey board and setup chromebook with armhf and armel
chroots on ssd.
-- Preparing document for Czech Republic visa
-- Meetings, emails, discussions etc.
== Plan ==
LLDB development
-- Further progress and try to fix run control on armhf targets [TCWG-855]
== This week ==
* TCWG-146 - Detect smin/umin idion (1/10)
- Patch sent upstream for approval
* TCWG-140 - Transform end of loop conditions to min_expr (4/10)
- Patch and investigating validation regressions
* TCWG-833 - Exploit Wide Add operations when appropriate (4/10)
- Investigation into why vectorizer does not exploit wide adds
* Misc (1/10)
- Conference calls
- Conference call with Kugan and Prathamesh to discuss GCC Git workflow
- Conference call with Charles and Prathamesh to discuss
autovectorization
== Next week ==
- Vacation
== Progress ==
(TCWG-831) post-indexed addressing [3/10]
. vectorization project kick-off call
. code browsing/reading to understand mailing list feedback about previous patch
(TCWG-775) NEON error messages [6/10]
. completed conversion of some ARM intrinsics to give same error
messages as AArch64 work
. reworked tests so they can be shared between AArch64, ARM.
. re-submitted previous patch with updated tests
Misc [1/10]
email, irc, gerrit reviews, connect travel booking, AArch64 qemu
big-endian experiment
== Plans ==
submit patch for work done so far on ARM NEON error messages
cortex-a53 workarounds
Benchmark automation - TCWG-360 [3/10]
* Created a partial Jenkins prototype
* Considered some security issues
Benchmarking presentation [5/10]
* Drafted some slides, did some reading
Misc [2/10]
=Plan=
More of the above
== Progress ==
* TCWG-849 (1/10)
- Committed improvement for VRP
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=225108
* Add REG_EQUAL for arm_emit_movpair (4/10)
- Posted patches for review
* TACT -TCWG-851 (3/10)
- Started with the small examples.
- Ran into an error while tuning; looking into it
* Git work flow for upstream patches -TCWG-848 (1/10)
- Had a chat with Michael and Prathamesh
- Tried the work-flow and now started documenting them
* Misc (1/10)
- gcc-patches, gcc-bugs list
- Meetings
== Plan ==
- GCC Bugs
- TACT driven optimization exploration for gcc
* TCWG-830 (4/10)
- Observing tree dumps
- Peeling for alignment happens at -O3 but not at -O2 -ftree-vectorize
Reason: in vect_enhance_data_refs_alignment() for:
a) -O2 -ftree-vectorize: max_allowed_peel == 0
b) -O3: max_allowed_peel == (unsigned) -1;
which equals UINT_MAX and therefore peeling gets allowed.
- Workaround: Pass -param vect-max-peeling-for-alignment=0
- Peeling for alignment with O2 can be enabled by passing
-fvect-cost-model (we don't want this!)
Reason:
opts.c:
/* Tune vectorization related parametees according to cost model. */
if (opts->x_flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
{
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS,
6, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS,
0, opts->x_param_values, opts_set->x_param_values);
maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
0, opts->x_param_values, opts_set->x_param_values);
}
The above if condition becomes false when -fvect-cost-model is passed.
- Proposed patch (untested): http://pastebin.com/ftp0mrwH
Patch follows the workaround and passes --param vect-max-peeling-for-alignment=0
if unaligned access is supported.
* TCWG-777 (4/10)
- Observing tree and rtl dumps
- Workaround: for -O1 pass -fno-tree-fre -fno-tree-dominator-opts
Test-case: http://pastebin.com/cjBcSpiT
Generated assembly at -O1 without workaround: http://pastebin.com/jmQGZhN9
Generated assembly at -O1 with workaround: http://pastebin.com/JGj05z66
Is that the expected output for no unnecessary temps in assembly with
workaround ?
Is it profitable over the assembly generated without workaround ?
- Approach currently taken:
a) New pass "remove-temps" (for lack of better name), after nrv (added
as last gimple pass).
b) Transforms:
if (ssa_var != 0)
to
new_ssa_var = SSA_NAME_DEF_STMT (ssa_var)
if (new_ssa_var != 0)
This "unfolds" cse on expressions within if, which was done by fre
(and if fre was disabled then by dom pass).
c) However this approach results in dead stores.
eg:
_8 = flags_7(D) & 1;
if (_8 != 0)
...
is transformed to:
_8 = flags_7(D) & 1;
_32 = flags_7(D) & 1;
if (_32 != 0)
...
so store to _8 is dead store.
I tried to run dse after remove-temps but that didn't work.
RTL 194r.jump eliminates the above dead store as "trivially dead insn".
However I don't think it's a good idea to have dead stores like these
in gimple and rely
on RTL to eliminate them. I could try to make the pass bit smarter to
not generate redundant stores like _32 != 0 in above case.
d) Patch (no intent to commit as-is): http://pastebin.com/AGXnSkrZ
Generated assembly at -O1 with the patch: http://pastebin.com/VmHCVpGC
Patch eliminates temporaries at -O1 but not at -O2.
I have not yet figured out the reason for that.
For if (flags & 1),
In dfinish pass for -O1, the generated RTL is from
zeroextractsi_compare0_scratch
while for -O2, the generated RTL is from andsi3_compare0
e) Is this a problem also on x86 ?
x86 generated assembly with -O1: http://pastebin.com/XMeTXXwK
* Misc (2/10)
- Getting familiar with vectorizer and NEON gcc intrinsics
- Reviewed git tutorials and starting preparation of git doc
- Conference calls
== Next Week ==
- Continue working on TCWG-830 and TCWG-777
- Header file flattening
- Travel to Mumbai on 2nd July (Thursday) for US Visa OFC appointment.