* TCWG-830 (4/10) - Observing tree dumps
- Peeling for alignment happens at -O3 but not at -O2 -ftree-vectorize Reason: in vect_enhance_data_refs_alignment() for: a) -O2 -ftree-vectorize: max_allowed_peel == 0 b) -O3: max_allowed_peel == (unsigned) -1; which equals UINT_MAX and therefore peeling gets allowed.
- Workaround: Pass -param vect-max-peeling-for-alignment=0
- Peeling for alignment with O2 can be enabled by passing -fvect-cost-model (we don't want this!) Reason: opts.c: /* Tune vectorization related parametees according to cost model. */ if (opts->x_flag_vect_cost_model == VECT_COST_MODEL_CHEAP) { maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS, 6, opts->x_param_values, opts_set->x_param_values); maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS, 0, opts->x_param_values, opts_set->x_param_values); maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT, 0, opts->x_param_values, opts_set->x_param_values); } The above if condition becomes false when -fvect-cost-model is passed.
- Proposed patch (untested): http://pastebin.com/ftp0mrwH Patch follows the workaround and passes --param vect-max-peeling-for-alignment=0 if unaligned access is supported.
* TCWG-777 (4/10) - Observing tree and rtl dumps
- Workaround: for -O1 pass -fno-tree-fre -fno-tree-dominator-opts Test-case: http://pastebin.com/cjBcSpiT Generated assembly at -O1 without workaround: http://pastebin.com/jmQGZhN9 Generated assembly at -O1 with workaround: http://pastebin.com/JGj05z66 Is that the expected output for no unnecessary temps in assembly with workaround ? Is it profitable over the assembly generated without workaround ?
- Approach currently taken: a) New pass "remove-temps" (for lack of better name), after nrv (added as last gimple pass).
b) Transforms: if (ssa_var != 0) to new_ssa_var = SSA_NAME_DEF_STMT (ssa_var) if (new_ssa_var != 0)
This "unfolds" cse on expressions within if, which was done by fre (and if fre was disabled then by dom pass).
c) However this approach results in dead stores. eg: _8 = flags_7(D) & 1; if (_8 != 0) ... is transformed to: _8 = flags_7(D) & 1; _32 = flags_7(D) & 1; if (_32 != 0) ... so store to _8 is dead store. I tried to run dse after remove-temps but that didn't work. RTL 194r.jump eliminates the above dead store as "trivially dead insn". However I don't think it's a good idea to have dead stores like these in gimple and rely on RTL to eliminate them. I could try to make the pass bit smarter to not generate redundant stores like _32 != 0 in above case.
d) Patch (no intent to commit as-is): http://pastebin.com/AGXnSkrZ Generated assembly at -O1 with the patch: http://pastebin.com/VmHCVpGC Patch eliminates temporaries at -O1 but not at -O2. I have not yet figured out the reason for that. For if (flags & 1), In dfinish pass for -O1, the generated RTL is from zeroextractsi_compare0_scratch while for -O2, the generated RTL is from andsi3_compare0
e) Is this a problem also on x86 ? x86 generated assembly with -O1: http://pastebin.com/XMeTXXwK
* Misc (2/10) - Getting familiar with vectorizer and NEON gcc intrinsics - Reviewed git tutorials and starting preparation of git doc - Conference calls
== Next Week == - Continue working on TCWG-830 and TCWG-777 - Header file flattening - Travel to Mumbai on 2nd July (Thursday) for US Visa OFC appointment.