Re: O2 optimization with vectorize

11 Apr 2012


      "Singh, Ravi Kumar (Ravi)" Ravi.Singh@lsi.com wrote:
...
None of the generated code contains the NEON instructions. Code
generated with case 1 is taking 3000 cycles, and code generated by
option 2 is taking 2500 cycles.
Even if vectorization failed in case1, it should not generate more
inefficient code than case 2. My belief was that the executables
from both would take same cycles, any thing done for doing
unsuccessful vectorization must be reverted if it did not succeed.
I suspect the reason vectorization fails is the direct reference
to the loop counter in the inner loop:
         index = j;
After vectorization, the loop counter is no longer available, so
code that accesses is as in your example usually cannot be
automatically vectorized.
As to why -ftree-vectorize still generates different code, that is
probably because the flag actually enables two other optimizations
that are distinct from the vectorizer, but usually enable it to do
a better job:  if-conversion and store-sinking.
I suspect in your test case, if-conversion actually transforms the
if in the inner loop.  However, if the result is then still not
vectorizable, that transformation might happen to be a net loss ...
You can switch off those extra transformations while still enabling
vectorization using something like:
   -ftree-vectorize -fno-tree-if-conversion --param max-stores-to-sink=0
(Note that this might cause some loops that would otherwise have been
vectorized to become non-vectorizable ...)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: O2 optimization with vectorize