All,
In the below code, I tried few compiler options and got following observations:
1) arm-linux-gnueabi-gcc -O2 -mcpu=cortex-a15 -mfpu=neon -ftree-vectorizer-verbose=6 -ftree-vectorize
Compiler throws following info messages:
foo.c:16: note: not vectorized: unsupported use in stmt.
foo.c:16: note: not vectorized: unsupported use in stmt.
foo.c:18: note: not vectorized: unsupported use in stmt.
foo.c:18: note: not vectorized: unsupported use in stmt.
2) -O2 -mcpu=cortex-a15 -mfpu=neon
None of the generated code contains the NEON instructions. Code generated with case 1 is taking 3000 cycles, and code generated by option 2 is taking 2500 cycles.
Even if vectorization failed in case1, it should not generate more inefficient code than case 2. My belief was that the executables from both would take same cycles, any thing done for doing unsuccessful vectorization must be reverted if it did not succeed.
################################################################### #define SIZE1 20 #define SIZE2 26
unsigned int array[SIZE1][SIZE2];
void foo() { unsigned int i,j; unsigned int max = 0;
for(i = 0; i < SIZE1; i++) { for(j = 0; j < SIZE2; j++) { if (array[i][j] > max) { max = array[i][j]; index = j; } } }
printf("Max value: %u Index: %u\n", max, index); }
Regards RKS
"Singh, Ravi Kumar (Ravi)" Ravi.Singh@lsi.com wrote:
None of the generated code contains the NEON instructions. Code generated with case 1 is taking 3000 cycles, and code generated by option 2 is taking 2500 cycles.
Even if vectorization failed in case1, it should not generate more inefficient code than case 2. My belief was that the executables from both would take same cycles, any thing done for doing unsuccessful vectorization must be reverted if it did not succeed.
I suspect the reason vectorization fails is the direct reference to the loop counter in the inner loop: index = j;
After vectorization, the loop counter is no longer available, so code that accesses is as in your example usually cannot be automatically vectorized.
As to why -ftree-vectorize still generates different code, that is probably because the flag actually enables two other optimizations that are distinct from the vectorizer, but usually enable it to do a better job: if-conversion and store-sinking.
I suspect in your test case, if-conversion actually transforms the if in the inner loop. However, if the result is then still not vectorizable, that transformation might happen to be a net loss ...
You can switch off those extra transformations while still enabling vectorization using something like: -ftree-vectorize -fno-tree-if-conversion --param max-stores-to-sink=0
(Note that this might cause some loops that would otherwise have been vectorized to become non-vectorizable ...)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
Ulrich,
If I disable extra transformations as suggested by you my cycles increase to 38xx in comparison to -O2 25xx
Regards RKS
-----Original Message----- From: Ulrich Weigand [mailto:Ulrich.Weigand@de.ibm.com] Sent: Wednesday, April 11, 2012 10:15 AM To: Singh, Ravi Kumar (Ravi) Cc: linaro-toolchain@lists.linaro.org Subject: Re: O2 optimization with vectorize
"Singh, Ravi Kumar (Ravi)" Ravi.Singh@lsi.com wrote:
None of the generated code contains the NEON instructions. Code generated with case 1 is taking 3000 cycles, and code generated by option 2 is taking 2500 cycles.
Even if vectorization failed in case1, it should not generate more inefficient code than case 2. My belief was that the executables from both would take same cycles, any thing done for doing unsuccessful vectorization must be reverted if it did not succeed.
I suspect the reason vectorization fails is the direct reference to the loop counter in the inner loop: index = j;
After vectorization, the loop counter is no longer available, so code that accesses is as in your example usually cannot be automatically vectorized.
As to why -ftree-vectorize still generates different code, that is probably because the flag actually enables two other optimizations that are distinct from the vectorizer, but usually enable it to do a better job: if-conversion and store-sinking.
I suspect in your test case, if-conversion actually transforms the if in the inner loop. However, if the result is then still not vectorizable, that transformation might happen to be a net loss ...
You can switch off those extra transformations while still enabling vectorization using something like: -ftree-vectorize -fno-tree-if-conversion --param max-stores-to-sink=0
(Note that this might cause some loops that would otherwise have been vectorized to become non-vectorizable ...)
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
"Singh, Ravi Kumar (Ravi)" Ravi.Singh@lsi.com wrote on 11.04.2012 17:50:53:
If I disable extra transformations as suggested by you my cycles increase to 38xx in comparison to -O2 25xx
Sorry, I misremembered the flag spelling. It should read:
-ftree-vectorize -fno-tree-loop-if-convert --param max-stores-to-sink=0
This gets me identical code to just -O2 on your test case.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
linaro-toolchain@lists.linaro.org