All,
In the below code, I tried few compiler options and got following observations:
1) arm-linux-gnueabi-gcc -O2 -mcpu=cortex-a15 -mfpu=neon -ftree-vectorizer-verbose=6 -ftree-vectorize
Compiler throws following info messages:
foo.c:16: note: not vectorized: unsupported use in stmt.
foo.c:16: note: not vectorized: unsupported use in stmt.
foo.c:18: note: not vectorized: unsupported use in stmt.
foo.c:18: note: not vectorized: unsupported use in stmt.
2) -O2 -mcpu=cortex-a15 -mfpu=neon
None of the generated code contains the NEON instructions. Code generated with case 1 is taking 3000 cycles, and code generated by option 2 is taking 2500 cycles.
Even if vectorization failed in case1, it should not generate more inefficient code than case 2. My belief was that the executables from both would take same cycles, any thing done for doing unsuccessful vectorization must be reverted if it did not succeed.
################################################################### #define SIZE1 20 #define SIZE2 26
unsigned int array[SIZE1][SIZE2];
void foo() { unsigned int i,j; unsigned int max = 0;
for(i = 0; i < SIZE1; i++) { for(j = 0; j < SIZE2; j++) { if (array[i][j] > max) { max = array[i][j]; index = j; } } }
printf("Max value: %u Index: %u\n", max, index); }
Regards RKS