Hi,
* Worked on peeling problem in eon (#831094). Wrote a patch that checks if the number of vector iterations is going to be more than 2, and disables peeling otherwise. With this patch I see about 1.5% regression with vectorization (and about 7% without it).
* I am thinking to extend the patch for unknown number of iterations by creating a run-time check. The threshold could be set by param. Another option, could be doing it through the cost model, but it's hard to evaluate costs when misalignments are unknown (and, I think, the cost model handles known misalignment properly).
* Disabling peeling for low loop bounds also helps with one of EEMBC benchmarks, for which vectorization with double-words is more beneficial than with quad-words. It turns out that we are able to force the alignment for double-words (and, therefore, avoid peeling), because we check that the required alignment (64 in this case) is less or equal to BIGGEST_ALIGNMENT, where arm.h:#define BIGGEST_ALIGNMENT (ARM_DOUBLEWORD_ALIGN ? DOUBLEWORD_ALIGNMENT : 32) and arm.h:#define DOUBLEWORD_ALIGNMENT 64 So, we can never force alignment for 128 bits on ARM. I wonder if that's a real limitation.
* Proposed three SLP patches to gcc-linaro, and merged two of them.
Ira
On 24 November 2011 15:32, Ira Rosen ira.rosen@linaro.org wrote:
- Disabling peeling for low loop bounds also helps with one of EEMBC
benchmarks, for which vectorization with double-words is more beneficial than with quad-words. It turns out that we are able to force the alignment for double-words (and, therefore, avoid peeling), because we check that the required alignment (64 in this case) is less or equal to BIGGEST_ALIGNMENT, where arm.h:#define BIGGEST_ALIGNMENT (ARM_DOUBLEWORD_ALIGN ? DOUBLEWORD_ALIGNMENT : 32) and arm.h:#define DOUBLEWORD_ALIGNMENT 64 So, we can never force alignment for 128 bits on ARM. I wonder if that's a real limitation.
I am sorry, I was wrong about this. We are trying to align a local variable in this benchmark, therefore, the alignment needs to be not bigger than MAX_STACK_ALIGNMENT (and not MAX_OFILE_ALIGNMENT). From here we have the same story, MAX_STACK_ALIGNMENT is set to STACK_BOUNDARY, and arm.h:#define STACK_BOUNDARY (ARM_DOUBLEWORD_ALIGN ? DOUBLEWORD_ALIGNMENT : 32)
As I learned from Richard, MAX_OFILE_ALIGNMENT is set in elfos.h to (((unsigned int) 1 << 28) * 8), so we have this problem only with stack variables.
Ira
linaro-toolchain@lists.linaro.org