Revital Eres revital.eres@linaro.org writes:
Another issue is related to the regression I saw with SMS in libav's dsputil-ssd_int8_vs_int16_c. Consulting with Ayal regarding this it seemed that the regression was due to dependence between accumulations that can be avoided, more specifically we had the following case in vector code:
vec1 = vec1 + ... ... vec1 = vec1+ ... ... vec1 = vec1+ ... ... vec1 = vec1+...
to resolve this, I implemented a hack similar to MVE optimiation in the loop-unroller as follows:
vec1 = vec1 + ... ... vec2 = vec2+ ... ... vec3 = vec3+ ... ... vec4 = vec4+...
While I agree that's a useful transformation, do you have a few more details about the SMS regression? I assume both the non-SMS and SMS loops use the:
vec1 = vec1 + ... ... vec1 = vec1+ ... ... vec1 = vec1+ ... ... vec1 = vec1+...
chain, so what makes the SMS version of it worse than the non-SMS version?
Richard