Following on from last night's performance call, I had a look at how 64 bit integer operations are mapped to NEON instructions. The summary is:
* add - fine * subtract - fine * bitwise and - fine * bitwise or - fine * bitwise xor - fine * multiply - can't as the instruction tops out at 32 bits. Might be able to compose using VMLAL * div, mod - no instruction * negate - instruction tops out at 32 bits, but could be turned into vmov #0, vsub * left shift constant - missing * right shift constant - missing * right arithmetic shift constant - missing * left shift register - missing * right shift register - tricky, as you do this as a left shift -register * not - no instruction, but could be done through a vceq, #0? * bitwise not - missing
I also noticed that the replicated constants aren't being used. A pre-increment is load constant pool; vadd but could be done as a vmov, #-1; vsub. The same with pre-decrement - it could be done as a vmov, #-1; vadd.
This seems worth blueprinting.
-- Michael
...and here's the missing test case.
-- Michael
On Wed, Oct 19, 2011 at 11:38 AM, Michael Hope michael.hope@linaro.org wrote:
Following on from last night's performance call, I had a look at how 64 bit integer operations are mapped to NEON instructions. The summary is:
* add - fine * subtract - fine * bitwise and - fine * bitwise or - fine * bitwise xor - fine * multiply - can't as the instruction tops out at 32 bits. Might be able to compose using VMLAL * div, mod - no instruction * negate - instruction tops out at 32 bits, but could be turned into vmov #0, vsub * left shift constant - missing * right shift constant - missing * right arithmetic shift constant - missing * left shift register - missing * right shift register - tricky, as you do this as a left shift -register * not - no instruction, but could be done through a vceq, #0? * bitwise not - missing
I also noticed that the replicated constants aren't being used. A pre-increment is load constant pool; vadd but could be done as a vmov, #-1; vsub. The same with pre-decrement - it could be done as a vmov, #-1; vadd.
This seems worth blueprinting.
-- Michael
linaro-toolchain@lists.linaro.org