Continued looking at constant reuse optimizations, as a background task. I've fiddled with the costs a bit more to remove false positives.
Continued benchmarking different generic tuning ideas. With each test run taking most of a day this is slow going.
Took Michael's rootfs that is used for all the toolchain testing and benchmarking, unpacked it, and repacked it so that it is compatible with "linaro-media-create", then tested that I could use it to run tests on LAVA successfully. I was hoping to use this for extra benchmarking bandwidth, but there's a permissions problem in the LAVA website software that means it's not yet possible to post private results to the system, so no proprietary benchmarks yet. I can still continue pipe-cleaning my process, and maybe run some benchmarks without actually reporting the results (or perhaps posting them somewhere write-only).
Begun work on adding GCC support for 64-bit shifts with NEON. This is not quite as simple as it ought to be because a) it's inefficient to move a value to NEON registers just to do a shift, so it needs to detect where the value is, and b) right shifts are encoded as left shift by a negative amount, and negative shift amounts are normally considered undefined behaviour.
On Mon, Nov 21, 2011 at 11:50 PM, Andrew Stubbs andrew.stubbs@linaro.org wrote:
Continued looking at constant reuse optimizations, as a background task. I've fiddled with the costs a bit more to remove false positives.
Continued benchmarking different generic tuning ideas. With each test run taking most of a day this is slow going.
Took Michael's rootfs that is used for all the toolchain testing and benchmarking, unpacked it, and repacked it so that it is compatible with "linaro-media-create", then tested that I could use it to run tests on LAVA successfully. I was hoping to use this for extra benchmarking bandwidth, but there's a permissions problem in the LAVA website software that means it's not yet possible to post private results to the system, so no proprietary benchmarks yet. I can still continue pipe-cleaning my process, and maybe run some benchmarks without actually reporting the results (or perhaps posting them somewhere write-only).
We can rsync them to the validation machine as a step and only post a summary, such as the number of regressions or improvements, to LAVA.
Begun work on adding GCC support for 64-bit shifts with NEON. This is not quite as simple as it ought to be because a) it's inefficient to move a value to NEON registers just to do a shift, so it needs to detect where the value is
Is this true for all 64 bit operations? What do the other operations currently do?
and b) right shifts are encoded as left shift by a negative amount, and negative shift amounts are normally considered undefined behaviour.
But the behaviour is defined for NEON, and this only appears at the assembly level - you implement a right shift by constant as the left shift by negative constant in the assembly.
-- Michael
On Mon 21 Nov 2011 21:26:03 GMT, Michael Hope wrote:
Took Michael's rootfs that is used for all the toolchain testing and benchmarking, unpacked it, and repacked it so that it is compatible with "linaro-media-create", then tested that I could use it to run tests on LAVA successfully. I was hoping to use this for extra benchmarking bandwidth, but there's a permissions problem in the LAVA website software that means it's not yet possible to post private results to the system, so no proprietary benchmarks yet. I can still continue pipe-cleaning my process, and maybe run some benchmarks without actually reporting the results (or perhaps posting them somewhere write-only).
We can rsync them to the validation machine as a step and only post a summary, such as the number of regressions or improvements, to LAVA.
Yeah, I've investigated pushing the private results to people.linaro.org, but to do so would require exposing an encryption key, and p.l.o does not permit use of .ssh/authorized_keys so I can't create a restricted one for the purpose. I could upload them to my home server, but that doesn't help anyone else. I could set something up to receive write-only file drops on p.l.o with a non-standard port number (if there's no firewall), but it might be harder to encrypt it, though that might not matter.
Begun work on adding GCC support for 64-bit shifts with NEON. This is not quite as simple as it ought to be because a) it's inefficient to move a value to NEON registers just to do a shift, so it needs to detect where the value is
Is this true for all 64 bit operations? What do the other operations currently do?
Basically, the neon 64-bit integer ops all provide two options: one for neon mode; and one for core-regs mode (that calls the normal 32-bit splitter). The decision which is used is essentially random - there's a marker on the fallback alternatives that slightly disparages that option (all else being equal), but it really just depends on where the register allocator happens to find room for a DImode value. Once the first register has been allocated, the patterns will tend to force the allocator to continue to use that mode for the rest of the algorithm, until it hits something that isn't provided or implemented by neon.
I've found that DImode values that come from function parameters will almost never use neon - they're already allocated in core-regs, so it always prefers that option.
DImode values loaded from memory also tend to be loaded to core-regs in small test cases. I intend to try a few cases where core-regs are less available to see what happens then, but maybe we can do something to alter the allocation algorithm when neon is available.
BTW, is hard float mode, are 64-bit integers passed in core-regs still? I expect so, since hard-float doesn't imply neon, but it would probably be a bonus if they were passed in neon registers.
and b) right shifts are encoded as left shift by a negative amount, and negative shift amounts are normally considered undefined behaviour.
But the behaviour is defined for NEON, and this only appears at the assembly level - you implement a right shift by constant as the left shift by negative constant in the assembly.
I'm not talking about shifts by constants, this is shifts by variable. True, GCC is probably more forgiving in that case because it's less able to reason about them, but if some value range propagation pass can determine that it's negative (now or in future) you never know what might happen.
Andrew
BTW, is hard float mode, are 64-bit integers passed in core-regs still? I expect so, since hard-float doesn't imply neon, but it would probably be a bonus if they were passed in neon registers.
They are still passed in the core registers . It would be worth noting in any case that the VFP and Neon unit share a common register bank. The hard-float ABI addendum doesn't make any assumptions about the presence of a Neon unit but it does assume the presence atleast of 16 DP registers.
and b) right shifts are encoded as left shift by a negative amount, and negative shift amounts are normally considered undefined behaviour.
But the behaviour is defined for NEON, and this only appears at the assembly level - you implement a right shift by constant as the left shift by negative constant in the assembly.
I'm not talking about shifts by constants, this is shifts by variable. True, GCC is probably more forgiving in that case because it's less able to reason about them, but if some value range propagation pass can determine that it's negative (now or in future) you never know what might happen.
Well you should never represent a right shift of a DImode quantity as a neg followed by a left shift in RTL. Instead you ought to be retaining the right shift for as long as possible (until reload) and then split it into a neg followed by an operation that is represented with an UNSPEC.
cheers Ramana
On Tue, Nov 22, 2011 at 11:03 PM, Andrew Stubbs andrew.stubbs@linaro.org wrote:
On Mon 21 Nov 2011 21:26:03 GMT, Michael Hope wrote:
Took Michael's rootfs that is used for all the toolchain testing and benchmarking, unpacked it, and repacked it so that it is compatible with "linaro-media-create", then tested that I could use it to run tests on LAVA successfully. I was hoping to use this for extra benchmarking bandwidth, but there's a permissions problem in the LAVA website software that means it's not yet possible to post private results to the system, so no proprietary benchmarks yet. I can still continue pipe-cleaning my process, and maybe run some benchmarks without actually reporting the results (or perhaps posting them somewhere write-only).
We can rsync them to the validation machine as a step and only post a summary, such as the number of regressions or improvements, to LAVA.
Yeah, I've investigated pushing the private results to people.linaro.org, but to do so would require exposing an encryption key, and p.l.o does not permit use of .ssh/authorized_keys so I can't create a restricted one for the purpose. I could upload them to my home server, but that doesn't help anyone else. I could set something up to receive write-only file drops on p.l.o with a non-standard port number (if there's no firewall), but it might be harder to encrypt it, though that might not matter.
There's a machine on the same network with a role based account. We can push to there and use a password-less key with host authenticatation and anything else that we can come up with to make up for the lack of password :)
Or perhaps a write-only rsyncd would be better?
-- Michael
linaro-toolchain@lists.linaro.org