Hi Ramaan,
On 29 June 2016 at 17:03, Ramana Radhakrishnan Ramana.Radhakrishnan@arm.com wrote:
I'm curious about what workloads / benchmarks you considered for this activity - the traditional spec benchmarks don't really trigger anything in libatomic - so where do we see the improvements or none ?
First, let me precise the purpose of this task which was to evaluate and implement ARMv8.1 support in libatomic, and not to evaluate the performance of ARMv8.1 architecture. Sorry if it wasn't clear in this short weekly format.
Given this objectif, I didn't consider benchmarking for this activity, my plan was to:
1. Verify the support of the new ARMv8.1 atomic instructions in the __atomic builtins 2. Familiarize with libatomic code base and build system and check that the builtins are used. 3. Enable and implement the ifunc version of the lib if needed.
My observations and conclusions are:
1. __atomic builtins already have a full support of the new atomic instructions, and generate cas, swp and ld<op> as needed on data types up to 8 bytes. 2. libatomic uses the atomic builtins proprely, thus building the lib for ARMv8.1 architecture or enabling multilib on AArch64 generates a libatomic which contains the expected code. 3. I don't see any benefits in implementing an Ifunc version of the lib which will decide at runtime to use the LSE version or not, as for the version up to 8bytes they are expanded inline at compile time, and the 16bytes version are the same with or without LSE support. Maybe I miss some use case or lack some background on libatomic usage here, and I'd be happy if you can give me some inputs.
Regarding the 16bytes version, as I said I recently saw that LSE contains a CASP instruction, which might be used to implement a 128bits compare exchange builtin, but if I understand well the discussion in this bugzilla it might be better to wait for a new version of the architecture which contains the proper 128bit instruction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814
Thanks Yvan