On Wednesday, November 24, 2010 8:29:35 pm Peter Maydell wrote:
This wiki page came up during the toolchain call: https://wiki.linaro.org/Internal/People/KenWerner/AtomicMemoryOperations/
It gives the code generated for __sync_val_compare_and_swap as including a push {r4} / pop {r4} pair because it uses too many temporaries to fit them all in callee-saves registers. I think you can tweak it a bit to get rid of that:
# int __sync_val_compare_and_swap (int *mem, int old, int new); # if the current value of *mem is old, then write new into *mem # r0: mem, r1 old, r2 new mov r3, r0 # move r0 into r3 dmb sy # full memory barrier .LSYT7: ldrex r0, [r3] # load (exclusive) from memory pointed to by r3 into r0 cmp r0, r1 # compare contents of r0 (mem) with r1 (old) -> updates the condition flag bne .LSYB7 # branch to LSYB7 if mem != old # This strex trashes the r0 we just loaded, but since we didn't take # the branch we know that r0 == r1 strex r0, r2, [r3] # store r2 (new) into memory pointed to by r3 (mem) # r0 contains 0 if the store was successful, otherwise 1 teq r0, #0 # compares contents of r0 with zero -> updates the condition flag bne .LSYT7 # branch to LSYT7 if r0 != 0 (if the store wasn't successful) # Move the value that was in memory into the right register to return it mov r0, r1 dmb sy # full memory barrier .LSYB7: bx lr # return
I think you can do a similar trick with __sync_fetch_and_add (although you have to use a subtract to regenerate r0 from r1 and r2).
On the other hand I just looked at the gcc code that does this and it's not simply dumping canned sequences out to the assembler, so maybe it's not worth the effort just to drop a stack push/pop.
Hi,
Attached is a small GCC patch that attempts to optimize the __sync_* builtins as described above. Since "or" and "(n)and" are non-reversible the corresponding builtins still need the push/pop instructions. Any suggestions or comments are welcome.
Regards Ken