Some questions about the gcc __sync intrinsics

List overview All Threads
Download

newer

older

Updating LLVM on Debian/Ubuntu

FreeBSD + LLD Update

Edward Nevill

9 Mar 2016 9 Mar '16

1:02 p.m.

Hi,

I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC 5.2-2015.11-1) and have noticed a difference with the __sync intrinsics.

Here is the simple test case

--- cut here --- int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); } --- cut here ---

Compiling with the stock gcc 5.2 (-S -O3) I get

--------- add_int: .L2: ldaxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 mov w0, w2 ret ---------

Wheras with Linaro gcc 5.2 I get

--------- add_int: .L2: ldxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 dmb ish mov w0, w2 ret ---------

Why the extra (unnecessary?) memory barrier?

Also, is it worthwhile putting a prfm before the ldaxr. EG

add_int: prfm pst1strm, [x1] .L2: ldaxr w2, [x1]

See the following thread

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html

All the best, Ed

Show replies by date

Yvan Roux

9 Mar 9 Mar

1:22 p.m.

Hi Ed,

On 9 March 2016 at 14:02, Edward Nevill edward.nevill@linaro.org wrote:

...

Hi,

I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC 5.2-2015.11-1) and have noticed a difference with the __sync intrinsics.

Here is the simple test case

--- cut here --- int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); } --- cut here ---

Compiling with the stock gcc 5.2 (-S -O3) I get

add_int: .L2: ldaxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 mov w0, w2 ret

Wheras with Linaro gcc 5.2 I get

add_int: .L2: ldxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 dmb ish mov w0, w2 ret

Why the extra (unnecessary?) memory barrier?

This is because Linaro gcc-5-branch is in sync with FSF gcc-5-branch which contains a fix for this PR : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

As explained in the bugzilla and the patch submission the restriction are stonger on __sync builtins than on __atomic ones.

https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html

...

Also, is it worthwhile putting a prfm before the ldaxr. EG

add_int: prfm pst1strm, [x1] .L2: ldaxr w2, [x1]

See the following thread

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html

All the best, Ed _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Edward Nevill

10 Mar 10 Mar

11:28 a.m.

Hi Yvan,

On 9 March 2016 at 13:22, Yvan Roux yvan.roux@linaro.org wrote:

...

Hi Ed,

On 9 March 2016 at 14:02, Edward Nevill edward.nevill@linaro.org wrote:

...
Hi,

Why the extra (unnecessary?) memory barrier?

This is because Linaro gcc-5-branch is in sync with FSF gcc-5-branch which contains a fix for this PR : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

As explained in the bugzilla and the patch submission the restriction are stonger on __sync builtins than on __atomic ones.

https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html

Thanks for that, obviously we should be using the __atomic versions in Java as I dont believe Java requires a memory barrier.

One curiosity, I recompiled for 8.1 LSE using the Linaro gcc 5.2 and I got the following

/usr/local/linare-gcc-5.2/bin/gcc -S -O3 -march=armv8-a+lse test.c

add_int: ldaddal w0, w0, [x1] add w2, w0, w0 mov w0, w2 ret

Why no memory barrier here? As far as I am aware the ldaddal has only acquire and release semantics, it does not implement a full barrier?

All the best, Ed.

Edward Nevill

6:42 p.m.

...

/usr/local/linare-gcc-5.2/bin/gcc -S -O3 -march=armv8-a+lse test.c

add_int: ldaddal w0, w0, [x1] add w2, w0, w0 mov w0, w2 ret

Am I going mad, or does this just return the contents of the memory location * 2.

ldaddal w0, w0, [x1]

Returns the original contents of [x1] in w0.

add w2, w0, w0

doubles it.

mov w0, w2

returns it.

I think __sync_add_and_fetch should return the updated contents.

Here is the test C code again.

int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); }

Regards, Ed.

Yvan Roux

6:52 p.m.

On 10 March 2016 at 19:42, Edward Nevill edward.nevill@linaro.org wrote:

...

...
/usr/local/linare-gcc-5.2/bin/gcc -S -O3 -march=armv8-a+lse test.c

add_int: ldaddal w0, w0, [x1] add w2, w0, w0 mov w0, w2 ret

Am I going mad, or does this just return the contents of the memory location * 2.

ldaddal w0, w0, [x1]

Returns the original contents of [x1] in w0.

add w2, w0, w0

doubles it.

mov w0, w2

returns it.

I think __sync_add_and_fetch should return the updated contents.

Here is the test C code again.

int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); }

Hmm, it is not the code I get with our latest release candidate (the release should be out next week), it gives the code:

add_int: ldaddal w0, w3, [x1] add w2, w3, w0 mov w0, w2 ret

...

Regards, Ed.

Yvan Roux

7:59 p.m.

On 10 March 2016 at 19:52, Yvan Roux yvan.roux@linaro.org wrote:

...

On 10 March 2016 at 19:42, Edward Nevill edward.nevill@linaro.org wrote:

...
...
/usr/local/linare-gcc-5.2/bin/gcc -S -O3 -march=armv8-a+lse test.c

add_int: ldaddal w0, w0, [x1] add w2, w0, w0 mov w0, w2 ret

Am I going mad, or does this just return the contents of the memory location * 2.

ldaddal w0, w0, [x1]

Returns the original contents of [x1] in w0.

add w2, w0, w0

doubles it.

mov w0, w2

returns it.

I think __sync_add_and_fetch should return the updated contents.

Here is the test C code again.

int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); }

Hmm, it is not the code I get with our latest release candidate (the release should be out next week), it gives the code:

add_int: ldaddal w0, w3, [x1] add w2, w3, w0 mov w0, w2 ret

Ed, just for info it was fixed by Andrew in trunk and backported in our branch end of December.

https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01962.html

...

...
Regards, Ed.

Pinski, Andrew

9 Mar 9 Mar

8:10 p.m.

...

Also, is it worthwhile putting a prfm before the ldaxr. EG

There is already a thread upstream about this: https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00508.html

I reject adding this to -mcpu=generic as it will hurt ThunderX more than it will help. Prfm is single issue for ThunderX so it causes an extra cycle for each case. For the kernel, the pfrm really should be patched out for ThunderX; I will propose a patch for that later on. The way ThunderX implements ldxr/stxr is much simpler than say Cortex-a57/a72 because the inner and outer domains are the same aka there is will only be one coherent point, the L2. It is also why ThunderX exposes so many race conditions (well and the timeout for write going to the coherent point is around 1024 cycles if there was no flush).

Thanks, Andrew Pinski

-----Original Message----- From: linaro-toolchain [mailto:linaro-toolchain-bounces@lists.linaro.org] On Behalf Of Edward Nevill Sent: Wednesday, March 9, 2016 5:02 AM To: Linaro Toolchain Mailman List linaro-toolchain@lists.linaro.org Subject: Some questions about the gcc __sync intrinsics

Hi,

I have been comparing the stock gcc 5.2 and the Linaro 5.2 (Linaro GCC 5.2-2015.11-1) and have noticed a difference with the __sync intrinsics.

Here is the simple test case

--- cut here --- int add_int(int add_value, int *dest) { return __sync_add_and_fetch(dest, add_value); } --- cut here ---

Compiling with the stock gcc 5.2 (-S -O3) I get

--------- add_int: .L2: ldaxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 mov w0, w2 ret ---------

Wheras with Linaro gcc 5.2 I get

--------- add_int: .L2: ldxr w2, [x1] add w2, w2, w0 stlxr w3, w2, [x1] cbnz w3, .L2 dmb ish mov w0, w2 ret ---------

Why the extra (unnecessary?) memory barrier?

Also, is it worthwhile putting a prfm before the ldaxr. EG

add_int: prfm pst1strm, [x1] .L2: ldaxr w2, [x1]

See the following thread

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html

All the best, Ed _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain

3521

days inactive

3522

days old

linaro-toolchain@lists.linaro.org

6 comments

participants

tags (0)

participants (3)

Edward Nevill
Pinski, Andrew
Yvan Roux