Hi,
in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement:
typedef struct { long unsigned int v1; long unsigned int v2; } mtype;
int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status;
do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0);
if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1; }
The generated assembly is:
.L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxp x2, x0, x1, [x3] cbnz w2, .L7
and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ?
Many thanks Yvan
-----Original Message----- From: linaro-toolchain-bounces@lists.linaro.org [mailto:linaro- toolchain-bounces@lists.linaro.org] On Behalf Of Yvan Roux Sent: 21 February 2013 15:54 To: linaro-toolchain@lists.linaro.org Subject: AArch64 asm statement question
Hi,
in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement:
typedef struct { long unsigned int v1; long unsigned int v2; } mtype;
int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status;
do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0);
if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1; }
The generated assembly is:
.L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxp x2, x0, x1, [x3] cbnz w2, .L7
and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ?
IIRC it's just printed out with %w or the equivalent in the punctuation character on AArch64. There is no need for a separate constraint for the w registers as W2 is the low half of X2 in the AArch64 there so if w2 is written the upper half of the x2 register is automatically zero'd out.
So stxp %w0, %2, %3, %1 in your inline asm or look out for how this is printed in the equivalent sync pattern. I'd look in iterators.md for some of the attributes to confirm this.
HTH Ramana
Many thanks Yvan
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
So stxp %w0, %2, %3, %1 in your inline asm or look out for how this is printed in the equivalent sync pattern. I'd look in iterators.md for some of the attributes to confirm this.
Hey, thanks Ramana that's it :)
just one more thing, there is a special internal constraint "Ump" on memory address for load/store pair operations, so is the Q constraint I used sufficient ?
Thanks Yvan
On 21/02/13 15:54, Yvan Roux wrote:
Hi,
in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement:
typedef struct { long unsigned int v1; long unsigned int v2; } mtype;
int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status;
do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0);
if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1; }
The generated assembly is:
.L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxp x2, x0, x1, [x3] cbnz w2, .L7
and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ?
You may already be aware of this, but like AArch32, the architecture restricts the use of load and store operations that are permitted between LDXP and STXP, which essentially means that any ASM block that uses LDXP must also contain the matching STXP that depends on it. If you don't do this the compiler may introduce random load/store operations (eg spills/reloads) that will kill your exclusive access and make the code unable to proceed.
R.
Hi Richard,
thanks for the reminding, my previous example was just an attempt to find the good asm statement constraints to generate a correct ldxp instruction. My real objective is to implement 128-bit single-copy atomic load/store and to do this I use ldxp without any matching stxp for the atomic_load :
__asm__ __volatile__( " ldxp %0, %1, [%2]" : "=&r" (res.v1), "=&r" (res._v2) : "r" (addr) );
and a "fake" ldxp with the "real" stxp for the atomic_store:
do { __asm__ __volatile__( " ldxp %0, %1, %3\n" " stxp %w2, %4, %5, %3" : "=&r" (fake_val.v1), "=&r" (fake_val.v2), "=&r" (status), "+Q" (*addr) : "r" (value.v1), "r" (value.v2) ); } while (status);
do you think that it is the right way to do it ?
Thanks Yvan
On 21 February 2013 19:31, Richard Earnshaw rearnsha@arm.com wrote:
On 21/02/13 15:54, Yvan Roux wrote:
Hi,
in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement:
typedef struct { long unsigned int v1; long unsigned int v2; } mtype;
int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status;
do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0);
if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1; }
The generated assembly is:
.L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxp x2, x0, x1, [x3] cbnz w2, .L7
and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ?
You may already be aware of this, but like AArch32, the architecture restricts the use of load and store operations that are permitted between LDXP and STXP, which essentially means that any ASM block that uses LDXP must also contain the matching STXP that depends on it. If you don't do this the compiler may introduce random load/store operations (eg spills/reloads) that will kill your exclusive access and make the code unable to proceed.
R.
On 22/02/13 09:54, Yvan Roux wrote:
Hi Richard,
thanks for the reminding, my previous example was just an attempt to find the good asm statement constraints to generate a correct ldxp instruction. My real objective is to implement 128-bit single-copy atomic load/store and to do this I use ldxp without any matching stxp for the atomic_load :
__asm__ __volatile__( " ldxp %0, %1, [%2]" : "=&r" (res.v1), "=&r" (res._v2) : "r" (addr) );
and a "fake" ldxp with the "real" stxp for the atomic_store:
do { __asm__ __volatile__( " ldxp %0, %1, %3\n" " stxp %w2, %4, %5, %3" : "=&r" (fake_val.v1), "=&r" (fake_val.v2), "=&r" (status), "+Q"
(*addr) : "r" (value.v1), "r" (value.v2) ); } while (status);
do you think that it is the right way to do it ?
Sadly, no. Only the STXP instruction is single-copy atomic. To get atomicity on a read you need to repeat the sequence
LDXP Xn, Xm, [addr] STXP Ws, Xn, Xm, [addr]
until the store succeeds.
However, this isn't needed on 32-bit LDXP sequences.
R.
Thanks Yvan
On 21 February 2013 19:31, Richard Earnshaw rearnsha@arm.com wrote:
On 21/02/13 15:54, Yvan Roux wrote:
Hi,
in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement:
typedef struct { long unsigned int v1; long unsigned int v2; } mtype;
int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status;
do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0); if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1;
}
The generated assembly is:
.L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxp x2, x0, x1, [x3] cbnz w2, .L7
and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ?
You may already be aware of this, but like AArch32, the architecture restricts the use of load and store operations that are permitted between LDXP and STXP, which essentially means that any ASM block that uses LDXP must also contain the matching STXP that depends on it. If you don't do this the compiler may introduce random load/store operations (eg spills/reloads) that will kill your exclusive access and make the code unable to proceed.
R.
linaro-toolchain@lists.linaro.org