Thank you,

with -ftls-model=local-exec gcc emits code that calculates offset at link time:

    threadedVar = 0xDEAD;
   1125c:    d53bd041     mrs    x1, tpidr_el0
   11260:    91401020     add    x0, x1, #0x4, lsl #12
   11264:    9105a000     add    x0, x0, #0x168
   11268:    529bd5a2     movz    w2, #0xdead
   1126c:    b9000002     str    w2, [x0]






On Sun, Jul 14, 2013 at 10:30 AM, Pinski, Andrew <Andrew.Pinski@caviumnetworks.com> wrote:
Yes don't compile with -fPIC or compile with -ftls-model=local-exec .

Thanks,
Andrew Pinski

From: linaro-toolchain-bounces@lists.linaro.org <linaro-toolchain-bounces@lists.linaro.org> on behalf of Vitali Sokhin <vitali.sokhin@gmail.com>
Sent: Sunday, July 14, 2013 12:21 AM
To: linaro-toolchain@lists.linaro.org
Subject: How to make gcc generate optimized code for statically linked TLS
 
Hello,

I use gcc-linaro-aarch64-linux-gnu-4.8 to compile my C code with thread-local variables.

Here is an example of my C code:

__thread u32 threadedVar;
void test(void)
{
    threadedVar = 0xDEAD;
}

gcc produces the following assembly to access my threaded variable:

    threadedVar = 0xDEAD;
    72b0:       d00000c0        adrp    x0, 21000
    72b4:       f945ac00        ldr     x0, [x0,#2904]
    72b8:       d503201f        nop
    72bc:       d503201f        nop
    72c0:       d53bd041        mrs     x1, tpidr_el0
    72c4:       529bd5a2        movz    w2, #0xdead
    72c8:       b8206822        str     w2, [x1,x0]


This assembly fits dynamically linked code, but in my case I have statically linked application that does not load any additional modules.
Since I have exactly one TLS block containing all thread-local variable gcc should be able to calculate the offset at link time.

Can I make gcc to produce the following assembly ?

threadedVar = 0xDEAD;
    72c0:       d53bd041        mrs     x1, tpidr_el0
    72c4:       529bd5a2        movz    w2, #0xdead
    72c8:       b8206822        str     w2, [x1,#offset_to_threadedVar]



Thank you,
          Vitali