Re: No usb on omap3: 4.6 toolchain issue?

20 Jun 2011


      On Saturday 18 June 2011, Nicolas Pitre wrote:
...
On Fri, 17 Jun 2011, Arnd Bergmann wrote:
...
I played with this a bit.
It looks like much more than a bit, thanks for the detailed analysis!
...
I think that GCC might have improved on this 
front.  It at least doesn't seem to produce the much suboptimal code 
compared to the plain C implementation as it used to do in the past.  
See my explanation and experiments below.
If everything else fails, we can make it depend on the gcc version and
e.g. use the inline assembly with 4.5 and higher, while we use the C code
that used to be fine with older versions.
...
...
The function in question is
#define __raw_writel(v,a) (*(volatile unsigned int __force   *)(a) = (v))
AFAICT, this should really be written as
static inline void __raw_writel(unsigned int v, volatile __iomem unsigned int *a)
{
   asm volatile("str %0, %1" : "=o" (*a) : "r" (v) : "memory");
}
What I don't know is whether this form is ideal in all cases (or even correct),
and what the constraints should be.
The problem here is that the assembly constraint doesn't let us specify 
which type of access this constraint is for.
Let's take the str instruction as an example.  It may use many different 
addressing modes:
str	r0, [r1]	# destination address in r1
   str	r0, [r1, r2]	# base address in r1 with offset in r2
   str	r0, [r1, #<i>]	# base address in r1 with an immediate offset
I believe that they are all part of the "o" constraint, which includes all
offsettable memory addresses.
...
The result is the same as gcc is smart enough to share the common iobase 
and spread the immediate offsets. This also provides better type 
checking.  But last time this was attempted, a large amount of code was 
then complaining due to the fact that sometimes the address is provided 
as a pointer, sometimes as an integer.  this deserves to be cleaned, but 
this should really be orthogonal to this issue and the move to a static 
inline be done first, and separately from the actual access 
implementation.
Good point. A lot of architectures have strict type checking here already,
so I don't think this will be much of a problem.
...
Looks like GCC is doing the right thing here.  
I can't say for sure, but I have a vague recollection of GCC doing it 
wrong in some distant past.  At the time the only thing that seemed to 
be OK in all cases with an inline asm implementation was:
static inline void writeh(unsigned int v, volatile unsigned short  *a)
{
        asm volatile ("strh\t%1, [%0]" : : "r" (a), "r" (v));
}
This would better use the '"=Q" (a)' form, to tell the compiler that
we are also writing into a. We cannot use the "=m" form because that
contains the pre/post-increment variants though.
...
which produced this:
foobar_init:
        mov     r1, #0
        strh    r1, [r0]
        mov     ip, #0
        add     r2, r0, #8
        strh    ip, [r2]
        add     r3, r0, #2048
        strh    ip, [r3]
        mvn     r2, #0
        add     r1, r0, #4
        strh    r2, [r1]
        mvn     r3, #0
        add     r0, r0, #12
        strh    r3, [r0]
        bx      lr
As you can see, this is far less optimal and larger.  And then GCC even 
gets confused wrt liveness of the value to store despite using -O2.  
Hence the implementation using pure C which produced far better code.
Yes, it would be good to avoid this.
...
...
If I understood Uli correctly, it should be possible to express this 
with gcc syntax, but of course there may be a catch somewhere.
Yes.  You don't want to use a memory clobber.  That tells GCC to forget 
everything about what it has cached into registers and forces it to 
reload everything from memory, including the iobase pointer if it comes 
from a struct device's private data.
Well, in case of readl/writel, we need the memory clobber for other reasons:
They might trigger a DMA operation, or the readl might be used to wait for
completion of a DMA operation. In either case, you need to have a barrier
to prevent the reordering of the MMIO instruction w.r.t the DMA memory that
it is referring to. On in-order CPUs, this does not always require a
synchronisation instruction, but it strill requires a compiler barrier,
which is typically expressed as a full memory clobber.
As far as I can tell, we don't need the memory clobber in case of
readl_relaxed()/writel_relaxed(), but we need it for the non-relaxed
versions.
...
As you can see, the iobase value is reloaded all the time needlessly.  
Another recollection of mine is that a while ago GCC was considering 
static inline functions like full memory barriers too, which doesn't 
seem to be the case anymore.
But for the constraints the following should be best:
static inline void writel(unsigned int v, volatile unsigned int *a)
{
        asm volatile ("str\t%1, %0" : : "+o" (*a) : "r" (v));
}
static inline unsigned int readl(volatile unsigned int *a)
{
   unsigned int v;
        asm volatile ("ldr\t%1, %0" : : "+o" (*a), "=r" (v));
   return v;
}
The bidirectional qualifier indicates that we're going to modify the 
memory pointed by a, and yet it is going to be different after that, 
which pretty much matches how hardware registers behave.
I don't see how the "+" is needed here, because the compiler won't
be accessing the pointers otherwise, and doesn't have the option of
omitting any accesses either.
Arnd

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: No usb on omap3: 4.6 toolchain issue?