Re: No usb on omap3: 4.6 toolchain issue?

1 Jul 2011


      On Tue, Jun 21, 2011 at 4:11 AM, Arnd Bergmann arnd.bergmann@linaro.org wrote:
...
On Monday 20 June 2011, Nicolas Pitre wrote:
...
On Mon, 20 Jun 2011, Arnd Bergmann wrote:
...
On Saturday 18 June 2011, Nicolas Pitre wrote:
Well, in case of readl/writel, we need the memory clobber for other reasons:
They might trigger a DMA operation, or the readl might be used to wait for
completion of a DMA operation. In either case, you need to have a barrier
to prevent the reordering of the MMIO instruction w.r.t the DMA memory that
it is referring to. On in-order CPUs, this does not always require a
synchronisation instruction, but it strill requires a compiler barrier,
which is typically expressed as a full memory clobber.
We don't have any of that now though.
We do have __iormb() in there, which is both a CPU and a compiler barrier.
Unfortunately, in case of !CONFIG_ARM_DMA_MEM_BUFFERABLE it degrades to
no barrier at all, which is a bug.
...
And I think this would be much more sensible if the barrier was
explicit, so not to generate awful code as I demonstrated, in the vast
majority of the cases where there is no such issues.
Yes, we can recommend to people to use the _relaxed versions, especially
for drivers that don't do any DMA.
...
And the fact that the inline asm is marked volatile should be enough to
prevent reordering already, isn't it?
No. It only makes sure that the asm will get executed, but does not
prevent the compiler from reordering other statements around it.
...
...
...
static inline unsigned int readl(volatile unsigned int *a)
{
  unsigned int v;
        asm volatile ("ldr\t%1, %0" : : "+o" (*a), "=r" (v));
  return v;
}
The bidirectional qualifier indicates that we're going to modify the
memory pointed by a, and yet it is going to be different after that,
which pretty much matches how hardware registers behave.
I don't see how the "+" is needed here, because the compiler won't
be accessing the pointers otherwise, and doesn't have the option of
omitting any accesses either.
It could prevents the compiler from assuming that the memory content at
that location is still containing the same data that was just
written/read, even if the volatile was omitted (note I'm not suggesting
to remove 'volatile').
But the compiler doesn't know teh contents of the pointer anyway, so
even if it assumes that it hasn't changed, that won't mean anything at
all.
Hi everyone.  Sorry for the delay in replying.
To summarise the thread:
 * The kernel uses writel() and readl() for accessing device registers
 * These must result in a single memory access
 * They enforce memory ordering and require the programmer to ensure alignment
 * The ARM asm/io.h uses C constructs for read and write
 * The behaviour of __attribute__((packed)) changed in GCC 4.6 causing
some work accesses to be split into a byte-by-byte unaligned access
 * Most other architectures use inline assembly
 * ARM doesn't use inline assembly as the performance was poor in the past.
 * Inline assembly in recent GCC is correct and performs well
Based on Nico's reply, I had a play with the different ways of
implementing register access.  I tried base + offset such as:
void *iobase;
#define REG_A  0x10
...and struct-based layout:
struct regs {
    unsigned int reg_a;
    unsigned int reg_b;
};
and the different ways of holding the register pointer:
void init(void *iobase);
struct device {
   void *iobase;
};
struct device {
  struct regs *regs;
};
The relaxed inline assembly version works well in all cases including
handling the limits of the different offset encodings.  The strict
inline assembly version with a memory barrier is poor where where
iobase is inside a struct, but it's easy to work around by taking
iobase into a register first such as:
void init(struct device *dev)
{
   struct regs r = dev->regs;
   writel(0, r->reg_a);
   writel(42, r->reg_b);
  ...
I see no reason not to switch writel() and readl() to inline assembly.
 This gives you control over the otherwise undefined behaviour.
On a separate note, I'm hoping to land the unaligned access patches
into next month's Linaro GCC 4.6 release.  This would fix the problem
on ARMv7 architectures but leave it lurking on earlier ones.
It seems that GCC is currently good enough for the job.  The next
steps on the kernel side are to audit any __attribute__((packed))'s,
check __io{r,w}mb() when CONFIG_ARM_DMA_MEM_BUFFERABLE=n, and to trial
changing to inline assembly.
Let me know if we can help.
-- Michael

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: No usb on omap3: 4.6 toolchain issue?