On Tue, Jun 21, 2011 at 4:11 AM, Arnd Bergmann arnd.bergmann@linaro.org wrote:
On Monday 20 June 2011, Nicolas Pitre wrote:
On Mon, 20 Jun 2011, Arnd Bergmann wrote:
On Saturday 18 June 2011, Nicolas Pitre wrote: Well, in case of readl/writel, we need the memory clobber for other reasons: They might trigger a DMA operation, or the readl might be used to wait for completion of a DMA operation. In either case, you need to have a barrier to prevent the reordering of the MMIO instruction w.r.t the DMA memory that it is referring to. On in-order CPUs, this does not always require a synchronisation instruction, but it strill requires a compiler barrier, which is typically expressed as a full memory clobber.
We don't have any of that now though.
We do have __iormb() in there, which is both a CPU and a compiler barrier. Unfortunately, in case of !CONFIG_ARM_DMA_MEM_BUFFERABLE it degrades to no barrier at all, which is a bug.
And I think this would be much more sensible if the barrier was explicit, so not to generate awful code as I demonstrated, in the vast majority of the cases where there is no such issues.
Yes, we can recommend to people to use the _relaxed versions, especially for drivers that don't do any DMA.
And the fact that the inline asm is marked volatile should be enough to prevent reordering already, isn't it?
No. It only makes sure that the asm will get executed, but does not prevent the compiler from reordering other statements around it.
static inline unsigned int readl(volatile unsigned int *a) { unsigned int v; asm volatile ("ldr\t%1, %0" : : "+o" (*a), "=r" (v)); return v; }
The bidirectional qualifier indicates that we're going to modify the memory pointed by a, and yet it is going to be different after that, which pretty much matches how hardware registers behave.
I don't see how the "+" is needed here, because the compiler won't be accessing the pointers otherwise, and doesn't have the option of omitting any accesses either.
It could prevents the compiler from assuming that the memory content at that location is still containing the same data that was just written/read, even if the volatile was omitted (note I'm not suggesting to remove 'volatile').
But the compiler doesn't know teh contents of the pointer anyway, so even if it assumes that it hasn't changed, that won't mean anything at all.
Hi everyone. Sorry for the delay in replying.
To summarise the thread: * The kernel uses writel() and readl() for accessing device registers * These must result in a single memory access * They enforce memory ordering and require the programmer to ensure alignment * The ARM asm/io.h uses C constructs for read and write * The behaviour of __attribute__((packed)) changed in GCC 4.6 causing some work accesses to be split into a byte-by-byte unaligned access * Most other architectures use inline assembly * ARM doesn't use inline assembly as the performance was poor in the past. * Inline assembly in recent GCC is correct and performs well
Based on Nico's reply, I had a play with the different ways of implementing register access. I tried base + offset such as:
void *iobase; #define REG_A 0x10
...and struct-based layout:
struct regs { unsigned int reg_a; unsigned int reg_b; };
and the different ways of holding the register pointer:
void init(void *iobase);
struct device { void *iobase; };
struct device { struct regs *regs; };
The relaxed inline assembly version works well in all cases including handling the limits of the different offset encodings. The strict inline assembly version with a memory barrier is poor where where iobase is inside a struct, but it's easy to work around by taking iobase into a register first such as:
void init(struct device *dev) { struct regs r = dev->regs; writel(0, r->reg_a); writel(42, r->reg_b); ...
I see no reason not to switch writel() and readl() to inline assembly. This gives you control over the otherwise undefined behaviour.
On a separate note, I'm hoping to land the unaligned access patches into next month's Linaro GCC 4.6 release. This would fix the problem on ARMv7 architectures but leave it lurking on earlier ones.
It seems that GCC is currently good enough for the job. The next steps on the kernel side are to audit any __attribute__((packed))'s, check __io{r,w}mb() when CONFIG_ARM_DMA_MEM_BUFFERABLE=n, and to trial changing to inline assembly.
Let me know if we can help.
-- Michael