Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

25 Mar 2022


      Robin Murphy robin.murphy@arm.com writes:
...
On 2022-03-25 16:25, Toke Høiland-Jørgensen wrote:
...
Maxime Bizon mbizon@freebox.fr writes:
...
On Thu, 2022-03-24 at 12:26 -0700, Linus Torvalds wrote:
...
It's actually very natural in that situation to flush the caches from
the CPU side again. And so dma_sync_single_for_device() is a fairly
reasonable thing to do in that situation.
In the non-cache-coherent scenario, and assuming dma_map() did an
initial cache invalidation, you can write this:
rx_buffer_complete_1(buf)
{
   invalidate_cache(buf, size)
   if (!is_ready(buf))
   	return;
   <proceed with receive>
}
or
rx_buffer_complete_2(buf)
{
   if (!is_ready(buf)) {
   	invalidate_cache(buf, size)
   	return;
   }
   <proceed with receive>
}
The latter is preferred for performance because dma_map() did the
initial invalidate.
Of course you could write:
rx_buffer_complete_3(buf)
{
   invalidate_cache(buf, size)
   if
(!is_ready(buf)) {
   	invalidate_cache(buf, size)
   	return;
   }
<proceed with receive>
}
but it's a waste of CPU cycles
So I'd be very cautious assuming sync_for_cpu() and sync_for_device()
are both doing invalidation in existing implementation of arch DMA ops,
implementers may have taken some liberty around DMA-API to avoid
unnecessary cache operation (not to blame them).
I sense an implicit "and the driver can't (or shouldn't) influence
this" here, right?
Right, drivers don't get a choice of how a given DMA API implementation 
works.
...
...
For example looking at arch/arm/mm/dma-mapping.c, for DMA_FROM_DEVICE
sync_single_for_device()
   => __dma_page_cpu_to_dev()
     => dma_cache_maint_page(op=dmac_map_area)
       => cpu_cache.dma_map_area()
sync_single_for_cpu()
   => __dma_page_dev_to_cpu()
     =>
__dma_page_cpu_to_dev(op=dmac_unmap_area)
       =>
cpu_cache.dma_unmap_area()
dma_map_area() always does cache invalidate.
But for a couple of CPU variant, dma_unmap_area() is a noop, so
sync_for_cpu() does nothing.
Toke's patch will break ath9k on those platforms (mostly silent
breakage, rx corruption leading to bad performance)
Okay, so that would be bad obviously. So if I'm reading you correctly
(cf my question above), we can't fix this properly from the driver side,
and we should go with the partial SWIOTLB revert instead?
Do you have any other way of telling if DMA is idle, or temporarily
pausing it before the sync_for_cpu, such that you could honour the
notion of ownership transfer properly?
I'll go check with someone who has a better grasp of how the hardware
works, but I don't think so...
...
As mentioned elsewhere I suspect the only "real" fix if you really do
need to allow concurrent access is to use the coherent DMA API for
buffers rather than streaming mappings, but that's obviously some far
more significant surgery.
That would imply copying the packet data out of that (persistent)
coherent mapping each time we do a recv operation, though, right? That
would be quite a performance hit...
If all we need is a way to make dma_sync_single_for_cpu() guarantee a
cache invalidation, why can't we just add a separate version that does
that (dma_sync_single_for_cpu_peek() or something)? Using that with the
patch I posted earlier should be enough to resolve the issue, AFAICT?
-Toke

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP