Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP

26 Mar 2022


      Halil Pasic pasic@linux.ibm.com writes:
...
On Fri, 25 Mar 2022 11:27:41 +0000
Robin Murphy robin.murphy@arm.com wrote:
...
What muddies the waters a bit is that the opposite combination 
sync_for_cpu(DMA_TO_DEVICE) really *should* always be a no-op, and I for 
one have already made the case for eliding that in code elsewhere, but 
it doesn't necessarily hold for the inverse here, hence why I'm not sure 
there even is a robust common solution for peeking at a live 
DMA_FROM_DEVICE buffer.
In https://lkml.org/lkml/2022/3/24/739 I also argued, that a robust
common solution for a peeking at a live DMA_FROM_DEVICE buffer is
probably not possible, at least not with the current programming model
as described by Documentation/core-api/dma-api.rst.
Namely AFAIU the programming model is based on exclusive ownership: the
buffer is either owned by the device, which means CPU(s) are not allowed
to *access* it, or it is owned by the CPU(s), and the device is not
allowed to *access* it. Do we agree on this?
Considering what Linus said here https://lkml.org/lkml/2022/3/24/775
I understand that: if the idea that dma_sync_*_for_{cpu,device} always
transfers ownership to the cpu and device respectively is abandoned, 
and we re-define ownership in a sense that only the owner may write,
but non-owner is allowed to read, then it may be possible to make the
scenario under discussion work.
The scenario in pseudo code:
/* when invoked device might be doing DMA into buf */
rx_buf_complete(buf)
{
   prepare_peek(buf, DMA_FROM_DEVICE);
        if (!is_ready(buf)) {
                /*let device gain the buffer again*/
                peek_done_not_ready(buf, DMA_FROM_DEVICE);
                return false;
        }
   peek_done_ready(buf, DMA_FROM_DEVICE);
   process_buff(buf, DMA_FROM_DEVICE); is
}
IMHO it is pretty obvious, that prepare_peek() has to update the
cpu copy of the data *without* transferring ownership to the CPU. Since
the owner is still the device, it is legit for the device to keep
modifying the buffer via DMA. In case of the swiotlb, we would copy the
content of the bounce buffer to the orig buffer possibly after
invalidating
caches, and for non-swiotlb we would do invalidate caches. So
prepare_peek() could be actually something like,
dma_sync_single_for_cpu(buf, DMA_FROM_DEVICE,
                        DMA_ATTR_NO_OWNERSHIP_TRANSFER)
which would most end up being functionally the same, as without the
flag, since my guess is that the ownership is only tracked in our
heads.
Well we also need to ensure that the CPU caches are properly invalidated
either in prepare_peek() or peek_done_not_ready(), so that the data is
not cached between subsequent peeks. This could translate to either
turning prepare_peek() into dma_sync_single_for_cpu(buf,
DMA_FROM_DEVICE, DMA_ATTR_NO_OWNERSHIP_TRANSFER_BUT_INVALIDATE_CACHES),
or it could turn peek_done_not_ready() into something that just
invalidates the cache.
I was also toying with the idea of having a copy-based peek helper like:
u32 data = dma_peek_word(buf, offset)
which leaves the ownership as-is, but copies out a single word from the
buffer at the given offset (from the bounce buffer or real buffer as
appropriate) without messing with the ownership notion. The trouble with
this idea is that ath9k reads two different words that are 44 bytes from
each other, so it would have to do two such calls, which would be racy :(
-Toke

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [REGRESSION] Recent swiotlb DMA_FROM_DEVICE fixes break ath9k-based AP