Hi Andy,

Am 21.06.22 um 12:17 schrieb Andy.Hsieh:
On 2/16/21 4:39 AM, Nicolas Dufresne wrote:
> Le lundi 15 février 2021 à 09:58 +0100, Christian König a écrit :
>> Hi guys,
>>
>> we are currently working an Freesync and direct scan out from system 
>> memory on AMD APUs in A+A laptops.
>>
>> On problem we stumbled over is that our display hardware needs to scan 
>> out from uncached system memory and we currently don't have a way to 
>> communicate that through DMA-buf.
>>
>> For our specific use case at hand we are going to implement something 
>> driver specific, but the question is should we have something more 
>> generic for this?
> 
> Hopefully I'm getting this right, but this makes me think of a long standing
> issue I've met with Intel DRM and UVC driver. If I let the UVC driver allocate
> the buffer, and import the resulting DMABuf (cacheable memory written with a cpu
> copy in the kernel) into DRM, we can see cache artifact being displayed. While
> if I use the DRM driver memory (dumb buffer in that case) it's clean because
> there is a driver specific solution to that.
> 
> There is no obvious way for userspace application to know what's is right/wrong
> way and in fact it feels like the kernel could solve this somehow without having
> to inform userspace (perhaps).
> 
>>
>> After all the system memory access pattern is a PCIe extension and as 
>> such something generic.
>>
>> Regards,
>> Christian.
> 
> 

Hi All,

We also encountered the UVC cache issue on ARMv8 CPU in Mediatek SoC when
using UVC dmabuf-export and feeding the dmabuf to the DRM display by the
following GStreamer command:

# gst-launch-1.0 v4l2src device=/dev/video0 io-mode=dmabuf ! kmssink

UVC driver uses videobuf2-vmalloc to allocate buffers and is able to export
them as dmabuf. But UVC uses memcpy() to fill the frame buffer by CPU without
flushing the cache. So if the display hardware directly uses the buffer, the
image shown on the screen will be dirty.

Here are some experiments:

1. By doing some memory operations (e.g. devmem) when streaming the UVC,
   the issue is mitigated. I guess the cache is swapped rapidly.
2. By replacing the memcpy() with memcpy_flushcache() in the UVC driver,
   the issue disappears.
3. By adding .finish callback in videobuf2-vmalloc.c to flush the cache
   before returning the buffer, the issue disappears.

It seems to lack a cache flush stage in either UVC or Display. We may also
need communication between the producer and consumer. Then, they can decide
who is responsible for the flushing to avoid flushing cache unconditionally
leading to the performance impact.

Well, that's not what this mail thread was all about.

The issue you are facing is that somebody is forgetting to flush caches, but the issue discussed in this thread here is that we have hardware which bypasses caches altogether.

As far as I can see in your case UVC just allocates normal cached system memory through videobuf2-vmalloc() and it is perfectly valid to fill that using memcpy().

If some hardware then accesses those buffers bypassing CPU caches then it is the responsibility of the importing driver and/or DMA subsystem to flush the caches accordingly.

Regards,
Christian.


Regards,
Andy Hsieh

************* MEDIATEK Confidentiality Notice ********************
The information contained in this e-mail message (including any 
attachments) may be confidential, proprietary, privileged, or otherwise
exempt from disclosure under applicable laws. It is intended to be 
conveyed only to the designated recipient(s). Any use, dissemination, 
distribution, printing, retaining or copying of this e-mail (including its 
attachments) by unintended recipient(s) is strictly prohibited and may 
be unlawful. If you are not an intended recipient of this e-mail, or believe 
that you have received this e-mail in error, please notify the sender 
immediately (by replying to this e-mail), delete any and all copies of 
this e-mail (including any attachments) from your system, and do not
disclose the content of this e-mail to any other person. Thank you!