On Tue, Feb 09, 2021 at 01:28:47PM +0530, Sumit Garg wrote:
Thanks Greg for your response.
On Tue, 9 Feb 2021 at 12:28, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Feb 09, 2021 at 11:39:25AM +0530, Sumit Garg wrote:
Hi Christoph, Greg,
Currently we are observing an incorrect address translation corresponding to DMA direct mapping methods on 5.4 stable kernel while sharing dmabuf from one device to another where both devices have their own coherent DMA memory pools.
What devices have this problem?
The problem is seen with V4L2 device drivers which are currently under development for UniPhier PXs3 Reference Board from Socionext [1].
Ok, so it's not even a driver in the 5.4 kernel today, so there's nothing I can do here as there is no regression of the existing source tree.
Following is brief description of the test framework:
The issue is observed while trying to construct a Gstreamer pipeline leveraging hardware video converter engine (VPE device) and hardware video encode/decode engine (CODEC device) where we use dmabuf framework for Zero-Copy.
Example GStreamer pipeline is: gst-launch-1.0 -v -e videotestsrc \
! video/x-raw, width=480, height=270, format=NV15 \ ! v4l2convert device=/dev/vpe0 capture-io-mode=dmabuf-import \ ! video/x-raw, width=480, height=270, format=NV12 \ ! v4l2h265enc device=/dev/codec0 output-io-mode=dmabuf \ ! video/x-h265, format=byte-stream, width=480, height=270 \ ! filesink location=out.hevc
Using GStreamer's V4L2 plugin,
- v4l2convert controls VPE driver,
- v4l2h265enc controls CODEC driver.
In the above pipeline, VPE driver imports CODEC driver's DMABUF for Zero-Copy.
[1] arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts
And why can't then just use 5.10 to solve this issue as that problem has always been present for them, right?
As the drivers are currently under development and Socionext has chosen 5.4 stable kernel for their development. So I will let Obayashi-san answer this if it's possible for them to migrate to 5.10 instead?
Why pick a kernel that doesn not support the features they require? That seems very odd and unwise.
BTW, this problem belongs to the common code, so others may experience this issue as well.
Then they should move to 5.10 or newer as this just doesn't work on older kernels, right?
I am able to root cause this issue which is caused by incorrect virt to phys translation for addresses belonging to vmalloc space using virt_to_page(). But while looking at the mainline kernel, this patch [1] changes address translation from virt->to->phys to dma->to->phys which fixes the issue observed on 5.4 stable kernel as well (minimal fix [2]).
So I would like to seek your suggestion for backport to stable kernels (5.4 or earlier) as to whether we should backport the complete mainline commit [1] or we should just apply the minimal fix [2]?
Whenever you try to create a "minimal" fix, 90% of the time it is wrong and does not work and I end up having to deal with the mess.
I agree with your concerns for having to apply a non-mainline commit onto a stable kernel.
What prevents you from doing the real thing here? Are the patches to big?
IMHO, yes the mainline patch is big enough to touch multiple architectures. But if that's the only way preferred then I can backport the mainline patch instead.
And again, why not just use 5.10 for this hardware? What hardware is it?
Please see my response above.
If a feature in the kernel was not present on older kernels, trying to shoe-horn it into them is not wise at all. You pick a kernel version to reflect the features/options that you require, and it sounds like 5.4 just will not work for them, so to stick with that would be quite foolish.
Just move to 5.10, much simpler!
thanks,
greg k-h