Hi Nicolas,
On Wed, May 15, 2024 at 01:43:58PM -0400, nicolas.dufresne(a)collabora.corp-partner.google.com wrote:
> Le mardi 14 mai 2024 à 23:42 +0300, Laurent Pinchart a écrit :
> > > You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
> > > only offer allocation for render buffers and most of them are missing allocators
> > > for YUV buffers, even though they can import in these formats. (kms allocators,
> > > except dumb, which has other issues, are format aware).
> >
> > My experience on Arm platforms is that the KMS drivers offer allocation
> > for scanout buffers, not render buffers, and mostly using the dumb
> > allocator API. If the KMS device can scan out YUV natively, YUV buffer
> > allocation should be supported. Am I missing something here ?
>
> There is two APIs, Dumb is the legacy allocation API, only used by display
Is it legacy only ? I understand the dumb buffers API to be officially
supported, to allocate scanout buffers suitable for software rendering.
> drivers indeed, and the API does not include a pixel format or a modifier. The
> allocation of YUV buffer has been made through a small hack,
>
> bpp = number of bits per component (of luma plane if multiple planes)
> width = width
> height = height * X
>
> Where X will vary, "3 / 2" is used for 420 subsampling, "2" for 422 and "3" for
> 444. It is far from idea, requires deep knowledge of each formats in the
> application
I'm not sure I see that as an issue, but our experiences and uses cases
may vary :-)
> and cannot allocate each planes seperatly.
For semi-planar or planar formats, unless I'm mistaken, you can either
allocate a single buffer and use it with appropriate offsets when
constructing your framebuffer (with DRM_IOCTL_MODE_ADDFB2), or allocate
one buffer per plane.
> The second is to use the driver specific allocation API. This is then abstracted
> by GBM. This allows allocating render buffers with notably modifiers and/or use
> cases. But no support for YUV formats or multi-planar formats.
GBM is the way to go for render buffers indeed. It has been designed
with only graphics buffer management use cases in mind, so it's
unfortunately not an option as a generic allocator, at least in its
current form.
--
Regards,
Laurent Pinchart
On Mon, May 13, 2024 at 11:10:00AM -0400, Nicolas Dufresne wrote:
> Le lundi 13 mai 2024 à 11:34 +0300, Laurent Pinchart a écrit :
> > On Mon, May 13, 2024 at 10:29:22AM +0200, Maxime Ripard wrote:
> > > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > > Hi,
> > > > >
> > > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > > until we can do better.
> > > > >
> > > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > > memfd + udmabuf instead (which is accounted already).
> > > > >
> > > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > > any cases distro needs to take action to make the softISP works. This
> > > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > > safer memfd+udmabuf for the distro with security concerns.
> > > > >
> > > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > > find common set of helpers to fix these exporters.
> > > >
> > > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > >
> > > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > > adding the right accounting and then either let mlock rlimits or cgroups
> > > > kernel memory limits enforce good behavior.
> > >
> > > I think the main drawback with memfd is that it'll be broken for devices
> > > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > > definitely not for codecs and display engines.
> >
> > If the application wants to share buffers between the camera and a
> > display engine or codec, it should arguably not use the libcamera
> > FrameBufferAllocator, but allocate the buffers from the display or the
> > encoder. memfd wouldn't be used in that case.
> >
> > We need to eat our own dogfood though. If we want to push the
> > responsibility for buffer allocation in the buffer sharing case to the
> > application, we need to modify the cam application to do so when using
> > the KMS backend.
>
> Agreed, and the new dmabuf feedback on wayland can also be used on top of this.
>
> You'll hit the same limitation as we hit in GStreamer, which is that KMS driver
> only offer allocation for render buffers and most of them are missing allocators
> for YUV buffers, even though they can import in these formats. (kms allocators,
> except dumb, which has other issues, are format aware).
My experience on Arm platforms is that the KMS drivers offer allocation
for scanout buffers, not render buffers, and mostly using the dumb
allocator API. If the KMS device can scan out YUV natively, YUV buffer
allocation should be supported. Am I missing something here ?
--
Regards,
Laurent Pinchart
The purpose of this patchset is for MediaTek secure video playback, and
also to enable other potential uses of this in the future. The 'restricted
dma-heap' will be used to allocate dma_buf objects that reference memory
in the secure world that is inaccessible/unmappable by the non-secure
(i.e. kernel/userspace) world. That memory will be used by the secure/
trusted world to store secure information (i.e. decrypted media content).
The dma_bufs allocated from the kernel will be passed to V4L2 for video
decoding (as input and output). They will also be used by the drm
system for rendering of the content.
This patchset adds two MediaTek restricted heaps and they will be used in
v4l2[1] and drm[2].
1) restricted_mtk_cm: secure chunk memory for MediaTek SVP (Secure Video
Path). The buffer is reserved for the secure world after bootup and it
is used for vcodec's ES/working buffer;
2) restricted_mtk_cma: secure CMA memory for MediaTek SVP. This buffer is
dynamically reserved for the secure world and will be got when we start
playing secure videos. Once the security video playing is complete, the
CMA will be released. This heap is used for the vcodec's frame buffer.
[1] https://lore.kernel.org/linux-mediatek/20231206081538.17056-1-yunfei.dong@m…
[2] https://lore.kernel.org/all/20231223182932.27683-1-jason-jh.lin@mediatek.co…
Change note:
v4: 1) Rename the heap name from "secure" to "restricted". suggested from
Simon/Pekka. There are still several "secure" string in MTK file
since we use ARM platform in which we call this "secure world"/
"secure command".
v3: https://lore.kernel.org/linux-mediatek/20231212024607.3681-1-yong.wu@mediat…
1) Separate the secure heap to a common file(secure_heap.c) and mtk
special file (secure_heap_mtk.c), and put all the tee related code
into our special file.
2) About dt-binding, Add "mediatek," prefix since this is Mediatek TEE
firmware definition.
3) Remove the normal CMA heap which is a draft for qcom.
Rebase on v6.7-rc1.
v2: https://lore.kernel.org/linux-mediatek/20231111111559.8218-1-yong.wu@mediat…
1) Move John's patches into the vcodec patchset since they use the new
dma heap interface directly.
https://lore.kernel.org/linux-mediatek/20231106120423.23364-1-yunfei.dong@m…
2) Reword the dt-binding description.
3) Rename the heap name from mtk_svp to secure_mtk_cm.
This means the current vcodec/DRM upstream code doesn't match this.
4) Add a normal CMA heap. currently it should be a draft version.
5) Regarding the UUID, I still use hard code, but put it in a private
data which allow the others could set their own UUID. What's more, UUID
is necessary for the session with TEE. If we don't have it, we can't
communicate with the TEE, including the get_uuid interface, which tries
to make uuid more generic, not working. If there is other way to make
UUID more general, please free to tell me.
v1: https://lore.kernel.org/linux-mediatek/20230911023038.30649-1-yong.wu@media…
Base on v6.6-rc1.
Yong Wu (7):
dt-bindings: reserved-memory: Add mediatek,dynamic-restricted-region
dma-buf: heaps: Initialize a restricted heap
dma-buf: heaps: restricted_heap: Add private heap ops
dma-buf: heaps: restricted_heap: Add dma_ops
dma-buf: heaps: restricted_heap: Add MediaTek restricted heap and
heap_init
dma-buf: heaps: restricted_heap_mtk: Add TEE memory service call
dma_buf: heaps: restricted_heap_mtk: Add a new CMA heap
.../mediatek,dynamic-restricted-region.yaml | 43 +++
drivers/dma-buf/heaps/Kconfig | 16 +
drivers/dma-buf/heaps/Makefile | 4 +-
drivers/dma-buf/heaps/restricted_heap.c | 237 +++++++++++++
drivers/dma-buf/heaps/restricted_heap.h | 43 +++
drivers/dma-buf/heaps/restricted_heap_mtk.c | 322 ++++++++++++++++++
6 files changed, 664 insertions(+), 1 deletion(-)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/mediatek,dynamic-restricted-region.yaml
create mode 100644 drivers/dma-buf/heaps/restricted_heap.c
create mode 100644 drivers/dma-buf/heaps/restricted_heap.h
create mode 100644 drivers/dma-buf/heaps/restricted_heap_mtk.c
--
2.18.0
On Mon, May 13, 2024 at 11:06:24AM -0400, Nicolas Dufresne wrote:
> Le lundi 13 mai 2024 à 15:51 +0200, Maxime Ripard a écrit :
> > On Mon, May 13, 2024 at 09:42:00AM -0400, Nicolas Dufresne wrote:
> > > Le lundi 13 mai 2024 à 10:29 +0200, Maxime Ripard a écrit :
> > > > On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > > > > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > > > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > > > > Shorter term, we have a problem to solve, and the best option we have
> > > > > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > > > > buffer allocatro helper in libcamera for the use case described above.
> > > > > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > > > > until we can do better.
> > > > > >
> > > > > > Considering the security concerned raised on this thread with dmabuf heap
> > > > > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > > > > memfd + udmabuf instead (which is accounted already).
> > > > > >
> > > > > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > > > > any cases distro needs to take action to make the softISP works. This
> > > > > > alternative is easy and does not interfere in anyway with your future plan or
> > > > > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > > > > safer memfd+udmabuf for the distro with security concerns.
> > > > > >
> > > > > > And for the long term plan, we can certainly get closer by fixing that issue
> > > > > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > > > > find common set of helpers to fix these exporters.
> > > > >
> > > > > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > > > > about to suggest. Not just as a stopgap, but as the real official thing.
> > > > >
> > > > > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > > > > adding the right accounting and then either let mlock rlimits or cgroups
> > > > > kernel memory limits enforce good behavior.
> > > >
> > > > I think the main drawback with memfd is that it'll be broken for devices
> > > > without an IOMMU, and while you said that it's uncommon for GPUs, it's
> > > > definitely not for codecs and display engines.
> > >
> > > In the context of libcamera, the allocation and the alignment done to the video
> > > frame is done completely blindly. In that context, there is a lot more then just
> > > the allocation type that can go wrong and will lead to a memory copy. The upside
> > > of memfd, is that the read cache will help speeding up the copies if they are
> > > needed.
> >
> > dma-heaps provide cacheable buffers too...
>
> Yes, and why we have cache hints in V4L2 now. There is no clue that softISP code
> can read to make the right call. The required cache management in undefined
> until all the importer are known. I also don't think heaps currently care to
> adapt the dmabuf sync behaviour based on the different importers, or the
> addition of a new importer. On top of which, there is insufficient information
> on the device to really deduce what is needed.
>
> > > Another important point is that this is only used if the application haven't
> > > provided frames. If your embedded application is non-generic, and you have
> > > permissions to access the right heap, the application can solve your specific
> > > issue. But in the generic Linux space, Linux kernel API are just insufficient
> > > for the "just work" scenario.
> >
> > ... but they also provide semantics around the memory buffers that no
> > other allocation API do. There's at least the mediatek secure playback
> > series and another one that I've started to work on to allocate ECC
> > protected or unprotected buffers that are just the right use case for
> > the heaps, and the target frameworks aren't.
>
> Let's agree we are both off topic now. The libcamera softISP is currently purely
> software, and cannot write to any form of protected memory. As for ECC, I would
> hope this usage will be coded in the application and that this application has
> been authorized to access the appropriate heaps.
>
> And finally, none of this fixes the issue that the heap allocation are not being
> accounted properly and allow of an easy memory DoS. So uaccess should be granted
> with care, meaning that defaulting a "desktop" library to that, means it will
> most of the time not work at all.
I think that issue should be fixed, regardless of whether or not we end
up using dma heaps for libcamera. If we do use them, maybe there will be
a higher incentive for somebody involved in this conversation to tackle
that problem first :-) And maybe, as a result, the rest of the Linux
community will consider with a more open mind usage of dma heaps on
desktop systems.
--
Regards,
Laurent Pinchart
On Mon, May 13, 2024 at 10:29:22AM +0200, Maxime Ripard wrote:
> On Wed, May 08, 2024 at 10:36:08AM +0200, Daniel Vetter wrote:
> > On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> > > Hi,
> > >
> > > Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > > > Shorter term, we have a problem to solve, and the best option we have
> > > > found so far is to rely on dma-buf heaps as a backend for the frame
> > > > buffer allocatro helper in libcamera for the use case described above.
> > > > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > > > until we can do better.
> > >
> > > Considering the security concerned raised on this thread with dmabuf heap
> > > allocation not be restricted by quotas, you'd get what you want quickly with
> > > memfd + udmabuf instead (which is accounted already).
> > >
> > > It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> > > any cases distro needs to take action to make the softISP works. This
> > > alternative is easy and does not interfere in anyway with your future plan or
> > > the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> > > safer memfd+udmabuf for the distro with security concerns.
> > >
> > > And for the long term plan, we can certainly get closer by fixing that issue
> > > with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> > > find common set of helpers to fix these exporters.
> >
> > Yeah if this is just for softisp, then memfd + udmabuf is also what I was
> > about to suggest. Not just as a stopgap, but as the real official thing.
> >
> > udmabuf does kinda allow you to pin memory, but we can easily fix that by
> > adding the right accounting and then either let mlock rlimits or cgroups
> > kernel memory limits enforce good behavior.
>
> I think the main drawback with memfd is that it'll be broken for devices
> without an IOMMU, and while you said that it's uncommon for GPUs, it's
> definitely not for codecs and display engines.
If the application wants to share buffers between the camera and a
display engine or codec, it should arguably not use the libcamera
FrameBufferAllocator, but allocate the buffers from the display or the
encoder. memfd wouldn't be used in that case.
We need to eat our own dogfood though. If we want to push the
responsibility for buffer allocation in the buffer sharing case to the
application, we need to modify the cam application to do so when using
the KMS backend.
--
Regards,
Laurent Pinchart
From: Christian Brauner
> Sent: 10 May 2024 11:55
>
> > For the uapi issue you describe below my take would be that we should just
> > try, and hope that everyone's been dutifully using O_CLOEXEC. But maybe
> > I'm biased from the gpu world, where we've been hammering it in that
> > "O_CLOEXEC or bust" mantra since well over a decade. Really the only valid
>
> Oh, we're very much on the same page. All new file descriptor types that
> I've added over the years are O_CLOEXEC by default. IOW, you need to
> remove O_CLOEXEC explicitly (see pidfd as an example). And imho, any new
> fd type that's added should just be O_CLOEXEC by default.
For fd a shell redirect creates you may want so be able to say
'this fd will have O_CLOEXEC set after the next exec'.
Also (possibly) a flag that can't be cleared once set and that
gets kept by dup() etc.
But maybe that is excessive?
I've certainly used:
# ip netns exec ns command 3</sys/class/net
in order to be able to (easily) read status for interfaces in the
default namespace and a specific namespace.
The would be hard if the O_CLOEXEC flag had got set by default.
(Especially without a shell builtin to clear it.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, 9 May 2024 at 04:39, Christian Brauner <brauner(a)kernel.org> wrote:
>
> Not worth it without someone explaining in detail why imho. First pass
> should be to try and replace kcmp() in scenarios where it's obviously
> not needed or overkill.
Ack.
> I've added a CLASS(fd_raw) in a preliminary patch since we'll need that
> anyway which means that your comparison patch becomes even simpler imho.
> I've also added a selftest patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.misc
LGTM.
Maybe worth adding an explicit test for "open same file, but two
separate opens, F_DUPFD_QUERY returns 0? Just to clarify the "it's not
testing the file on the filesystem for equality, but the file pointer
itself".
Linus
On Tue, 7 May 2024 at 18:15, Bryan O'Donoghue
<bryan.odonoghue(a)linaro.org> wrote:
>
> On 07/05/2024 16:09, Dmitry Baryshkov wrote:
> > Ah, I see. Then why do you require the DMA-ble buffer at all? If you are
> > providing data to VPU or DRM, then you should be able to get the buffer
> > from the data-consuming device.
>
> Because we don't necessarily know what the consuming device is, if any.
>
> Could be VPU, could be Zoom/Hangouts via pipewire, could for argument
> sake be GPU or DSP.
>
> Also if we introduce a dependency on another device to allocate the
> output buffers - say always taking the output buffer from the GPU, then
> we've added another dependency which is more difficult to guarantee
> across different arches.
Yes. And it should be expected. It's a consumer who knows the
restrictions on the buffer. As I wrote, Zoom/Hangouts should not
require a DMA buffer at all. Applications should be able to allocate
the buffer out of the generic memory. GPUs might also have different
requirements. Consider GPUs with VRAM. It might be beneficial to
allocate a buffer out of VRAM rather than generic DMA mem.
--
With best wishes
Dmitry
On Tue, May 07, 2024 at 04:07:39PM -0400, Nicolas Dufresne wrote:
> Hi,
>
> Le mardi 07 mai 2024 à 21:36 +0300, Laurent Pinchart a écrit :
> > Shorter term, we have a problem to solve, and the best option we have
> > found so far is to rely on dma-buf heaps as a backend for the frame
> > buffer allocatro helper in libcamera for the use case described above.
> > This won't work in 100% of the cases, clearly. It's a stop-gap measure
> > until we can do better.
>
> Considering the security concerned raised on this thread with dmabuf heap
> allocation not be restricted by quotas, you'd get what you want quickly with
> memfd + udmabuf instead (which is accounted already).
>
> It was raised that distro don't enable udmabuf, but as stated there by Hans, in
> any cases distro needs to take action to make the softISP works. This
> alternative is easy and does not interfere in anyway with your future plan or
> the libcamera API. You could even have both dmabuf heap (for Raspbian) and the
> safer memfd+udmabuf for the distro with security concerns.
>
> And for the long term plan, we can certainly get closer by fixing that issue
> with accounting. This issue also applied to v4l2 io-ops, so it would be nice to
> find common set of helpers to fix these exporters.
Yeah if this is just for softisp, then memfd + udmabuf is also what I was
about to suggest. Not just as a stopgap, but as the real official thing.
udmabuf does kinda allow you to pin memory, but we can easily fix that by
adding the right accounting and then either let mlock rlimits or cgroups
kernel memory limits enforce good behavior.
-Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Sat, 4 May 2024 at 02:37, Christian Brauner <brauner(a)kernel.org> wrote:
>
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
> if (!dmabuf || !dmabuf->resv)
> return EPOLLERR;
>
> + if (!get_file_active(&dmabuf->file))
> + return EPOLLERR;
[...]
I *really* don't think anything that touches dma-buf.c can possibly be right.
This is not a dma-buf.c bug.
This is purely an epoll bug.
Lookie here, the fundamental issue is that epoll can call '->poll()'
on a file descriptor that is being closed concurrently.
That means that *ANY* driver that relies on *any* data structure that
is managed by the lifetime of the 'struct file' will have problems.
Look, here's sock_poll():
static __poll_t sock_poll(struct file *file, poll_table *wait)
{
struct socket *sock = file->private_data;
and that first line looks about as innocent as it possibly can, right?
Now, imagine that this is called from 'epoll' concurrently with the
file being closed for the last time (but it just hasn't _quite_
reached eventpoll_release() yet).
Now, imagine that the kernel is built with preemption, and the epoll
thread gets preempted _just_ before it loads 'file->private_data'.
Furthermore, the machine is under heavy load, and it just stays off
its CPU a long time.
Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the
file closing completes, eventpoll_release() finishes, and the
preemption of the poll() thing just takes so long that you go through
an RCU period too, so that the actual file has been released too.
So now that totally innoced file->private_data load in the poll() is
probably going to get random data.
Yes, the file is allocated as SLAB_TYPESAFE_BY_RCU, so it's probably
still a file. Not guaranteed, even the slab will get fully free'd in
some situations. And yes, the above case is impossible to hit in
practice. You have to hit quite the small race window with an
operation that practically never happens in the first place.
But my point is that the fact that the problem with file->f_count
lifetimes happens for that dmabuf driver is not the fault of the
dmabuf code. Not at all.
It is *ENTIRELY* a bug in epoll, and the dmabuf code is probably just
easier to hit because it has a poll() function that does things that
have longer lifetimes than most things, and interacts more directly
with that f_count.
So I really don't understand why Al thinks this is "dmabuf does bad
things with f_count". It damn well does not. dma-buf is the GOOD GUY
here. It's doing things *PROPERLY*. It's taking refcounts like it damn
well should.
The fact that it takes ref-counts on something that the epoll code has
messed up is *NOT* its fault.
Linus