Yet another memory provider: can linaro organize a meeting?

List overview All Threads
Download

newer

older

RE: ELC and memory management

Re: Device Tree on ARM status...

Hans Verkuil

8 Mar 2011 8 Mar '11

8:13 a.m.

Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

I'm sure that last list is incomplete.

Regards,

Hans

-- Hans Verkuil - video4linux developer - sponsored by Cisco

Show replies by date

Kyungmin Park

8 Mar 8 Mar

11:05 a.m.

Dear Jonghun,

It's also helpful to explain what's the original purpose of UMP (for GPU, MALI) and what's the goal of UMP usage for multimedia stack. Especially, what's the final goal of UMP from LSI.

Also consider the previous GPU memory management program, e.g., SGX.

Thank you, Kyungmin Park

On Tue, Mar 8, 2011 at 5:13 PM, Hans Verkuil hverkuil@xs4all.nl wrote:

...

Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

I'm sure that last list is incomplete.

Regards,

Hans

-- Hans Verkuil - video4linux developer - sponsored by Cisco -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Jonghun Han

12:08 p.m.

Thanks for interesting.

As I know, the purpose of UMP is the buffer sharing especially inter-process . Maybe ARM can explain it more detail.

High resolution video/image processing requires zero-copy operation. UMP allows zero-copy operation using system unique key, named SecureID. UMP supports memory allocation. (custom memory allocator can be used.) It gives a SecureID for each buffer during allocation. And user virtual address for each process can be made by SecureID. Application can access the buffer using its own virtual address made by SecureID. So application can share the buffer without copy operation.

For example, video playback application can share the buffer even though it consists of multiple process.

Best regards, Jonghun Han

...

-----Original Message----- From: linux-media-owner@vger.kernel.org [mailto:linux-media- owner@vger.kernel.org] On Behalf Of Kyungmin Park Sent: Tuesday, March 08, 2011 8:06 PM To: Hans Verkuil Cc: linaro-dev@lists.linaro.org; linux-media@vger.kernel.org; Jonghun Han Subject: Re: Yet another memory provider: can linaro organize a meeting?

Dear Jonghun,

It's also helpful to explain what's the original purpose of UMP (for GPU, MALI) and what's the goal of UMP usage for multimedia stack. Especially, what's the final goal of UMP from LSI.

Also consider the previous GPU memory management program, e.g., SGX.

Thank you, Kyungmin Park

On Tue, Mar 8, 2011 at 5:13 PM, Hans Verkuil hverkuil@xs4all.nl wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver- open-source/page__cid__133__show__newcomment/

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

I'm sure that last list is incomplete.

Regards,

Hans

-- Hans Verkuil - video4linux developer - sponsored by Cisco -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

-- To unsubscribe from this list: send the line "unsubscribe linux-media" in

the

...

body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Benjamin Gaignard

12:29 p.m.

Hi,

hwmem basically use the same concept of handle (or SecureID).

The problem with this approach is that the middleware must be aware of this handle and must provide a way to forward it between elements/component and upper level. Today isn't the case in GStreamer (maybe in 1.0 it will), EGL, X ... the list isn't complete.

Does one solution natively provide a way to not use a handle and to only get a virtual address to manage in middleware?

While talking with hwmem owners, I came to the idea that a solution could be to reserve, overs all process, a range of virtual address where only hwmem could mmap physical buffers so the virtual address of the buffer could become the "handle" of the underline buffer.

Benjamin

2011/3/8 Jonghun Han jonghun.han@samsung.com

...

Thanks for interesting.

As I know, the purpose of UMP is the buffer sharing especially inter-process . Maybe ARM can explain it more detail.

High resolution video/image processing requires zero-copy operation. UMP allows zero-copy operation using system unique key, named SecureID. UMP supports memory allocation. (custom memory allocator can be used.) It gives a SecureID for each buffer during allocation. And user virtual address for each process can be made by SecureID. Application can access the buffer using its own virtual address made by SecureID. So application can share the buffer without copy operation.

For example, video playback application can share the buffer even though it consists of multiple process.

Best regards, Jonghun Han

...
-----Original Message----- From: linux-media-owner@vger.kernel.org [mailto:linux-media- owner@vger.kernel.org] On Behalf Of Kyungmin Park Sent: Tuesday, March 08, 2011 8:06 PM To: Hans Verkuil Cc: linaro-dev@lists.linaro.org; linux-media@vger.kernel.org; Jonghun

Han

...
Subject: Re: Yet another memory provider: can linaro organize a meeting?

Dear Jonghun,

It's also helpful to explain what's the original purpose of UMP (for GPU, MALI) and what's the goal of UMP usage for multimedia stack. Especially, what's the final goal of UMP from LSI.

Also consider the previous GPU memory management program, e.g., SGX.

Thank you, Kyungmin Park

On Tue, Mar 8, 2011 at 5:13 PM, Hans Verkuil hverkuil@xs4all.nl wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver- open-source/page__cid__133__show__newcomment/

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

I'm sure that last list is incomplete.

Regards,
   Hans
-- Hans Verkuil - video4linux developer - sponsored by Cisco -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe linux-media" in
the

...
body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Clark, Rob

3:11 p.m.

Then this sounds to me a bit like like GEM.. (or maybe I should say DRM and either TTM/GEM below)? If you can pass buffers back and forth between kernel and various processes by integer id, and then optionally read/write/mmap thru some ioctls if needed.. then the buffer sharing problem is solved. To me it sounds like how libdrm and libva above work. If the problem is already solved for video decode and render, then we just need to extend it to add camera.

So if it is explicitly about buffer sharing, and not buffer allocation, then it is still separate from what could/should fit beneath to allocate contiguous memory..

BR, -R

On Tue, Mar 8, 2011 at 6:08 AM, Jonghun Han jonghun.han@samsung.com wrote:

...

Thanks for interesting.

As I know, the purpose of UMP is the buffer sharing especially inter-process . Maybe ARM can explain it more detail.

High resolution video/image processing requires zero-copy operation. UMP allows zero-copy operation using system unique key, named SecureID. UMP supports memory allocation. (custom memory allocator can be used.) It gives a SecureID for each buffer during allocation. And user virtual address for each process can be made by SecureID. Application can access the buffer using its own virtual address made by SecureID. So application can share the buffer without copy operation.

For example, video playback application can share the buffer even though it consists of multiple process.

Best regards, Jonghun Han

...
-----Original Message----- From: linux-media-owner@vger.kernel.org [mailto:linux-media- owner@vger.kernel.org] On Behalf Of Kyungmin Park Sent: Tuesday, March 08, 2011 8:06 PM To: Hans Verkuil Cc: linaro-dev@lists.linaro.org; linux-media@vger.kernel.org; Jonghun

Han

...
Subject: Re: Yet another memory provider: can linaro organize a meeting?

Dear Jonghun,

It's also helpful to explain what's the original purpose of UMP (for GPU, MALI) and what's the goal of UMP usage for multimedia stack. Especially, what's the final goal of UMP from LSI.

Also consider the previous GPU memory management program, e.g., SGX.

Thank you, Kyungmin Park

On Tue, Mar 8, 2011 at 5:13 PM, Hans Verkuil hverkuil@xs4all.nl wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver- open-source/page__cid__133__show__newcomment/

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

I'm sure that last list is incomplete.

Regards,
   Hans
-- Hans Verkuil - video4linux developer - sponsored by Cisco -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe linux-media" in
the

...
body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Andy Walls

2:01 p.m.

On Tue, 2011-03-08 at 09:13 +0100, Hans Verkuil wrote:

...

Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

I'm not sure that's the entire story regarding what the current allocators for GPU do. TTM and GEM create in kernel objects that can be passed between applications. TTM apparently has handling for VRAM (video RAM). GEM uses anonymous userspace memory that can be swapped out.

TTM: http://lwn.net/Articles/257417/ http://www.x.org/wiki/ttm http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d... http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d...

GEM: http://lwn.net/Articles/283798/

GEM vs. TTM: http://lwn.net/Articles/283793/

The current TTM and GEM allocators appear to have API and buffer processing and management functions tied in with memory allocation.

TTM has fences for event notification of buffer processing completion. (maybe something v4l2 can do with v4l2_events?)

GEM tries avoid mapping buffers to userspace. (sounds like the v4l2 mem to mem API?)

Thanks to the good work of developers on the LMML in the past year or two, V4L2 has separated out some of that functionality from video buffer allocation:

video buffer queue management and userspace access (videobuf2) memory to memory buffer transformation/movement (m2m) event notification (VIDIOC_SUBSCRIBE_EVENT)

http://lwn.net/Articles/389081/ http://lwn.net/Articles/420512/

...

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

...

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

Prior to a meeting one would probably want to capture for each allocator:

1. What are the attributes of the memory allocated by this allocator?

2. For what domain was this allocator designed: GPU, video capture, video decoder, etc.

3. How are applications expected to use objects from this allocator?

4. What are the estimated sizes and lifetimes of objects that would be allocated this allocator?

5. Beyond memory allocation, what other functionality is built into this allocator: buffer queue management, event notification, etc.?

6. Of the requirements that this allocator satisfies, what are the performance critical requirements?

Maybe there are better question to ask.

Regards, Andy

Laurent Pinchart

3:52 p.m.

Hi Andy,

On Tuesday 08 March 2011 15:01:10 Andy Walls wrote:

...

On Tue, 2011-03-08 at 09:13 +0100, Hans Verkuil wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-ope n-source/page__cid__133__show__newcomment/

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

[snip]

...

...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

-- Regards, Laurent Pinchart

Andy Walls

7:12 p.m.

Hi LAurent,

On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

...

Hi Andy,

[snip]

...

...
...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

So I understand now why a single solution is desirable.

Regards, Andy

Laurent Pinchart

7:23 p.m.

Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...

On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...

So I understand now why a single solution is desirable.

-- Regards, Laurent Pinchart

Robert Fekete

15 Mar 15 Mar

4:07 p.m.

On 8 March 2011 20:23, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...

Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

1. The actual memory allocator 2. In-kernel API 3. Userland API

Explained: 1. This is how you acquire the actual physical or virtual memory, defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

Jesse! any progress?

BR /Robert Fekete

Jesse Barker

4:28 p.m.

On Tue, Mar 15, 2011 at 9:07 AM, Robert Fekete robert.fekete@linaro.orgwrote:

...

On 8 March 2011 20:23, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...
Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n",

for

...
...
...
...
some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video

memory?)

...
...
...
In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge,

...)

...
...
...
and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between

the

...
...
...
DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires

a

...
...
...
single memory management solution used by the ISP, the DSP and the

GPU.

...
...
Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer

properties

...
related to allocation, in order for all components in the system to

verify

...
that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic ( https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe... ) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

Jesse! any progress?

Robert (et al.),

Based upon the requirements in the link Robert (and I) posted, we are looking into what TTM changes would be needed to support those using the current UMP API as a sort of template, as this is something we have currently to hand (linux-linaro-2.6.38 based tree containing UMP kernel and Mali 400 device driver on git.linaro.org). We believe it would be similar to a mapping onto the HWMEM API, but if there are doubts there, I will happily add some additional work items to validate that. I will also be adding the V4L2 requirement to resolve a MM handle from a virtual address (we heard that one from Samsung in a separate thread).

On the KMS front, I have suggested to Scott (Linaro landing teams lead) that we push in that direction already and have proposed that we allocate resources in the graphics working group to support that (of course, contributions have to come from the SoC vendors to enable that work fully; thus the landing team involvement).

I would like to add an additional request for comments on the contents of our unified memory management position to make sure that it addresses everyone's concerns. Anyone in Linaro can edit it directly, but I will happily make "proxy" edits based upon email and IRC requests.

cheers, Jesse

Alex Deucher

4:47 p.m.

On Tue, Mar 15, 2011 at 12:07 PM, Robert Fekete robert.fekete@linaro.org wrote:

...

On 8 March 2011 20:23, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...
Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Modern GPUs are basically an SoC: 3D engine, video decode, hdmi packet engines, audio, dma engine, display blocks, etc. with a shared memory controller. Also the AMD fusion and Intel moorestown SoCs are not too different from ARM-based SoCs and we are supporting them with the drm. I expect we'll see the x86 and ARM/MIPS based SoCs continue to get closer together.

What are you basing your "fluffy" statement on? We recently merged a set of patches from qualcomm to support platform devices in the drm and Dave added support for USB devices. Qualcomm also has an open source drm for their snapdragon GPUs (although the userspace driver is closed) and they are using that on their SoCs.

...

Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

You don't have to use GEM or TTM for as your memory manager for KMS or DRI, it's memory manager independent. That said, I don't really see why you couldn't use one of them for a central memory manager on an SoC; the sub drivers would just request buffers from the common memory manager. We are already working on support for sharing buffers between drm drivers for supporting hybrid laptops and crossfire (multi-gpu) type things. We already share buffers between multiple userspace acceleration drivers and the drm using the DRI protocol.

...

I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Alex

...

Jesse! any progress?

BR /Robert Fekete -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Li Li

16 Mar 16 Mar

7:37 a.m.

Sorry but I feel the discussion is a bit off the point. We're not going to compare the pros and cons of current code (GEM/TTM, HWMEM, UMP, CMA, VCM, CMEM, PMEM, etc.)

The real problem is to find a suitable unified memory management module for various kinds of HW components (including CPU, VPU, GPU, camera, FB/OVL, etc.), especially for ARM based SOC. Some HW requires physical continuous big chunk of memory (e.g. some VPU & OVL); while others could live with DMA chain (e.g. some powerful GPU has built-in MMU).

So, what's current situation?

1) As Hans mentioned, there're GEM & TTM in upstream kernel, under the DRM framework (w/ KMS, etc.). This works fine on conventional (mostly Xorg-based) Linux distribution.

2) But DRM (or GEM/TTM) is still too heavy and complex to some embedded OS, which only want a cheaper memory management module. So...

2.1) Google uses PMEM in Android - However PMEM was removed from upstream kernel for well-known reasons;

2.2) Qualcomm writes a hybrid KGSL based DRM+PMEM solution - However KGSL was shamed in dri-devel list because their close user space binary.

2.3) ARM starts UMP/MaliDRM for both of Android and X11/DRI2 - This makes things even more complicated. (Therefore I personally think this is actually a shame for ARM to create another private SW. As a leader of Linaro, ARM should think more and coordinate with partners better to come up a unified solution to make our life easier.)

2.4) Other companies also have their own private solutions because nobody can get a STANDARD interface from upstream, including Marvell, TI, Freescale.

In general, it would be highly appreciated if Linaro guys could sit down together around a table, co-work with silicon vendors and upstream Linux kernel maintainers to make a unified (and cheaper than GEM/TTM/DRM) memory management module. This module should be reviewed carefully and strong enough to replace any other private memory manager mentioned above. It should replace PMEM for Android (with respect to Gralloc). And it could even be leveraged in DRM framework (as a primitive memory allocation provider under GEM).

Anyway, such a module is necessary, because user space application cannot exchange enough information by a single virtual address (among different per-process virtual address space). Gstreamer, V4L and any other middleware could remain using a single virtual address in the same process. But a global handler/ID is also necessary for sharing buffers between processes.

Furthermore, besides those well-known basic features, some advanced APIs should be provided for application to map the same physical memory region into another process, with 1) manageable fine CACHEable/BUFFERable attributes and cache flush mechanism (for performance); 2) lock/unlock synchronization; 3) swap/migration ability (optional in current stage, as those buffer are often expected to stay in RAM for better performance).

Finally, and the most important, THIS MODULE SHOULD BE PUSHED TO UPSTREAM (sorry, please ignore all the nonsense I wrote above if we can achieve this) so that everyone treat it as a de facto well supported memory management module. Thus all companies could transit from current private design to this public one. And, let's cheer for the end of this damn chaos!

Thanks, Lea

On Wed, Mar 16, 2011 at 12:47 AM, Alex Deucher alexdeucher@gmail.com wrote:

...

On Tue, Mar 15, 2011 at 12:07 PM, Robert Fekete robert.fekete@linaro.org wrote:

...
On 8 March 2011 20:23, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...
Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
> It really shouldn't be that hard to get everyone involved together > and settle on a single solution (either based on an existing > proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Modern GPUs are basically an SoC: 3D engine, video decode, hdmi packet engines, audio, dma engine, display blocks, etc. with a shared memory controller. Also the AMD fusion and Intel moorestown SoCs are not too different from ARM-based SoCs and we are supporting them with the drm. I expect we'll see the x86 and ARM/MIPS based SoCs continue to get closer together.

What are you basing your "fluffy" statement on? We recently merged a set of patches from qualcomm to support platform devices in the drm and Dave added support for USB devices. Qualcomm also has an open source drm for their snapdragon GPUs (although the userspace driver is closed) and they are using that on their SoCs.

...
Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

You don't have to use GEM or TTM for as your memory manager for KMS or DRI, it's memory manager independent. That said, I don't really see why you couldn't use one of them for a central memory manager on an SoC; the sub drivers would just request buffers from the common memory manager. We are already working on support for sharing buffers between drm drivers for supporting hybrid laptops and crossfire (multi-gpu) type things. We already share buffers between multiple userspace acceleration drivers and the drm using the DRI protocol.

...
I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Alex

...
Jesse! any progress?

BR /Robert Fekete -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Kyungmin Park

8:14 a.m.

On Wed, Mar 16, 2011 at 4:37 PM, Li Li eggonlea@gmail.com wrote:

...

Sorry but I feel the discussion is a bit off the point. We're not going to compare the pros and cons of current code (GEM/TTM, HWMEM, UMP, CMA, VCM, CMEM, PMEM, etc.)

The real problem is to find a suitable unified memory management module for various kinds of HW components (including CPU, VPU, GPU, camera, FB/OVL, etc.), especially for ARM based SOC. Some HW requires physical continuous big chunk of memory (e.g. some VPU & OVL); while others could live with DMA chain (e.g. some powerful GPU has built-in MMU).

So, what's current situation?

As Hans mentioned, there're GEM & TTM in upstream kernel, under the

DRM framework (w/ KMS, etc.). This works fine on conventional (mostly Xorg-based) Linux distribution.

But DRM (or GEM/TTM) is still too heavy and complex to some

embedded OS, which only want a cheaper memory management module. So...

2.1) Google uses PMEM in Android - However PMEM was removed from upstream kernel for well-known reasons;

2.2) Qualcomm writes a hybrid KGSL based DRM+PMEM solution - However KGSL was shamed in dri-devel list because their close user space binary.

2.3) ARM starts UMP/MaliDRM for both of Android and X11/DRI2 - This makes things even more complicated. (Therefore I personally think this is actually a shame for ARM to create another private SW. As a leader of Linaro, ARM should think more and coordinate with partners better to come up a unified solution to make our life easier.)

2.4) Other companies also have their own private solutions because nobody can get a STANDARD interface from upstream, including Marvell, TI, Freescale.

In general, it would be highly appreciated if Linaro guys could sit down together around a table, co-work with silicon vendors and upstream Linux kernel maintainers to make a unified (and cheaper than GEM/TTM/DRM) memory management module. This module should be reviewed carefully and strong enough to replace any other private memory manager mentioned above. It should replace PMEM for Android (with respect to Gralloc). And it could even be leveraged in DRM framework (as a primitive memory allocation provider under GEM).

Anyway, such a module is necessary, because user space application cannot exchange enough information by a single virtual address (among different per-process virtual address space). Gstreamer, V4L and any other middleware could remain using a single virtual address in the same process. But a global handler/ID is also necessary for sharing buffers between processes.

Furthermore, besides those well-known basic features, some advanced APIs should be provided for application to map the same physical memory region into another process, with 1) manageable fine CACHEable/BUFFERable attributes and cache flush mechanism (for performance); 2) lock/unlock synchronization; 3) swap/migration ability (optional in current stage, as those buffer are often expected to stay in RAM for better performance).

Finally, and the most important, THIS MODULE SHOULD BE PUSHED TO UPSTREAM (sorry, please ignore all the nonsense I wrote above if we can achieve this) so that everyone treat it as a de facto well supported memory management module. Thus all companies could transit from current private design to this public one. And, let's cheer for the end of this damn chaos!

Rough schedules.

1. Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors Make a consensence at media developers. and share the information. Please note that it's v4l2 brainstorming meeting. so memory management is not the main issue. 2. ELC (4/11~4/13): DRM, DRI and v4l2 person. Discuss GEM/TTM is acceptable for non-X86 system and find out the which modules are acceptable. We studied the GEM for our environment. but it's too huge and not much benefit for us since current frameworks are enough. The missing is that no generic memory passing mechanism. We need the generic memory passing interface. that's all. 3. Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons. I hope several person are anticipated and made a small step for final goal.

Thank you, Kyungmin Park

...

Thanks, Lea

On Wed, Mar 16, 2011 at 12:47 AM, Alex Deucher alexdeucher@gmail.com wrote:

...
On Tue, Mar 15, 2011 at 12:07 PM, Robert Fekete robert.fekete@linaro.org wrote:

...
On 8 March 2011 20:23, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...
Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
> > It really shouldn't be that hard to get everyone involved together > > and settle on a single solution (either based on an existing > > proposal or create a 'the best of' vendor-neutral solution). > > "Single" might be making the problem impossibly hard to solve well. > One-size-fits-all solutions have a tendency to fall short on meeting > someone's critical requirement. I will agree that "less than n", for > some small n, is certainly desirable. > > The memory allocators and managers are ideally satisfying the > requirements imposed by device hardware, what userspace applications > are expected to do with the buffers, and system performance. (And > maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Modern GPUs are basically an SoC: 3D engine, video decode, hdmi packet engines, audio, dma engine, display blocks, etc. with a shared memory controller. Also the AMD fusion and Intel moorestown SoCs are not too different from ARM-based SoCs and we are supporting them with the drm. I expect we'll see the x86 and ARM/MIPS based SoCs continue to get closer together.

What are you basing your "fluffy" statement on? We recently merged a set of patches from qualcomm to support platform devices in the drm and Dave added support for USB devices. Qualcomm also has an open source drm for their snapdragon GPUs (although the userspace driver is closed) and they are using that on their SoCs.

...
Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

You don't have to use GEM or TTM for as your memory manager for KMS or DRI, it's memory manager independent. That said, I don't really see why you couldn't use one of them for a central memory manager on an SoC; the sub drivers would just request buffers from the common memory manager. We are already working on support for sharing buffers between drm drivers for supporting hybrid laptops and crossfire (multi-gpu) type things. We already share buffers between multiple userspace acceleration drivers and the drm using the DRI protocol.

...
I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Alex

...
Jesse! any progress?

BR /Robert Fekete -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

-- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Hans Verkuil

21 Mar 21 Mar

6:03 p.m.

On Wednesday, March 16, 2011 09:14:54 Kyungmin Park wrote:

...

Rough schedules.

Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors

Make a consensence at media developers. and share the information. Please note that it's v4l2 brainstorming meeting. so memory management is not the main issue.

I have asked all participants to the meeting to try and assemble requirements for their hardware in the next week.

...

ELC (4/11~4/13): DRM, DRI and v4l2 person.

Discuss GEM/TTM is acceptable for non-X86 system and find out the which modules are acceptable. We studied the GEM for our environment. but it's too huge and not much benefit for us since current frameworks are enough. The missing is that no generic memory passing mechanism. We need the generic memory passing interface. that's all.

Who will be there? Is there a BoF or something similar organized?

...

Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons.

I hope several person are anticipated and made a small step for final goal.

I should be able to join, at least for the part related to buffer pools and related topics.

Regards,

Hans

-- Hans Verkuil - video4linux developer - sponsored by Cisco

Laurent Pinchart

22 Mar 22 Mar

10:03 a.m.

On Monday 21 March 2011 19:03:38 Hans Verkuil wrote:

...

On Wednesday, March 16, 2011 09:14:54 Kyungmin Park wrote:

...
Rough schedules.

Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors

Make a consensence at media developers. and share the information. Please note that it's v4l2 brainstorming meeting. so memory management is not the main issue.

I have asked all participants to the meeting to try and assemble requirements for their hardware in the next week.

...

ELC (4/11~4/13): DRM, DRI and v4l2 person.

Discuss GEM/TTM is acceptable for non-X86 system and find out the which modules are acceptable. We studied the GEM for our environment. but it's too huge and not much benefit for us since current frameworks are enough. The missing is that no generic memory passing mechanism. We need the generic memory passing interface. that's all.

Who will be there? Is there a BoF or something similar organized?

...

Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons.

I hope several person are anticipated and made a small step for final goal.

I should be able to join, at least for the part related to buffer pools and related topics.

Same for me. I might not join for the whole week, so it would be nice if we could draft an agenda in the near future.

-- Regards, Laurent Pinchart

Clark, Rob

25 Mar 25 Mar

9:41 p.m.

On Wed, Mar 16, 2011 at 3:14 AM, Kyungmin Park kmpark@infradead.org wrote:

...

Rough schedules.

Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors

Make a consensence at media developers. and share the information. Please note that it's v4l2 brainstorming meeting. so memory management is not the main issue. 2. ELC (4/11~4/13): DRM, DRI and v4l2 person.

Fyi, I should be at ELC, at least for a day or two.. it would be nice, as Andy suggested on other thread, to carve out a timeslot to discuss in advance, because I'm not sure that I'll be able to be there the entire time..

BR, -R

...

Discuss GEM/TTM is acceptable for non-X86 system and find out the which modules are acceptable. We studied the GEM for our environment. but it's too huge and not much benefit for us since current frameworks are enough. The missing is that no generic memory passing mechanism. We need the generic memory passing interface. that's all. 3. Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons. I hope several person are anticipated and made a small step for final goal.

Jesse Barker

27 Mar 27 Mar

6:25 p.m.

I'll be at ELC, as well as living in SF, so I'll be around before and after as well.

cheers, Jesse

On Fri, Mar 25, 2011 at 2:41 PM, Clark, Rob rob@ti.com wrote:

...

On Wed, Mar 16, 2011 at 3:14 AM, Kyungmin Park kmpark@infradead.org wrote:

...
Rough schedules.

Warsaw meetings (3/16~3/18): mostly v4l2 person and some SoC vendors

Make a consensence at media developers. and share the information. Please note that it's v4l2 brainstorming meeting. so memory management is not the main issue. 2. ELC (4/11~4/13): DRM, DRI and v4l2 person.

Fyi, I should be at ELC, at least for a day or two.. it would be nice, as Andy suggested on other thread, to carve out a timeslot to discuss in advance, because I'm not sure that I'll be able to be there the entire time..

BR, -R

...
Discuss GEM/TTM is acceptable for non-X86 system and find out the which modules are acceptable. We studied the GEM for our environment. but it's too huge and not much benefit for us since current frameworks are enough. The missing is that no generic memory passing mechanism. We need the generic memory passing interface. that's all. 3. Linaro (5/9~5/13): ARM, SoC vendors and v4l2 persons. I hope several person are anticipated and made a small step for final

goal.

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Alex Deucher

16 Mar 16 Mar

4:09 p.m.

On Wed, Mar 16, 2011 at 3:37 AM, Li Li eggonlea@gmail.com wrote:

...

Sorry but I feel the discussion is a bit off the point. We're not going to compare the pros and cons of current code (GEM/TTM, HWMEM, UMP, CMA, VCM, CMEM, PMEM, etc.)

The real problem is to find a suitable unified memory management module for various kinds of HW components (including CPU, VPU, GPU, camera, FB/OVL, etc.), especially for ARM based SOC. Some HW requires physical continuous big chunk of memory (e.g. some VPU & OVL); while others could live with DMA chain (e.g. some powerful GPU has built-in MMU).

So, what's current situation?

As Hans mentioned, there're GEM & TTM in upstream kernel, under the

DRM framework (w/ KMS, etc.). This works fine on conventional (mostly Xorg-based) Linux distribution.

But DRM (or GEM/TTM) is still too heavy and complex to some

embedded OS, which only want a cheaper memory management module. So...

2.1) Google uses PMEM in Android - However PMEM was removed from upstream kernel for well-known reasons;

2.2) Qualcomm writes a hybrid KGSL based DRM+PMEM solution - However KGSL was shamed in dri-devel list because their close user space binary.

2.3) ARM starts UMP/MaliDRM for both of Android and X11/DRI2 - This makes things even more complicated. (Therefore I personally think this is actually a shame for ARM to create another private SW. As a leader of Linaro, ARM should think more and coordinate with partners better to come up a unified solution to make our life easier.)

2.4) Other companies also have their own private solutions because nobody can get a STANDARD interface from upstream, including Marvell, TI, Freescale.

In general, it would be highly appreciated if Linaro guys could sit down together around a table, co-work with silicon vendors and upstream Linux kernel maintainers to make a unified (and cheaper than GEM/TTM/DRM) memory management module. This module should be reviewed carefully and strong enough to replace any other private memory manager mentioned above. It should replace PMEM for Android (with respect to Gralloc). And it could even be leveraged in DRM framework (as a primitive memory allocation provider under GEM).

Anyway, such a module is necessary, because user space application cannot exchange enough information by a single virtual address (among different per-process virtual address space). Gstreamer, V4L and any other middleware could remain using a single virtual address in the same process. But a global handler/ID is also necessary for sharing buffers between processes.

Furthermore, besides those well-known basic features, some advanced APIs should be provided for application to map the same physical memory region into another process, with 1) manageable fine CACHEable/BUFFERable attributes and cache flush mechanism (for performance); 2) lock/unlock synchronization; 3) swap/migration ability (optional in current stage, as those buffer are often expected to stay in RAM for better performance).

Finally, and the most important, THIS MODULE SHOULD BE PUSHED TO UPSTREAM (sorry, please ignore all the nonsense I wrote above if we can achieve this) so that everyone treat it as a de facto well supported memory management module. Thus all companies could transit from current private design to this public one. And, let's cheer for the end of this damn chaos!

FWIW, I don't know if a common memory management API is possible. On the GPU side we tried, but there ended up being too many weird hardware quirks from vendor to vendor (types of memory addressable, strange tiling formats, etc.). You might be able to come up with some kind of basic framework like TTM, but by the time you add the necessary quirks for various hw, it may be bigger than you want. That's why we have GEM and TTM and driver specific memory management ioctls in the drm.

Alex

...

Thanks, Lea

On Wed, Mar 16, 2011 at 12:47 AM, Alex Deucher alexdeucher@gmail.com wrote:

...
On Tue, Mar 15, 2011 at 12:07 PM, Robert Fekete robert.fekete@linaro.org wrote:

...
On 8 March 2011 20:23, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...
Hi Andy,

On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
> > It really shouldn't be that hard to get everyone involved together > > and settle on a single solution (either based on an existing > > proposal or create a 'the best of' vendor-neutral solution). > > "Single" might be making the problem impossibly hard to solve well. > One-size-fits-all solutions have a tendency to fall short on meeting > someone's critical requirement. I will agree that "less than n", for > some small n, is certainly desirable. > > The memory allocators and managers are ideally satisfying the > requirements imposed by device hardware, what userspace applications > are expected to do with the buffers, and system performance. (And > maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Modern GPUs are basically an SoC: 3D engine, video decode, hdmi packet engines, audio, dma engine, display blocks, etc. with a shared memory controller. Also the AMD fusion and Intel moorestown SoCs are not too different from ARM-based SoCs and we are supporting them with the drm. I expect we'll see the x86 and ARM/MIPS based SoCs continue to get closer together.

What are you basing your "fluffy" statement on? We recently merged a set of patches from qualcomm to support platform devices in the drm and Dave added support for USB devices. Qualcomm also has an open source drm for their snapdragon GPUs (although the userspace driver is closed) and they are using that on their SoCs.

...
Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

You don't have to use GEM or TTM for as your memory manager for KMS or DRI, it's memory manager independent. That said, I don't really see why you couldn't use one of them for a central memory manager on an SoC; the sub drivers would just request buffers from the common memory manager. We are already working on support for sharing buffers between drm drivers for supporting hybrid laptops and crossfire (multi-gpu) type things. We already share buffers between multiple userspace acceleration drivers and the drm using the DRI protocol.

...
I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Alex

...
Jesse! any progress?

BR /Robert Fekete -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Laurent Pinchart

5:49 p.m.

Hi Alex,

On Wednesday 16 March 2011 17:09:45 Alex Deucher wrote:

...

On Wed, Mar 16, 2011 at 3:37 AM, Li Li eggonlea@gmail.com wrote:

...
Sorry but I feel the discussion is a bit off the point. We're not going to compare the pros and cons of current code (GEM/TTM, HWMEM, UMP, CMA, VCM, CMEM, PMEM, etc.)

The real problem is to find a suitable unified memory management module for various kinds of HW components (including CPU, VPU, GPU, camera, FB/OVL, etc.), especially for ARM based SOC. Some HW requires physical continuous big chunk of memory (e.g. some VPU & OVL); while others could live with DMA chain (e.g. some powerful GPU has built-in MMU).

So, what's current situation?

As Hans mentioned, there're GEM & TTM in upstream kernel, under the

DRM framework (w/ KMS, etc.). This works fine on conventional (mostly Xorg-based) Linux distribution.

But DRM (or GEM/TTM) is still too heavy and complex to some

embedded OS, which only want a cheaper memory management module. So...

2.1) Google uses PMEM in Android - However PMEM was removed from upstream kernel for well-known reasons;

2.2) Qualcomm writes a hybrid KGSL based DRM+PMEM solution - However KGSL was shamed in dri-devel list because their close user space binary.

2.3) ARM starts UMP/MaliDRM for both of Android and X11/DRI2 - This makes things even more complicated. (Therefore I personally think this is actually a shame for ARM to create another private SW. As a leader of Linaro, ARM should think more and coordinate with partners better to come up a unified solution to make our life easier.)

2.4) Other companies also have their own private solutions because nobody can get a STANDARD interface from upstream, including Marvell, TI, Freescale.

In general, it would be highly appreciated if Linaro guys could sit down together around a table, co-work with silicon vendors and upstream Linux kernel maintainers to make a unified (and cheaper than GEM/TTM/DRM) memory management module. This module should be reviewed carefully and strong enough to replace any other private memory manager mentioned above. It should replace PMEM for Android (with respect to Gralloc). And it could even be leveraged in DRM framework (as a primitive memory allocation provider under GEM).

Anyway, such a module is necessary, because user space application cannot exchange enough information by a single virtual address (among different per-process virtual address space). Gstreamer, V4L and any other middleware could remain using a single virtual address in the same process. But a global handler/ID is also necessary for sharing buffers between processes.

Furthermore, besides those well-known basic features, some advanced APIs should be provided for application to map the same physical memory region into another process, with 1) manageable fine CACHEable/BUFFERable attributes and cache flush mechanism (for performance); 2) lock/unlock synchronization; 3) swap/migration ability (optional in current stage, as those buffer are often expected to stay in RAM for better performance).

Finally, and the most important, THIS MODULE SHOULD BE PUSHED TO UPSTREAM (sorry, please ignore all the nonsense I wrote above if we can achieve this) so that everyone treat it as a de facto well supported memory management module. Thus all companies could transit from current private design to this public one. And, let's cheer for the end of this damn chaos!

FWIW, I don't know if a common memory management API is possible. On the GPU side we tried, but there ended up being too many weird hardware quirks from vendor to vendor (types of memory addressable, strange tiling formats, etc.). You might be able to come up with some kind of basic framework like TTM, but by the time you add the necessary quirks for various hw, it may be bigger than you want. That's why we have GEM and TTM and driver specific memory management ioctls in the drm.

I agree that we might not be able to use the same memory buffers for all devices, as they all have more or less complex requirements regarding the memory properties (type, alignment, ...). However, having a common API to pass buffers around between drivers and applications using a common ID would be highly interesting. I'm not sure how complex that would be, I might not have all the nasty small details in mind.

-- Regards, Laurent Pinchart

Alex Deucher

6:03 p.m.

On Wed, Mar 16, 2011 at 1:49 PM, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...

Hi Alex,

On Wednesday 16 March 2011 17:09:45 Alex Deucher wrote:

...
On Wed, Mar 16, 2011 at 3:37 AM, Li Li eggonlea@gmail.com wrote:

...
Sorry but I feel the discussion is a bit off the point. We're not going to compare the pros and cons of current code (GEM/TTM, HWMEM, UMP, CMA, VCM, CMEM, PMEM, etc.)

The real problem is to find a suitable unified memory management module for various kinds of HW components (including CPU, VPU, GPU, camera, FB/OVL, etc.), especially for ARM based SOC. Some HW requires physical continuous big chunk of memory (e.g. some VPU & OVL); while others could live with DMA chain (e.g. some powerful GPU has built-in MMU).

So, what's current situation?

As Hans mentioned, there're GEM & TTM in upstream kernel, under the

DRM framework (w/ KMS, etc.). This works fine on conventional (mostly Xorg-based) Linux distribution.

But DRM (or GEM/TTM) is still too heavy and complex to some

embedded OS, which only want a cheaper memory management module. So...

2.1) Google uses PMEM in Android - However PMEM was removed from upstream kernel for well-known reasons;

2.2) Qualcomm writes a hybrid KGSL based DRM+PMEM solution - However KGSL was shamed in dri-devel list because their close user space binary.

2.3) ARM starts UMP/MaliDRM for both of Android and X11/DRI2 - This makes things even more complicated. (Therefore I personally think this is actually a shame for ARM to create another private SW. As a leader of Linaro, ARM should think more and coordinate with partners better to come up a unified solution to make our life easier.)

2.4) Other companies also have their own private solutions because nobody can get a STANDARD interface from upstream, including Marvell, TI, Freescale.

In general, it would be highly appreciated if Linaro guys could sit down together around a table, co-work with silicon vendors and upstream Linux kernel maintainers to make a unified (and cheaper than GEM/TTM/DRM) memory management module. This module should be reviewed carefully and strong enough to replace any other private memory manager mentioned above. It should replace PMEM for Android (with respect to Gralloc). And it could even be leveraged in DRM framework (as a primitive memory allocation provider under GEM).

Anyway, such a module is necessary, because user space application cannot exchange enough information by a single virtual address (among different per-process virtual address space). Gstreamer, V4L and any other middleware could remain using a single virtual address in the same process. But a global handler/ID is also necessary for sharing buffers between processes.

Furthermore, besides those well-known basic features, some advanced APIs should be provided for application to map the same physical memory region into another process, with 1) manageable fine CACHEable/BUFFERable attributes and cache flush mechanism (for performance); 2) lock/unlock synchronization; 3) swap/migration ability (optional in current stage, as those buffer are often expected to stay in RAM for better performance).

Finally, and the most important, THIS MODULE SHOULD BE PUSHED TO UPSTREAM (sorry, please ignore all the nonsense I wrote above if we can achieve this) so that everyone treat it as a de facto well supported memory management module. Thus all companies could transit from current private design to this public one. And, let's cheer for the end of this damn chaos!

FWIW, I don't know if a common memory management API is possible. On the GPU side we tried, but there ended up being too many weird hardware quirks from vendor to vendor (types of memory addressable, strange tiling formats, etc.). You might be able to come up with some kind of basic framework like TTM, but by the time you add the necessary quirks for various hw, it may be bigger than you want. That's why we have GEM and TTM and driver specific memory management ioctls in the drm.

I agree that we might not be able to use the same memory buffers for all devices, as they all have more or less complex requirements regarding the memory properties (type, alignment, ...). However, having a common API to pass buffers around between drivers and applications using a common ID would be highly interesting. I'm not sure how complex that would be, I might not have all the nasty small details in mind.

On the userspace side, we pass buffers around using the DRI protocol. Buffers are passed as handles, and the protocol is generic, however all of the relevant clients are GPU specific at this point. That may change as we work on support for sharing buffers between drivers for supporting things like hybrid laptops and multi-gpu rendering.

Alex

...

-- Regards,

Laurent Pinchart

Laurent Pinchart

8:52 a.m.

Hi Alex,

On Tuesday 15 March 2011 17:47:47 Alex Deucher wrote:

[snip]

...

FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Please don't see any malice there. Even though the topic has been on our table for quite some time now, we're only starting to actively work on it. The first step is to gather our requirements (this will likely be done this week, during the V4L2 brainstorming meeting in Warsaw). We will then of course contact DRM/DRI developers.

-- Regards, Laurent Pinchart

Alex Deucher

4 p.m.

On Wed, Mar 16, 2011 at 4:52 AM, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:

...

Hi Alex,

On Tuesday 15 March 2011 17:47:47 Alex Deucher wrote:

[snip]

...
FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Please don't see any malice there. Even though the topic has been on our table for quite some time now, we're only starting to actively work on it. The first step is to gather our requirements (this will likely be done this week, during the V4L2 brainstorming meeting in Warsaw). We will then of course contact DRM/DRI developers.

Sorry, it came out a little harsher than I wanted. I just want to avoid duplication of effort if possible.

Alex

Laurent Pinchart

4:28 p.m.

Hi Alex,

On Wednesday 16 March 2011 17:00:03 Alex Deucher wrote:

...

On Wed, Mar 16, 2011 at 4:52 AM, Laurent Pinchart wrote:

...
On Tuesday 15 March 2011 17:47:47 Alex Deucher wrote:

[snip]

...
FWIW, I have yet to see any v4l developers ever email the dri mailing list while discussing GEM, TTM, or the DRM, all the while conjecturing on aspects of it they admit to not fully understanding. For future reference, the address is: dri-devel@lists.freedesktop.org. We are happy to answer questions.

Please don't see any malice there. Even though the topic has been on our table for quite some time now, we're only starting to actively work on it. The first step is to gather our requirements (this will likely be done this week, during the V4L2 brainstorming meeting in Warsaw). We will then of course contact DRM/DRI developers.

Sorry, it came out a little harsher than I wanted. I just want to avoid duplication of effort if possible.

No worries. I share you concerns about this. As long as everyone remains polite I have absolutely no issue with critics :-)

-- Regards, Laurent Pinchart

Laurent Pinchart

8:49 a.m.

On Tuesday 15 March 2011 17:07:10 Robert Fekete wrote:

...

On 8 March 2011 20:23, Laurent Pinchart wrote:

...
On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video data from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing a data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to some extent.

So what I read is a common way of providing in kernel persistent buffers (buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may or may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can have several different allocators as long as the buffer objects can be managed through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

I think there's an agreement on this. Memory allocation and memory management must be separated, in order to have a single buffer management API working with several different memory providers. Given the wild creativity of hardware engineers, it's pretty much guaranteed that we'll see even more exotic memory allocation requirements in the future :-)

...

Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/Unified MemoryManagement) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

I've had a look at HWMEM yesterday. The API seems to go more or less in the right direction, but the allocator and memory managers are tightly integrated, so we'll need to solve that.

...

ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

I've asked TI who is responsible for CMEM, I'm waiting for an answer.

...

Jesse! any progress?

-- Regards, Laurent Pinchart

Jesse Barker

5:54 p.m.

Hi all,

I should probably have mentioned earlier that I am planning a session for Linaro @ UDS for this in May.

cheers, Jesse

On Wed, Mar 16, 2011 at 1:49 AM, Laurent Pinchart < laurent.pinchart@ideasonboard.com> wrote:

...

On Tuesday 15 March 2011 17:07:10 Robert Fekete wrote:

...
On 8 March 2011 20:23, Laurent Pinchart wrote:

...
On Tuesday 08 March 2011 20:12:45 Andy Walls wrote:

...
On Tue, 2011-03-08 at 16:52 +0100, Laurent Pinchart wrote:

[snip]

...
...
> It really shouldn't be that hard to get everyone involved

together

...
...
...
...
...
> and settle on a single solution (either based on an existing > proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve

well.

...
...
...
...
...
One-size-fits-all solutions have a tendency to fall short on

meeting

...
...
...
...
...
someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace

applications

...
...
...
...
...
are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

In the embedded world, a very common use case is to capture video

data

...
...
...
...
from an ISP (V4L2+MC), process it in a DSP (V4L2+M2M, tidspbridge, ...) and display it on the GPU (OpenGL/ES). We need to be able to share a data buffer between the ISP and the DSP, and another buffer between the DSP and the GPU. If processing is not required, sharing

a

...
...
...
...
data buffer between the ISP and the GPU is required. Achieving zero-copy requires a single memory management solution used by the ISP, the DSP and the GPU.

Ah. I guess I misunderstood what was meant by "memory provider" to

some

...
...
...
extent.

So what I read is a common way of providing in kernel persistent

buffers

...
...
...
(buffer objects? buffer entities?) for drivers and userspace applications to pass around by reference (no copies). Userspace may

or

...
...
...
may not want to see the contents of the buffer objects.

Exactly. How that memory is allocated in irrelevant here, and we can

have

...
...
several different allocators as long as the buffer objects can be

managed

...
...
through a single API. That API will probably have to expose buffer properties related to allocation, in order for all components in the system to verify that the buffers are suitable for their needs, but the allocation process itself is irrelevant.

...
So I understand now why a single solution is desirable.

Exactly,

It is important to know that there are 3 topics of discussion which all are a separate topic of its own:

The actual memory allocator

In-kernel API

Userland API

I think there's an agreement on this. Memory allocation and memory management must be separated, in order to have a single buffer management API working with several different memory providers. Given the wild creativity of hardware engineers, it's pretty much guaranteed that we'll see even more exotic memory allocation requirements in the future :-)

...
Explained:

This is how you acquire the actual physical or virtual memory,

defrag, swap, etc. This can be enhanced by CMA, hotswap, memory regions or whatever and the main topic for a system wide memory allocator does not deal much with how this is done. 2. In-kernel API is important from a device driver point of view in order to resolve buffers, pin memory when used(enable defrag when unpinned) 3. Userland API deals with alloc/free, import/export(IPC), security, and set-domain capabilities among others and is meant to pass buffers between processes in userland and enable no-copy data paths.

We need to resolve 2. and 3.

GEM/TTM is mentioned in this thread and there is an overlap of what is happening within DRM/DRI/GEM/TTM/KMS and V4L2. The whole idea behind DRM is to have one device driver for everything (well at least 2D/3D, video codecs, display output/composition), while on a SoC all this is on several drivers/IP's. A V4L2 device cannot resolve a GEM handle. GEM only lives inside one DRM device (AFAIK). GEM is also mainly for "dedicated memory-less" graphics cards while TTM mainly targets advanced Graphics Card with dedicated memory. From a SoC point of view DRM looks very "fluffy" and not quite slimmed for an embedded device, and you cannot get GEM/TTM without bringing in all of DRM/DRI. KMS on the other hand is very attractive as a framebuffer device replacer. It is not an easy task to decide on a multimedia user interface for a SoC vendor.

Uniting the frameworks within the kernel will likely fail(too big of a task) but a common system wide memory manager would for sure make life easier enabling the possibility to pass buffers between drivers(and user-land as well). In order for No-copy to work on a system level the general multimedia infrastructure in User-land (i.e. Gstreamer/X11/wayland/stagefright/flingers/etc) must also be aware of this memory manager and manage handles accordingly. This infrastructure in user-land puts the requirements on the User land API (1.).

I know that STE and ARM has a vision to have a hwmem/ump alike API and that Linaro is one place to resolve this. As Jesse Barker mentioned earlier Linaro has work ongoing on this topic (

https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/Unified

...
MemoryManagement) and a V4L2 brainstorming meeting in Warsaw will likely bring this up as well. And Gstreamer is also looking at this from a user-land point of view.

I've had a look at HWMEM yesterday. The API seems to go more or less in the right direction, but the allocator and memory managers are tightly integrated, so we'll need to solve that.

...
ARM, STE seems to agree on this, V4L2 maestros seems to agree, GStreamer as well(I believe), How about Samsung(vcm)? TI(cmem)? Freescale? DRI community? Linus?

I've asked TI who is responsible for CMEM, I'm waiting for an answer.

...
Jesse! any progress?

-- Regards,

Laurent Pinchart

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Jesse Barker

8 Mar 8 Mar

3:56 p.m.

Hi all,

Here's what I've cobbled together tentatively from prior threads involving linaro-dev as well as folks from ARM, Samsung and ST-E:

https://wiki.linaro.org/WorkingGroups/Middleware/Graphics/Projects/UnifiedMe...

The current goals within the graphics working group are to map these API requirements to extant allocator functionality; in particular, we are looking at the current UMP drop with respect to TTM (it's what we have immediately to hand, but would be happy doing similar exercises with HWMEM, etc.). From there we can work out what needs to be done to add appropriate support to TTM (or another allocator).

Please let me know if I've missed anything.

cheers, Jesse

On Tue, Mar 8, 2011 at 6:01 AM, Andy Walls awalls@md.metrocast.net wrote:

...

On Tue, 2011-03-08 at 09:13 +0100, Hans Verkuil wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

...
This is getting out of hand. I think that organizing a meeting to solve

this

...
mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel

any

...
driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible

to

...
allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

I'm not sure that's the entire story regarding what the current allocators for GPU do. TTM and GEM create in kernel objects that can be passed between applications. TTM apparently has handling for VRAM (video RAM). GEM uses anonymous userspace memory that can be swapped out.

TTM: http://lwn.net/Articles/257417/ http://www.x.org/wiki/ttm

http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d...

http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d...

GEM: http://lwn.net/Articles/283798/

GEM vs. TTM: http://lwn.net/Articles/283793/

The current TTM and GEM allocators appear to have API and buffer processing and management functions tied in with memory allocation.

TTM has fences for event notification of buffer processing completion. (maybe something v4l2 can do with v4l2_events?)

GEM tries avoid mapping buffers to userspace. (sounds like the v4l2 mem to mem API?)

Thanks to the good work of developers on the LMML in the past year or two, V4L2 has separated out some of that functionality from video buffer allocation:
   video buffer queue management and userspace access (videobuf2)
   memory to memory buffer transformation/movement (m2m)
   event notification (VIDIOC_SUBSCRIBE_EVENT)

   http://lwn.net/Articles/389081/
   http://lwn.net/Articles/420512/
...
It really shouldn't be that hard to get everyone involved together and

settle

...
on a single solution (either based on an existing proposal or create a

'the

...
best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

...
I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

Prior to a meeting one would probably want to capture for each allocator:

What are the attributes of the memory allocated by this allocator?

For what domain was this allocator designed: GPU, video capture,

video decoder, etc.

How are applications expected to use objects from this allocator?

What are the estimated sizes and lifetimes of objects that would be

allocated this allocator?

Beyond memory allocation, what other functionality is built into this

allocator: buffer queue management, event notification, etc.?

Of the requirements that this allocator satisfies, what are the

performance critical requirements?

Maybe there are better question to ask.

Regards, Andy

linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

Hans Verkuil

5:23 p.m.

On Tuesday, March 08, 2011 15:01:10 Andy Walls wrote:

...

On Tue, 2011-03-08 at 09:13 +0100, Hans Verkuil wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

I'm not sure that's the entire story regarding what the current allocators for GPU do. TTM and GEM create in kernel objects that can be passed between applications. TTM apparently has handling for VRAM (video RAM). GEM uses anonymous userspace memory that can be swapped out.

TTM: http://lwn.net/Articles/257417/ http://www.x.org/wiki/ttm http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d... http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d...

GEM: http://lwn.net/Articles/283798/

GEM vs. TTM: http://lwn.net/Articles/283793/

The current TTM and GEM allocators appear to have API and buffer processing and management functions tied in with memory allocation.

TTM has fences for event notification of buffer processing completion. (maybe something v4l2 can do with v4l2_events?)

GEM tries avoid mapping buffers to userspace. (sounds like the v4l2 mem to mem API?)

Thanks to the good work of developers on the LMML in the past year or two, V4L2 has separated out some of that functionality from video buffer allocation:

video buffer queue management and userspace access (videobuf2) memory to memory buffer transformation/movement (m2m) event notification (VIDIOC_SUBSCRIBE_EVENT)

http://lwn.net/Articles/389081/ http://lwn.net/Articles/420512/

...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

Actually, I think we really need one solution. I don't see how you can have it any other way without requiring e.g. gstreamer to support multiple 'solutions'.

...

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

We discussed this before at the V4L2 brainstorm meeting in Oslo. The idea was to have opaque buffer IDs (more like cookies) that both kernel and userspace can use. You have some standard operations you can do with that and tied to the buffer is the knowledge (probably a set of function pointers in practice) of how to do each operation. So a buffer referring to video memory might have different code to e.g. obtain the virtual address compared to a buffer residing in regular memory.

This way you would hide all these details while still have enough flexibility. It also allows you to support 'hidden' memory. It is my understanding that on some platforms (particular those used for set-top boxes) the video buffers are on memory that is not accessible from the CPU (rights management related). But apparently you still have to be able to refer to it. I may be wrong, it's something I was told.

...

...
I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

Prior to a meeting one would probably want to capture for each allocator:

What are the attributes of the memory allocated by this allocator?

For what domain was this allocator designed: GPU, video capture,

video decoder, etc.

How are applications expected to use objects from this allocator?

What are the estimated sizes and lifetimes of objects that would be

allocated this allocator?

Beyond memory allocation, what other functionality is built into this

allocator: buffer queue management, event notification, etc.?

Of the requirements that this allocator satisfies, what are the

performance critical requirements?

Maybe there are better question to ask.

It's a big topic with many competing and overlapping solutions. That really needs to change.

Regards,

Hans

-- Hans Verkuil - video4linux developer - sponsored by Cisco

Andy Walls

7:30 p.m.

Hi Hans,

On Tue, 2011-03-08 at 18:23 +0100, Hans Verkuil wrote:

...

...
"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

Actually, I think we really need one solution. I don't see how you can have it any other way without requiring e.g. gstreamer to support multiple 'solutions'.

Thanks. Laurent's explanation sorted that out for me.

...

...
The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

We discussed this before at the V4L2 brainstorm meeting in Oslo. The idea was to have opaque buffer IDs (more like cookies) that both kernel and userspace can use.

Sounds like System V Shared Memory IPC. It may be worth looking at the issues one can get with SYS V Shared Memory: obtaining the resource identifier, exhaustion of global resources, etc.

...

You have some standard operations you can do with that and tied to the buffer is the knowledge (probably a set of function pointers in practice) of how to do each operation. So a buffer referring to video memory might have different code to e.g. obtain the virtual address compared to a buffer residing in regular memory.

That is interesting.

...

This way you would hide all these details while still have enough flexibility. It also allows you to support 'hidden' memory. It is my understanding that on some platforms (particular those used for set-top boxes) the video buffers are on memory that is not accessible from the CPU (rights management related). But apparently you still have to be able to refer to it.

I can see that's something one would need to do with key material stored inside any decent cryptographic engine (key material should not be extractable from the engine, ever). I guess it's needed for the video ciphertext and video plaintext in STB DRM to impede someone with physical access to the device from doing differential analysis on the buffers to extract the key.

...

I may be wrong, it's something I was told.

...
...
I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

Prior to a meeting one would probably want to capture for each allocator:

What are the attributes of the memory allocated by this allocator?

For what domain was this allocator designed: GPU, video capture,

video decoder, etc.

How are applications expected to use objects from this allocator?

What are the estimated sizes and lifetimes of objects that would be

allocated this allocator?

Beyond memory allocation, what other functionality is built into this

allocator: buffer queue management, event notification, etc.?

Of the requirements that this allocator satisfies, what are the

performance critical requirements?

Maybe there are better question to ask.

It's a big topic with many competing and overlapping solutions. That really needs to change.

It also seems that the existing providers have different objectives.

...

From what I read, GEM could swap out buffers under system low memory

conditions, so the system still runs at the expense of video performance.

IIRC, TTM locks pages into memory.

With the per buffer type operations you mentioned, I guess the requirements that drive those sort of conflicting design decisions can be satisfied by one mechanism?

Regards, Andy

Alex Deucher

14 Mar 14 Mar

5:41 a.m.

On Tue, Mar 8, 2011 at 12:23 PM, Hans Verkuil hverkuil@xs4all.nl wrote:

...

On Tuesday, March 08, 2011 15:01:10 Andy Walls wrote:

...
On Tue, 2011-03-08 at 09:13 +0100, Hans Verkuil wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

I'm not sure that's the entire story regarding what the current allocators for GPU do. TTM and GEM create in kernel objects that can be passed between applications. TTM apparently has handling for VRAM (video RAM). GEM uses anonymous userspace memory that can be swapped out.

TTM: http://lwn.net/Articles/257417/ http://www.x.org/wiki/ttm http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d... http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d...

GEM: http://lwn.net/Articles/283798/

GEM vs. TTM: http://lwn.net/Articles/283793/

The current TTM and GEM allocators appear to have API and buffer processing and management functions tied in with memory allocation.

TTM has fences for event notification of buffer processing completion. (maybe something v4l2 can do with v4l2_events?)

GEM tries avoid mapping buffers to userspace. (sounds like the v4l2 mem to mem API?)

Thanks to the good work of developers on the LMML in the past year or two, V4L2 has separated out some of that functionality from video buffer allocation:

video buffer queue management and userspace access (videobuf2) memory to memory buffer transformation/movement (m2m) event notification (VIDIOC_SUBSCRIBE_EVENT)

http://lwn.net/Articles/389081/ http://lwn.net/Articles/420512/

...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

Actually, I think we really need one solution. I don't see how you can have it any other way without requiring e.g. gstreamer to support multiple 'solutions'.

...
The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

We discussed this before at the V4L2 brainstorm meeting in Oslo. The idea was to have opaque buffer IDs (more like cookies) that both kernel and userspace can use. You have some standard operations you can do with that and tied to the buffer is the knowledge (probably a set of function pointers in practice) of how to do each operation. So a buffer referring to video memory might have different code to e.g. obtain the virtual address compared to a buffer residing in regular memory.

This way you would hide all these details while still have enough flexibility. It also allows you to support 'hidden' memory. It is my understanding that on some platforms (particular those used for set-top boxes) the video buffers are on memory that is not accessible from the CPU (rights management related). But apparently you still have to be able to refer to it. I may be wrong, it's something I was told.

A related example is vram on GPUs. Often, the CPU can only mmap the region of vram that is covered by the PCI framebuffer BAR, but the GPU can access the entire vram pool. As such in order to access the buffer using the CPU, you either have to migrate it to a mappable region of vram using the GPU (using a dma engine or a blit), or migrate the buffer to another memory pool (such as gart memory - system memory that is remapped into a linear aperture on the GPU).

Alex

...

...
...
I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

Prior to a meeting one would probably want to capture for each allocator:

What are the attributes of the memory allocated by this allocator?

For what domain was this allocator designed: GPU, video capture,

video decoder, etc.

How are applications expected to use objects from this allocator?

What are the estimated sizes and lifetimes of objects that would be

allocated this allocator?

Beyond memory allocation, what other functionality is built into this

allocator: buffer queue management, event notification, etc.?

Of the requirements that this allocator satisfies, what are the

performance critical requirements?

Maybe there are better question to ask.

It's a big topic with many competing and overlapping solutions. That really needs to change.

Regards,

Hans

-- Hans Verkuil - video4linux developer - sponsored by Cisco -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Alex Deucher

5:57 a.m.

On Tue, Mar 8, 2011 at 9:01 AM, Andy Walls awalls@md.metrocast.net wrote:

...

On Tue, 2011-03-08 at 09:13 +0100, Hans Verkuil wrote:

...
Hi all,

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open-s...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

I'm not sure that's the entire story regarding what the current allocators for GPU do. TTM and GEM create in kernel objects that can be passed between applications. TTM apparently has handling for VRAM (video RAM). GEM uses anonymous userspace memory that can be swapped out.

TTM can handle pretty much any "type" of memory, it's just a basic memory manager. You specify things cacheability attributes when you set up the pools.

Generally on GPUs we see 3 types of buffers: 1. video ram connected to the GPU. Often some or all of that pool is not accessible by the CPU. The GPU usually provides a mechanism to migrate the buffer to a pool or region that is accessible to the CPU. Vram buffers are usually mapped uncached write-combined. 2. GART/GTT (Graphics Address Remapping Table) memory. This is DMAable system memory that is mapped into the GPU's address space and remapped to look linear to the GPU. It can either be cached or uncached pages depending on the GPU's capabilities and what the buffers are used for. 3. unpinned system pages. Depending on the GPU, they have to be copied to DMAable memory before the GPU can access them.

The DRI protocol (used for communication between GPU acceleration drivers) doesn't really care what the underlying memory manager is. It just passes around handles.

Alex

...

TTM: http://lwn.net/Articles/257417/ http://www.x.org/wiki/ttm http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d... http://nouveau.freedesktop.org/wiki/TTMMemoryManager?action=AttachFile&d...

GEM: http://lwn.net/Articles/283798/

GEM vs. TTM: http://lwn.net/Articles/283793/

The current TTM and GEM allocators appear to have API and buffer processing and management functions tied in with memory allocation.

TTM has fences for event notification of buffer processing completion. (maybe something v4l2 can do with v4l2_events?)

GEM tries avoid mapping buffers to userspace. (sounds like the v4l2 mem to mem API?)

Thanks to the good work of developers on the LMML in the past year or two, V4L2 has separated out some of that functionality from video buffer allocation:

video buffer queue management and userspace access (videobuf2) memory to memory buffer transformation/movement (m2m) event notification (VIDIOC_SUBSCRIBE_EVENT)

http://lwn.net/Articles/389081/ http://lwn.net/Articles/420512/

...
It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

"Single" might be making the problem impossibly hard to solve well. One-size-fits-all solutions have a tendency to fall short on meeting someone's critical requirement. I will agree that "less than n", for some small n, is certainly desirable.

The memory allocators and managers are ideally satisfying the requirements imposed by device hardware, what userspace applications are expected to do with the buffers, and system performance. (And maybe the platform architecture, I/O bus, and dedicated video memory?)

...
I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

Prior to a meeting one would probably want to capture for each allocator:

What are the attributes of the memory allocated by this allocator?

For what domain was this allocator designed: GPU, video capture,

video decoder, etc.

How are applications expected to use objects from this allocator?

What are the estimated sizes and lifetimes of objects that would be

allocated this allocator?

Beyond memory allocation, what other functionality is built into this

allocator: buffer queue management, event notification, etc.?

Of the requirements that this allocator satisfies, what are the

performance critical requirements?

Maybe there are better question to ask.

Regards, Andy

-- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Jonathan Corbet

8 Mar 8 Mar

6:04 p.m.

On Tue, 8 Mar 2011 09:13:59 +0100 Hans Verkuil hverkuil@xs4all.nl wrote:

...

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

There is a memory management summit at the LF Collaboration Summit next month. Perhaps this would be a good topic to raise there? I've added Hugh to the Cc list in case he has any thoughts on the matter - and besides, he doesn't have enough to do...:)

jon

Hugh Dickins

9 Mar 9 Mar

8:06 p.m.

On Tue, Mar 8, 2011 at 10:04 AM, Jonathan Corbet corbet@lwn.net wrote:

...

On Tue, 8 Mar 2011 09:13:59 +0100 Hans Verkuil hverkuil@xs4all.nl wrote:

...
All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

There is a memory management summit at the LF Collaboration Summit next month. Perhaps this would be a good topic to raise there? I've added Hugh to the Cc list in case he has any thoughts on the matter - and besides, he doesn't have enough to do...:)

It's a good suggestion, Jon, thank you. But I don't see that any of the prime movers in this area have applied to come to LSF/MM this year: except for Kamezawa-san, who is coming (but I expect will be focussing on other issues). And now we're full up.

Let me keep it in mind when drawing up the agenda; but I doubt this will be the forum to get such a ball rolling this year.

Hugh

Marek Szyprowski

10 Mar 10 Mar

2:14 p.m.

Hello,

On Tuesday, March 08, 2011 9:14 AM Hans Verkuil wrote:

...

We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open- source/page__cid__133__show__newcomment/

I really wonder what's the opinion of ARM Linux maintainer on this memory allocator. Russell - could you comment on it? Is this a preferred memory provider/allocator on ARM Linux platform? What's about still to-be-resolved issues with mapping memory regions for DMA transfers and different cache settings for each mapping?

...

This is getting out of hand. I think that organizing a meeting to solve this mess should be on the top of the list. Companies keep on solving the same problem time and again and since none of it enters the mainline kernel any driver using it is also impossible to upstream.

All these memory-related modules have the same purpose: make it possible to allocate/reserve large amounts of memory and share it between different subsystems (primarily framebuffer, GPU and V4L).

It really shouldn't be that hard to get everyone involved together and settle on a single solution (either based on an existing proposal or create a 'the best of' vendor-neutral solution).

I am currently aware of the following solutions floating around the net that all solve different parts of the problem:

In the kernel: GEM and TTM. Out-of-tree: HWMEM, UMP, CMA, VCM, CMEM, PMEM.

I'm sure that last list is incomplete.

Best regards -- Marek Szyprowski Samsung Poland R&D Center

Russell King - ARM Linux

2:42 p.m.

On Thu, Mar 10, 2011 at 03:14:11PM +0100, Marek Szyprowski wrote:

...

Hello,

On Tuesday, March 08, 2011 9:14 AM Hans Verkuil wrote:

...
We had a discussion yesterday regarding ways in which linaro can assist V4L2 development. One topic was that of sorting out memory providers like GEM and HWMEM.

Today I learned of yet another one: UMP from ARM.

http://blogs.arm.com/multimedia/249-making-the-mali-gpu-device-driver-open- source/page__cid__133__show__newcomment/

I really wonder what's the opinion of ARM Linux maintainer on this memory allocator. Russell - could you comment on it?

First I've heard about it. I'll have to do some reading first, but I'm rather busy at the present time.

As far as DMA memory allocation goes, I do have that patch laying around which preallocates the DMA coherent and writecombine memory, but inspite of sending it to the mailing list, there was very little in the way of feedback.

Someone was going to go through the various platforms and work out which could be reduced down to 1MB coherent/1MB writecombine, but I never saw any follow-up to that.

I've been debating about just throwing it in the kernel for this coming merge window anyway - I suspect most people just don't care how DMA memory is provided, so long as it works and works reliably.

5224

days inactive

5243

days old

linaro-dev@lists.linaro.org

34 comments

participants

tags (0)

participants (15)

Alex Deucher
Andy Walls
Benjamin Gaignard
Clark, Rob
Hans Verkuil
Hugh Dickins
Jesse Barker
Jonathan Corbet
Jonghun Han
Kyungmin Park
Laurent Pinchart
Li Li
Marek Szyprowski
Robert Fekete
Russell King - ARM Linux