Hi!
I just want to clarify some buffer object operations and terminology that seems confusing to people and that are used by most modern GPU drivers.
I think it's useful to be aware of this, going forward in the memory manager discussions.
Terminology:
Scanout buffer: Buffer that is used for continous access by a device. Needs to be permanently pinned.
Pinned buffer: A pinned buffer may not move and may not change backing pages. Allows it to be mapped to a device.
Synchronization object: An object that is either in a signaled or non-signaled state. Signaled means that the device is done with the buffer, and has flushed its caches. A synchronization object has a device-specific part that may, for example, contain flushing state.
Basic device use of a buffer:
Scanout buffers (and perhaps also capture buffers?) are typically pinned.
Other buffers that are temporarily used by a GPU and, for example, a video decoding engine or image processor are typically *not* pinned. The usage pattern for submitting any commands that affect the buffer is as follows:
1) Take a mutex that stops the buffer from being moved. This mutex could be global (stops all buffers from being moved) or per-buffer.
2) Wait on any previous synchronization objects attached to the buffer, if those sync objects would not be implicitly signaled when the device executes its work. This is where it becomes bad to have a global mutex under 1).
3) Validate the buffer. This means setting up any missing (contigous) device mappings or move to VRAM, flush cpu caches if necessary.
4) Patch up the device commands to reflect any movement of the buffer in 3). New offsets, SG-lists etc.
5) Submit the device commands.
6) Create a new synchronization object and attach it to the buffer.
7) Release the mutex taken i 1).
The buffer will not be moved until the synchronization object has signaled, and mappings set up under 3) will not be torn down until the memory manager receives a request to free up mapping resources.
I'd call this "Generation 2" device buffer management. (Intel (uses busy lists, no sync objects), Radeon, Nouveau, vmwgfx, New VIA)
"Generation 1" was using a global memory manager for pinned buffers (SiS, old VIA DRM drivers)
Generation 3 would be page based device MMUs with programmable apertures to access VRAM.
What we were discussing today is basically creating a unified gen 1 manager, with a new user-space interface.
/Thomas
DRM support for platform devices dropped last year and was drastically
improved earlier this year. Qualcomm uses it for a really weak DRM driver
that handles memory for X but does GPU and display through a different
interface. Feel free to flame me for that.. :).
https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git;a=blob;f=driver…
And I believe OMAP also has a solution somewhere (sorry, I couldn't find a URL).
Jordan
Hi!
Just wanted to share some thoughts about CMA, TTM and the problems with
conflicting mappings.
1) CMA seems to be a nice way to handle the problem with contigous
pages, although it seems Arnd has some concerns. It would be nice to
here about those. Some thoughts:
a) It seems fairly straightforward to interface CMA with TTM. The
benefit would be that CMA could ask TTM at any time (using a shrinker
callback) to release its contigous pages, and TTM would do so once the
GPU is finished with them (unless of course they are used with a pinned
buffer object, like a scanout buffer). CMA would need to be extended
with a small API to create / free a contigous range and to actually
populate that range with pages.
b) DRM, TTM and it seems CMA all need a range allocator. There is a
reasonable implementation in drm_mm.c, which since the original
implementation has seen a fair bit of improvement. Should we try to move
that to linux/lib ?
c) Could the CMA technique be used also to keep a pool of pages that are
unmapped from the linear kernel map? Essentially a range of HIGHMEM
pages? The benefit compared to just having a pool of HIGHMEM pages by
itself would be that the graphics subsystem would have priority over
normal system use (moving out movable contents), and could use these
pages with nonstandard caching attribute maps if needed. If this is
considered a good idea, we could perhaps consider placing the default
CMA region in HIGHMEM.
/Thomas
Hi all,
I've updated the mini-summit wiki with a couple more details:
https://wiki.linaro.org/Events/2011-05-MM
one of which is a sample use case description from Samsung. I would
encourage everyone to look at that and see if there are other use
cases they think would make more sense, or if there is clarification
or amendment of the current proposal. The discussion around this is
slated for Tuesday, so we have some time before it comes up in the
summit. As we proceed, I'll be moving sections of the agenda over
onto the discussion blueprints on launchpad (as that's how Linaro
tracks stuff), but everything will also be available on the wiki as
well as this list for those that can't or don't want to use launchpad.
Also, there are still a few components without representatives in the
summit, and while it would be nice to be able to have those in the
early parts of the sessions, I would rather flex the agenda than omit
them. Even if there is no presentation on a component at the summit,
it would still be good to have written overviews of those, so I'll ask
again for volunteers.
cheers,
Jesse
Hey
This is a quick heads up that MM summit starts at *2pm* today (in 15mn)
and not at 3pm. The schedule is incorrect because we can't overlap the
plenaries, but discussion will start at 2pm.
See you there!
--
Loïc Minier
Hi all,
I just wanted to remind everyone about the room change and the fact
that the scheduler wouldn't let me show the session as starting at
1400, due to the plenaries. See you all soon.
cheers,
jesse
Hi,
>From the today's V4L presentation, there were two missing topics that may be
useful to include for our discussions:
a) V4L overlay mode;
b) dvb.
So, I'm bringing those two topics for discussions. If needed, I can do some
presentation about them, but it seemed better to start the discussion via
ML, in order to know more about the interests over those two subject.
a) V4L overlay mode
================
The V4L Overlay mode were used a lot during kernel 2.2 and 2.4 days, were
most hardware were not capable enough to do real-time processing of video
streams. It is supported by xawtv and a Xorg v4l driver, and uses XV overlay
extensions to display video. It is simple to setup and it requires no CPU
usage for it, as the video framebuffer is passed directly to the video hardware,
that programs DMA to write directly into the fb memory.
The main structures used on overlay mode (from kerrnel include/linux/videodev2.h)
are:
struct v4l2_pix_format {
__u32 width;
__u32 height;
__u32 pixelformat;
enum v4l2_field field;
__u32 bytesperline; /* for padding, zero if unused */
__u32 sizeimage;
enum v4l2_colorspace colorspace;
__u32 priv; /* private data, depends on pixelformat */
};
struct v4l2_framebuffer {
__u32 capability;
__u32 flags;
/* FIXME: in theory we should pass something like PCI device + memory
* region + offset instead of some physical address */
void *base;
struct v4l2_pix_format fmt;
};
/* Flags for the 'capability' field. Read only */
#define V4L2_FBUF_CAP_EXTERNOVERLAY 0x0001
#define V4L2_FBUF_CAP_CHROMAKEY 0x0002
#define V4L2_FBUF_CAP_LIST_CLIPPING 0x0004
#define V4L2_FBUF_CAP_BITMAP_CLIPPING 0x0008
#define V4L2_FBUF_CAP_LOCAL_ALPHA 0x0010
#define V4L2_FBUF_CAP_GLOBAL_ALPHA 0x0020
#define V4L2_FBUF_CAP_LOCAL_INV_ALPHA 0x0040
#define V4L2_FBUF_CAP_SRC_CHROMAKEY 0x0080
/* Flags for the 'flags' field. */
#define V4L2_FBUF_FLAG_PRIMARY 0x0001
#define V4L2_FBUF_FLAG_OVERLAY 0x0002
#define V4L2_FBUF_FLAG_CHROMAKEY 0x0004
#define V4L2_FBUF_FLAG_LOCAL_ALPHA 0x0008
#define V4L2_FBUF_FLAG_GLOBAL_ALPHA 0x0010
#define V4L2_FBUF_FLAG_LOCAL_INV_ALPHA 0x0020
#define V4L2_FBUF_FLAG_SRC_CHROMAKEY 0x0040
Using it is as simple as selecting a format that the video display framebuffer
supports, and send a couple of ioctls to the video adapter.
This is what the Xorg v4l driver (v4l.c) does (simplified, to ease
comprehension):
struct v4l2_framebuffer yuv_fbuf;
int on = 1;
if (-1 == ioctl(V4L_FD, VIDIOC_G_FBUF, &yuv_fbuf))
return;
/* Sets the Framebuf data: width, heigth, bpp, format, base and display position */
yuv_fbuf.fmt.width = surface->width;
yuv_fbuf.fmt.height = surface->height;
yuv_fbuf.fmt.bytesperline = surface->pitches[0];
yuv_fbuf.fmt.pixelformat = V4L2_PIX_FMT_YUYV;
yuv_fbuf.base = (char *)(memPhysBase + surface->offsets[0]);
memset(&yuv_win, 0, sizeof(yuv_win));
yuv_win.w.left = 0;
yuv_win.w.top = 0;
yuv_win.w.width = surface->width;
yuv_win.w.height = surface->height;
if (-1 == ioctl(V4L_FD, VIDIOC_S_FBUF, yuv_fbuf))
return;
/* Sets mem transfer type to overlay mode */
memset(&fmt, 0, sizeof(fmt));
fmt.type = V4L2_BUF_TYPE_VIDEO_OVERLAY;
if (-1 == ioctl(V4L_FD, VIDIOC_S_FMT, &fmt))
return;
/* Enables overlay mode. Data are transfered directly from video capture device into display framebuffer */
memcpy(&fmt.fmt.win, &pPPriv->yuv_win, sizeof(pPPriv->yuv_win));
if (-1 == ioctl(V4L_FD, VIDIOC_OVERLAY, &on))
return;
The main issue with the overlay mode, as discussed on the first day,
is that the framebuffer pointer is a physical address. The original
idea, on v4l2, were to use some framebuffer ID.
That's said, it wouldn't be hard to add a new flag at v4l2_framebuffer.flags,
saying meant to say that it should use a GEM ID. I had some discussions
with David Arlie about that when I've submitted the v4l driver fixes due to
the removal of the V4L1 old API. I'm planning to submit something like that in
the future, when I have some spare time for doing it. Eventually, if Linaro
is interested, it could be an interesting project, as it may solve some of
the current needs.
It is probably simpler to do that than to add another mode to the V4L MMAP stuff.
b) DVB
===
Several new ARM devices are now shipped with Digital TV integrated on that. On my
Country, we have several mobile phones, tablets and GPS devices with DTV receptors
inside. Modern TV sets and set-top-boxes already use Linux with DVB support inside.
GoogleTV will for sure need DTV support, as well as similar products.
Even being used everywhere, currently, no big vendor tried to send us patches to
improve their DVB support, but I suspect that this should happen soon. This is
just an educated guess. It would be nice to have some feedback about that from the
vendors.
The DVB API is completely different from the V4L one, and there are two different
types of DVB devices:
- Full-featured DVB devices, with MPEG-TS, audio and video codec inside it;
- "simple" devices that just provide a read() interface to get an MPEG-TS stream.
As modern ARM SoC devices can have a codec DSP processor, it makes sense for them
to use the full-featured API, providing just audio and video via the DVB API
(yes, DVB has a different way to control and export audio/video than V4L/alsa).
The question here is: is there any demand for it right now? If so, what are the
requirements? Are the memory management requirements identical to the current
ones?
Thanks,
Mauro
Hi,
>From the today's V4L presentation, there were two missing topics that may be
useful to include for our discussions:
a) V4L overlay mode;
b) dvb.
So, I'm bringing those two topics for discussions. If needed, I can do some
presentation about them, but it seemed better to start the discussion via
ML, in order to know more about the interests over those two subject.
a) V4L overlay mode
================
The V4L Overlay mode were used a lot during kernel 2.2 and 2.4 days, were
most hardware were not capable enough to do real-time processing of video
streams. It is supported by xawtv and a Xorg v4l driver, and uses XV overlay
extensions to display video. It is simple to setup and it requires no CPU
usage for it, as the video framebuffer is passed directly to the video hardware,
that programs DMA to write directly into the fb memory.
The main structures used on overlay mode (from kerrnel include/linux/videodev2.h)
are:
struct v4l2_pix_format {
__u32 width;
__u32 height;
__u32 pixelformat;
enum v4l2_field field;
__u32 bytesperline; /* for padding, zero if unused */
__u32 sizeimage;
enum v4l2_colorspace colorspace;
__u32 priv; /* private data, depends on pixelformat */
};
struct v4l2_framebuffer {
__u32 capability;
__u32 flags;
/* FIXME: in theory we should pass something like PCI device + memory
* region + offset instead of some physical address */
void *base;
struct v4l2_pix_format fmt;
};
/* Flags for the 'capability' field. Read only */
#define V4L2_FBUF_CAP_EXTERNOVERLAY 0x0001
#define V4L2_FBUF_CAP_CHROMAKEY 0x0002
#define V4L2_FBUF_CAP_LIST_CLIPPING 0x0004
#define V4L2_FBUF_CAP_BITMAP_CLIPPING 0x0008
#define V4L2_FBUF_CAP_LOCAL_ALPHA 0x0010
#define V4L2_FBUF_CAP_GLOBAL_ALPHA 0x0020
#define V4L2_FBUF_CAP_LOCAL_INV_ALPHA 0x0040
#define V4L2_FBUF_CAP_SRC_CHROMAKEY 0x0080
/* Flags for the 'flags' field. */
#define V4L2_FBUF_FLAG_PRIMARY 0x0001
#define V4L2_FBUF_FLAG_OVERLAY 0x0002
#define V4L2_FBUF_FLAG_CHROMAKEY 0x0004
#define V4L2_FBUF_FLAG_LOCAL_ALPHA 0x0008
#define V4L2_FBUF_FLAG_GLOBAL_ALPHA 0x0010
#define V4L2_FBUF_FLAG_LOCAL_INV_ALPHA 0x0020
#define V4L2_FBUF_FLAG_SRC_CHROMAKEY 0x0040
Using it is as simple as selecting a format that the video display framebuffer
supports, and send a couple of ioctls to the video adapter.
This is what the Xorg v4l driver (v4l.c) does (simplified, to ease
comprehension):
struct v4l2_framebuffer yuv_fbuf;
int on = 1;
if (-1 == ioctl(V4L_FD, VIDIOC_G_FBUF, &yuv_fbuf))
return;
/* Sets the Framebuf data: width, heigth, bpp, format, base and display position */
yuv_fbuf.fmt.width = surface->width;
yuv_fbuf.fmt.height = surface->height;
yuv_fbuf.fmt.bytesperline = surface->pitches[0];
yuv_fbuf.fmt.pixelformat = V4L2_PIX_FMT_YUYV;
yuv_fbuf.base = (char *)(memPhysBase + surface->offsets[0]);
memset(&yuv_win, 0, sizeof(yuv_win));
yuv_win.w.left = 0;
yuv_win.w.top = 0;
yuv_win.w.width = surface->width;
yuv_win.w.height = surface->height;
if (-1 == ioctl(V4L_FD, VIDIOC_S_FBUF, yuv_fbuf))
return;
/* Sets mem transfer type to overlay mode */
memset(&fmt, 0, sizeof(fmt));
fmt.type = V4L2_BUF_TYPE_VIDEO_OVERLAY;
if (-1 == ioctl(V4L_FD, VIDIOC_S_FMT, &fmt))
return;
/* Enables overlay mode. Data are transfered directly from video capture device into display framebuffer */
memcpy(&fmt.fmt.win, &pPPriv->yuv_win, sizeof(pPPriv->yuv_win));
if (-1 == ioctl(V4L_FD, VIDIOC_OVERLAY, &on))
return;
The main issue with the overlay mode, as discussed on the first day,
is that the framebuffer pointer is a physical address. The original
idea, on v4l2, were to use some framebuffer ID.
That's said, it wouldn't be hard to add a new flag at v4l2_framebuffer.flags,
saying meant to say that it should use a GEM ID. I had some discussions
with David Arlie about that when I've submitted the v4l driver fixes due to
the removal of the V4L1 old API. I'm planning to submit something like that in
the future, when I have some spare time for doing it. Eventually, if Linaro
is interested, it could be an interesting project, as it may solve some of
the current needs.
It is probably simpler to do that than to add another mode to the V4L MMAP stuff.
b) DVB
===
Several new ARM devices are now shipped with Digital TV integrated on that. On my
Country, we have several mobile phones, tablets and GPS devices with DTV receptors
inside. Modern TV sets and set-top-boxes already use Linux with DVB support inside.
GoogleTV will for sure need DTV support, as well as similar products.
Even being used everywhere, currently, no big vendor tried to send us patches to
improve their DVB support, but I suspect that this should happen soon. This is
just an educated guess. It would be nice to have some feedback about that from the
vendors.
The DVB API is completely different from the V4L one, and there are two different
types of DVB devices:
- Full-featured DVB devices, with MPEG-TS, audio and video codec inside it;
- "simple" devices that just provide a read() interface to get an MPEG-TS stream.
As modern ARM SoC devices can have a codec DSP processor, it makes sense for them
to use the full-featured API, providing just audio and video via the DVB API
(yes, DVB has a different way to control and export audio/video than V4L/alsa).
The question here is: is there any demand for it right now? If so, what are the
requirements? Are the memory management requirements identical to the current
ones?
Thanks,
Mauro
I've added it to the wiki along with the existing use case.
cheers,
jesse
On Mon, May 9, 2011 at 5:15 PM, Sakari Ailus
<sakari.ailus(a)maxwell.research.nokia.com> wrote:
> Jesse Barker wrote:
>> Hi all,
>
> Hi Jesse,
>
>> I've updated the mini-summit wiki with a couple more details:
>>
>> https://wiki.linaro.org/Events/2011-05-MM
>>
>> one of which is a sample use case description from Samsung. I would
>> encourage everyone to look at that and see if there are other use
>> cases they think would make more sense, or if there is clarification
>> or amendment of the current proposal. The discussion around this is
>
> I have a small set of slides on a use case related to camera on TI OMAP
> 3. The slides are attached.
>
> The Samsung example also looks very good to me.
>
> Kind regards,
>
> --
> Sakari Ailus
> sakari.ailus(a)maxwell.research.nokia.com
>