- Linaro-mm-sig - lists.linaro.org

Re: [Linaro-mm-sig] [PATCH 00/10] mm: Linux VM Infrastructure to support Memory Power Management

by amit kachhap

Hi All, In response to the discussion about power savings with memory regions features following measurements are done. Title: On a system with 2GB memory , 1GB is static and the other 1GB in various power states. Brief environment description: Samsung smdk-exynos board is used for this work and full board level power consumption is measured that comprises of cpu and other components. It has 2 DMC's(Dynamic memory controller) with each supporting 1 GB DDR3 memory. Power characteristics of DMC0 controlled memory remain same but memory controlled by DMC1 is changed to 4 different power states. The following numbers describe the maximum power savings measured after executing the software from DMC0 controlled memory which changes the power states of DMC1 controlled memory. Here the actual numbers are not mentioned but the percentage power savings is shown in reference to the change in overall power consumption. The memory region patches are expected to facilitate transition of memory into into one of the following low power states. 1) Percentage power savings when DMC1(1GB) moved to self refresh mode from idle/unaccess mode= 2.69% 2) Percentage power savings when DMC1(1GB) moved to precharge mode from idle/unaccess mode= 3.23% 3) Percentage power savings when DMC1(1GB) clock is gated = 6.32% The above power savings is indicative of the benefits that memory regions could provide in this platform. Thanks & Regards, Amit Daniel Kachhap Samsung India s/w operations, Bangalore On Sat, May 28, 2011 at 1:26 PM, Andrew Morton <akpm(a)linux-foundation.org> wrote: > On Fri, 27 May 2011 18:01:28 +0530 Ankita Garg <ankita(a)in.ibm.com> wrote: > >> This patchset proposes a generic memory regions infrastructure that can be >> used to tag boundaries of memory blocks which belongs to a specific memory >> power management domain and further enable exploitation of platform memory >> power management capabilities. > > A couple of quick thoughts... > > I'm seeing no estimate of how much energy we might save when this work > is completed. But saving energy is the entire point of the entire > patchset! So please spend some time thinking about that and update and > maintain the [patch 0/n] description so others can get some idea of the > benefit we might get from all of this. That estimate should include an > estimate of what proportion of machines are likely to have hardware > which can use this feature and in what timeframe. > > IOW, if it saves one microwatt on 0.001% of machines, not interested ;) > > > Also, all this code appears to be enabled on all machines? So machines > which don't have the requisite hardware still carry any additional > overhead which is added here. I can see that ifdeffing a feature like > this would be ghastly but please also have a think about the > implications of this and add that discussion also. > > If possible, it would be good to think up some microbenchmarks which > probe the worst-case performance impact and describe those and present > the results. So others can gain an understanding of the runtime costs. > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel(a)lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >

14 years, 5 months

1
0
0 0

[PATCHv10 0/10] Contiguous Memory Allocator

by Marek Szyprowski

Hello everyone, Like I've promised during the Memory Management summit at Linaro Meeting in Budapest I continued the development of the CMA. The goal is to integrate it as tight as possible with other kernel subsystems (like memory management and dma-mapping) and finally merge to mainline. This version introduces integration with DMA-mapping subsystem for ARM architecture, but I believe that similar integration can be done for other archs too. I've also rebased all the code onto latest v3.0-rc2 kernel. A few words for these who see CMA for the first time: The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. The main difference from the similar frameworks is the fact that CMA allows to transparently reuse memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework will migrate system pages to create a required big chunk of physically contiguous memory. For more information see the changelog and links to previous versions of CMA framework. The current version of CMA is just an allocator that handles allocation of contiguous memory blocks. The difference between this patchset and Kamezawa's alloc_contig_pages() are: 1. alloc_contig_pages() requires MAX_ORDER alignment of allocations which may be unsuitable for embeded systems where a few MiBs are required. Lack of the requirement on the alignment means that several threads might try to access the same pageblock/page. To prevent this from happening CMA uses a mutex so that only one cm_alloc()/cm_free() function may run at one point. 2. CMA may use its own migratetype (MIGRATE_CMA) which behaves similarly to ZONE_MOVABLE but can be put in arbitrary places. This is required for us since we need to define two disjoint memory ranges inside system RAM. (ie. in two memory banks (do not confuse with nodes)). 3. alloc_contig_pages() scans memory in search for range that could be migrated. CMA on the other hand maintains its own allocator to decide where to allocate memory for device drivers and then tries to migrate pages from that part if needed. This is not strictly required but I somehow feel it might be faster. The integration with ARM DMA-mapping subsystem is quite straightforward. Once cma context is available alloc_pages() can be replaced by cm_alloc() call. Current version have been tested on Samsung S5PC110 based Aquila machine and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig memory allocator, which in turn relies on dma_alloc_coherent() from DMA-mapping subsystem. By integrating CMA with DMA-mapping we manage to get this driver working with CMA without any single change required in the driver or videobuf2-dma-contig allocator. TODO: 1. use struct page * or pfn internally instead of physicall address 2. use some simple bitmap based allocator instead of genaloc 3. provide a function similar to dma_declare_coherent_memory(), which will created and register cma area for particular device 4. code cleanup and simplification 5. discussion 6. double-mapping issues with ARMv6+ and coherent memory Best regards -- Marek Szyprowski Samsung Poland R&D Center Links to previous versions of the patchset: v9: <http://article.gmane.org/gmane.linux.kernel.mm/60787> v8: <http://article.gmane.org/gmane.linux.kernel.mm/56855> v7: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v6: <http://article.gmane.org/gmane.linux.kernel.mm/55626> v5: (intentionally left out as CMA v5 was identical to CMA v4) lv4: <http://article.gmane.org/gmane.linux.kernel.mm/52010> v3: <http://article.gmane.org/gmane.linux.kernel.mm/51573> v2: <http://article.gmane.org/gmane.linux.kernel.mm/50986> v1: <http://article.gmane.org/gmane.linux.kernel.mm/50669> Changelog: v10: 1. Rebased onto 3.0-rc2 and resolved all conflicts 2. Simplified CMA to be just a pure memory allocator, for use with platfrom/bus specific subsystems, like dma-mapping. Removed all device specific functions are calls. 3. Integrated with ARM DMA-mapping subsystem. 4. Code cleanup here and there. 5. Removed private context support. v9: 1. Rebased onto 2.6.39-rc1 and resolved all conflicts 2. Fixed a bunch of nasty bugs that happened when the allocation failed (mainly kernel oops due to NULL ptr dereference). 3. Introduced testing code: cma-regions compatibility layer and videobuf2-cma memory allocator module. v8: 1. The alloc_contig_range() function has now been separated from CMA and put in page_allocator.c. This function tries to migrate all LRU pages in specified range and then allocate the range using alloc_contig_freed_pages(). 2. Support for MIGRATE_CMA has been separated from the CMA code. I have not tested if CMA works with ZONE_MOVABLE but I see no reasons why it shouldn't. 3. I have added a @private argument when creating CMA contexts so that one can reserve memory and not share it with the rest of the system. This way, CMA acts only as allocation algorithm. v7: 1. A lot of functionality that handled driver->allocator_context mapping has been removed from the patchset. This is not to say that this code is not needed, it's just not worth posting everything in one patchset. Currently, CMA is "just" an allocator. It uses it's own migratetype (MIGRATE_CMA) for defining ranges of pageblokcs which behave just like ZONE_MOVABLE but dispite the latter can be put in arbitrary places. 2. The migration code that was introduced in the previous version actually started working. v6: 1. Most importantly, v6 introduces support for memory migration. The implementation is not yet complete though. Migration support means that when CMA is not using memory reserved for it, page allocator can allocate pages from it. When CMA wants to use the memory, the pages have to be moved and/or evicted as to make room for CMA. To make it possible it must be guaranteed that only movable and reclaimable pages are allocated in CMA controlled regions. This is done by introducing a MIGRATE_CMA migrate type that guarantees exactly that. Some of the migration code is "borrowed" from Kamezawa Hiroyuki's alloc_contig_pages() implementation. The main difference is that thanks to MIGRATE_CMA migrate type CMA assumes that memory controlled by CMA are is always movable or reclaimable so that it makes allocation decisions regardless of the whether some pages are actually allocated and migrates them if needed. The most interesting patches from the patchset that implement the functionality are: 09/13: mm: alloc_contig_free_pages() added 10/13: mm: MIGRATE_CMA migration type added 11/13: mm: MIGRATE_CMA isolation functions added 12/13: mm: cma: Migration support added [wip] Currently, kernel panics in some situations which I am trying to investigate. 2. cma_pin() and cma_unpin() functions has been added (after a conversation with Johan Mossberg). The idea is that whenever hardware does not use the memory (no transaction is on) the chunk can be moved around. This would allow defragmentation to be implemented if desired. No defragmentation algorithm is provided at this time. 3. Sysfs support has been replaced with debugfs. I always felt unsure about the sysfs interface and when Greg KH pointed it out I finally got to rewrite it to debugfs. v5: (intentionally left out as CMA v5 was identical to CMA v4) v4: 1. The "asterisk" flag has been removed in favour of requiring that platform will provide a "*=<regions>" rule in the map attribute. 2. The terminology has been changed slightly renaming "kind" to "type" of memory. In the previous revisions, the documentation indicated that device drivers define memory kinds and now, v3: 1. The command line parameters have been removed (and moved to a separate patch, the fourth one). As a consequence, the cma_set_defaults() function has been changed -- it no longer accepts a string with list of regions but an array of regions. 2. The "asterisk" attribute has been removed. Now, each region has an "asterisk" flag which lets one specify whether this region should by considered "asterisk" region. 3. SysFS support has been moved to a separate patch (the third one in the series) and now also includes list of regions. v2: 1. The "cma_map" command line have been removed. In exchange, a SysFS entry has been created under kernel/mm/contiguous. The intended way of specifying the attributes is a cma_set_defaults() function called by platform initialisation code. "regions" attribute (the string specified by "cma" command line parameter) can be overwritten with command line parameter; the other attributes can be changed during run-time using the SysFS entries. 2. The behaviour of the "map" attribute has been modified slightly. Currently, if no rule matches given device it is assigned regions specified by the "asterisk" attribute. It is by default built from the region names given in "regions" attribute. 3. Devices can register private regions as well as regions that can be shared but are not reserved using standard CMA mechanisms. A private region has no name and can be accessed only by devices that have the pointer to it. 4. The way allocators are registered has changed. Currently, a cma_allocator_register() function is used for that purpose. Moreover, allocators are attached to regions the first time memory is registered from the region or when allocator is registered which means that allocators can be dynamic modules that are loaded after the kernel booted (of course, it won't be possible to allocate a chunk of memory from a region if allocator is not loaded). 5. Index of new functions: +static inline dma_addr_t __must_check +cma_alloc_from(const char *regions, size_t size, + dma_addr_t alignment) +static inline int +cma_info_about(struct cma_info *info, const const char *regions) +int __must_check cma_region_register(struct cma_region *reg); +dma_addr_t __must_check +cma_alloc_from_region(struct cma_region *reg, + size_t size, dma_addr_t alignment); +static inline dma_addr_t __must_check +cma_alloc_from(const char *regions, + size_t size, dma_addr_t alignment); +int cma_allocator_register(struct cma_allocator *alloc); Patches in this patchset: lib: bitmap: Added alignment offset for bitmap_find_next_zero_area() lib: genalloc: Generic allocator improvements Some improvements to genalloc API (most importantly possibility to allocate memory with alignment requirement). mm: move some functions from memory_hotplug.c to page_isolation.c mm: alloc_contig_freed_pages() added Code "stolen" from Kamezawa. The first patch just moves code around and the second provide function for "allocates" already freed memory. mm: alloc_contig_range() added This is what Kamezawa asked: a function that tries to migrate all pages from given range and then use alloc_contig_freed_pages() (defined by the previous commit) to allocate those pages. mm: MIGRATE_CMA migration type added mm: MIGRATE_CMA isolation functions added Introduction of the new migratetype and support for it in CMA. MIGRATE_CMA works similar to ZONE_MOVABLE expect almost any memory range can be marked as one. mm: cma: Contiguous Memory Allocator added The code CMA code. Manages CMA contexts and performs memory allocations. ARM: integrate CMA with dma-mapping subsystem Main client of CMA frame work. CMA serves as a alloc_pages() replacement if device has the cma context assigned. ARM: S5PV210: add CMA support for FIMC devices on Aquila board Example of platform/board specific code that creates cma context and assigns it to particular devices. Patch summary: KAMEZAWA Hiroyuki (2): mm: move some functions from memory_hotplug.c to page_isolation.c mm: alloc_contig_freed_pages() added Marek Szyprowski (3): mm: cma: Contiguous Memory Allocator added ARM: integrate CMA with dma-mapping subsystem ARM: S5PV210: add CMA support for FIMC devices on Aquila board Michal Nazarewicz (5): lib: bitmap: Added alignment offset for bitmap_find_next_zero_area() lib: genalloc: Generic allocator improvements mm: alloc_contig_range() added mm: MIGRATE_CMA migration type added mm: MIGRATE_CMA isolation functions added arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-mapping.h | 19 ++ arch/arm/mach-s5pv210/Kconfig | 1 + arch/arm/mach-s5pv210/mach-aquila.c | 26 +++ arch/arm/mm/dma-mapping.c | 60 +++++-- include/linux/bitmap.h | 24 ++- include/linux/cma.h | 189 ++++++++++++++++++ include/linux/genalloc.h | 50 +++--- include/linux/mmzone.h | 43 ++++- include/linux/page-isolation.h | 50 ++++-- lib/bitmap.c | 22 ++- lib/genalloc.c | 190 +++++++++++-------- mm/Kconfig | 29 +++- mm/Makefile | 1 + mm/cma.c | 358 +++++++++++++++++++++++++++++++++++ mm/compaction.c | 10 + mm/internal.h | 3 + mm/memory_hotplug.c | 111 ----------- mm/page_alloc.c | 292 ++++++++++++++++++++++++++--- mm/page_isolation.c | 130 ++++++++++++- 20 files changed, 1319 insertions(+), 292 deletions(-) create mode 100644 include/linux/cma.h create mode 100644 mm/cma.c -- 1.7.1.569.g6f426

14 years, 5 months

14
57
0 0

Reminder: Next face to face

by Jesse Barker

Hi all, Just a reminder (as previously discussed in Budapest and in the recent IRC meeting), the next face to face mini-summit will be at the Linaro Connect event from August 1-5 in Cambourne, UK (just outside Cambridge). The actual mini-summit will be, as with Budapest, afternoons on Monday through Wednesday. We will also be co-locating the next V4L2 brainstorming meeting at the Connect as well (right, Laurent?). Details for registering, booking the conference hotel, etc. can be found here: https://wiki.linaro.org/Events/LinaroConnectQ3.11 Please let me know if you have questions, concerns, etc. cheers, Jesse

14 years, 5 months

2
1
0 0

[RFC 0/2] ARM: DMA-mapping & IOMMU integration

by Marek Szyprowski

Hello, Folloing the discussion about the driver for IOMMU controller for Samsung Exynos4 platform and Arnd's suggestions I've decided to start working on redesign of dma-mapping implementation for ARM architecture. The goal is to add support for IOMMU in the way preffered by the community :) Some of the ideas about merging dma-mapping api and iommu api comes from the following threads: http://www.spinics.net/lists/linux-media/msg31453.html http://www.spinics.net/lists/arm-kernel/msg122552.html http://www.spinics.net/lists/arm-kernel/msg124416.html They were also discussed on Linaro memory management meeting at UDS (Budapest 9-12 May). I've finaly managed to clean up a bit my works and present the initial, very proof-of-concept version of patches that were ready just before Linaro meeting. What have been implemented: 1. Introduced arm_dma_ops dma_map_ops from include/linux/dma-mapping.h suffers from the following limitations: - lack of start address for sync operations - lack of write-combine methods - lack of mmap to user-space methods - lack of map_single method For the initial version I've decided to use custom arm_dma_ops. Extending common interface will take time, until that I wanted to have something already working. dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro meeting. New attribute for WRITE_COMBINE memory have been introduced. 2. moved all inline ARM dma-mapping related operations to arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops structure. The dma-mapping.c code deinitely needs cleanup, but this is just a first step. 3. Added very initial IOMMU support. Right now it is limited only to dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been tested with s5p-fimc driver on Samsung Exynos4 platform. 4. Adapted Samsung Exynos4 IOMUU driver to make use of the introduced iommu_dma proposal. This patch series contains only patches for common dma-mapping part. There is also a patch that adds driver for Samsung IOMMU controller on Exynos4 platform. All required patches are available on: git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch Git web interface: http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads… Future: 1. Add all missing operations for IOMMU mappings (map_single/page/sg, sync_*) 2. Move sync_* operations into separate function for better code sharing between iommu and non-iommu dma-mapping code 3. Splitting out dma bounce code from non-bounce into separate set of dma methods. Right now dma-bounce code is compiled conditionally and spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c. 4. Merging dma_map_single with dma_map_page. I haven't investigated deeply why they have separate implementation on ARM. If this is a requirement then dma_map_ops need to be extended with another method. 5. Fix dma_alloc to unmap from linear mapping. 6. Convert IO address space management code from gen-alloc to some simpler bitmap based solution. 7. resolve issues that might araise during discussion & comments Please note that this is very early version of patches, definitely NOT intended for merging. I just wanted to make sure that the direction is right and share the code with others that might want to cooperate on dma-mapping improvements. Best regards -- Marek Szyprowski Samsung Poland R&D Center Patch summary: Marek Szyprowski (2): ARM: Move dma related inlines into arm_dma_ops methods ARM: initial proof-of-concept IOMMU mapper for DMA-mapping arch/arm/Kconfig | 1 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 30 ++ arch/arm/include/asm/dma-mapping.h | 653 +++++++++++------------------ arch/arm/mm/dma-mapping.c | 817 +++++++++++++++++++++++++++++++++--- arch/arm/mm/vmregion.h | 2 +- include/linux/dma-attrs.h | 1 + 7 files changed, 1033 insertions(+), 474 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h -- 1.7.1.569.g6f426

14 years, 5 months

8
19
0 0

Linaro 11.06 sync-up meeting on IRC

by Jesse Barker

Hi all, As Linaro refines its release cycles and related processes, we would very much like to ensure that work stays on track and that we're able to make progress. To that end, we'd like to have a sync-up meeting on IRC (#linaro-mm-sig on irc.linaro.org or irc.freenode.net) to cover status and next steps on the topics we've discussed in the summit, on this list, and others. To address the timezone issue (there are a lot of them between us), I'd like to offer to usurp the normal meeting slot for the graphics working group, which, if nothing else, should ensure that at least the working group members that are assigned to memory management topics will be there; specifically, 1200UTC on Wednesday, June 22. The meeting details are here: https://wiki.linaro.org/OfficeofCTO/MemoryManagement/Notes/2011-06-22 I've put in a preliminary agenda, and the minutes and actions will be culled from the channel log after the meeting. Please let me know if you can't make it or if you want to see other items on the agenda but can't edit the wiki for some reason (I'm not clear on the write access to that page, but I'm happy to make proxy edits). cheers, Jesse

14 years, 6 months

1
0
0 0

UMM support for user-allocated buffers

by Subash Patel

Hi, I have a below use case for the UMM. Samsung EXYNOS4 SoC has hardware IP for JPEG encoding/decoding. The Android gallery application uses the JPEG decoder to draw the images/thumbnail. As of now, skia library which handles this, uses a software JPEG decoder for the same. Buffer will be allocated by the Skia library for the JPEG image file to be decoded. Now if we want to use hardware IP for decoding, we need to a) Change the buffer allocation mechanism in Skia to get the buffers from JPEG driver (mmapped) b) Pass the user allocated buffer into the JPEG driver, and then to the IP through the proposed DMA-IOMMU framework. We feel (b) is nicer way to handle, with minimal changes into the Android framework. But the issue lies if the UMM is going to address this scenario. Please mail me back if you need any clarifications on the requirement. Regards, Subash

14 years, 6 months

1
0
0 0

Re: [Linaro-mm-sig] Memory Management Mini-Summit Report

by Subash Patel

Hello Ilias, I would prefer to have a fortnightly meeting at an preferred time of 14:00 UTC (to suit IN and further east TZ). Also, conference calls are more preferred. Regards, Subash Samsung India - Linaro, Bangalore - India. Launchpad: https://launchpad.net/~subashp/ On 02/06/11 01:58, Jesse Barker wrote: > > * Communication and Meetings > > - New IRC channel #linaro-mm-sig for meetings and general > > communication between those working on and interested in these topics > > (already created). > > - IRC meetings will be weekly with an option for the consituency to > > decide on ultimate frequency (logs to be emailed to linaro-mm-sig > > list). > > - Linaro can provide wiki services and any dial-in needed. > > - Next face-to-face meetings: > > . Linaro mid-cycle summit (August 1-5, see > > https://wiki.linaro.org/Events/2011-08-LDS) > > . Linux Plumbers Conference (September 7-9, see > > http://www.linuxplumbersconf.org/2011/ocw/proposals/567) > > . V4L2 brainstorm meeting (Hans Verkuil to update with details) > > Since this is an area of key interest to many parties, a periodic meeting could provide a channel to all who are interested to participate and discuss. I can set it up and send Google calendar invitations. One obvious issue with this idea would be: With participants in this list from 5-6 timezones having 1 meeting time would be challenging, but perhaps a time slot around UTC16:00 would suit most? Means: IRC as Jesse mentioned above, we can also setup a call, via Canonical's conferencing system. Frequency: Is there a specific need for discussing weekly? Assuming once-every-fortnight frequency, there could be around 2-3 meetings before the Linaro sprint in August 1-5. Of course if there is participation on the IRC channel, then communication can happen more often... There are a couple more items I wanted to ask about: 1. I think we need a single wiki page with all the relevant pointers and consolidated info, at wiki.linaro.org. I can collect the information pointers available and setup the wiki page. 2. Tracking work progress: certainly this work has been planned via Launchpad blueprints added by Jesse. I do not know if everyone is on Launchpad - I'd like to ask for suggestions on how to track work progress especially from those who are not using Launchpad. Would progress updates via the wiki suffice? If you have other suggestions please let me know. BR, -- Ilias Biris, Aallonkohina 2D 19, 02320 Espoo, Finland Tel: +358 50 4839608 (mobile) Email: ilias dot biris at linaro dot org Skype: ilias_biris

14 years, 6 months

1
0
0 0

Memory Management Mini-Summit Report

by Jesse Barker

Memory Management Mini-Summit Linaro Developer Summit, Budapest, May 9-11, 2011 ================================================= Hi all. Apologies for this report being so long in coming. I know others have thrown in their perceptions and opinions on how the mini-summit went, so I suppose it's my turn. Outcomes: --------- * Approach (full proposal under draft, to be sent to the lists below) - Modified CMA for additional physically contiguous buffer support. - dma-mapping API changes, enhancements and ARM architecture support. - "struct dma_buf" based buffer sharing infrastructure with support from device drivers. - Pick any "low-hanging fruit" with respect to consolidation (supporting the ARM arch/sub-arch goals). * Proposal for work around allocation, mapping and buffer sharing to be announced on: - dri-devel - linux-arm-kernel - linux-kernel - linux-media - linux-mm - linux-mm-sig * Communication and Meetings - New IRC channel #linaro-mm-sig for meetings and general communication between those working on and interested in these topics (already created). - IRC meetings will be weekly with an option for the consituency to decide on ultimate frequency (logs to be emailed to linaro-mm-sig list). - Linaro can provide wiki services and any dial-in needed. - Next face-to-face meetings: . Linaro mid-cycle summit (August 1-5, see https://wiki.linaro.org/Events/2011-08-LDS) . Linux Plumbers Conference (September 7-9, see http://www.linuxplumbersconf.org/2011/ocw/proposals/567) . V4L2 brainstorm meeting (Hans Verkuil to update with details) Overview and Goals for the 3 days: ---------------------------------- * Day 1 - Component overviews, expected to spill over into day 2 * Day 2 - Concrete use case that outlines a definition of the problem that we are trying to solve, and shows that we have solved it. * Day 3 - Dig into the lower level details of the current implementations. What do we have, what's missing, what's not implemented for ARM. This is about memory management, zero-copy pipelines, kernel/userspace interfaces, memory management, memory reservations and much more :-) In particular, what we would like to end up with is: * Understand who is working on what; avoid work duplication. * Focus on a specific problem we want to solve and discuss possible solutions. * Come up with a plan to fix this specific problem. * Start enumerating work items that the Linaro Graphics WG can work on in this cycle. Day 1: ------ The first day got off to a little bit of a stutter start as the summit scheduler would not let us indicate that our desired starting time was immediately after lunch, during the plenaries. However, that didn't stop people from flocking to the session in droves. By the time I made the kickoff comments on why we were there, and what we were there to accomplish (see "Overview and Goals for the 3 days" above), we had brought in an extra 10 chairs and there were people on the floor and spilling out into the hallway. Based upon our experiences from the birds-of-a-feather at the Embedded Linux Conference, 2 things dominated day 1. First things first, I assigned someone to take notes ;-). Etherpad made it really easy for people to take notes collectively, including those participating remotely, and for everyone to see who was writing what, but we definitely needed someone whose focus would be capturing the proceedings, so thanks to Dave Rusling for shouldering that burden. The second thing was that we desperately needed an education in each others components and subsystems. Without this, we would risk missing significant areas of discussion, or possibly even be violently agreeing on something without realizing it. So, we started with a series of component overviews. These were presentations on the order of 20 minutes with some room for Q&A. On day 1, we had: * V4L2 - Hans Verkuil * DRM/GEM/KMS - Daniel Vetter * TTM - Thomas Hellstrom * CMA - Marek Szyprowski * VCMM - Zach Pfeffer All of these (as well as the ones from day 2) are available through links on the mini-summit wiki (https://wiki.linaro.org/Events/2011-05-MM). Day 2: ------ The second day got off to a bit better a start than did day 1 as we more clearly communicated the start time to everyone involved, and forgot about the summit scheduler. We (conceptually) picked up where day 1 left off with one more component overview: * UMP - Ketil Johnson and, covered the MediaController API for good measure. From there, we spent a fair amount of time discussing use cases to illustrate our problem space. We started (via pre-summit submissions) with a couple of variations on what amounted to basically the same thing. I think the actual case is probably best illustrated by the pdf slides from Sakari Ailus (see the link on the mini-summit wiki). Basically, we want to take a video input, either from a camera or from a file, decode it, process it, render to it and/or with it and display it. These pipeline stages may be handled by hardware, by software on the CPU or some combination of the two; each stage should be handled by accepting a buffer from the last stage and operating on it in some fashion (no copies wherever possible). It turned out that still image capture can actually be a more complicated version of this use case, but even something as simple as taking input through the camera and displaying it (image preview) can involve much of the underpinnings required to support the more complicated cases. We may indeed start with this simple case as a proof-of-concept. Once we had the use case nailed down, we moved onto the actual components/subsystems that would need to share buffers in order for the use case to work properly with the zero-copy (or at least minimal-copy) requirement. We had: * DRM * V4L2 * fbdev * ALSA * DSP * User-space (kind of all encompassing and could include things like OpenCL, which also makes an interesting use case). * DVB * Out-of-tree GPU drivers We wound out the day by discussing exactly what metadata we would want to track in order to enable the desired levels of sharing with simultaneous device mappings, cache management and other considerations (e.g., device peculiarities). What we came up with is a struct (we called it "dma_buf") that has the following info: * Size * Creator/Allocator * Attributes: - sharable? - contiguous? - device-local? * Reference count * Pinning reference count * CPU cache management data * Device private data (e.g., quirky tiling modes) * Scatter list * Synchronization data (for managing in-flight device transactions) * Mapping data These last few (device privates through mapping data) are lists of data, one for each device that has a mapping of the buffer. The mapping data is nominally an address and per-device cache management data. We actually got through the this part fairly quickly. The biggest part of the discussion was what to use for handles/identifiers in the buffer sharing scheme. The discussion was between global identifiers like GEM uses, or file descriptors as favored by Android. Initially, there was an informal consensus around unique IDs, though it was not a definitive decision (yet). The atomicity of passing file descriptors between processes makes them quite attractive for the task. Day 3: ------ By the third day, there was a sense of running out of time and really needing to ensure that we left with a reasonable set of outcomes (see the overview and goals section above). In short, we wanted to make sure that we had a plan/roadmap, a reasonably actionable set of tasks that could be picked up by Linaro engineers and community members alike, and that we would not only avoid duplicating new work, but also reduce some of the existing code duplication that got us to this point in the first place. But, we weren't done. We still had to cover the requirements around allocation and explore the dma-mapping and IOMMU APIs. This took most of the day, but was a quite fruitful set of discussions. As with the rest of the discussions, we focused on leveraging existing technologies as much as possible. With allocations, however, this wasn't entirely possible as we have devices on ARM SoCs that do not have an IOMMU and require physically contiguous buffers in order to operate. After a fair amount of discussion, it was decided that a modified version of the current CMA (see Marek's slides linked from the wiki). It assumes the pages are movable and manages them and not the mappings. There was concern that the API didn't quite fit with other related API, so the changes from the current state will be around those details. On the mapping side, we focused on the dma-mapping API with appropriate layering on the IOMMU API where appropriate. Without going into crazy detail, we are looking at something like 4 implementation s of the dma_map_ops functions for ARM: with and without IOMMU, with and without bounce buffer (these last two exist, but not using the dma_map_ops API). Marek has put out patches for comment on the IOMMU based implementation of this based upon work he had in progress. Also in the area of dma_map_ops, the sync related API need a start address and offset, and the alloc and free need attribute parameters like map and unmap already have (to support cacheable/coherent/write-combined). In the "not involving dma_map_ops" category, we have a couple of changes that are likely to be non-trivial (not that any of the other proposed work is). It was proposed to modify (actually, the word thrown about in the discussions was "fix") dma_alloc_coherent for ARM to support unmapping from the kernel linear mapping and the use of HIGHMEM; two separate implementations, configured at build-time. And, last but not least, there was a fair amount of concern over the cache management API and its ability to live cleanly with the IOMMU code and to resist breakage from other architecture implementations. At this point, we reviewed what we had done and finalized the outcomes (see the outcomes section at the top). And, with a half an hour to spare, I re-instigated the file descriptors versus unique identifiers discussion from day 2. I think file descriptors were winning by the end (especially after people started posting pointers to code samples of how to actually pass them between processes).... Attendees: ---------- I will likely miss people here trying to list out everyone, especially given that some of the sessions were quite literally overflowing the room we were in. For as accurate an account of attendance as I can muster, check out the list of attendees on the mini-summit wiki page or the discussion blueprints we used for scheduling: https://wiki.linaro.org/Events/2011-05-MM#Attendees https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-m… https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-m… https://blueprints.launchpad.net/linaro-graphics-wg/+spec/linaro-graphics-m… The occupants of the fishbowl (the front/center of the room in closest proximity to the microphones) were primarily: Arnd Bergmann Laurent Pinchart Hans Verkuil Mauro Chehab Daniel Vetter Sakari Ailus Thomas Hellstrom Marek Szyprowski Jesse Barker The IRC fishbowl seemed to consist of: Rob Morell Jordan Crouse David Brown There were certainly others both local and remote participating to varying degrees that I do not intend to omit, and a special thanks goes out to Joey Stanford for arranging a larger room for us on days 2 and 3 when we had people sitting on the floor and spilling into the hallway during day 1.

14 years, 6 months

3
2
0 0

Re: [Linaro-mm-sig] 3D support for Displaylink devices

by Rob Clark

On Mon, May 30, 2011 at 12:30 PM, PRASANNA KUMAR <prasanna_tsm_kumar(a)yahoo.co.in> wrote: > USB graphics devices from displaylink does not have 3D hardware. To get 3D > effects (compiz, GNOME 3, KWin, OpenGL apps etc) with these device in Linux > the native (primary) GPU can be used to provide hardware acceleration. All > the graphics operation is done using the native (primary) GPU and the end > result is taken and send to the displaylink device. Can this be achieved? If > so is it possible to implement a generic framework so that any device (USB, > thunderbolt or any new technology) can use this just by implementing device > specific (compression and) data transport? I am not sure this is the correct > mailing list. fwiw, this situation is not too far different from the SoC world. For example, there are multiple ARM SoC's that share the same IMG/PowerVR core or ARM/mali 3d core, but each have their own unique display controller.. I don't know quite the best way to deal with this (either at the DRM/kernel layer or xorg driver layer), but there would certainly be some benefit to be able to make DRM driver a bit more modular to combine a SoC specific display driver (mostly the KMS part) with a different 2d and/or 3d accelerator IP. Of course the (or some of the) challenge here is that different display controllers might have different memory mgmt requirements (for ex, depending on whether the display controller has an IOMMU or not) and formats, and that the flip command should somehow come via the 2d/3d command stream. I have an (experimental) DRM/KMS driver for OMAP which tries to solve the issue by way of a simple plugin API, ie the idea being to separate the PVR part from the OMAP display controller part more cleanly. I don't think it is perfect, but it is an attempt. (I'll send patches as an RFC, but wanted to do some cleanup first.. just haven't had time yet.) But I'm definitely open to suggestions here. BR, -R > Thanks, > Prasanna Kumar > _______________________________________________ > dri-devel mailing list > dri-devel(a)lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > >

14 years, 6 months

1
0
0 0

Video4Linux API for shared buffers

by Mauro Carvalho Chehab

After the mm panels, I had a few discussions with Hans, Rob and Daniel, among others, during the V4L and KMS discussions and after that. Based on those discussions, I'm pretty much convinced that the normal MMAP way of streaming (VIDIOC_[REQBUF|STREAMON|STREAMOFF|QBUF|DQBUF ioctl's) are not the best way to share data with framebuffers. We probably need something that it is close to VIDIOC_FBUF/VIDIOC_OVERLAY, but it is still not the same thing. I suspect that working on such API is somewhat orthogonal to the decision of using a file pointer based or a bufer ID based based kABI for passing the buffer parameters to the newly V4L calls, but we cannot decide about the type of buffer ID that we'll use if we not finish working at an initial RFC for the V4L API, as the way the buffers will be passed into it will depend on how we design such API. It should be also noticed that, while in the shared buffers some definitions can be postponed to happen later (as it is basically a Kernelspace-only ABI - at least initially), the V4L API should be designed to consider all possible scenarios, as "diamonds and userspace API's are forever"(tm). It seems to me that the proper way to develop such API is starting working with Xorg V4L driver, changing it to work with KMS and with the new API (probably porting some parts of it to kernelspace). One of the problems with a shared framebuffer is that an overlayed V4L stream may, at the worse case, be sent to up to 4 different GPU's and/or displays, like: ===================+=================== | | | | D1 +----|---+ D2 | | | V4L| | | +-------------|----+---|--------------| | | | | | | D3 +----+---+ D4 | | | | ======================================= Where D1, D2, D3 and D4 are 4 different displays, and the same V4L framebuffer is partially shared between them (the above is an example of a V4L input, although the reverse scenario of having one frame buffer divided into 4 V4L outputs also seems to be possible). As the same image may be divided into 4 monitors, the buffer filling should be synced with all of them, in order to avoid flipping effects. Also, the buffer can't be re-used until all displays finish reading. Display API's currently has similar issues. From what I understood from Rob and Daniel, this is solved there by dynamically allocating buffers. So, we may need to do something similar to that also at V4L (in a matter of fact, there's currently a proposal to hack REQBUF's, in order to extend V4L API to allow dynamically creating more buffers than used by a stream). It makes sense to me to discuss such proposal together with the above discussions, in order to keep the API consistent. >From my side, I'm expecting that the responsible(s) for the API proposals to also provide with open source drivers and userspace application(s), that allows to test and validate such API RFC. Thanks, Mauro

14 years, 6 months

6
6
0 0

Notes on the Linaro memory management mini-summit, V4L2

by Sakari Ailus

Hi, Here are my own notes from the Linaro memory management mini-summit in Budapest. I've written them from my own point of view, which is mostly V4L2 in embedded devices and camera related use cases. I attempted to summarise the discussion mostly concentrating into parts which I've considered important and ignored the rest. So please do not consider this as the generic notes of the mini-summit. :-) I still felt like sharing this since it might be found useful by those who are working with similar systems with similar problems. Memory buffer management --- the future ======================================= The memory buffer management can be split to following sub-problems which may have dependencies, both in implementation and possibly in the APIs as well: - Fulfilling buffer allocation requirements - API to allocate buffers - Sharing buffers among kernel subsystems (e.g. V4L2, DRM, FB) - Sharing buffers between processes - Cache coherency What has been agreed that we need kernel to recognise a DMA buffer which may be passed between user processes and different kernel subsystems. Fulfilling buffer allocation requirements ----------------------------------------- APIs, as well as devices, have different requirements on the buffers. It is difficult to come up with generic requirements for buffer allocation, and to keep the solution future-proof is challenging as well. In principle the user is interested in being able to share buffers between subsystems without knowing the exact requirements of the devices, which makes it possible to keep the requirement handling internal to the kernel. Whether this is the way to go or not, will be seen in the future. The buffer allocation remains a problem to be resolved in the future. Majority of the devices' requirements could be filled using a few allocators; one for physically continugous memory and the other for physically non-contiguous memory of single page allocations. Being able to allocate large pages would also be beneficial in many cases. API to allocate buffers ----------------------- It was agreed there was a need to have a generic interface for buffer object creation. This could be either a new system call which would be supported by all devices supporting such buffers in subsystem APIs (such as V4L2), or a new dedicated character device. Different subsystems have different ways of describing the properties of the buffers, such as how the data in the buffer should be interpreted. The V4L2 has width, height, bytesperline and pixel format, for example. The generic buffers should not recognise such properties since this is very subsystem specific information. Instead, the user which is aware of the different subsystems must come with matching set of buffer properties using the subsystem specific interfaces. Sharing buffers among kernel subsystems --------------------------------------- There was discussion on how to refer to generic DMA buffers, and the audience was first mostly split between using buffer IDs to refer to the buffers and using file handles for the purpose. Using file handles have pros and cons compared to the numeric IDs: + Easy life cycle management. Deallocation of buffers no longer in use is trivial. + Access control for files exists already. Passing file descriptors between processes is possible throught Unix sockets. - Allocating extremely large number of buffers would require as many file descriptors. This is not likely to be an important issue. Before the day ended, it was felt that the file handles are the right way to go. The generic DMA buffers further need to be associated to the subsystem buffers. This is up to the subsystem APIs. In V4L2, this would most likely mean that there will be a new buffer type for the generic DMA buffers. Sharing buffers between processes --------------------------------- Numeric IDs can be easily shared between processes while sharing file handles is more difficult. However, it can be done using the Unix sockets between any two processes. This also gives automatically the same access control mechanism as every other file. Access control mechanisms are mandatory when making the buffer shareable between processes. Cache coherency --------------- Cache coherency is seen largely orthogonal to any other sub-problems in memory buffer management. In few cases this might have something in common with buffer allocation. Some architectures, ARM in particular, do not have coherent caches, meaning that the operating system must know when to invalidate or clean various parts of the cache. There are two ways to approach the issue, independently of the cache implementation: 1. Allocate non-cacheable memory, or 2. invalidate or clean (or flush) the cache when necessary. Allocating non-cacheable memory is a valid solution to cache coherency handling in some situations, but mostly only when the buffer is only partially accessed by the CPU or at least not multiple times. In other cases, invalidating or cleaning the cache is the way to go. The exact circumstances in which using non-cacheable memory gives a performance benefit over invalidating or cleaning the cache when necessary are very system and use case dependent. This should be selectable from the user space. The cache invalidation or cleaning can be either on the whole (data) cache or a particular memory area. Performing the operation on a particular memory area may be difficult since it should be done to all mappings of the memory in the system. Also, there is a limit beyond which performing invalidation or clean for an area is always more expensive than a full cache flush: on many machines the cache line size is 64 bytes, and the invalidate/clean must be performed for the whole buffer, which in cameras could be tens of megabytes in size, per every cache line. Mapping buffers to application memory is not always necessary --- the buffers may only be used by the devices, in which case a scatterlist of the pages in the buffer is necessary to map the buffer to the IOMMU. More (impartial :-)) information can be found here: <URL:http://summit.ubuntu.com/uds-o/meeting/linaro-graphics-memory-managemen…> <URL:http://summit.ubuntu.com/uds-o/meeting/linaro-graphics-memory-managemen…> <URL:http://summit.ubuntu.com/uds-o/meeting/linaro-graphics-memory-managemen…> Regards, -- Sakari Ailus sakari.ailus(a)maxwell.research.nokia.com

14 years, 7 months

1
0
0 0

How to pass a fd between processes?

by Hans Verkuil

Hi all, During the Budapest meetings it was mentioned that you can pass a fd between processes. How does that work? Does someone have a code example or a link to code that does that? Just to satisfy my curiosity. Regards, Hans

14 years, 7 months

4
4
0 0

Memory Management Discussion

by Sree Kumar

Thanks Jesse for initiating the mailing list. We need to address the requirements of Graphics and Multimedia Accelerators (IPs). What we really need is a permanent solution (at upstream) which accommodates the following requirements and conforms to Graphics and Multimedia use cases. 1.Mechanism to map/unmap the memory. Some of the IPs’ have the ability to address virtual memory and some can address only physically contiguous address space. We need to address both these cases. 2.Mechanism to allocate and release memory. 3.Method to share the memory (ZERO copy is a MUST for better performance) between different device drivers (example output of camera to multimedia encoder). 4.Method to share the memory with different processes in userspace. The sharing mechanism should include built-in security features. Are there any special requirements from V4L or DRM perspectives? Thanks, Sree

14 years, 7 months

18
48
0 0

Device DMA programming and buffer use cases

by rmorell＠nvidia.com

(Disclaimer: I come from a graphics background, so sorry if I use graphicsy terminology; please let me know if any of this isn't clear. I tried.) There is an wide range of hardware capabilities that require different programming approaches in order to perform optimally. We need to define an interface that is flexible enough to handle each of them, or else it won't be used and we'll be right back where we are today: with vendors rolling their own support for the things they need. I'm going to try to enumerate some of the more unique usage patterns as I see them here. - Many or all engines may sit behind asynchronous command stream interfaces. Programming is done through "batch buffers"; a set of commands operating on a set of in-memory buffers is prepared and then submitted to the kernel to be queued. The kernel will first make sure all of the buffers are resident (which may require paging or mapping into an IOMMU/GART, a.k.a. "pinning"), then queue the batch of commands. The hardware will process the commands at its earliest convenience, and then interrupt the CPU to notify it that it's done with the buffers (i.e. it can now be "unpinned"). Those familiar with graphics may recognize this programming model as a classic GPU command stream. But it doesn't need to be used exclusively with GPUs; any number of devices may have such an on-demand paging mechanism. - In contrast, some engines may also stream to or from memory continuously (e.g., video capture or scanout); such buffers need to be pinned for an extended period of time, not tied to the command streams described above. - There can be multiple different command streams working at the same time on the same buffers. (There may be hardware synchronization primitives between the multiple command streams so the CPU doesn't have to babysit too much, for both performance and power reasons.) - In some systems, IOMMU/GART may be much smaller than physical memory; older GPUs and SoCs have this. To support these, we need to be able to map and unmap pages into the IOMMU on demand in our host command stream flow. This model also requires patching up pending batch buffers before queueing them to the hardware, to update them to point to the newly-mapped location in the IOMMU. - In other systems, IOMMU/GART may be much larger than physical memory; more modern GPUs and SoCs have this. With these, we can reserve virtual (IOMMU) address space for each buffer up front. To userspace, the buffers always appear "mapped". This is similar in concept to how the CPU virtual space in userspace sticks around even when the underlying memory is paged out to disk. In this case, pinning is performed at the same time as the small-IOMMU case above, but in the normal/fast case, the pages are never paged out of the IOMMU, and the pin step just increments a refcount to prevent the pages from being evicted. It is desirable to keep the same IOMMU address for: a) implementing features such as http://www.opengl.org/registry/specs/NV/shader_buffer_load.txt (OpenGL client applications and shaders manipulate GPU vaddr pointers directly; a GPU virtual address is assumed to be valid forever). b) performance: scanning through the command buffers to patch up pointers can be very expensive. One other important note: buffer format properties may be necessary to set up mappings (both CPU and iommu mappings). For example, both types of mappings may need to know tiling properties of the buffer. This may be a property of the mapping itself (consider it baked into the page table entries), not necessarily something a different driver or userspace can program later independently. Some of the discussion I heard this morning tended towards being overly simplistic and didn't seem to cover each of these cases well. Hopefully this will help get everyone on the same page. Thanks, Robert

14 years, 7 months

3
2
0 0

Media controller + DSS presentation link

by Sumit Semwal

http://elinux.org/images/8/83/Elc2011_semwal.pdf

14 years, 7 months

1
0
0 0

Short GEM Introduction

by Daniel Vetter

Hi all, A bit later than what I've hoped for, but here we go [Jesse and Dave, please correct/clarify/extend where you see fit]: The core idea of GEM is to identify graphic buffer objects with 32bit ids. The reason being "X runs out of open fds" (KDE easily reaches a few thousand). The core design principle behind GEM is that the kernel is in full control of the allocation of these buffer objects and is free to move the around in any way it sees fit. This is to make concurrent rendering by multiple processes possible while userspace can still assume that it is in sole possession of the gpu - GEM means "graphics execution manager". Below some more details on what GEM is and does, what it does _not_ do and how it relates to other graphic subsystems. GEM does ... ------------ - lifecycle management. Userspace references are associated with the drm fd and get reaped on close (in case userspace forgets about them). - per-device global names to exchange buffers between processes (eg dri2). These names are again 32bit ids. These global ids do not count as userspace references and don't prevent a buffer from being reaped. - it implements very few generic ioctls: * flink for creating a global name for a buffer object * open for getting a per-fd handle to a buffer object with a global name * close for dropping a per-fd handle. - a little bit of kernel-internal helpers to facilitate mmap (by blending multiple buffer objects into the single drm device address space) and a few other things. That's it, i.e. GEM is very much meant to be as simple as possible. Driver-specific GEM ioctls -------------------------- The generic GEM stuff is obviously not very useful. So drivers implement quite a bit driver-specific ioctls, like: - buffer creation. In recent kernels there is some support to create dumb scanout objects for KMS. But they're only really useful for boot-splashs and unaccelerated dumb KMS drivers. Creating buffers usable for rendering is only possible with driver specific ioctls. - command submission. An important part is mapping abstract buffer ids to actual gpu address (and rewriting batchbuffers with these). In the future, with support for virtual gpu address spaces this might change. - tiling management. The kernel needs to know this to correctly tile/detile buffers when moving them around (e.g. evicting from vram). - command completion signalling and gpu/cpu synchronization. There are currently two approaches for implementing a GEM driver: - roll-your-own, used by drm/i915 (and sometimes getting flaked for NIH). - ttm-base: radeon & nouveau. GEM does not ... ---------------- This still leaves out a few things that I've seen mentioned as ideas/requirements here and elsewhere: - cross-device buffer sharing and namespaces (see below) and - buffer format handling and mediation between different users (except tiling as mentioned above). The reason here is that gpus are a mess and one of the worst parts is format handling. Better keep that out of the kernel ... KMS (kernel mode setting) ------------------------- KMS is essentially just a port of the xrandr api to the kernel as an ioctl interface: - crtcs feed (possible multiple) outputs and get their data from a framebuffer object. A major part of KMS is also the support for vsynced-pageflipping of framebuffers. - Internally there's some support infrastructure to simplify drivers (all the drm_*_helper.c code). - framebuffers are created from a opaque driver-specific 32bit id and a format description. For GEM drivers these ids name GEM objects, but that need not be: The recently merged qemu kms driver does not implement gem and has one unique buffer object with id 0. - as mentioned above there newly is a generic ioctl to create an object suitable as a dumb scanout (plus some support to mmap it). - currently KMS has no generic support for overlays (there are driver-specific ioctls in i915 and vmgfx, though). Jesse Barnes has posted an RFC to remedy this: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg10415.html GEM and PRIME ------------- PRIME is a proof-of-concept implementation from Dave Airlie for sharing GEM objects between drivers/devices: Buffer sharing is done with a list of struct page pointers. While being shared, buffers can't be moved anymore. No further buffer description is passed along in the kernel, format/layout mediation is to be handled in userspace. Blog-post describing the initial design for sharing buffers between an integrated Intel igd and a discrete ATI gpu: http://airlied.livejournal.com/71734.html Other code using the same framework to render on an Intel igd and display the framebuffer on an usb-connected displayport: http://git.kernel.org/?p=linux/kernel/git/airlied/drm-testing.git;a=shortlo… GEM/KMS and fbdev ----------------- There's some minimal support to emulate an fbdev with a gem/kms driver. Resolution can't be changed and it's unaccelerated. There's been some muttering once in a while to better integrate this with either a kms kernel console driver or by routing fbdev resolution changes to kms. But the main use case is to display a kernel oops, which works. For everything else there's X (or an EGL client that understands kms). -Daniel -- Daniel Vetter Mail: daniel(a)ffwll.ch Mobile: +41 (0)79 365 57 48

14 years, 7 months

2
2
0 0

Basic buffer object operations and terminology

by Thomas Hellstrom

Hi! I just want to clarify some buffer object operations and terminology that seems confusing to people and that are used by most modern GPU drivers. I think it's useful to be aware of this, going forward in the memory manager discussions. Terminology: Scanout buffer: Buffer that is used for continous access by a device. Needs to be permanently pinned. Pinned buffer: A pinned buffer may not move and may not change backing pages. Allows it to be mapped to a device. Synchronization object: An object that is either in a signaled or non-signaled state. Signaled means that the device is done with the buffer, and has flushed its caches. A synchronization object has a device-specific part that may, for example, contain flushing state. Basic device use of a buffer: Scanout buffers (and perhaps also capture buffers?) are typically pinned. Other buffers that are temporarily used by a GPU and, for example, a video decoding engine or image processor are typically *not* pinned. The usage pattern for submitting any commands that affect the buffer is as follows: 1) Take a mutex that stops the buffer from being moved. This mutex could be global (stops all buffers from being moved) or per-buffer. 2) Wait on any previous synchronization objects attached to the buffer, if those sync objects would not be implicitly signaled when the device executes its work. This is where it becomes bad to have a global mutex under 1). 3) Validate the buffer. This means setting up any missing (contigous) device mappings or move to VRAM, flush cpu caches if necessary. 4) Patch up the device commands to reflect any movement of the buffer in 3). New offsets, SG-lists etc. 5) Submit the device commands. 6) Create a new synchronization object and attach it to the buffer. 7) Release the mutex taken i 1). The buffer will not be moved until the synchronization object has signaled, and mappings set up under 3) will not be torn down until the memory manager receives a request to free up mapping resources. I'd call this "Generation 2" device buffer management. (Intel (uses busy lists, no sync objects), Radeon, Nouveau, vmwgfx, New VIA) "Generation 1" was using a global memory manager for pinned buffers (SiS, old VIA DRM drivers) Generation 3 would be page based device MMUs with programmable apertures to access VRAM. What we were discussing today is basically creating a unified gen 1 manager, with a new user-space interface. /Thomas

14 years, 7 months

1
0
0 0

Non-PCI DRM implementations

by Jordan Crouse

DRM support for platform devices dropped last year and was drastically improved earlier this year. Qualcomm uses it for a really weak DRM driver that handles memory for X but does GPU and display through a different interface. Feel free to flame me for that.. :). https://www.codeaurora.org/gitweb/quic/la/?p=kernel/msm.git;a=blob;f=driver… And I believe OMAP also has a solution somewhere (sorry, I couldn't find a URL). Jordan

14 years, 7 months

2
1
0 0

CMA, TTM & conflicting mappings

by Thomas Hellstrom

Hi! Just wanted to share some thoughts about CMA, TTM and the problems with conflicting mappings. 1) CMA seems to be a nice way to handle the problem with contigous pages, although it seems Arnd has some concerns. It would be nice to here about those. Some thoughts: a) It seems fairly straightforward to interface CMA with TTM. The benefit would be that CMA could ask TTM at any time (using a shrinker callback) to release its contigous pages, and TTM would do so once the GPU is finished with them (unless of course they are used with a pinned buffer object, like a scanout buffer). CMA would need to be extended with a small API to create / free a contigous range and to actually populate that range with pages. b) DRM, TTM and it seems CMA all need a range allocator. There is a reasonable implementation in drm_mm.c, which since the original implementation has seen a fair bit of improvement. Should we try to move that to linux/lib ? c) Could the CMA technique be used also to keep a pool of pages that are unmapped from the linear kernel map? Essentially a range of HIGHMEM pages? The benefit compared to just having a pool of HIGHMEM pages by itself would be that the graphics subsystem would have priority over normal system use (moving out movable contents), and could use these pages with nonstandard caching attribute maps if needed. If this is considered a good idea, we could perhaps consider placing the default CMA region in HIGHMEM. /Thomas

14 years, 7 months

2
1
0 0

CFP: Use case specification

by Jesse Barker

Hi all, I've updated the mini-summit wiki with a couple more details: https://wiki.linaro.org/Events/2011-05-MM one of which is a sample use case description from Samsung. I would encourage everyone to look at that and see if there are other use cases they think would make more sense, or if there is clarification or amendment of the current proposal. The discussion around this is slated for Tuesday, so we have some time before it comes up in the summit. As we proceed, I'll be moving sections of the agenda over onto the discussion blueprints on launchpad (as that's how Linaro tracks stuff), but everything will also be available on the wiki as well as this list for those that can't or don't want to use launchpad. Also, there are still a few components without representatives in the summit, and while it would be nice to be able to have those in the early parts of the sessions, I would rather flex the agenda than omit them. Even if there is no presentation on a component at the summit, it would still be good to have written overviews of those, so I'll ask again for volunteers. cheers, Jesse

14 years, 7 months

2
2
0 0

Linaro@UDS: MM summit starts at *2pm* today

by Loïc Minier

Hey This is a quick heads up that MM summit starts at *2pm* today (in 15mn) and not at 3pm. The schedule is incorrect because we can't overlap the plenaries, but discussion will start at 2pm. See you there! -- Loïc Minier

14 years, 7 months

1
0
0 0

Reminder: In the TAS room at 1400 CEST (in 15min).

by Jesse Barker

Hi all, I just wanted to remind everyone about the room change and the fact that the scheduler wouldn't let me show the session as starting at 1400, due to the plenaries. See you all soon. cheers, jesse

14 years, 7 months

1
0
0 0

V4L overlay mode and DVB

by Mauro Carvalho Chehab

Hi, >From the today's V4L presentation, there were two missing topics that may be useful to include for our discussions: a) V4L overlay mode; b) dvb. So, I'm bringing those two topics for discussions. If needed, I can do some presentation about them, but it seemed better to start the discussion via ML, in order to know more about the interests over those two subject. a) V4L overlay mode ================ The V4L Overlay mode were used a lot during kernel 2.2 and 2.4 days, were most hardware were not capable enough to do real-time processing of video streams. It is supported by xawtv and a Xorg v4l driver, and uses XV overlay extensions to display video. It is simple to setup and it requires no CPU usage for it, as the video framebuffer is passed directly to the video hardware, that programs DMA to write directly into the fb memory. The main structures used on overlay mode (from kerrnel include/linux/videodev2.h) are: struct v4l2_pix_format { __u32 width; __u32 height; __u32 pixelformat; enum v4l2_field field; __u32 bytesperline; /* for padding, zero if unused */ __u32 sizeimage; enum v4l2_colorspace colorspace; __u32 priv; /* private data, depends on pixelformat */ }; struct v4l2_framebuffer { __u32 capability; __u32 flags; /* FIXME: in theory we should pass something like PCI device + memory * region + offset instead of some physical address */ void *base; struct v4l2_pix_format fmt; }; /* Flags for the 'capability' field. Read only */ #define V4L2_FBUF_CAP_EXTERNOVERLAY 0x0001 #define V4L2_FBUF_CAP_CHROMAKEY 0x0002 #define V4L2_FBUF_CAP_LIST_CLIPPING 0x0004 #define V4L2_FBUF_CAP_BITMAP_CLIPPING 0x0008 #define V4L2_FBUF_CAP_LOCAL_ALPHA 0x0010 #define V4L2_FBUF_CAP_GLOBAL_ALPHA 0x0020 #define V4L2_FBUF_CAP_LOCAL_INV_ALPHA 0x0040 #define V4L2_FBUF_CAP_SRC_CHROMAKEY 0x0080 /* Flags for the 'flags' field. */ #define V4L2_FBUF_FLAG_PRIMARY 0x0001 #define V4L2_FBUF_FLAG_OVERLAY 0x0002 #define V4L2_FBUF_FLAG_CHROMAKEY 0x0004 #define V4L2_FBUF_FLAG_LOCAL_ALPHA 0x0008 #define V4L2_FBUF_FLAG_GLOBAL_ALPHA 0x0010 #define V4L2_FBUF_FLAG_LOCAL_INV_ALPHA 0x0020 #define V4L2_FBUF_FLAG_SRC_CHROMAKEY 0x0040 Using it is as simple as selecting a format that the video display framebuffer supports, and send a couple of ioctls to the video adapter. This is what the Xorg v4l driver (v4l.c) does (simplified, to ease comprehension): struct v4l2_framebuffer yuv_fbuf; int on = 1; if (-1 == ioctl(V4L_FD, VIDIOC_G_FBUF, &yuv_fbuf)) return; /* Sets the Framebuf data: width, heigth, bpp, format, base and display position */ yuv_fbuf.fmt.width = surface->width; yuv_fbuf.fmt.height = surface->height; yuv_fbuf.fmt.bytesperline = surface->pitches[0]; yuv_fbuf.fmt.pixelformat = V4L2_PIX_FMT_YUYV; yuv_fbuf.base = (char *)(memPhysBase + surface->offsets[0]); memset(&yuv_win, 0, sizeof(yuv_win)); yuv_win.w.left = 0; yuv_win.w.top = 0; yuv_win.w.width = surface->width; yuv_win.w.height = surface->height; if (-1 == ioctl(V4L_FD, VIDIOC_S_FBUF, yuv_fbuf)) return; /* Sets mem transfer type to overlay mode */ memset(&fmt, 0, sizeof(fmt)); fmt.type = V4L2_BUF_TYPE_VIDEO_OVERLAY; if (-1 == ioctl(V4L_FD, VIDIOC_S_FMT, &fmt)) return; /* Enables overlay mode. Data are transfered directly from video capture device into display framebuffer */ memcpy(&fmt.fmt.win, &pPPriv->yuv_win, sizeof(pPPriv->yuv_win)); if (-1 == ioctl(V4L_FD, VIDIOC_OVERLAY, &on)) return; The main issue with the overlay mode, as discussed on the first day, is that the framebuffer pointer is a physical address. The original idea, on v4l2, were to use some framebuffer ID. That's said, it wouldn't be hard to add a new flag at v4l2_framebuffer.flags, saying meant to say that it should use a GEM ID. I had some discussions with David Arlie about that when I've submitted the v4l driver fixes due to the removal of the V4L1 old API. I'm planning to submit something like that in the future, when I have some spare time for doing it. Eventually, if Linaro is interested, it could be an interesting project, as it may solve some of the current needs. It is probably simpler to do that than to add another mode to the V4L MMAP stuff. b) DVB === Several new ARM devices are now shipped with Digital TV integrated on that. On my Country, we have several mobile phones, tablets and GPS devices with DTV receptors inside. Modern TV sets and set-top-boxes already use Linux with DVB support inside. GoogleTV will for sure need DTV support, as well as similar products. Even being used everywhere, currently, no big vendor tried to send us patches to improve their DVB support, but I suspect that this should happen soon. This is just an educated guess. It would be nice to have some feedback about that from the vendors. The DVB API is completely different from the V4L one, and there are two different types of DVB devices: - Full-featured DVB devices, with MPEG-TS, audio and video codec inside it; - "simple" devices that just provide a read() interface to get an MPEG-TS stream. As modern ARM SoC devices can have a codec DSP processor, it makes sense for them to use the full-featured API, providing just audio and video via the DVB API (yes, DVB has a different way to control and export audio/video than V4L/alsa). The question here is: is there any demand for it right now? If so, what are the requirements? Are the memory management requirements identical to the current ones? Thanks, Mauro

14 years, 7 months

1
0
0 0

V4L overlay mode and DVB

by Mauro Carvalho Chehab

Hi, >From the today's V4L presentation, there were two missing topics that may be useful to include for our discussions: a) V4L overlay mode; b) dvb. So, I'm bringing those two topics for discussions. If needed, I can do some presentation about them, but it seemed better to start the discussion via ML, in order to know more about the interests over those two subject. a) V4L overlay mode ================ The V4L Overlay mode were used a lot during kernel 2.2 and 2.4 days, were most hardware were not capable enough to do real-time processing of video streams. It is supported by xawtv and a Xorg v4l driver, and uses XV overlay extensions to display video. It is simple to setup and it requires no CPU usage for it, as the video framebuffer is passed directly to the video hardware, that programs DMA to write directly into the fb memory. The main structures used on overlay mode (from kerrnel include/linux/videodev2.h) are: struct v4l2_pix_format { __u32 width; __u32 height; __u32 pixelformat; enum v4l2_field field; __u32 bytesperline; /* for padding, zero if unused */ __u32 sizeimage; enum v4l2_colorspace colorspace; __u32 priv; /* private data, depends on pixelformat */ }; struct v4l2_framebuffer { __u32 capability; __u32 flags; /* FIXME: in theory we should pass something like PCI device + memory * region + offset instead of some physical address */ void *base; struct v4l2_pix_format fmt; }; /* Flags for the 'capability' field. Read only */ #define V4L2_FBUF_CAP_EXTERNOVERLAY 0x0001 #define V4L2_FBUF_CAP_CHROMAKEY 0x0002 #define V4L2_FBUF_CAP_LIST_CLIPPING 0x0004 #define V4L2_FBUF_CAP_BITMAP_CLIPPING 0x0008 #define V4L2_FBUF_CAP_LOCAL_ALPHA 0x0010 #define V4L2_FBUF_CAP_GLOBAL_ALPHA 0x0020 #define V4L2_FBUF_CAP_LOCAL_INV_ALPHA 0x0040 #define V4L2_FBUF_CAP_SRC_CHROMAKEY 0x0080 /* Flags for the 'flags' field. */ #define V4L2_FBUF_FLAG_PRIMARY 0x0001 #define V4L2_FBUF_FLAG_OVERLAY 0x0002 #define V4L2_FBUF_FLAG_CHROMAKEY 0x0004 #define V4L2_FBUF_FLAG_LOCAL_ALPHA 0x0008 #define V4L2_FBUF_FLAG_GLOBAL_ALPHA 0x0010 #define V4L2_FBUF_FLAG_LOCAL_INV_ALPHA 0x0020 #define V4L2_FBUF_FLAG_SRC_CHROMAKEY 0x0040 Using it is as simple as selecting a format that the video display framebuffer supports, and send a couple of ioctls to the video adapter. This is what the Xorg v4l driver (v4l.c) does (simplified, to ease comprehension): struct v4l2_framebuffer yuv_fbuf; int on = 1; if (-1 == ioctl(V4L_FD, VIDIOC_G_FBUF, &yuv_fbuf)) return; /* Sets the Framebuf data: width, heigth, bpp, format, base and display position */ yuv_fbuf.fmt.width = surface->width; yuv_fbuf.fmt.height = surface->height; yuv_fbuf.fmt.bytesperline = surface->pitches[0]; yuv_fbuf.fmt.pixelformat = V4L2_PIX_FMT_YUYV; yuv_fbuf.base = (char *)(memPhysBase + surface->offsets[0]); memset(&yuv_win, 0, sizeof(yuv_win)); yuv_win.w.left = 0; yuv_win.w.top = 0; yuv_win.w.width = surface->width; yuv_win.w.height = surface->height; if (-1 == ioctl(V4L_FD, VIDIOC_S_FBUF, yuv_fbuf)) return; /* Sets mem transfer type to overlay mode */ memset(&fmt, 0, sizeof(fmt)); fmt.type = V4L2_BUF_TYPE_VIDEO_OVERLAY; if (-1 == ioctl(V4L_FD, VIDIOC_S_FMT, &fmt)) return; /* Enables overlay mode. Data are transfered directly from video capture device into display framebuffer */ memcpy(&fmt.fmt.win, &pPPriv->yuv_win, sizeof(pPPriv->yuv_win)); if (-1 == ioctl(V4L_FD, VIDIOC_OVERLAY, &on)) return; The main issue with the overlay mode, as discussed on the first day, is that the framebuffer pointer is a physical address. The original idea, on v4l2, were to use some framebuffer ID. That's said, it wouldn't be hard to add a new flag at v4l2_framebuffer.flags, saying meant to say that it should use a GEM ID. I had some discussions with David Arlie about that when I've submitted the v4l driver fixes due to the removal of the V4L1 old API. I'm planning to submit something like that in the future, when I have some spare time for doing it. Eventually, if Linaro is interested, it could be an interesting project, as it may solve some of the current needs. It is probably simpler to do that than to add another mode to the V4L MMAP stuff. b) DVB === Several new ARM devices are now shipped with Digital TV integrated on that. On my Country, we have several mobile phones, tablets and GPS devices with DTV receptors inside. Modern TV sets and set-top-boxes already use Linux with DVB support inside. GoogleTV will for sure need DTV support, as well as similar products. Even being used everywhere, currently, no big vendor tried to send us patches to improve their DVB support, but I suspect that this should happen soon. This is just an educated guess. It would be nice to have some feedback about that from the vendors. The DVB API is completely different from the V4L one, and there are two different types of DVB devices: - Full-featured DVB devices, with MPEG-TS, audio and video codec inside it; - "simple" devices that just provide a read() interface to get an MPEG-TS stream. As modern ARM SoC devices can have a codec DSP processor, it makes sense for them to use the full-featured API, providing just audio and video via the DVB API (yes, DVB has a different way to control and export audio/video than V4L/alsa). The question here is: is there any demand for it right now? If so, what are the requirements? Are the memory management requirements identical to the current ones? Thanks, Mauro

14 years, 7 months

1
0
0 0

Re: [Linaro-mm-sig] CFP: Use case specification

by Jesse Barker

I've added it to the wiki along with the existing use case. cheers, jesse On Mon, May 9, 2011 at 5:15 PM, Sakari Ailus <sakari.ailus(a)maxwell.research.nokia.com> wrote: > Jesse Barker wrote: >> Hi all, > > Hi Jesse, > >> I've updated the mini-summit wiki with a couple more details: >> >> https://wiki.linaro.org/Events/2011-05-MM >> >> one of which is a sample use case description from Samsung. I would >> encourage everyone to look at that and see if there are other use >> cases they think would make more sense, or if there is clarification >> or amendment of the current proposal. The discussion around this is > > I have a small set of slides on a use case related to camera on TI OMAP > 3. The slides are attached. > > The Samsung example also looks very good to me. > > Kind regards, > > -- > Sakari Ailus > sakari.ailus(a)maxwell.research.nokia.com >

14 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig