Re: [Linaro-mm-sig] Memory region attribute bits and multiple mappings

21 Apr 2011


      On Thu, Apr 21, 2011 at 3:33 AM, Tom Cooksey Tom.Cooksey@arm.com wrote:
...
At least with Mali, we use user-space sub-allocators. So we’ll acquire
hunks of memory from the kernel, map it into the GPU context’s virtual
address space and let the user-space driver divide it up into however many
buffers it needs. So I guess we could still acquire the hunks of memory from
whatever kernel-space allocator we want, but fundamentally, kernel-space
will never see what individual buffers that memory is divided up into. I’m
also struggling to understand how we could securely share a buffer after it
has been allocated given we’d have to share entire pages. It’s quite likely
a buffer we want to share will have data in pages used by other buffers
which we don’t want to share? Fundamentally, we need to know at allocation
time if a buffer is intended to be shared with another device or process so
we can make sure the allocation gets its own pages.
I know lots of gpu driver code suballocates, I expect folks will do exactly
what you've suggested, allocate a chunk and then manage it elsewhere.
I don't think there's any way to manage security at less than a page
boundary.  Do you have an mmu for your gpu with smaller page tables than
that?  I think if you do that kind of thing you are breaking the security
model and there's no way around it.  Theoretically we could manage security
at less than an allocation granularity -- i'm thinking you'd mmap at an
offset.  I implemented that for PMEM and it's a bit of a metadata management
mess, but it's a possible extension.  Still it'd be at a page boundary
though.
...
Thinking about the uncached issue some more, I think I’ve convinced myself
that it should be a requirement. We’ll have to support older (pre-A9) cores
for some time where using uncached memory provides better performance (which
seems to be the consensus opinion here?).
Agreed.
...
Cheers,
Tom
*From:* rschultz@google.com [mailto:rschultz@google.com] *On Behalf Of *Rebecca
Schultz Zavin
*Sent:* 20 April 2011 22:53
*To:* Dave Airlie
*Cc:* Tom Cooksey; Arnd Bergmann; linaro-mm-sig@lists.linaro.org
*Subject:* Re: [Linaro-mm-sig] Memory region attribute bits and multiple
mappings
I've been buried all day, but here's my response to a bunch of the
discussion points today:
I definitely don't think it makes sense to solve part of the problem and
then leave the gpu's to require their own allocator for their common case.
 Why solve the allocator problem twice?  Also, we want to be able to support
passing of these buffers between the camera, gpu, video decoder etc.  It'd
be much easier if they were from a common source.
The android team's graphics folks, the their counterparts at intel,
imagination tech, nvidia, qualcomm and arm have all told me that they need a
method for mapping buffers uncached to userspace.  The common case for this
is to write vertexes, textures etc to these buffers once and never touch
them again.  This may happen several (or even several 10s or more) of times
per frame.  My experience with cache flushes on ARM architectures matches
Marek's.  Typically write combine makes streaming writes really really fast,
and on several SOC's we've found it cheaper to flush the whole cache than to
flush by line.  Clearly this impacts the rest of system performance, not to
mention the fact that a couple of large textures and you've totally blown
your caches for the rest of the system.
Once we've solved the multiple mapping problem, it becomes quite easy to
support both cached AND uncached accesses from the cpu, so I think that
covers the cases where it's actually desirable to have a cached mapping --
software rendering, processing frames from a camera etc.
I think the issue of contiguous memory allocation and cache attributes are
totally separate, except that in cases where large contiguous regions are
necessary -- the problem qualcom's pmem and friends were written to solve --
you pretty much end up needing to put aside a pool of buffers at boot anyway
in order to guarantee the availability of large order allocations.  Once
you've done that, the attributes problem goes away since your memory is not
in the direct map.
On Wed, Apr 20, 2011 at 1:34 PM, Dave Airlie airlied@gmail.com wrote:
...
[TC] I’m not sure I completely agree with this being a use case. From my
understanding, the problem we’re trying to solve here isn’t a generic
graphics memory manager but rather a memory manager to facilitate
cross-device sharing of 2D image buffers. GPU drivers will still have
their
...
own allocators for textures which will probably be in a tiled or other
proprietary format no other device can understand anyway. The use case
where
...
we (GPU driver vendors) might want uncached memory is for one-time
texture
...
upload. If we have a texture we know we’re only going to write to once
(with
...
the CPU), there is no benefit in using cached memory. In fact, there’s a
potential performance drop if you used cached memory because the texture
upload will cause useful cache lines to be evicted and replaced with
useless
...
lines for the texture. However, I don’t see any use case where we’d then
want to share that CPU-uploaded texture with another device, in which
case
...
we would use our own (uncached) allocator, not this “cross-device”
allocator. There’s also a school of thought (so I’m told) that for
one-time
...
texture upload you still want to use cached memory because more modern
cores
...
have smaller write buffers (so you want to use the cache for better
combining of writes) and are good at detecting large sequential writes
and
...
thus don’t use the whole cache for those anyway. So, other than one-time
texture upload, are there other graphics use cases you know of where it
might be more optimal to use uncached memory? What about video decoder
use-cases?
The memory mangaer should be used for all internal  GPU memory
management as well if desired.
If we have any hope of ever making open source ARM GPU drivers get
upstream they can't all just
go reinventing the wheel. They need to be based on a common layer.
Dave.
-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] Memory region attribute bits and multiple mappings