<snip>

 

> ARM has stated that if you have the same physical memory mapped with two
> different sets of attribute bits you get undefined behavior.  I think it's
> going to be a requirement that some of the memory allocated via the unified
> memory manager is mapped uncached.

This may be a stupid question, but do we have an agreement that it
is actually a requirement to have uncached mappings? With the
streaming DMA mapping API, it should be possible to work around
noncoherent DMA by flushing the caches at the right times, which
probably results in better performance than simply doing noncached
mappings. What is the specific requirement for noncached memory
regions?

 

That was my original plan, but our graphics folks and those at our partner companies basically have me convinced that the common case is for userspace to stream data into memory, say copying an image into a texture, and never read from it or touch it again.  The alternative will mean a lot of cache flushes for small memory regions,  in and of itself this becomes a performance problem.  I think we want to optimize for this case, rather than the much less likely case of read-modify-write to these buffers.  

 

[TC] I’m not sure I completely agree with this being a use case. From my understanding, the problem we’re trying to solve here isn’t a generic graphics memory manager but rather a memory manager to facilitate cross-device sharing of 2D image buffers. GPU drivers will still have their own allocators for textures which will probably be in a tiled or other proprietary format no other device can understand anyway. The use case where we (GPU driver vendors) might want uncached memory is for one-time texture upload. If we have a texture we know we’re only going to write to once (with the CPU), there is no benefit in using cached memory. In fact, there’s a potential performance drop if you used cached memory because the texture upload will cause useful cache lines to be evicted and replaced with useless lines for the texture. However, I don’t see any use case where we’d then want to share that CPU-uploaded texture with another device, in which case we would use our own (uncached) allocator, not this “cross-device” allocator. There’s also a school of thought (so I’m told) that for one-time texture upload you still want to use cached memory because more modern cores have smaller write buffers (so you want to use the cache for better combining of writes) and are good at detecting large sequential writes and thus don’t use the whole cache for those anyway. So, other than one-time texture upload, are there other graphics use cases you know of where it might be more optimal to use uncached memory? What about video decoder use-cases?

 


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.