On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
On 15 Jun 11 10:36, Marek Szyprowski wrote:
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it "accidentally", depriving those who really need it.
Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another?
The possible conflict that I still see with per-bank CMA regions are:
- It completely destroys memory power management in cases where that is based on powering down entire memory banks.
I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management.
We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove).
We already established that we have to know something about the banks, and your additional input makes it even clearer that we need to consider the bigger picture here: We need to describe parts of memory separately regarding general performance, device specific allocations and hotplug characteristics.
It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two.
Arnd