(bouncing to linaro-dev as it's generally interesting)
On Fri, Sep 16, 2011 at 8:17 AM, Ramana Radhakrishnan ramana.radhakrishnan@linaro.org wrote:
Hi,
I've been looking at some of the perf regressions we've been seeing these days in an attempt to understand what's going on in these cases. While I can use perf and get more statistics and do other things to figure out why there are perf regressions between 2 binaries along with perf record and report, I wonder if it is possible to use u-boot to accurately measure what's going on. I would like to try and get the values of the performance counters between 2 program points .
I am aware that there are patches that are floating around that allow users to set and reset the PMU counters by allowing user level access to it in the kernel : while that maybe useful to some I'm not sure if I want to take a chance with some other process getting scheduled that ends up getting scheduled. Even if there are parts of the kernel that save and restore PMU counters associated per process with across context switches . I'm looking for as accurate measurements as possible in this case and I wonder if u-boot is the best bet for this ( in the absence of any dedicated hardware debug / trace unit) given not all of us have one.
At the minimum to do this I believe we require u-boot or some start-up code to:
* Turn on i-cache and d-cache. ( The current u-boot for panda that I get from the linaro-uboot git repo git://git.linaro.org/boot/u-boot-linaro-stable.git says "Warning Caches turned off" when starting up ). Googling around I find a few patches floating around that turn on the d-cache in August from Aneesh at TI . We should consider getting these in at some point.
- Looking in $(UBOOT_TOP)/examples/api I see that there are simple
printf routines and simple stand-alone applications that exist which could be used for this purpose. The one problem with this is the fact that u-boot appears to require use of -ffixed-r8 for it's purposes which *might* mean we need these if we were to use API calls into standard u-boot functions .
I wonder if R8 is used in the current ARM version? There's no reason we can't cherry pick parts such as the serial I/O out into a library and make the app completely self contained. Skip all of the initialisation stuff and assume the boot loader has done it for you.
- Turn on / off speculative prefetching - I believe the kernel does
this already for a few boards, but could this be done in u-boot just before it launches a test application ?
Turn on the VFP and Neon units.
Turn on unaligned access so that unaligned accesses are allowed in
the test applications. GCC will now move towards generating unaligned accesses on versions of the architecture that support it, the patches upstream have now been approved.
- Memory map / linker scripts to make sure we are putting things in
the right places (sigh, has to be per-board).
But everything goes in RAM so you have one generic linker script and a per board MEMORY definition. Similar to: http://bazaar.launchpad.net/~stm32f-dev/stm32f-dev/stm32f-startup/view/head:...
...but even lighter.
We then write a set of library functions that could then look at what performance counters are of interest to us and track them by resetting them to 0 and making sure they haven't overflown.
Has anyone else in the group played with u-boot before or has any thoughts in this direction ? I am not suggesting that we do this work right now but it sounds like an interesting thought of where we can get to with this.
My worry is that we miss turning on a feature and get results that aren't representative. That should be easy enough to check by baselineing against a Linux hosted run.
We can use NFS or kermit to load the programs. u-boot has a network console which is nice when you don't have serial. This combined with an expect script (or LAVA? Paul?) should automate the whole process.
-- Michael