Re: Overheating Pandas

4 Jul 2013


      On Wed, 3 Jul 2013 15:59:47 +0100
Mans Rullgard mans.rullgard@linaro.org wrote:
...
On 3 July 2013 14:13, Renato Golin renato.golin@linaro.org wrote:
...
Hi Folks,
I'm running two buildbots here at home and am getting consistent failures
from the Pandas because of overheating. I've set up a monitor that will tell
me the current CPU temperature and the allowed maximum, and when the bot
passes 90%, it shuts itself off.
The problem is that I'm running with heat-sinks and the boards are on top of
three fans, so there really isn't much more I can do to solve this problem.
I personally think this is a hardware problem, since everything is in the
same die, CPU, GPU and RAM, and the physical dimensions of the chip are
quite small. I remember when Intel started overheating (around 486DX66) and
the die was huge (more head dissipation), plus RAM and GPU were separate,
and it still needed a hefty heat-sink.
It's true that gates are far smaller today, but it's not true that a dual
core 1.3GHz + GPU + RAM will produce less heat on a small die than a 66KHz
CPU on a huge die, so why anyone think it's a good idea to release a 1+GHz
chip without *any* form of heat dissipation is beyond my comprehension.
Modern silicon processes are much more power-efficient than those of the 90s.
For example, an old ~500MHz Alpha machine I have readily consumes 90W even
when idle.  A quad-core Intel i7 typically has a TDP of 130W at full load.
That's orders of magnitude more gates clocked at 6x the frequency and still
using only marginally more power.
BTW, the RAM is a separate chip mounted on top of the SoC.
...
Manufacturers only got away with it, so far, because people rarely use 100%
of the CPU power for extended periods of time, because ARM devices end up as
set-top boxes, mobile phones and tablets. However, even those devices will
heat up when playing 2 h films or games, and they do have some form of heat
sink.
An OMAP4460 will run at 1.2GHz indefinitely without overheating in reasonable
ambient temperature.  The higher frequencies are only meant to be used in
conjunction with (software) thermal management to throttle back if temperature
rises.
If you don't have thermal management in the kernel you're running, you need
to clamp the clock at a safe value.
By the way, power consumption is not constant and heavily depends on
what the CPU is actually doing. And 100% CPU load in one application
does not mean that it would consume the same amount of power as 100%
CPU load in another application. With some targeted "optimisations" it
is possible to boost power consumption roughly by a factor of 1.5x
compared to most heavy workloads in real applications. I have a
collection of ARM cpuburn programs, empirically tuned for different
microarchitectures (which means that they still can be possibly
"improved"):
https://github.com/ssvb/cpuburn
It is possible that Cortex-A15 would show a similar ~1.5x factor for
the power consumption boost if somebody were to tune cpuburn for it.
But I'm a bit reluctant to dismantle my ARM Chromebook to hook a
multimeter there (developer boards with no batteries and with barrel
power connectors are much more easy to deal with).
Some time ago, I tossed my Cortex-A9 cpuburn to the ODROID-X people.
And coincidentally they quickly got the thermal framework properly
integrated into their kernels and also started to offer optional
active coolers to their customers :-)
Now if you also consider that SoCs usually have a lot more than
just the CPU cores, the peak power consumption can be really high.
Designing the cooling system so that it is able to handle the peak
power consumption is a bit of an overkill. It is going to be expensive
and/or bulky. And just restricting the CPU clock frequency so that the
power consumption never exceeds a certain threshold, you are going to
end up clocking the CPU at a really low speed. In my opinion, the right
solution for modern ARM SoCs is just to always ensure proper throttling
support (both in the hardware and in the software). ARM can even call it
"turbo-boost", "turbo-core" or use some other marketing buzzword ;-)
-- 
Best regards,
Siarhei Siamashka

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Overheating Pandas