Re: [Linaro-validation] Overheating Pandas

3 Jul 2013


      I believe that in the LAVA lab there are a few pandas with USB keys
that are used for builds to try and overcome some reliability
problems. Don't know if it was a temperature problem or something
else. With any luck someone who knows more about that issue can speak
up and share what they found. You could also try running "stress --cpu
4 --vm 2" and see if any errors show. I find that on my desktop
running 2x the number of CPU stress threads as I have CPUs is about
right to eat all available resources. That will just stress RAM and
CPU, not disk I/O, which should pinpoint the problem. Plenty of other
options (http://www.hecticgeek.com/2012/11/stress-test-your-ubuntu-computer-with-stre...)...
Is running at 100% of the thermal limit really an issue? Isn't the
point that it is the limit, which itself should have some safety built
in? I don't know off hand if the OMAP 4 SoCs incorporate hardware
frequency limiting or if it is entirely software, in which case the
kernel frequency governor should (at a guess) be throttling back.
I did have a panda give up on me about a year ago. It wasn't being
worked hard, but did refuse to get through a boot most of the time (it
did power on and get part way through booting). Those boards aren't
designed for high reliability and it may be that you just need to get
a couple of replacements.
James
On 3 July 2013 14:13, Renato Golin renato.golin@linaro.org wrote:
...
Hi Folks,
I'm running two buildbots here at home and am getting consistent failures
from the Pandas because of overheating. I've set up a monitor that will tell
me the current CPU temperature and the allowed maximum, and when the bot
passes 90%, it shuts itself off.
The problem is that I'm running with heat-sinks and the boards are on top of
three fans, so there really isn't much more I can do to solve this problem.
I personally think this is a hardware problem, since everything is in the
same die, CPU, GPU and RAM, and the physical dimensions of the chip are
quite small. I remember when Intel started overheating (around 486DX66) and
the die was huge (more head dissipation), plus RAM and GPU were separate,
and it still needed a hefty heat-sink.
It's true that gates are far smaller today, but it's not true that a dual
core 1.3GHz + GPU + RAM will produce less heat on a small die than a 66KHz
CPU on a huge die, so why anyone think it's a good idea to release a 1+GHz
chip without *any* form of heat dissipation is beyond my comprehension.
Manufacturers only got away with it, so far, because people rarely use 100%
of the CPU power for extended periods of time, because ARM devices end up as
set-top boxes, mobile phones and tablets. However, even those devices will
heat up when playing 2 h films or games, and they do have some form of heat
sink.
We, at the toolchain group, make things worse by using 100% CPU, 24 / 7,
something that Panda boards, or Arndales were not designed to do. However,
with ARM moving into the server space, their designs will have to be
re-thought, and what a better place than Linaro for making sure we get it
right?
For the time being, I believe we *must* have air conditioning in the Lab all
the time, and we *must* have heat-sinks on every board, and we *must*
monitor the CPU temperature of the boards, at least until we're comfortable
that they're not failing all the time.
Can we make a temperature monitor (like the one attached) a default feature
on Linaro Ubuntu distributions? We could dump that info to the syslog/dmesg
whenever it crosses the (say) 75% threshold, and report more often when it
crosses the 95%, possibly dumping the processe(s) that are consuming more
CPU at the time, to enable post-mortem debugging.
cheers,
--renato
As a side note, the quad-A9 ODroid does ship with a massive heat-sink, which
also serves as a fancy case. Quite clever, really.

linaro-validation mailing list
linaro-validation@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-validation
-- 
James Tunnicliffe

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Linaro-validation] Overheating Pandas