I believe that in the LAVA lab there are a few pandas with USB keys
that are used for builds to try and overcome some reliability
problems.
Is running at 100% of the thermal limit really an issue? Isn't the
point that it is the limit, which itself should have some safety built
in? I don't know off hand if the OMAP 4 SoCs incorporate hardware
frequency limiting or if it is entirely software, in which case the
kernel frequency governor should (at a guess) be throttling back.
I did have a panda give up on me about a year ago. It wasn't being
worked hard, but did refuse to get through a boot most of the time (it
did power on and get part way through booting). Those boards aren't
designed for high reliability and it may be that you just need to get
a couple of replacements.