New subject: Problems with kernel support for hardware watchpoints

14 Feb 2011

      "Will Deacon" will.deacon@arm.com wrote on 02/14/2011 11:30:45 AM:
...
...

In testing on Versatile Express, I noticed what appears to be SMP
related bugs in handling regular software breakpoints: occasionally,
software breakpoints simply are not hit and execution continues as if
the underlying code had not been changed at all.  This symptom
completely goes away if GDB and the debugged process are forced to
the same CPU using the affinity feature (e.g. with schedtool).

I've seen this issue in the past but I thought I'd fixed it. What kernel
are
...
you using and do you have CONFIG_ARM_ERRATA_720789 enabled?
I'm using the 2.6.37-1002-linaro-vexpress kernel from the Linaro package
of the same name.  This does *not* have CONFIG_ARM_ERRATA_720789 enabled
(presumably because the mach-vexpress/Kconfig file does not add it?) ...
...
...
My guess, just from seeing those symptoms, would be that when
inserting
...
...
a software breakpoint via ptrace, not all i-caches on all CPUs are
  reliably flushed ...   Any thoughts on this?
There was an I-cache aliasing problem in the kernel coupled with a TLB
invalidation hardware bug on the versatile express. I fixed these though
and haven't seen any problems since.
Hmm, a TLB flush problem could also explain the symptom (because the write
of the breakpoint to the text section causes a copy-on-write operation
which
installs a new page ...)
I'll try rebuilding the kernel with the above config option enabled.
...
Hmmm, I'll need to have a think about this. What does GDB do if it
receives
...
a SIGTRAP with si_addr set to (potentially) complete nonsense? As an
aside,
...
Cortex-A15 reports the faulting address for a watchpoint correctly, so we
will be able to use multiple watchpoints there.
The GDB common core can handle either of the following two indications:
A) The (read/write/access) watchpoint at address XXX triggered.
B) A write watchpoint may have triggered at some address.
In the case of B, GDB will scan all the write breakpoints it is currently
tracking and compare the current value at that address with the last value
it remembers being present there.  Any changes GDB sees will cause it to
report the corresponding watchpoint as triggered.
As far as the kernel interface is concerned, the important issue that the
ARM native target in GDB is able to understand what the kernel reports, so
it can in turn report either case A or B to the common core.
This means as long as there is some way for GDB to understand the kernel
is reporting a write watchpoint hit at an unknown address, everything is
fine.  This could be done e.g. be reporting a "slot" zero in si_errno to
indicate the slot (and then also the address) triggering the watchpoint
is unknown ...
...
...

Finally, I noticed when reading kernel code that under some
circumstances, the kernel will automatically do a single step to
get off a watchpoint that was just hit.  However, this does not
happen for user-space watchpoints installed via ptrace, right?
(Just wanting to confirm; since GDB currently does that single
step itself -- we don't want *both* kernel and GDB to issue a
single step each ...)

If the {break,watch}point has been inserted via ptrace, the kernel will
send a SIGTRAP instead of stepping the instruction.
OK, thanks for the confirmation!
...
...
I haven't gotten to looking further into other hardware (IGEP,
Panda) -- that's next on the list.
Good stuff, keep me posted if you see any further problems!
Sure, will do!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294

RE: Problems with kernel support for hardware watchpoints