"Will Deacon" will.deacon@arm.com wrote on 02/14/2011 11:30:45 AM:
- In testing on Versatile Express, I noticed what appears to be SMP related bugs in handling regular software breakpoints: occasionally, software breakpoints simply are not hit and execution continues as if the underlying code had not been changed at all. This symptom completely goes away if GDB and the debugged process are forced to the same CPU using the affinity feature (e.g. with schedtool).
I've seen this issue in the past but I thought I'd fixed it. What kernel
are
you using and do you have CONFIG_ARM_ERRATA_720789 enabled?
I'm using the 2.6.37-1002-linaro-vexpress kernel from the Linaro package of the same name. This does *not* have CONFIG_ARM_ERRATA_720789 enabled (presumably because the mach-vexpress/Kconfig file does not add it?) ...
My guess, just from seeing those symptoms, would be that when
inserting
a software breakpoint via ptrace, not all i-caches on all CPUs are reliably flushed ... Any thoughts on this?
There was an I-cache aliasing problem in the kernel coupled with a TLB invalidation hardware bug on the versatile express. I fixed these though and haven't seen any problems since.
Hmm, a TLB flush problem could also explain the symptom (because the write of the breakpoint to the text section causes a copy-on-write operation which installs a new page ...)
I'll try rebuilding the kernel with the above config option enabled.
Hmmm, I'll need to have a think about this. What does GDB do if it
receives
a SIGTRAP with si_addr set to (potentially) complete nonsense? As an
aside,
Cortex-A15 reports the faulting address for a watchpoint correctly, so we will be able to use multiple watchpoints there.
The GDB common core can handle either of the following two indications:
A) The (read/write/access) watchpoint at address XXX triggered. B) A write watchpoint may have triggered at some address.
In the case of B, GDB will scan all the write breakpoints it is currently tracking and compare the current value at that address with the last value it remembers being present there. Any changes GDB sees will cause it to report the corresponding watchpoint as triggered.
As far as the kernel interface is concerned, the important issue that the ARM native target in GDB is able to understand what the kernel reports, so it can in turn report either case A or B to the common core.
This means as long as there is some way for GDB to understand the kernel is reporting a write watchpoint hit at an unknown address, everything is fine. This could be done e.g. be reporting a "slot" zero in si_errno to indicate the slot (and then also the address) triggering the watchpoint is unknown ...
- Finally, I noticed when reading kernel code that under some circumstances, the kernel will automatically do a single step to get off a watchpoint that was just hit. However, this does not happen for user-space watchpoints installed via ptrace, right? (Just wanting to confirm; since GDB currently does that single step itself -- we don't want *both* kernel and GDB to issue a single step each ...)
If the {break,watch}point has been inserted via ptrace, the kernel will send a SIGTRAP instead of stepping the instruction.
OK, thanks for the confirmation!
I haven't gotten to looking further into other hardware (IGEP, Panda) -- that's next on the list.
Good stuff, keep me posted if you see any further problems!
Sure, will do!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
"Will Deacon" will.deacon@arm.com wrote on 02/14/2011 11:30:45 AM:
- In testing on Versatile Express, I noticed what appears to be SMP related bugs in handling regular software breakpoints:
occasionally,
software breakpoints simply are not hit and execution continues as
if
the underlying code had not been changed at all. This symptom completely goes away if GDB and the debugged process are forced to the same CPU using the affinity feature (e.g. with schedtool).
I've seen this issue in the past but I thought I'd fixed it. What
kernel
are you using and do you have CONFIG_ARM_ERRATA_720789 enabled?
I'm using the 2.6.37-1002-linaro-vexpress kernel from the Linaro package of the same name. This does *not* have CONFIG_ARM_ERRATA_720789 enabled (presumably because the mach-vexpress/Kconfig file does not add it?) ...
I've now built a kernel with CONFIG_ARM_ERRATA_720789 enabled, and the symptoms indeed seem to have disappeared completely ...
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
Hi Ulrich,
I've now built a kernel with CONFIG_ARM_ERRATA_720789 enabled, and the symptoms indeed seem to have disappeared completely ...
Yup - that's because without it, invalidating a TLB entry for a particular process isn't broadcast correctly, so you can end up using the old (pre-COW) mappings if you're running on a different core.
Will
linaro-toolchain@lists.linaro.org