From: "David A. Long" <dave.long(a)linaro.org>
This patch series adds basic uprobes support to ARM. It is based on patches
developed earlier by Rabin Vincent. That approach of adding special cases into
the kprobes instruction parsing code was not well received. This approach
separates the ARM instruction parsing code in kprobes out into a separate set
of functions which can be used by both kprobes and uprobes. Both kprobes and
uprobes then provide their own semantic action tables to process the results of
the parsing.
One regression bug fix is still in progress on this, and some more definitions
may be moved from kprobes*.h files into more generic include files. However,
at this point feedback on the basic approach would be appreciated.
These patches are based on v3.10-rc7
David A. Long (3):
uprobes: move function declarations out of arch
ARM: Separate kprobes instruction parsing into routines
ARM: add uprobes support
Rabin Vincent (4):
uprobes: allow ignoring of probe hits
uprobes: allow arch access to xol slot
uprobes: allow arch-specific initialization
uprobes: add arch write opcode hook
arch/arm/Kconfig | 4 +
arch/arm/include/asm/kprobes.h | 17 +-
arch/arm/include/asm/probes.h | 23 ++
arch/arm/include/asm/ptrace.h | 6 +
arch/arm/include/asm/thread_info.h | 5 +-
arch/arm/include/asm/uprobes.h | 34 ++
arch/arm/kernel/Makefile | 3 +-
arch/arm/kernel/kprobes-arm.c | 495 ++++++-----------------------
arch/arm/kernel/kprobes-common.c | 260 +---------------
arch/arm/kernel/kprobes-thumb.c | 31 +-
arch/arm/kernel/kprobes.c | 7 +-
arch/arm/kernel/kprobes.h | 51 +--
arch/arm/kernel/probes-arm.h | 66 ++++
arch/arm/kernel/probes.c | 624 +++++++++++++++++++++++++++++++++++++
arch/arm/kernel/signal.c | 4 +
arch/arm/kernel/uprobes-arm.c | 220 +++++++++++++
arch/arm/kernel/uprobes.c | 203 ++++++++++++
arch/arm/kernel/uprobes.h | 25 ++
arch/powerpc/include/asm/uprobes.h | 1 -
arch/x86/include/asm/uprobes.h | 7 -
include/linux/uprobes.h | 17 +
kernel/events/uprobes.c | 54 +++-
22 files changed, 1426 insertions(+), 731 deletions(-)
create mode 100644 arch/arm/include/asm/probes.h
create mode 100644 arch/arm/include/asm/uprobes.h
create mode 100644 arch/arm/kernel/probes-arm.h
create mode 100644 arch/arm/kernel/probes.c
create mode 100644 arch/arm/kernel/uprobes-arm.c
create mode 100644 arch/arm/kernel/uprobes.c
create mode 100644 arch/arm/kernel/uprobes.h
--
1.8.1.2
More than 256 entries in ACPI MADT is supported from ACPI 3.0 Specification,
So the outdated description for MADT entries should be removed.
Signed-off-by: Hanjun Guo <hanjun.guo(a)linaro.org>
---
Documentation/cpu-hotplug.txt | 3 ---
1 file changed, 3 deletions(-)
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 9f40135..2e36e40 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -370,9 +370,6 @@ A: There is no clear spec defined way from ACPI that can give us that
CPUs in MADT as hotpluggable CPUS. In the case there are no disabled CPUS
we assume 1/2 the number of CPUs currently present can be hotplugged.
- Caveat: Today's ACPI MADT can only provide 256 entries since the apicid field
- in MADT is only 8 bits.
-
User Space Notification
Hotplug support for devices is common in Linux today. Its being used today to
--
1.7.9.5
This is a first draft at a process for getting things integrated into
the Linaro Stable Kernel releases. Any feedback is appreciated,
hopefully this is all fairly boring and uncontroversial so there
shouldn't be too many surprises.
The first thing to say here is that all LSK releases will be based off
the latest generic Linux stable kernel so the best way to get a change
into a Linaro release is to get it into the generic Linux stable kernel
using the standard processes. This will maximise the number of people
who can use the change and is just generally good practice.
New features
------------
Features for the stable kernel are agreed by the TSC. Once a feature
has been agreed by the TSC there should be an owner assigned to deliver
a feature branch into the stable kernel and work with the stable kernel
team to resolve any integration issues at least up until the feature has
been included in a release. This will be done per kernel version.
These feature branches should be based on the relevant upstream kernel
as far as possible (any dependencies on other branches should be
discussed with the stable kernel team). Some information about where
the code came fromm should be included along with the code, in order of
preference:
1. Commit IDs from the standard kernel in the changelogs of the
individual patches.
2. A description of how the equivalent change was made upstream or
why it isn't required in LSK (eg, explaining that this is taken
care of by features not present in the stable kernel).
3. References to where out of tree development is happening
including contact information for followup.
The code should be sent as a pull request or patches, with review by the
stable team following normal kernel process and focusing on backporting
and integration issues. Relevant testing infrastructure should also be
available in LAVA for QA and the review will also include ensuring that
the testsuite passes with the changes integrated.
Once the code has been accepted it will be stored as a branch in the LSK
tree and the submission branch can be deleted.
Updating code in the LSK
------------------------
The LSK can be updated either by replacing an existing topic branch or
by submitting incremental patches. Replacement would be most useful in
cases where a feature has not yet been merged into the standard kernel
and is still being redeveloped there but otherwise incremental updates
are preferred. The process for submitting changes is the same as for
new features with the exception that incremental updates should be based
on the topic branch in the LSK git rather than the standard kernel.
Hi,
A number of patch sets related to power-efficient scheduling have been
posted over the last couple of months. Most of them do not have much
data to back them up, so I decided to do some testing.
Common for all of the patch sets that I have tested, except one, is that
they attempt to pack tasks on as few cpus as possible to allow the
remaining cpus to enter deeper sleep states - a strategy that should
make sense on most platforms that support per-cpu power gating and
multi-socket machines.
Kernel: 3.9
Patch sets:
rlb-v4: sched: use runnable load based balance (Alex Shi)
<https://lkml.org/lkml/2013/4/27/13>
pas-v7: sched: power aware scheduling (Alex Shi)
<https://lkml.org/lkml/2013/4/3/732>
pst-v3: sched: packing small tasks (Vincent Guittot)
<https://lkml.org/lkml/2013/3/22/183>
pst-v4: sched: packing small tasks (Vincent Guittot)
<https://lkml.org/lkml/2013/4/25/396>
Configuration:
pas-v7: Set to "powersaving" mode.
pst-v4: Set to "Full" packing mode.
Platform:
ARM TC2 (test-chip), 2xCortex-A15 + 3xCortex-A7. Cortex-A15s disabled.
Measurement technique:
Time spent non-idle (not in idle state) for each cpu based on cpuidle
ftrace events. TC2 does not have per-core power-gating, so packing
inside the A7 cluster does not lead to any significant power savings.
Note that any product grade hardware (TC2 is a test-chip) will very
likely have per-core power-gating, so in those cases packing will have
an appreciable effect on power savings.
Measuring non-idle time rather than power should give a more clear idea
about the effect of the patch sets given that the idle back-end is
highly implementation specific.
Benchmarks:
audio playback (Android): 30s mp3 file playback on Android.
bbench+audio (Android): Web page rendering while doing mp3 playback.
andebench_native (Android): Android benchmark running in native mode.
cyclictest: Short periodic tasks.
Results:
Two runs for each patch set.
audio playback (Android) SMP
non-idle % cpu 0 cpu 1 cpu 2
3.9_1 11.96 2.86 2.48
3.9_2 12.64 2.81 1.88
rlb-v4_1 12.61 2.44 1.90
rlb-v4_2 12.45 2.44 1.90
pas-v7_1 16.17 0.03 0.24
pas-v7_2 16.08 0.28 0.07
pst-v3_1 15.18 2.76 1.70
pst-v3_2 15.13 0.80 0.38
pst-v4_1 16.14 0.05 0.00
pst-v4_2 16.34 0.06 0.00
bbench+audio (Android) SMP
non-idle % cpu 0 cpu 1 cpu 2 render time
3.9_1 25.00 20.73 21.22 812
3.9_2 24.29 19.78 22.34 795
rlb-v4_1 23.84 19.36 22.74 782
rlb-v4_2 24.07 19.36 22.74 797
pas-v7_1 28.29 17.86 16.01 869
pas-v7_2 28.62 18.54 15.05 908
pst-v3_1 29.14 20.59 21.72 830
pst-v3_2 27.69 18.81 20.06 830
pst-v4_1 42.20 13.63 2.29 880
pst-v4_2 41.56 14.40 2.17 935
andebench_native (8 threads) (Android) SMP
non-idle % cpu 0 cpu 1 cpu 2 Score
3.9_1 99.22 98.88 99.61 4139
3.9_2 99.56 99.31 99.46 4148
rlb-v4_1 99.49 99.61 99.53 4153
rlb-v4_2 99.56 99.61 99.53 4149
pas-v7_1 99.53 99.59 99.29 4149
pas-v7_2 99.42 99.63 99.48 4150
pst-v3_1 97.89 99.33 99.42 4097
pst-v3_2 99.16 99.62 99.42 4097
pst-v4_1 99.34 99.01 99.59 4146
pst-v4_2 99.49 99.52 99.20 4146
cyclictest SMP
non-idle % cpu 0 cpu 1 cpu 2
3.9_1 9.13 8.88 8.41
3.9_2 10.27 8.02 6.30
rlb-v4_1 8.88 8.09 8.11
rlb-v4_2 8.49 8.09 8.11
pas-v7_1 10.20 0.02 11.50
pas-v7_2 7.86 14.31 0.02
pst-v3_1 20.44 8.68 7.97
pst-v3_2 20.41 0.78 1.00
pst-v4_1 21.32 0.21 0.05
pst-v4_2 21.56 0.21 0.04
Overall, pas-v7 seems to do a fairly good job at packing. The idle time
distribution seems to be somewhere between pst-v3 and the more
aggressive pst-v4 for all the benchmarks. pst-v4 manages to keep two
cpus nearly idle (<0.25% non-idle) for both cyclictest and audio, which
is better than both pst-v3 and pas-v7. pas-v7 fails to pack cyclictest.
Packing does come at at cost which can be seen for bbench+audio, where
pst-v3 and rlb-v4 get better render times than pas-v7 and pst-v4 which
do more aggressive packing. rlb-v4 does not pack, it is only included
for reference.
>From a packing perspective pst-v4 seems to do the best job for the
workloads that I have tested on ARM TC2. The less aggressive packing in
pst-v3 may be a better choice for in terms of performance.
I'm well aware that these tests are heavily focused on mobile workloads.
I would therefore encourage people to share your test results for your
workloads on your platforms to complete the picture. Comments are also
welcome.
Thanks,
Morten