Hello Namhyung,
Sorry your question is so late. I don't include the ELF headers here,
but the problem can be seen with a perf.data packet dump of user
instruction trace capture. The problem is with the non-zero pgoff. The
arm-cs-trace-disasm.py script was never passed pgoff information to
adjust the start/end disassemble range passed to objdump. This patch
distributes the fix between perf and the arm-cs-trace-disasm.py script.
Here's a brief excerpt from an e-mail I sent to James Clark describing
the patch before I submitted it.
Regards,
Steve C.
-----------------------------------------------------------------
.
.
This e-mail is to document what I know and don’t know about the problem.
The background is Fedora 38 introduced non-zero text offsets into common
shared objects and executables (e.g. libc.so.6, etc.).
If you were to ‘perf report --dump’ the perf.data of a user mode
instruction trace on Fedora 37 and grep the PERF_RECORD_MMAP2 packets
you’ll notice all zero values (@ 0) for shared binary and executable
text offsets. Repeat the same for user trace collected on Fedora 38/39,
and these text offsets show as non-zero.
Fedora 37:
4294967295 18446744073709551615 0x8ac8 [0x78]: PERF_RECORD_MMAP2
2389577/2389577: [0xffff85124000(0x42000) @ 0 103:04 805306555
1941233998]: r-xp /usr/lib/ld-linux-aarch64.so.1
4294967295 18446744073709551615 0x8b40 [0x60]: PERF_RECORD_MMAP2
2389577/2389577: [0xffff85161000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
4294967295 18446744073709551615 0x8ba0 [0x68]: PERF_RECORD_MMAP2
2389577/2389577: [0xaaaab09b0000(0x10000) @ 0 103:04 1140851919
994218038]: r-xp /usr/bin/dd
4294967295 18446744073709551615 0x8c08 [0x70]: PERF_RECORD_MMAP2
2389577/2389577: [0xffff84f70000(0x1a9000) @ 0 103:04 1677721733
3891973508]: r-xp /usr/lib64/libc.so.6
Fedora 39:
4294967295 18446744073709551615 0x8ac8 [0x78]: PERF_RECORD_MMAP2
18229/18229: [0xffffa5512000(0x1d000) @ 0x10000 103:04 161 4093340249]:
r-xp /usr/lib/ld-linux-aarch64.so.1
4294967295 18446744073709551615 0x8b40 [0x60]: PERF_RECORD_MMAP2
18229/18229: [0xffffa554f000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
4294967295 18446744073709551615 0x8ba0 [0x68]: PERF_RECORD_MMAP2
18229/18229: [0xaaaade7e0000(0xb000) @ 0x10000 103:04 536876423
421616320]: r-xp /usr/bin/dd
4294967295 18446744073709551615 0x8c08 [0x70]: PERF_RECORD_MMAP2
18229/18229: [0xffffa5360000(0x11f000) @ 0x30000 103:04 536873199
2415801053]: r-xp /usr/lib64/libc.so.6
The arm-cs-trace-disasm.py script never gets to see the text offset into
the dso binaries, so has no opportunity to adjust the start/end address
range passed to objdump. This wasn’t a problem on Fedora 37 and below
since there’s no start/end adjustment for a zero text offset. On Fedora
38/39 distros, the instruction trace shows unconditional branch
instructions which do not branch to the target address, the clearest
indication of trouble.
-----------------------------------------------------------------
On 10/7/2024 10:51 PM, Namhyung Kim wrote:
> Hello,
>
> Sorry for the long delay. But can you please explain your problem in
> more detail? Showing ELF program (or section) header would be helpful
> as well.
>
> Is the problem when the text mapping has non-zero pgoff only? Is the
> kernel symbols working correctly? Or is it just the Python script
> broken?
>
> Thanks,
> Namhyung
>
> On Mon, Sep 09, 2024 at 03:30:02PM -0600, Steve Clevenger wrote:
>> Changes in V8:
>> - in arm-cs-trace-disasm.py, ensure map_pgoff is not converted to
>> string.
>> - Remove map_pgoff integer conversion in dso not found print
>> message.
>>
>> Changes in V7:
>> - In arm-cs-trace-disasm.py, fix print message core dump resulting
>> from mixed type arithmetic.
>> - Modify CS_ETM_TRACE_ON filter to filter zero start_addr. The
>> CS_ETM_TRACE_ON message is changed to print only in verbose mode.
>> - Removed verbose mode only notification for start_addr/stop_addr
>> outside of dso address range.
>>
>> Changes in V6:
>> - In arm-cs-trace-disasm.py, zero map_pgoff for kernel files. Add
>> map_pgoff to start/end address for dso not found message.
>> - Added "Reviewed-by" trailer for patches 1-3 previously reviewed
>> by Leo Yan in V4 and V5.
>>
>> Changes in V5:
>> - In symbol-elf.c, branch to exit_close label if open file.
>> - In trace_event_python.c, correct indentation. set_sym_in_dict
>> call parameter "map_pgoff" renamed as "addr_map_pgoff" to
>> match local naming.
>>
>> Changes in V4:
>> - In trace-event-python.c, fixed perf-tools-next merge problem.
>>
>> Changes in V3:
>> - Rebased to linux-perf-tools branch.
>> - Squash symbol-elf.c and symbol.h into same commit.
>> - In map.c, merge dso__is_pie() call into existing if statement.
>> - In arm-cs-trace-disasm.py, remove debug artifacts.
>>
>> Changes in V2:
>> - In dso__is_pie() (symbol-elf.c), Decrease indentation, add null pointer
>> checks per Leo Yan review.
>> - Updated mailing list distribution.
>>
>> Steve Clevenger (4):
>> Add dso__is_pie call to identify ELF PIE
>> Force MAPPING_TYPE__IDENTIY for PIE
>> Add map pgoff to python dictionary based on MAPPING_TYPE
>> Adjust objdump start/end range per map pgoff parameter
>>
>> .../scripts/python/arm-cs-trace-disasm.py | 17 ++++--
>> tools/perf/util/map.c | 4 +-
>> .../scripting-engines/trace-event-python.c | 13 +++-
>> tools/perf/util/symbol-elf.c | 61 +++++++++++++++++++
>> tools/perf/util/symbol.h | 1 +
>> 5 files changed, 86 insertions(+), 10 deletions(-)
>>
>> --
>> 2.44.0
>>
On 08/10/2024 07:52, Leo Yan wrote:
> On 10/7/2024 9:05 PM, Leo Yan wrote:
>>
>> Hi Julien,
>>
>> On Wed, Sep 25, 2024 at 03:13:56PM +0200, Julien Meunier wrote:
>>> The previous implementation limited the tracing capabilities when perf
>>> was run in the init PID namespace, making it impossible to trace
>>> applications in non-init PID namespaces.
>>>
>>> This update improves the tracing process by verifying the event owner.
>>> This allows us to determine whether the user has the necessary
>>> permissions to trace the application.
>>
>> The original commit aab473867fed is not for constraint permission. It is
>> about PID namespace mismatching issue.
>>
>> E.g. Perf runs in non-root namespace, thus it records process info in the
>> non-root PID namespace. On the other hand, Arm CoreSight traces PID for
>> root namespace, as a result, it will lead mess when decoding.
>>
>> With this change, I am not convinced that Arm CoreSight can trace PID for
>> non-root PID namespace. Seems to me, the concerned issue is still existed
>> - it might cause PID mismatching issue between hardware trace data and
>> Perf's process info.
>
> I thought again and found I was wrong with above conclusion. This patch is a
> good fixing for the perf running in root namespace to profile programs in
> non-root namespace. Sorry for noise.
>
> Maybe it is good to improve a bit comments to avoid confusion. See below.
>
> [...]
>
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>> index bf01f01964cf..8365307b1aec 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
>>> @@ -695,7 +695,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>>
>>> /* Only trace contextID when runs in root PID namespace */
>
> We can claim the requirement for the *tool* running in root PID namespae.
>
> /* Only trace contextID when the tool runs in root PID namespace */
minor nit: I wouldn't call "tool". Let keep it "event owner".
/* Only trace contextID when the event owner is in root PID namespace */
Julien,
Please could you respin the patch with the comments addressed.
Kind regards
Suzuki
>
>
>>> if ((attr->config & BIT(ETM_OPT_CTXTID)) &&
>>> - task_is_in_init_pid_ns(current))
>>> + task_is_in_init_pid_ns(event->owner))
>>> /* bit[6], Context ID tracing bit */
>>> config->cfg |= TRCCONFIGR_CID;
>>>
>>> @@ -710,7 +710,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>>> goto out;
>>> }
>>> /* Only trace virtual contextID when runs in root PID namespace */
>
> Ditto.
>
> /* Only trace virtual contextID when the tool runs in root PID namespace */
>
> With above change:
>
> Reviewed-by: Leo Yan <leo.yan(a)arm.com>
>
>>> - if (task_is_in_init_pid_ns(current))
>>> + if (task_is_in_init_pid_ns(event->owner))
>>> config->cfg |= TRCCONFIGR_VMID | TRCCONFIGR_VMIDOPT;
>>> }
>>>
>>> --
>>> 2.34.1
>>>
>>>
On 25/09/2024 14:13, Julien Meunier wrote:
> The previous implementation limited the tracing capabilities when perf
> was run in the init PID namespace, making it impossible to trace
> applications in non-init PID namespaces.
>
> This update improves the tracing process by verifying the event owner.
> This allows us to determine whether the user has the necessary
> permissions to trace the application.
>
> Cc: stable(a)vger.kernel.org
> Fixes: aab473867fed ("coresight: etm4x: Don't trace PID for non-root PID namespace")
> Signed-off-by: Julien Meunier <julien.meunier(a)nokia.com>
Thanks for the fix, I will queue this for v6.13
Suzuki
> ---
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index bf01f01964cf..8365307b1aec 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -695,7 +695,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
>
> /* Only trace contextID when runs in root PID namespace */
> if ((attr->config & BIT(ETM_OPT_CTXTID)) &&
> - task_is_in_init_pid_ns(current))
> + task_is_in_init_pid_ns(event->owner))
> /* bit[6], Context ID tracing bit */
> config->cfg |= TRCCONFIGR_CID;
>
> @@ -710,7 +710,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> goto out;
> }
> /* Only trace virtual contextID when runs in root PID namespace */
> - if (task_is_in_init_pid_ns(current))
> + if (task_is_in_init_pid_ns(event->owner))
> config->cfg |= TRCCONFIGR_VMID | TRCCONFIGR_VMIDOPT;
> }
>
On Wed, Sep 25, 2024 at 04:04:55PM -0700, Namhyung Kim wrote:
> On Wed, Sep 25, 2024 at 10:54:31AM +0100, James Clark wrote:
> >
> >
> > On 25/09/2024 12:39 am, Ilkka Koskinen wrote:
> > > If one builds perf with DEBUG=1, captures data on multiple CPUs and
> > > finally runs 'perf report -C <cpu>' for only one of the cpus, assert()
> > > aborts the program. This happens because there are empty queues with
> > > format set. This patch changes the condition to abort only if a queue
> > > is not empty and if the format is unset.
> > >
> > > $ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install
> > > $ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1
> > > $ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1
> > > Aborted (core dumped)
> > >
> > > Fixes: 57880a7966be ("perf: cs-etm: Allocate queues for all CPUs")
> > > Signed-off-by: Ilkka Koskinen <ilkka(a)os.amperecomputing.com>
> > > ---
> > > tools/perf/util/cs-etm.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > > index 90f32f327b9b..40f047baef81 100644
> > > --- a/tools/perf/util/cs-etm.c
> > > +++ b/tools/perf/util/cs-etm.c
> > > @@ -3323,7 +3323,7 @@ static int cs_etm__create_decoders(struct cs_etm_auxtrace *etm)
> > > * Don't create decoders for empty queues, mainly because
> > > * etmq->format is unknown for empty queues.
> > > */
> > > - assert(empty == (etmq->format == UNSET));
> > > + assert(empty || etmq->format != UNSET);
> > > if (empty)
> > > continue;
> >
> > Oops I didn't realize you could filter on CPU in report mode. Thanks for the
> > fix. Adding a test to the end of test_arm_coresight.sh might be quite
> > useful. Either way:
> >
> > Reviewed-by: James Clark <james.clark(a)linaro.org>
>
> Thanks, it should go to the perf-tool. Arnaldo, please pick up.
Right, picking it now.
Thanks,
- Arnaldo
Change since V4:
1. Use ^ete(-[0-9]+)?$ for the pattern of node name -- comments from Krzysztof Kozlowski <krzk(a)kernel.org>
2. Update commit message --- comments from Rob Herring <robh(a)kernel.org>
Change since V3:
1. Use ^ete-[0-9]+$ for the pattern of node name -- comments from Rob Herring
Change since V2:
1. Change the name in binding as 'ete'.
Change since V1:
1. Remove the pattern match of ETE node name.
2. Update the tmc-etr node name in DT.
Mao Jinlong (2):
dt-bindings: arm: coresight: Update the pattern of ete node name
arm64: dts: qcom: sm8450: Add coresight nodes
.../arm/arm,embedded-trace-extension.yaml | 6 +-
arch/arm64/boot/dts/qcom/sm8450.dtsi | 726 ++++++++++++++++++
2 files changed, 729 insertions(+), 3 deletions(-)
--
2.46.0
On 25/09/2024 12:39 am, Ilkka Koskinen wrote:
> If one builds perf with DEBUG=1, captures data on multiple CPUs and
> finally runs 'perf report -C <cpu>' for only one of the cpus, assert()
> aborts the program. This happens because there are empty queues with
> format set. This patch changes the condition to abort only if a queue
> is not empty and if the format is unset.
>
> $ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install
> $ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1
> $ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1
> Aborted (core dumped)
>
> Fixes: 57880a7966be ("perf: cs-etm: Allocate queues for all CPUs")
> Signed-off-by: Ilkka Koskinen <ilkka(a)os.amperecomputing.com>
> ---
> tools/perf/util/cs-etm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 90f32f327b9b..40f047baef81 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -3323,7 +3323,7 @@ static int cs_etm__create_decoders(struct cs_etm_auxtrace *etm)
> * Don't create decoders for empty queues, mainly because
> * etmq->format is unknown for empty queues.
> */
> - assert(empty == (etmq->format == UNSET));
> + assert(empty || etmq->format != UNSET);
> if (empty)
> continue;
>
Oops I didn't realize you could filter on CPU in report mode. Thanks for
the fix. Adding a test to the end of test_arm_coresight.sh might be
quite useful. Either way:
Reviewed-by: James Clark <james.clark(a)linaro.org>
On 18/09/2024 12:23 pm, Ganapatrao Kulkarni wrote:
>
> Hi James,
>
> On 16-09-2024 07:27 pm, James Clark wrote:
>> A set of changes that came out of the issues reported here [1].
>>
>> * First 2 patches fix a decode bug in Perf and add support for new
>> consistency checks in OpenCSD
>> * The remaining ones make the disassembly script easier to test
>> and use. This also involves adding a new Python binding to
>> Perf to get a config value (perf_config_get())
>>
>> [1]:
>> https://lore.kernel.org/linux-arm-kernel/20240719092619.274730-1-gankulkarn…
>>
>
> Tried this series with below commands and issue is not seen as reported
> in [1].
>
> record:
> timeout 8s ./perf record -e cs_etm// -C 1 -o kcore --kcore dd
> if=/dev/zero of=/dev/null
>
> decode:
> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -- -d
> objdump -k kcore/kcore_dir/kcore
>
> ./perf script -i ./kcore -s scripts/python/arm-cs-trace-disasm.py -F
> cpu,event,ip,addr,sym -- -d objdump -k kcore/kcore_dir/kcore
>
> Feel free to add for 1/7 and 2/7.
> Tested-by: Ganapatrao Kulkarni <gankulkarni(a)os.amperecomputing.com>
>
Thanks for testing!
A set of changes that came out of the issues reported here [1].
* First 2 patches fix a decode bug in Perf and add support for new
consistency checks in OpenCSD
* The remaining ones make the disassembly script easier to test
and use. This also involves adding a new Python binding to
Perf to get a config value (perf_config_get())
[1]: https://lore.kernel.org/linux-arm-kernel/20240719092619.274730-1-gankulkarn…
Changes since V1:
* Keep the flush function for discontinuities
* Still remove the flush when the buffer fills, but now add
cs_etm__end_block() for the end trace. That way we won't drop
the last branch stack if the instruction sample period wasn't
hit at the very end.
James Clark (7):
perf cs-etm: Don't flush when packet_queue fills up
perf cs-etm: Use new OpenCSD consistency checks
perf scripting python: Add function to get a config value
perf scripts python cs-etm: Update to use argparse
perf scripts python cs-etm: Improve arguments
perf scripts python cs-etm: Add start and stop arguments
perf test: cs-etm: Test Coresight disassembly script
.../perf/Documentation/perf-script-python.txt | 2 +-
.../scripts/python/Perf-Trace-Util/Context.c | 11 ++
.../scripts/python/arm-cs-trace-disasm.py | 109 +++++++++++++-----
.../tests/shell/test_arm_coresight_disasm.sh | 63 ++++++++++
tools/perf/util/config.c | 22 ++++
tools/perf/util/config.h | 1 +
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +-
tools/perf/util/cs-etm.c | 25 ++--
8 files changed, 205 insertions(+), 35 deletions(-)
create mode 100755 tools/perf/tests/shell/test_arm_coresight_disasm.sh
--
2.34.1
A set of changes that came out of the issues reported here [1].
* First 2 patches fix a decode bug in Perf and add support for new
consistency checks in OpenCSD
* The remaining ones make the disassembly script easier to test
and use. This also involves adding a new Python binding to
Perf to get a config value (perf_config_get())
[1]: https://lore.kernel.org/linux-arm-kernel/20240719092619.274730-1-gankulkarn…
Changes since V2:
* Check validity of start stop arguments
* Make test work if Perf was installed
* Document that start and stop time are monotonic clock values
Changes since V1:
* Keep the flush function for discontinuities
* Still remove the flush when the buffer fills, but now add
cs_etm__end_block() for the end trace. That way we won't drop
the last branch stack if the instruction sample period wasn't
hit at the very end.
James Clark (7):
perf cs-etm: Don't flush when packet_queue fills up
perf cs-etm: Use new OpenCSD consistency checks
perf scripting python: Add function to get a config value
perf scripts python cs-etm: Update to use argparse
perf scripts python cs-etm: Improve arguments
perf scripts python cs-etm: Add start and stop arguments
perf test: cs-etm: Test Coresight disassembly script
.../perf/Documentation/perf-script-python.txt | 2 +-
.../scripts/python/Perf-Trace-Util/Context.c | 11 ++
.../scripts/python/arm-cs-trace-disasm.py | 127 ++++++++++++++----
.../tests/shell/test_arm_coresight_disasm.sh | 65 +++++++++
tools/perf/util/config.c | 22 +++
tools/perf/util/config.h | 1 +
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +-
tools/perf/util/cs-etm.c | 25 +++-
8 files changed, 225 insertions(+), 35 deletions(-)
create mode 100755 tools/perf/tests/shell/test_arm_coresight_disasm.sh
--
2.34.1