Minor update to library released - build fixes.
Fixes issue with Debian build on Sparc. See README for details.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi Mathieu,
Apologies if mailman does not see this as a reply. I'm not sure if Outlook handles In-Reply-To properly.
I'm testing CPU-wide tracing on Zynq Ultrascale+ MPSoC and I have some comments I'd like to share.
Some introduction at first:
- I'm using mainline Linux from a couple of days ago (12ad143e1b80 Merge branch 'perf-urgent-for-linus'...)
- on top of it I have a couple of my changes introducing CoreSight support on US+
- on top of this I cherry-picked your two patch sets with CPU-wide tracing
I prepared a test program that's supposed to generate deterministic trace. I created a function that should,
depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn
two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
The atom generator function's body looks like below. If *atom == 'n', branch is not taken, thus atom N should
be generated, and if *atom == 'e', branch is taken and atom E should be generated. After that, another atom
E is expected, since the while loop branches back to the start. It's counter-intuitive when you look at the C code,
but the if-condition is actually evaluated to b.ne instruction, so getting inside the condition happens when the branch
is not taken.
volatile int sum = 0;
while (1) {
// Reference by pointer, so it's not optimized out.
if (*atom == 'n') // compiler creates b.ne here
sum += 0xdeadbeef * (*atom + 1);
}
Here are my observations:
1. -C option works well. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -C1 ./atom_gen
In perf report I can see lots of E atoms in packets marked with ID:12. If I collect trace with -C2 instead,
I see E/N atoms in packets with ID:14. Everything works as expected each time I trace this application.
2. -a option works unreliable. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12
Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command.
Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get
ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would
not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my
understanding of CoreSight. Is this behaviour expected?
3. I'm not able to use filters.
I'd like to narrow down tracing to the while(1) loop in trace generator, to filter out noise from other instructions.
However, I find it impossible to use --filter flag along with -C or -a:
# perf record -e cs_etm/(a)fe940000.etf1/u --filter 'filter atom_n_generator @./atom_gen' -a ./atom_gen
failed to set filter "filter 0x90c/0x8c@/root/atom_gen" on event cs_etm/(a)fe940000.etf1/u with 95 (Operation not supported)
It works fine with --per-thread. Is the behaviour expected, or is this a bug?
4. Kernel crashes if used with no -a, -C or --per-thread.
If I call perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u ./atom_gen
I can see some printfs from the program, but immediately kernel gets NULL pointer dereference.
Please find a log below. My serial connection drops characters sometimes, sorry for that.
The crash happens in tmc_enable_etf_sink+0x90, which is:
/* Get a handle on the pid of the process to monitor */
if (handle->event->owner)
pid = task_pid_nr(handle->event->owner);
The handle->event->owner seems to be NULL.
[ 1313.650726Unable to handle kernel NULL pointer dereference at virtual adess 00000000000003b8
[ 1313.659501] Mem abort info:
[ 1313.662281] ESR = 0x96000006
[ 1313.665320] Exption class = DABT (current EL), IL = 32 bits
[ 1313.671232] SET = 0, FnV = 0
[ 1313.674277] EA = 0, S1PTW = 0
[ 1313.677401] Data abort info:
[ 1313.680266] ISV = 0, ISS =x00000006
[ 1313.684085] CM = 0, WnR = 0
[ 1313.687039] us pgtable: 4k pages, 39-bit VAs, pgdp = 000000003b61a770
[ 1313.693644] [00000000000003b8] pgd=000000006c6da003, pud=0000006c6da003, pmd=0000000000000000
[ 1313.702336] Internal err: Oops: 96000006 [#1] SMP
[ 1313.707201] Modules linked in:
[ 1313.710250] CPU: 1 PID: 3255 Comm: multithread-two N tainted 5.0.0-10411-g66431e6376c4-dirty #26
[ 1313.719200] Hdware name: ZynqMP ZCU104 RevA (DT)
[ 1313.723981] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 1313.728770] pc : tmc_enle_etf_sink+0x90/0x3b0
[ 1313.733286] lr : tmc_enable_etf_sin0x64/0x3b0
[ 1313.737806] sp : ffffff8011263b40
[ 1313.741104] x29: ffffff8011263b40 x28: 0000000000000000
[ 1313.6409] x27: 0000000000000000 x26: ffffffc06d4ce180
[ 1313.7512] x25: 0000000000000001 x24: ffffffc06faa4ce0
[ 1313.757015] x23: 0000000000000002 x22: 0000000000000080
[ 1313.7319] x21: ffffffc06faa4ce0 x20: ffffffc06cf07c00
[ 1313.7676] x19: ffffffc06d560e80 x18: 0000000000000000
[ 1313.772926] x17: 0000000000000000 x16: 0000000000000000
[ 1313.7729] x15: 0000000000000000 x14: ffffff8010879388
[ 1313.78353 x13: 0000000000000000 x12: 0000000002e8fc00
[ 1313.788836] x11: 0000000000000000 x10: 00000000000007f0
[ 1313.7940] x9 : 0000000000000000 x8 : 0000000000000000
[ 1313.799443x7 : 0000000000000030 x6 : ffffffc06c279030
[ 1313.804747] x5 : 0000000000000030 x4 : 0000000000000002
[ 1313.8100] x3 : ffffffc06d560ee8 x2 : 0000000000000001
[ 1313.815354]1 : 0000000000000000 x0 : 0000000000000000
[ 1313.820659] Process multithread-two (pid: 3255, stack limit = 0x00000073629f1e)
[ 1313.828133] Call trace:
[ 1313.830571] tmc_enae_etf_sink+0x90/0x3b0
[ 1313.834748] coresight_enable_path+0xe4/0x1f8
[ 1313.839096] etm_event_start+0x8c/0x120
[313.842923] etm_event_add+0x38/0x58
[ 1313.846492] event_scd_in.isra.61.part.62+0x94/0x1b0
[ 1313.851620] group_sched_in+0xa0/0x1c8
[ 1313.855360] flexible_sched_in+0xac/0x1
[ 1313.859364] visit_groups_merge+0x144/0x1f8
[ 1313.86353 ctx_sched_in.isra.39+0x128/0x138
[ 1313.867887] perf_event_sched_in.isra.41+0x54/0x80
[ 1313.872669] __perf_eventask_sched_in+0x16c/0x180
[ 1313.877540] finish_task_switch+104/0x1d8
[ 1313.881715] schedule_tail+0xc/0x98
[ 1313.885195] ret_from_fork+0x4/0x18
[ 1313.888677] Code: 540016 f9001bb7 f94002a0 f9414400 (b943b817)
[ 1313.894761] ---[ e trace 99bb09dc83a83a1a ]---
Best regards,
Wojciech
Hi,
(+coresight mailing lists.)
Looked at this - -fpic is supposed to generate smaller code then -fPIC.
That said, I've tried both variants for x86_64 and aarch64 builds:
x86_64 showed no change, (gcc 5.4)
cross compiled aarch64 code was 0.45% smaller using -fpic rather than
-fPIC. (gcc 6.2)
native compiled aarch64 code showed no change (gcc 4.9)
While we could add some code to the makefile to dynamically change the
-fPIC/pic option when building on sparc architectures, unless there are
objections on the mailing list, I propose to change to -fPIC across the
board at this point.
This will be released as a 0.11.1 patch (along with another minor build
fix.)
Regards
Mike
On Wed, 13 Mar 2019 at 08:50, John Paul Adrian Glaubitz <
notifications(a)github.com> wrote:
> I have just tested this on sparc64 and can confirm that replacing -fpic
> with -fPIC fixes the issue for me.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/Linaro/OpenCSD/issues/16#issuecomment-472332457>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AMvwsxbzERGcBbzJECyGHDUxn…>
> .
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Version v0.11.0 release of OpenCSD library.
* ETMv4 support updated to cover all ETM versions up to v4.4 (from
v4.1 previously). This covers the latest v8.4 arch cores.
* Updated memory callback function to pass trace ID to client when
requesting memory data.
Allows client to determine source CPU for the request and return
memory images accordingly.
(required for upcoming work on perf support.).
* Other minor fixes and updates.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi,
I'm trying to run Coresight on XIlinx Zynq Ultrascale+ with ZCU104 board. I have already made some progress, however I'm stuck on getting valid timestamp packets in the decoded trace.
I'm using mainline Linux forked at v5.0-rc4. So far, I manually added Coresight nodes to DTS basing on the Ultrascale+ datasheet, since neither mainline Linux nor the Xilinx fork seem to have it. Looks like this step went fine, since I can see drivers reporting in dmesg and related sysfs nodes being registered:
root@zynq:~# dmesg | grep -i coresight
[ 3.745681] coresight-etm4x fec40000.etm0: CPU0: ETM v4.0 initialized
[ 3.752318] coresight-etm4x fed40000.etm1: CPU1: ETM v4.0 initialized
[ 3.758952] coresight-etm4x fee40000.etm2: CPU2: ETM v4.0 initialized
[ 3.765587] coresight-etm4x fef40000.etm3: CPU3: ETM v4.0 initialized
[ 3.772190] coresight-stm fe9c0000.stm: stm_register_device failed, probing deferred
[ 3.780004] coresight-cpu-debug fec10000.debug0: Coresight debug-CPU0 initialized
[ 3.787531] coresight-cpu-debug fed10000.debug1: Coresight debug-CPU1 initialized
[ 3.795058] coresight-cpu-debug fee10000.debug2: Coresight debug-CPU2 initialized
[ 3.802587] coresight-cpu-debug fef10000.debug3: Coresight debug-CPU3 initialized
[ 3.917423] coresight-stm fe9c0000.stm: STM500 initialized
root@zynq:~# ls /sys/bus/coresight/devices/
fe920000.funnel1 fe940000.etf1 fe970000.etr fe9c0000.stm fed40000.etm1 fef40000.etm3
fe930000.funnel2 fe950000.etf2 fe980000.tpiu fec40000.etm0 fee40000.etm2 replicator
Overall trace acquisition and decoding with perf+OpenCSD also looks good, apart from timestamp packets. When I request timestamping, the value is constant 0x0:
root@zynq:~/cs_test# perf record -e cs_etm/timestamp,(a)fe940000.etf1/u --filter 'filter 0x764/0x2c@./sum' --per-thread ./sum
Val: 20
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data ]
root@zynq:~/cs_test# perf report --dump | grep TIMESTAMP
Idx:231; ID:12; I_TIMESTAMP : Timestamp.; Updated val = 0x0
Idx:254; ID:12; I_TIMESTAMP : Timestamp.; Updated val = 0x0
Idx:279; ID:12; I_TIMESTAMP : Timestamp.; Updated val = 0x0
According to the datasheet, Ultrascale+ has a timestamp generator. Perf confirms it in the registers values dump:
root@zynq:~/cs_test# perf report --dump | grep 'TRCCONFIGR\|TRCIDR0'
TRCCONFIGR 800
TRCIDR0 28000ea1
TRCCONFIGR 800
TRCIDR0 28000ea1
TRCCONFIGR 800
TRCIDR0 28000ea1
TRCCONFIGR 800
TRCIDR0 28000ea1
These value match what ETMv4 architecture specs says:
- bit 11 (TS) of TRCCONFIGR is 1, what indicates global timestamping is enabled
- bits 24:28 (TSSIZE) of TRCIDR0 are 0b1000, what indicates support for 64-bit-long timestamps
In the kernel source code I can see that the ETMv4 driver writes the EVENT bits (0:7) of the TRCTSCTLR register. I didn't thoroughly analyze this code, but I take it as another hint that timestamping is actually supported.
At this point I wonder if anybody has witnessed timestamps working on this platform, whether under Linux or baremetal. I would appreciate even the tiniest suggestions where should I look next to get them working.
Thanks and best regards,
Wojciech
Hi,
Can I send you a Price of one of our Database Sellers based on your
requirement?
Kindly just share your requirements by filling in the below table:
Industries : -----------------------
Job Titles : -----------------------
Geography : ----------------------
I'll come up with the data counts, costs & few sample contacts for your
review.
Regards,
Katherine Allison
Business Development
Hi Mark,
CoreSight trace collection is broken on v5.0-rc6 due to this commit:
9dff0aa95a32 perf/core: Don't WARN() for impossible ring-buffer sizes
Before:
root@juno:/home/linaro# perf record -e cs_etm/(a)20070000.etr/u
--per-thread uname
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.036 MB perf.data ]
root@juno:/home/linaro#
After:
root@juno:/home/linaro# perf record -e cs_etm/(a)20070000.etr/u
--per-thread uname
failed to mmap with 12 (Cannot allocate memory)
root@juno:/home/linaro#
The problem is related to the order_base_2() [1] test with a size of
1264, stemming from nr_pages equal to 128. The combination yields an
order of 11, something leading directly to the error path.
The results are the same with linux-next 20190213. This was tested on
a Juno R0 and R1 with a 4K page configuration. I haven't tried but
I'm pretty sure it breaks IntelPT as well.
Please have a look when you have a minute. Leo Yan and I will be
happy to test patches.
Thanks,
Mathieu
[1]. https://elixir.bootlin.com/linux/v5.0-rc6/source/kernel/events/ring_buffer.…