CoreSight ETM4x devices could be accessed either via MMIO (handled via
amba_driver) or CPU system instructions (handled via platform driver). But
this has the following issues :
- Each new CPU comes up with its own PID and thus we need to keep on
adding the "known" PIDs to get it working with AMBA driver. While
the ETM4 architecture (and CoreSight architecture) defines way to
identify a device as ETM4. Thus older kernels won't be able to
"discover" a newer CPU, unless we add the PIDs.
- With ACPI, the ETM4x devices have the same HID to identify the device
irrespective of the mode of access. This creates a problem where two
different drivers (both AMBA based driver and platform driver) would
hook into the "HID" and could conflict. e.g., if AMBA driver gets
hold of a non-MMIO device, the probe fails. If we have single driver
hooked into the given "HID", we could handle them seamlessly,
irrespective of the mode of access.
- CoreSight is heavily dependent on the runtime power management. With
ACPI, amba_driver doesn't get us anywhere with handling the power
and thus one need to always turn the power ON to use them. Moving to
platform driver gives us the power management for free.
Due to all of the above, we are moving the MMIO based etm4x devices to be
supported via platform driver. The series makes the existing platform
driver generic to handle both type of the access modes. With that we can
also remove the etm4x amba driver.
Finally, we need a way to make sure the new driver gets control of the
ETM4x device on a DT based system. CoreSight devices have always had the
"arm,primecell" in the compatible list. But the way this is handled
currently in OF code is a bit messy. The ETM4x devices are identified by
"arm,coresight-etm4x". The platform driver can never get a chance to probe
these devices, since the "arm,primecell" takes priority and is hard-coded
in the OF code. We have two options here :
1) Remove the arm,primecell from all DTS. This is fine for "new" kernels
with this change. But, for existing boards, using an older kernel will
break. Thus, is not preferred.
2) Add a white list of "compatibles" where the "priority" of the
"arm,primecell" can be ignored.
The series implements (2) above and applies on 6.3-rc2.
Cc: Steve Clevenger <scclevenger(a)os.amperecomputing.com>
Cc: Rob Herring <robh+dt(a)kernel.org>
Cc: Frank Rowand <frowand.list(a)gmail.com>
Cc: Russell King (Oracle) <linux(a)armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Len Brown <lenb(a)kernel.org>
Cc: Sudeep Holla <sudeep.holla(a)arm.com>
Cc: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: Mike Leach <mike.leach(a)linaro.org>
Cc: Leo Yan <leo.yan(a)linaro.org>
Cc: devicetree(a)vger.kernel.org
Cc: linux-acpi(a)vger.kernel.org
Cc: coresight(a)lists.linaro.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-kernel(a)vger.kernel.org
Anshuman Khandual (6):
coresight: etm4x: Allocate and device assign 'struct etmv4_drvdata' earlier
coresight: etm4x: Drop iomem 'base' argument from etm4_probe()
coresight: etm4x: Drop pid argument from etm4_probe()
coresight: etm4x: Change etm4_platform_driver driver for MMIO devices
of/platform: Skip coresight etm4x devices from AMBA bus
coresight: etm4x: Drop the AMBA driver
Suzuki Poulose (1):
coresight: etm4x: Add ACPI support in platform driver
drivers/acpi/acpi_amba.c | 1 -
.../coresight/coresight-etm4x-core.c | 171 ++++++++----------
drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
drivers/of/platform.c | 10 +-
include/linux/coresight.h | 56 ++++++
5 files changed, 143 insertions(+), 98 deletions(-)
--
2.25.1
Hi Steve
On 06/03/2023 05:54, Steve Clevenger wrote:
> An Ampere Computing design decision is MMIO reads are considered the
> same as an external debug access. If TRCOSLAR.OSLK is left set, the
> TRCIDR1 access results in a bus fault followed by a kernel panic. A
> TRCIDR1 read should be valid regardless of TRCOSLAR.OSLK provided MMIO
> access (now deprecated) is supported.
>
> The Ampere work around is to early clear ETM TRCOSLAR.OSLK prior to
> TRCIDR1 access. Do not distinguish between manufacturers.
> AC03_DEBUG_06 is described in the AmpereOne Developer Errata:
>
> Add Ampere ETM PID required for Coresight ETM driver support.
>
> https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-d…
>
> Signed-off-by: Steve Clevenger <scclevenger(a)os.amperecomputing.com>
I went back to the CoreSight ETM4x architecture and this is not an
erratum at your end. This is actually a bug in the ETMv4 driver.
TRCIDR1 is part of the "Trace" and not "Management" registers of
ETMv4. Access to "Trace" registers without OSLK is going to
cause issues. (Un)Fortunately, we never hit this on existing
systems, but that doesn't mean it can stay there.
So we should really fix our detection and only rely on
accessing the TRCDEVARCH to detect the device to be ETMv4.
I have a sent out the fix here [0], are you able to test it :
[0]
https://lkml.kernel.org/r/20230317115728.1358368-1-suzuki.poulose@arm.com
Suzuki
On the HiSilicon platform, there is one ETM per logic core.
When setting up SMT, not every process has an ETM. So the path
".../cs_etm/cpux/trcidr/trcidr0" does not exist, the function
perf_pmu__scan_file() return an error. However, the command
'perf record' will returns silently and don't print.
After this patch, log a error when read fails that makes it easy
for users to debug.
root@localhost:/# perf record -e /cs_etm/@ultra_smb0/ -C 3 uname -a
cs_etm: can't read file cpu3/trcidr/trcidr0
root@localhost:/#
Signed-off-by: Junhao He <hejunhao3(a)huawei.com>
---
tools/perf/arch/arm/util/cs-etm.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 7f71c8a237ff..19ea17430399 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -585,7 +585,6 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu)
{
- bool ret = false;
char path[PATH_MAX];
int scan;
unsigned int val;
@@ -600,9 +599,10 @@ static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu)
/* The file was read successfully, we have a winner */
if (scan == 1)
- ret = true;
+ return true;
- return ret;
+ pr_err("%s: can't read file %s\n", CORESIGHT_ETM_PMU_NAME, path);
+ return false;
}
static int cs_etm_get_ro(struct perf_pmu *pmu, int cpu, const char *path)
--
2.33.0
On 16/03/2023 03:20, Hao Zhang wrote:
> Introduction of Coresight Dummy subunit
> The Coresight Dummy subunit is for Coresight Dummy component, there are some
> specific Coresight devices that HLOS don't have permission to access. Such as
What is HLOS ?
> some TPDMs, they would be configured in NON-HLOS side, but it's necessary to
What is NON-HLOS ?
> build Coresight path for it to debug. So there need driver to register dummy
> devices as Coresight devices.
Build a path for who to debug ? If this is used by some privileged
software, shouldn't that do all of the work ?
Suzuki
>
> Commit link:
> https://git.codelinaro.org/clo/linux-kernel/coresight/-/tree/coresight-dummy
>
> Hao Zhang (3):
> Coresight: Add coresight dummy driver
> dt-bindings: arm: Add Coresight Dummy Trace YAML schema
> Documentation: trace: Add documentation for Coresight Dummy Trace
>
> .../bindings/arm/qcom,coresight-dummy.yaml | 129 +++++++++++++
> .../trace/coresight/coresight-dummy.rst | 58 ++++++
> drivers/hwtracing/coresight/Kconfig | 11 ++
> drivers/hwtracing/coresight/Makefile | 1 +
> drivers/hwtracing/coresight/coresight-dummy.c | 176 ++++++++++++++++++
> 5 files changed, 375 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/arm/qcom,coresight-dummy.yaml
> create mode 100644 Documentation/trace/coresight/coresight-dummy.rst
> create mode 100644 drivers/hwtracing/coresight/coresight-dummy.c
>
This is printed as a warning but it is normal behavior that users
shouldn't be expected to do anything about. Reduce the warning level to
debug3 so it's only seen in verbose mode to avoid confusion.
Signed-off-by: James Clark <james.clark(a)arm.com>
---
tools/perf/arch/arm/util/cs-etm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 7f71c8a237ff..59b50dd70330 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -694,8 +694,8 @@ static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr,
data[CS_ETMV4_TS_SOURCE] = (__u64) cs_etm_get_ro_signed(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TS_SOURCE]);
else {
- pr_warning("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
- cpu);
+ pr_debug3("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
+ cpu);
data[CS_ETMV4_TS_SOURCE] = (__u64) -1;
}
}
@@ -729,8 +729,8 @@ static void cs_etm_save_ete_header(__u64 data[], struct auxtrace_record *itr, in
data[CS_ETE_TS_SOURCE] = (__u64) cs_etm_get_ro_signed(cs_etm_pmu, cpu,
metadata_ete_ro[CS_ETE_TS_SOURCE]);
else {
- pr_warning("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
- cpu);
+ pr_debug3("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
+ cpu);
data[CS_ETE_TS_SOURCE] = (__u64) -1;
}
}
--
2.34.1
On Thu, Mar 09, 2023 at 10:06:41AM -0800, Yang Shi wrote:
[...]
> > I reviewed your shared dump, the bad and good perf data both contain the
> > dummy event with 'text_poke = 1'. Could you confirm the shared dump in
> > your previous email is correct or not?
>
> Oops, sorry. I pasted the wrong log. The good one looks like
> (generated by v5.19):
>
> # captured on : Wed Mar 8 18:02:58 2023
> # header version : 1
> # data offset : 408
> # data size : 22640
> # feat offset : 23048
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 5.19.g3d7cb6b04c3f
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /home/yshi/linux/tools/perf/perf record -e
> cs_etm/@tmc_etf63/k --kcore --per-thread -- taskset --cpu-list 1 uname
> # event : name = cs_etm/@tmc_etf63/k, , id = { 3832 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID, d
> isabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec = 1,
> sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 3833 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID,
> disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1,
> enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1,
> mmap2 = 1, comm_exec = 1,
> context_switch = 1, ksymbol = 1, bpf_event = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # missing features: TRACING_DATA CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
> HYBRID_CPU_PMU_CAPS
Thanks for confirmation.
Just a quick summary, here we have two issues:
- With command:
perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
-- taskset --cpu-list 1 uname",
perf doesn't enable "text poke" attribution.
- With command:
perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
-- taskset --cpu-list 1 true (in your previous email), or ...
perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread \
-- taskset --cpu-list 1 uname (in your shared perf data file),
perf enables "text poke" attribution, in this case, perf fails to decode
Arm CoreSight trace data.
[...]
> > Do you mind to share the bad perf.data file with James and me?
>
> Please check the attachment out. Thanks for looking into this problem.
Thank you for sharing the data. We will look into it.
Leo
On Thu, Mar 09, 2023 at 12:46:55PM -0800, Steve Clevenger wrote:
[...]
> > #include <linux/io-64-nonatomic-hi-lo.h>
> >
> > static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
> > {
> > if (csa->io_mem) {
> > if (csa->no_quad_mmio)
> > return hi_lo_readq_relaxed(csa->base + offset);
> > else
> > return readq_relaxed(csa->base + offset);
> > } else {
> > return read_etm4x_sysreg_offset(offset, true);
> > }
> > }
> >
> > I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
> > fine to directly call hi_lo_readq_relaxed()?
>
> I can do this, and it would work fine. Without compile protection, one
> concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
> for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
> The only (minor) risk I see is to performance.
Yeah, we could include io.h ahead io-64-nonatomic-hi-lo.h:
#include <linux/io.h>
#include <linux/io-64-nonatomic-lo-hi.h>
Thus we can get the definitions for {readq|writeq}_relaxed() from
"io.h".
> Second, it seems to me the hi_lo and lo_hi macros were intended to deal
> with endianness. IMO, the implementation I offered is agnostic without a
> hint to endianness or atomicity.
Seems to me, kernel functions with explict endianness naming is not bad
thing :)
> The CoreSight maintainer audience is
> limited and already aware of these issues, so I'll make the change for you.
Thanks! Suzuki, Mike, I think it's good to get your opinion before
Steve can proceed for updating patch.
Leo