On Thu, Mar 09, 2023 at 12:46:55PM -0800, Steve Clevenger wrote:
[...]
> > #include <linux/io-64-nonatomic-hi-lo.h>
> >
> > static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
> > {
> > if (csa->io_mem) {
> > if (csa->no_quad_mmio)
> > return hi_lo_readq_relaxed(csa->base + offset);
> > else
> > return readq_relaxed(csa->base + offset);
> > } else {
> > return read_etm4x_sysreg_offset(offset, true);
> > }
> > }
> >
> > I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
> > fine to directly call hi_lo_readq_relaxed()?
>
> I can do this, and it would work fine. Without compile protection, one
> concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
> for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
> The only (minor) risk I see is to performance.
Yeah, we could include io.h ahead io-64-nonatomic-hi-lo.h:
#include <linux/io.h>
#include <linux/io-64-nonatomic-lo-hi.h>
Thus we can get the definitions for {readq|writeq}_relaxed() from
"io.h".
> Second, it seems to me the hi_lo and lo_hi macros were intended to deal
> with endianness. IMO, the implementation I offered is agnostic without a
> hint to endianness or atomicity.
Seems to me, kernel functions with explict endianness naming is not bad
thing :)
> The CoreSight maintainer audience is
> limited and already aware of these issues, so I'll make the change for you.
Thanks! Suzuki, Mike, I think it's good to get your opinion before
Steve can proceed for updating patch.
Leo
Hi Steve,
On Wed, Mar 08, 2023 at 11:08:34AM -0800, Steve Clevenger wrote:
[...]
> I'm familiar with this interface. The reason I chose not to use it is
> this solution is configured at compile time. The writeq_relaxed (used by
> etm4x_relaxed_write64) and readq_relaxed (used by etm4x_relaxed_read64)
> otherwise default to 64-bit access. I was uncertain how a compile time
> change would go over with the maintainers. Correct me, but compile time
> configurations in the kernel seem to be related to ARM64 erratum, versus
> manufacturer specific?
>
> My take (based on limited knowledge) of the changes necessary to support
> a compile time decision seemed to exceed the changes compared with the
> implementation I submitted upstream. If this becomes a sticking point,
> let me know.
I am suggesting something like below:
#include <linux/io-64-nonatomic-hi-lo.h>
static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
{
if (csa->io_mem) {
if (csa->no_quad_mmio)
return hi_lo_readq_relaxed(csa->base + offset);
else
return readq_relaxed(csa->base + offset);
} else {
return read_etm4x_sysreg_offset(offset, true);
}
}
I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
fine to directly call hi_lo_readq_relaxed()?
Thanks,
Leo
Hi Yang,
On Wed, Mar 08, 2023 at 11:56:38AM -0800, Yang Shi wrote:
[...]
> > Dumping raw events could show the events from the bad data file. But
> > it has zero samples after event collapse.
> >
> > The only difference is --kcore inserted a new text_poke dummy event.
> > It seems coresight also inserted a dummy event with my command but
> > your command didn't. So it seems like the two dummy events confused
> > event collapse.
> >
> > The text_poke dummy event is added by commit
> > f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
> > text_poke events with --kcore option"). If I reverted this commit,
> > then it works. But I'm not sure whether this is the right fix or real
> > root cause or not. Or coresight shouldn't insert its own dummy event?
>
> It seems like coresight needs to insert the dummy event if
> full_auxtrace is on IIUC. So it sounds like event collapse can't
> handle such a case?
I am struggling to understand the meaning "event collapse" :)
I reviewed your shared dump, the bad and good perf data both contain the
dummy event with 'text_poke = 1'. Could you confirm the shared dump in
your previous email is correct or not?
> I also tried v5.19 (before "perf record: Always
> get text_poke events with --kcore option", which was merged in v6.0),
> it works. So it seems like a regression.
Yeah, we need to fix it. I am not sure the Linux kernel for Arm64
supports text poke or not (kernel needs some specific handling when
alter instructions), the kernel change is the prerequisites.
On the other hand, in the current code cs-etm misses to handle the
event PERF_RECORD_TEXT_POKE in the function cs_etm__process_event().
This might be the cause for the failure.
Do you mind to share the bad perf.data file with James and me?
Thanks,
Leo
Currently there is a refcount leak in CTI when using system wide mode
or tracing multithreaded applications. See the last commit for a
reproducer. This prevents the module from being unloaded.
Historically there have been a few issues and fixes attempted around
here which have resulted in some extra logic and a member to keep
track of CTI being enabled 'struct coresight_device->ect_enabled'.
The fix in commit 665c157e0204 ("coresight: cti: Fix hang in
cti_disable_hw()") was also related to CTI having its own
enable/disable path which came later than other devices.
If we make CTI a helper device and enable helper devices adjacent to
the path we get very similar enable/disable behavior to now, but with
more reuse of the existing reference counting logic in the coresight
core code. This also affects CATU which can have a little bit of
its hard coded enable/disable code removed.
Enabling CATU on the generic path does require that input connections
are tracked so that it can get its associated ETR buffer.
Applies to coresight/next (669c4614236a7) but also requires the
realloc_array patch here [1].
Also available in full here [2].
[1]: https://lore.kernel.org/linux-arm-kernel/20230306152723.3090195-1-james.cla…
[2]: https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-cs-cti-module-refcou…
James Clark (8):
coresight: Use enum type for cs_mode wherever possible
coresight: Change name of pdata->conns
coresight: Rename nr_outports to nr_outconns
coresight: Dynamically add connections
coresight: Store in-connections as well as out-connections
coresight: Refactor out buffer allocation function for ETR
coresight: Enable and disable helper devices adjacent to the path
coresight: Fix CTI module refcount leak by making it a helper device
drivers/hwtracing/coresight/coresight-catu.c | 34 ++-
drivers/hwtracing/coresight/coresight-core.c | 258 +++++++++++-------
.../hwtracing/coresight/coresight-cti-core.c | 56 ++--
.../hwtracing/coresight/coresight-cti-sysfs.c | 4 +-
drivers/hwtracing/coresight/coresight-cti.h | 4 +-
drivers/hwtracing/coresight/coresight-etb10.c | 3 +-
.../coresight/coresight-etm3x-core.c | 6 +-
.../coresight/coresight-etm4x-core.c | 6 +-
.../hwtracing/coresight/coresight-platform.c | 168 +++++++++---
drivers/hwtracing/coresight/coresight-priv.h | 9 +-
drivers/hwtracing/coresight/coresight-stm.c | 6 +-
drivers/hwtracing/coresight/coresight-sysfs.c | 1 -
.../hwtracing/coresight/coresight-tmc-etf.c | 2 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 88 +++---
drivers/hwtracing/coresight/coresight-tmc.h | 2 +
drivers/hwtracing/coresight/coresight-tpdm.c | 4 +-
drivers/hwtracing/coresight/coresight-tpiu.c | 3 +-
drivers/hwtracing/coresight/coresight-trbe.c | 3 +-
drivers/hwtracing/coresight/ultrasoc-smb.c | 3 +-
drivers/hwtracing/coresight/ultrasoc-smb.h | 2 +-
include/linux/coresight.h | 92 ++++---
21 files changed, 483 insertions(+), 271 deletions(-)
--
2.34.1
On 06/03/2023 21:13, Yang Shi wrote:
> Hi,
>
> I'm seeking some help regarding perf record --kcore + coresight. When
> I tracing with perf record --kcore -e cs_etm ... , perf report shows
> "The perf.data/data data has no samples!".
>
> I tried other combinations:
> 1. perf record --kcore (w/o using coresight): works
> 2. perf record -e cs_etm ... : works
> 3. Manually copy kcore with perf buildid-cache, then run perf record
> -e cs_etm... : works
>
> So just "perf record --kcore -e cs_etm ..." doesn't work.
>
> I'm using v6.2 kernel and the perf is built from the same kernel
> source with OpenCSD 1.4. Also attached the sample file generated by
> perf.
>
> Any suggestion is appreciated.
>
> Thanks,
> Yang
Hi Yang,
I don't see any issue with this command and I still get samples:
perf record -e cs_etm// --kcore -- true
You might want to try running both record and report in verbose and
stdio mode (-vvv --stdio) to see if you see any warnings.
I can't think of any way --kcore would cause an issue because all it
does is save kcore into the .debug cache rather than affecting the
actual perf.data file.
Are you running perf report in the same place the recording was made? Or
on another system?
James
Hi Steve
On 07/03/2023 01:23, Steve Clevenger wrote:
>
> Hi Suzuki,
>
> To answer your question, Ampere plans to use the existing MMIO
> implementation to introduce CoreSight HW Assisted Trace since we're
> preparing for release. As a minimum, we know this would require a respin
Please note that MMIO interface is for "external debug" not for
self-hosted usecase, with system instruction support. This is why you need
to work around the driver "as an errata".
> of our CoreSight DSDT. Also, if I didn't misunderstand, it sounded like
> you planned supporting work (e.g. ETMv4 not handled as an AMBA
> device). Since our ETMv4 sink components (ETF, ETR, +CATU) remain memory
> mapped, do these remain AMBA?
Just to be clear, all of these components will be MMIO. We are simply
moving the "Linux" driver framework from AMBA driver to platform device.
This will be done in stages. ETMv4x would be moved in the first pass to
allow us to support system instruction based devices seemlessly with
ACPI.
>
> We understand ETMv4/ETE MMIO is going away. As a sysreg quick test, I
As mentioned above it is not going away. MMIO is the way to access if
that is *the only* access mode. For ETMv4.4+ and ETE MMIO system
instructions is *the mode* of access for self-hosted. Please be aware
that, going down this route may prevent them being virtualised (when
we add the support in the near future).
> bypassed the code which checks for an ETMv4 base address in order to
> default to sysreg access. Trace collection failed with an error. I don't
> have the time to chase after this right now, but I intend to budget the
> time in the near future.
If we can get to the bottom of this, we should be able to support
the platform in a future proof way than adding this ugly work around
of accessing via the *external MMIO* interface.
Suzuki
>
> Thanks and regards,
> Steve
>
> On 3/6/2023 2:29 AM, Suzuki K Poulose wrote:
>> Hi Steve,
>>
>> On 06/03/2023 05:54, Steve Clevenger wrote:
>>> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
>>> the Ampere ETMv4.x hardware implementation.
>>>
>>
>> I don't see any mention of the access via system instructions. Where did
>> that end up ? What is the conclusion on that front ?
>>
>> Kind regards
>> Suzuki
>>
>>
>>> Steve Clevenger (3):
>>> Add known list of Ampere ETMv4 errata
>>> coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
>>> coresight etm4x: Add 32-bit read/write option to split 64-bit words
>>>
>>> Documentation/arm64/silicon-errata.rst | 6 +-
>>> .../coresight/coresight-etm4x-core.c | 50 +++++++++++-----
>>> drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
>>> include/linux/coresight.h | 3 +
>>> 4 files changed, 89 insertions(+), 28 deletions(-)
>>>
>>
On 06/03/2023 05:54, Steve Clevenger wrote:
> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
> the Ampere ETMv4.x hardware implementation.
>
nit: Subject should be :
[PATCH <version> 0/X] <subsystem>: <component>: <Title>
in this case case:
<version> = NULL if version == 1
= version, otherwise
Helps us to keep track of different versions of the postings.
<subsystem> == coresight
This helps the reviewers to "notice" or "ignore" a given series
depending on their interest.
<component> == etm4x
This further helps the reviewers to filter the mails within a subsystem.
[PATCH v2 0/3] coresight: etm4x: Support for Ampere computing
Also, please include a changelog from the previous version to indicate
what has changed. e.g,
"Changes since v1:
- Modified xyz
- Dropped abc
- Addressed comments on ijk (<name of the requester>
"
That helps the reviewers to get a picture of what they should be looking
at and better spend their time.
Kind regards
Suzuki
> Steve Clevenger (3):
> Add known list of Ampere ETMv4 errata
> coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
> coresight etm4x: Add 32-bit read/write option to split 64-bit words
>
> Documentation/arm64/silicon-errata.rst | 6 +-
> .../coresight/coresight-etm4x-core.c | 50 +++++++++++-----
> drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
> include/linux/coresight.h | 3 +
> 4 files changed, 89 insertions(+), 28 deletions(-)
>
Hi Steve,
On 06/03/2023 05:54, Steve Clevenger wrote:
> Ampere ETMv4.x support. Added Ampere ETM ID, and changes required by
> the Ampere ETMv4.x hardware implementation.
>
I don't see any mention of the access via system instructions. Where did
that end up ? What is the conclusion on that front ?
Kind regards
Suzuki
> Steve Clevenger (3):
> Add known list of Ampere ETMv4 errata
> coresight etm4x: Early clear TRCOSLAR.OSLK prior to TRCIDR1 read
> coresight etm4x: Add 32-bit read/write option to split 64-bit words
>
> Documentation/arm64/silicon-errata.rst | 6 +-
> .../coresight/coresight-etm4x-core.c | 50 +++++++++++-----
> drivers/hwtracing/coresight/coresight-etm4x.h | 58 ++++++++++++++-----
> include/linux/coresight.h | 3 +
> 4 files changed, 89 insertions(+), 28 deletions(-)
>