On 22/09/2021 10:58, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> The TRBE driver makes sure that there is enough space for a meaningful
>> run, otherwise pads the given space and restarts the offset calculation
>> once. But there is no guarantee that we may find space or hit "no space".
>
> So what happens currently when it neither finds the required minimum buffer
> space for a meaningful run nor does it hit the "no space" scenario ?
It tries once today and assumes that it will either hit :
- No space
OR
- Enough space
which is reasonable, given the minimum space needed is a few bytes.
But this may no longer be true with other erratum workaround.
>
>> Make sure that we repeat the step until, either :
>> - We have the minimum space
>> OR
>> - There is NO space at all.
>>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 3373f4e2183b..02f9e00e2091 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle)
>> * If the head is too close to the limit and we don't
>> * have space for a meaningful run, we rather pad it
>> * and start fresh.
>> + *
>> + * We might have to do this more than once to make sure
>> + * we have enough required space.
>
> OR no space at all, as explained in the commit message.
> Hence this comment needs an update.
>
>> */
>> - if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>> + while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) {
>> trbe_pad_buf(handle, limit - head);
>> limit = __trbe_normal_offset(handle);
>> + head = PERF_IDX2OFF(handle->head, buf);
>
> Should the loop be bound with a retry limit as well ?
No. We will eventually hit No-space as we keep on padding
the buffer.
Suzuki
On 22/09/2021 08:23, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> Now that we have the work around implmented in the TRBE
>> driver, add the Kconfig entries and document the errata.
>>
>> Cc: Mark Rutland <mark.rutland(a)arm.com>
>> Cc: Will Deacon <will(a)kernel.org>
>> Cc: Catalin Marinas <catalin.marinas(a)arm.com>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> Documentation/arm64/silicon-errata.rst | 4 +++
>> arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++
>> 2 files changed, 43 insertions(+)
>>
>> diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
>> index d410a47ffa57..2f99229d993c 100644
>> --- a/Documentation/arm64/silicon-errata.rst
>> +++ b/Documentation/arm64/silicon-errata.rst
>> @@ -92,12 +92,16 @@ stable kernels.
>> +----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
>> +----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Neoverse-N1 | #1349291 | N/A |
>> +----------------+-----------------+-----------------+-----------------------------+
>> | ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> | ARM | MMU-500 | #841119,826419 | N/A |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +----------------+-----------------+-----------------+-----------------------------+
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 077f2ec4eeb2..eac4030322df 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412
>>
>> If unsure, say Y.
>>
>> +config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> + bool
>> +
>> +config ARM64_ERRATUM_2119858
>> + bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode"
>> + default y
>> + depends on CORESIGHT_TRBE
>> + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> + help
>> + This option adds the workaround for ARM Cortex-A710 erratum 2119858.
>> +
>> + Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace
>> + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>> + the event of a WRAP event.
>> +
>> + Work around the issue by always making sure we move the TRBPTR_EL1 by
>> + 256bytes before enabling the buffer and filling the first 256bytes of
>> + the buffer with ETM ignore packets upon disabling.
>> +
>> + If unsure, say Y.
>> +
>> +config ARM64_ERRATUM_2139208
>> + bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode"
>> + default y
>> + depends on CORESIGHT_TRBE
>> + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
>> + help
>> + This option adds the workaround for ARM Neoverse-N2 erratum 2139208.
>> +
>> + Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace
>> + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in
>
> s/ponited/pointed
>
>> + the event of a WRAP event.
>> +
>> + Work around the issue by always making sure we move the TRBPTR_EL1 by
>> + 256bytes before enabling the buffer and filling the first 256bytes of
>> + the buffer with ETM ignore packets upon disabling.
>> +
>> + If unsure, say Y.
>> +
>> config CAVIUM_ERRATUM_22375
>> bool "Cavium erratum 22375, 24313"
>> default y
>>
>
> The real errata problem description for both these erratums are exactly
> the same. Rather a more generalized description should be included for
> the ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, which abstracts out the
> problem and a corresponding solution that is implemented in the driver.
> This should also help us reduce current redundancy.
>
The issue is what a user wants to see. A user who wants to configure the
kernel specifically for a given CPU (think embedded systems), they would
want to hand pick the errata for the particular CPU. So, moving the help
text to an implicitly selected Kconfig symbol. I would rather keep this
as it is to keep it user friendly. This doesn't affect the code size
anyways.
The other option is to remove all the CPU specific Kconfig symbols and
update the "title" to reflect both the CPU/erratum numbers.
Kind regards
Suzuki
Hi Tao
Are there no sinks at all on this platform ? I had this question on the
previous series. How is CoreSight useful on this platform otherwise ?
On 13/09/2021 07:40, Tao Zhang wrote:
> Add the basic coresight components found on Qualcomm SM8250 Soc. The
> basic coresight components include ETF, ETMs,STM and the related
> funnels.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
> ---
> arch/arm64/boot/dts/qcom/qrb5165-rb5.dts | 442 ++++++++++++++++++++++-
> 1 file changed, 438 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts b/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
> index 8ac96f8e79d4..9c8f87d80afc 100644
> --- a/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
> +++ b/arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
> @@ -222,11 +222,445 @@
> regulator-max-microvolt = <1800000>;
> regulator-always-on;
> };
> -};
>
> -&adsp {
> - status = "okay";
> - firmware-name = "qcom/sm8250/adsp.mbn";
Unrelated change ? Please keep it separate from the CoreSight changes.
Suzuki
Hi Tao
On 13/09/2021 07:40, Tao Zhang wrote:
> This series adds Coresight support for SM8250 Soc on RB5 board.
> It is composed of two elements.
> a) Add ETM PID for Kryo-5XX.
> b) Add coresight support to DTS for RB5.
>
> This series applies to coresight/next
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
Please could you mention what has changed since the previous version
in the cover letter ?
Kind regards
Suzuki
> Tao Zhang (2):
> coresight: etm4x: Add ETM PID for Kryo-5XX
> arm64: dts: qcom: sm8250: Add Coresight support
>
> arch/arm64/boot/dts/qcom/qrb5165-rb5.dts | 442 +++++++++++++++++-
> .../coresight/coresight-etm4x-core.c | 1 +
> 2 files changed, 439 insertions(+), 4 deletions(-)
>
Debugfs is nice and so are module parameters, but
* debugfs doesn't take effect early (e.g., if drivers are locking up
before user space gets anywhere)
* module parameters either add a lot to the kernel command line, or
else take effect late as well (if you build =m and configure in
/etc/modprobe.d/)
So in the same spirit as these
CONFIG_PANIC_ON_OOPS (also available via cmdline or modparam)
CONFIG_INTEL_IOMMU_DEFAULT_ON (also available via cmdline)
add a new Kconfig option.
Module parameters and debugfs can still override.
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/hwtracing/coresight/Kconfig | 13 +++++++++++++
drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
index f026e5c0e777..8b638eb3cb7d 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@@ -150,6 +150,19 @@ config CORESIGHT_CPU_DEBUG
To compile this driver as a module, choose M here: the
module will be called coresight-cpu-debug.
+config CORESIGHT_CPU_DEBUG_DEFAULT_ON
+ bool "Enable CoreSight CPU Debug by default
+ depends on CORESIGHT_CPU_DEBUG
+ help
+ Say Y here to enable the CoreSight Debug panic-debug by default. This
+ can also be enabled via debugfs, but this ensures the debug feature
+ is enabled as early as possible.
+
+ Has the same effect as setting coresight_cpu_debug.enable=1 on the
+ kernel command line.
+
+ Say N if unsure.
+
config CORESIGHT_CTI
tristate "CoreSight Cross Trigger Interface (CTI) driver"
depends on ARM || ARM64
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 00de46565bc4..8845ec4b4402 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -105,7 +105,7 @@ static DEFINE_PER_CPU(struct debug_drvdata *, debug_drvdata);
static int debug_count;
static struct dentry *debug_debugfs_dir;
-static bool debug_enable;
+static bool debug_enable = IS_ENABLED(CONFIG_CORESIGHT_CPU_DEBUG_DEFAULT_ON);
module_param_named(enable, debug_enable, bool, 0600);
MODULE_PARM_DESC(enable, "Control to enable coresight CPU debug functionality");
--
2.33.0.153.gba50c8fa24-goog
The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the
low level's architecture code, e.g. for Arm64, it maps the memory with
the attribution "Normal non-cacheable"; this can be concluded from the
definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h.
Later when access the AUX bounce buffer, since the memory mapping is
non-cacheable, it's low efficiency due to every load instruction must
reach out DRAM.
This patch changes to allocate pages with dma_alloc_noncoherent(), the
driver can access the memory via cacheable mapping; therefore, load
instructions can fetch data from cache lines rather than always read
data from DRAM, the driver can boost memory performance. After using
the cacheable mapping, the driver uses dma_sync_single_for_cpu() to
invalidate cacheline prior to read bounce buffer so can avoid read stale
trace data.
By measurement the duration for function tmc_update_etr_buffer() with
ftrace function_graph tracer, it shows the performance significant
improvement for copying 4MiB data from bounce buffer:
# echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise
# echo tmc_update_etr_buffer > set_graph_function
# echo function_graph > current_tracer
before:
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) | tmc_update_etr_buffer() {
...
2) # 8148.320 us | }
after:
# CPU DURATION FUNCTION CALLS
# | | | | | | |
2) | tmc_update_etr_buffer() {
...
2) # 2525.420 us | }
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
Changes from v3:
Refined change to use dma_alloc_noncoherent()/dma_free_noncoherent()
(Robin Murphy);
Retested functionality and performance on Juno-r2 board.
Changes from v2:
Sync the entire buffer in one go when the tracing is wrap around
(Suzuki);
Add Suzuki's review tage.
.../hwtracing/coresight/coresight-tmc-etr.c | 26 ++++++++++++++++---
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index acdb59e0e661..a049b525a274 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -609,8 +609,9 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
if (!flat_buf)
return -ENOMEM;
- flat_buf->vaddr = dma_alloc_coherent(real_dev, etr_buf->size,
- &flat_buf->daddr, GFP_KERNEL);
+ flat_buf->vaddr = dma_alloc_noncoherent(real_dev, etr_buf->size,
+ &flat_buf->daddr,
+ DMA_FROM_DEVICE, GFP_KERNEL);
if (!flat_buf->vaddr) {
kfree(flat_buf);
return -ENOMEM;
@@ -631,14 +632,18 @@ static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
if (flat_buf && flat_buf->daddr) {
struct device *real_dev = flat_buf->dev->parent;
- dma_free_coherent(real_dev, flat_buf->size,
- flat_buf->vaddr, flat_buf->daddr);
+ dma_free_noncoherent(real_dev, etr_buf->size,
+ flat_buf->vaddr, flat_buf->daddr,
+ DMA_FROM_DEVICE);
}
kfree(flat_buf);
}
static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
{
+ struct etr_flat_buf *flat_buf = etr_buf->private;
+ struct device *real_dev = flat_buf->dev->parent;
+
/*
* Adjust the buffer to point to the beginning of the trace data
* and update the available trace data.
@@ -648,6 +653,19 @@ static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
etr_buf->len = etr_buf->size;
else
etr_buf->len = rwp - rrp;
+
+ /*
+ * The driver always starts tracing at the beginning of the buffer,
+ * the only reason why we would get a wrap around is when the buffer
+ * is full. Sync the entire buffer in one go for this case.
+ */
+ if (etr_buf->offset + etr_buf->len > etr_buf->size)
+ dma_sync_single_for_cpu(real_dev, flat_buf->daddr,
+ etr_buf->size, DMA_FROM_DEVICE);
+ else
+ dma_sync_single_for_cpu(real_dev,
+ flat_buf->daddr + etr_buf->offset,
+ etr_buf->len, DMA_FROM_DEVICE);
}
static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
--
2.25.1
This patch series fixes the snapshot mode in the CoreSight driver.
The first patch simplifies the head pointer handling for AUX buffer, the
second patch updates the driver comments to reflect the latest status
for perf tool.
Changes from v3:
- Made the comments generic in CoreSight drivers (Suzuki).
Changes from v2:
- Minor improvement the commits for patches 01 and 02.
Leo Yan (2):
coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
coresight: Update comments for removing cs_etm_find_snapshot()
drivers/hwtracing/coresight/coresight-etb10.c | 5 ++---
drivers/hwtracing/coresight/coresight-tmc-etf.c | 5 ++---
drivers/hwtracing/coresight/coresight-tmc-etr.c | 15 +++++----------
3 files changed, 9 insertions(+), 16 deletions(-)
--
2.25.1
Hi Peter,
On Tue, Sep 14, 2021 at 11:51:36AM +0200, Peter Zijlstra wrote:
> On Sun, Aug 29, 2021 at 06:56:57PM +0800, Leo Yan wrote:
> > Hi Peter, or any x86 maintainer,
> >
> > On Mon, Aug 09, 2021 at 07:14:02PM +0800, Leo Yan wrote:
> > > Since BTS is coherent, simply add a compiler barrier to separate the BTS
> > > update and aux_head store.
> >
> > Could you reivew this patch and check if BTS needs the comipler
> > barrier in this case? Thanks.
>
> Yes, a compiler barrier is sufficient.
>
> You want me to pick it up?
Maybe other maintainers are more suitable than me to answer this :)
Yeah, I think it's great if you could pick it.
Thanks,
Leo
This patch series is to refine the memory barriers for AUX ring buffer.
Patches 01 ~ 04 to address the barriers usage in the kernel. The first
patch is to make clear comment for how to use the barriers between the
data store and aux_head store, this asks the driver to make sure the
data is visible. Patches 02 ~ 04 is to refine the drivers for barriers
after the data store.
Patch 05 is to use WRITE_ONCE() for updating aux_tail.
Patches 06 ~ 09 is to drop the legacy __sync functions, and polish for
duplicate code and cleanup the build and feature test after
SYNC_COMPARE_AND_SWAP is not used.
For easier review and more clear patch organization, comparing against
to the previous patch series, the patches for support compat mode for
AUX trace have been left out and will be sent out as a separate patch
set.
This patch set have been tested on Arm64 Juno platform.
Changes from v4:
- Refined comment for CoreSight ETR/ETF drivers (Suzuki/Peter);
- Changed to use compiler barrier for BTS (mentioned by Peter, but have
not received response from Intel developer);
- Refined the coding style for patch 07 (Adrian).
Changes from v3:
- Removed the inapprocate paragraph in the commit log for patch "perf
auxtrace: Drop legacy __sync functions" (Adrian);
- Added new patch to remove feature-sync-compare-and-swap test (Adrian);
- Th patch for "perf auxtrace: Use WRITE_ONCE() for updating aux_tail",
is a standlone and simple change, so moved it ahead in the patch set
for better ordering;
- Minor improvement for commit logs in the last two patches.
Changes from v2:
- Removed auxtrace_mmap__read_snapshot_head(), which has the duplicated
code with auxtrace_mmap__read_head();
- Cleanuped the build for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT (Adrian);
- Added global variable "kernel_is_64_bit" (Adrian);
- Added compat variants compat_auxtrace_mmap__{read_head|write_tail}
(Adrian).
Leo Yan (9):
perf/ring_buffer: Add comment for barriers on AUX ring buffer
coresight: tmc-etr: Add barrier after updating AUX ring buffer
coresight: tmc-etf: Add comment for store ordering
perf/x86: Add compiler barrier after updating BTS
perf auxtrace: Use WRITE_ONCE() for updating aux_tail
perf auxtrace: Drop legacy __sync functions
perf auxtrace: Remove auxtrace_mmap__read_snapshot_head()
perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
tools: Remove feature-sync-compare-and-swap feature detection
arch/x86/events/intel/bts.c | 6 ++++
.../hwtracing/coresight/coresight-tmc-etf.c | 5 +++
.../hwtracing/coresight/coresight-tmc-etr.c | 8 +++++
kernel/events/ring_buffer.c | 9 ++++++
tools/build/Makefile.feature | 1 -
tools/build/feature/Makefile | 4 ---
tools/build/feature/test-all.c | 4 ---
.../feature/test-sync-compare-and-swap.c | 15 ---------
tools/perf/Makefile.config | 4 ---
tools/perf/util/auxtrace.c | 18 +++--------
tools/perf/util/auxtrace.h | 31 +------------------
11 files changed, 34 insertions(+), 71 deletions(-)
delete mode 100644 tools/build/feature/test-sync-compare-and-swap.c
--
2.25.1